claude-dev-env 1.37.0 → 1.38.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +3 -0
- package/_shared/pr-loop/audit-contract.md +4 -3
- package/_shared/pr-loop/fix-protocol.md +2 -0
- package/_shared/pr-loop/gh-payloads.md +38 -37
- package/_shared/pr-loop/scripts/README.md +0 -1
- package/_shared/pr-loop/scripts/preflight.py +2 -1
- package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +2 -2
- package/_shared/pr-loop/scripts/tests/test_preflight.py +22 -0
- package/_shared/pr-loop/state-schema.md +10 -10
- package/agents/clean-coder.md +4 -0
- package/agents/code-quality-agent.md +23 -85
- package/agents/groq-coder.md +8 -6
- package/hooks/blocking/__init__.py +0 -0
- package/hooks/blocking/hedging_language_blocker.py +2 -2
- package/hooks/blocking/state_description_blocker.py +243 -0
- package/hooks/blocking/tdd_enforcer.py +94 -0
- package/hooks/blocking/test_hedging_language_blocker.py +1 -1
- package/hooks/blocking/test_state_description_blocker.py +618 -0
- package/hooks/blocking/test_tdd_enforcer.py +152 -0
- package/hooks/config/state_description_blocker_constants.py +130 -0
- package/hooks/hooks.json +10 -0
- package/package.json +1 -1
- package/rules/gh-paginate.md +4 -50
- package/rules/no-historical-clutter.md +57 -0
- package/scripts/config/groq_bugteam_config.py +13 -5
- package/skills/bugteam/CONSTRAINTS.md +20 -27
- package/skills/bugteam/EXAMPLES.md +1 -1
- package/skills/bugteam/PROMPTS.md +78 -42
- package/skills/bugteam/SKILL.md +76 -63
- package/skills/bugteam/SKILL_EVALS.md +12 -12
- package/skills/bugteam/reference/audit-and-teammates.md +21 -48
- package/skills/bugteam/reference/audit-contract.md +7 -7
- package/skills/bugteam/reference/github-pr-reviews.md +31 -31
- package/skills/bugteam/reference/team-setup.md +1 -1
- package/skills/bugteam/reference/teardown-publish-permissions.md +4 -4
- package/skills/copilot-review/SKILL.md +7 -14
- package/skills/findbugs/SKILL.md +2 -2
- package/skills/fixbugs/SKILL.md +1 -1
- package/skills/monitor-open-prs/SKILL.md +6 -6
- package/skills/pr-converge/SKILL.md +7 -6
- package/skills/pr-converge/reference/convergence-gates.md +46 -44
- package/skills/pr-converge/reference/examples.md +4 -4
- package/skills/pr-converge/reference/fix-protocol.md +8 -8
- package/skills/pr-converge/reference/multi-pr-orchestration.md +10 -10
- package/skills/pr-converge/reference/per-tick.md +24 -36
- package/skills/pr-converge/reference/stop-conditions.md +7 -7
- package/skills/pr-converge/scripts/README.md +65 -117
- package/skills/pr-review-responder/EXAMPLES.md +2 -2
- package/skills/pr-review-responder/PRINCIPLES.md +2 -8
- package/skills/pr-review-responder/README.md +7 -48
- package/skills/pr-review-responder/SKILL.md +2 -3
- package/skills/pr-review-responder/TESTING.md +8 -65
- package/skills/qbug/SKILL.md +10 -16
- package/_shared/pr-loop/scripts/config/gh_util_constants.py +0 -31
- package/_shared/pr-loop/scripts/gh_util.py +0 -193
- package/_shared/pr-loop/scripts/tests/test_gh_util.py +0 -257
- package/_shared/pr-loop/scripts/tests/test_gh_util_constants.py +0 -61
- package/skills/pr-converge/scripts/check_pr_mergeability.py +0 -78
- package/skills/pr-converge/scripts/config/pr_converge_constants.py +0 -118
- package/skills/pr-converge/scripts/config/test_pr_converge_constants.py +0 -152
- package/skills/pr-converge/scripts/fetch_bugbot_inline_comments.py +0 -70
- package/skills/pr-converge/scripts/fetch_bugbot_reviews.py +0 -57
- package/skills/pr-converge/scripts/fetch_claude_inline_comments.py +0 -70
- package/skills/pr-converge/scripts/fetch_claude_reviews.py +0 -61
- package/skills/pr-converge/scripts/fetch_copilot_inline_comments.py +0 -70
- package/skills/pr-converge/scripts/fetch_copilot_reviews.py +0 -61
- package/skills/pr-converge/scripts/mark_pr_ready.py +0 -54
- package/skills/pr-converge/scripts/post-bugbot-run.helpers.ps1 +0 -49
- package/skills/pr-converge/scripts/post-bugbot-run.ps1 +0 -33
- package/skills/pr-converge/scripts/reply_to_inline_comment.py +0 -84
- package/skills/pr-converge/scripts/request_copilot_review.py +0 -71
- package/skills/pr-converge/scripts/resolve_pr_head.py +0 -58
- package/skills/pr-converge/scripts/review_field_helpers.py +0 -43
- package/skills/pr-converge/scripts/reviewer_fetch_core.py +0 -153
- package/skills/pr-converge/scripts/reviewer_specs.py +0 -98
- package/skills/pr-converge/scripts/test_check_pr_mergeability.py +0 -126
- package/skills/pr-converge/scripts/test_fetch_bugbot_inline_comments.py +0 -443
- package/skills/pr-converge/scripts/test_fetch_bugbot_reviews.py +0 -299
- package/skills/pr-converge/scripts/test_fetch_claude_inline_comments.py +0 -485
- package/skills/pr-converge/scripts/test_fetch_claude_reviews.py +0 -368
- package/skills/pr-converge/scripts/test_fetch_copilot_inline_comments.py +0 -440
- package/skills/pr-converge/scripts/test_fetch_copilot_reviews.py +0 -366
- package/skills/pr-converge/scripts/test_mark_pr_ready.py +0 -69
- package/skills/pr-converge/scripts/test_post_bugbot_run.py +0 -195
- package/skills/pr-converge/scripts/test_reply_to_inline_comment.py +0 -159
- package/skills/pr-converge/scripts/test_request_copilot_review.py +0 -101
- package/skills/pr-converge/scripts/test_resolve_pr_head.py +0 -79
- package/skills/pr-converge/scripts/test_review_field_helpers.py +0 -80
- package/skills/pr-converge/scripts/test_reviewer_fetch_core.py +0 -448
- package/skills/pr-converge/scripts/test_reviewer_specs.py +0 -107
- package/skills/pr-converge/scripts/test_trigger_bugbot.py +0 -139
- package/skills/pr-converge/scripts/test_view_pr_context.py +0 -111
- package/skills/pr-converge/scripts/trigger_bugbot.py +0 -77
- package/skills/pr-converge/scripts/view_pr_context.py +0 -47
- package/skills/pr-review-responder/scripts/respond_to_reviews.py +0 -376
|
@@ -10,21 +10,40 @@ Keep the spawn prompt self-contained: reference only the PR scope, audit rubric,
|
|
|
10
10
|
<branch>head ref</branch>
|
|
11
11
|
<base_branch>base ref</base_branch>
|
|
12
12
|
<pr_url>full URL</pr_url>
|
|
13
|
-
<loop>
|
|
13
|
+
<loop>L</loop>
|
|
14
14
|
<pr_number>N</pr_number>
|
|
15
15
|
<worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
|
|
16
16
|
</context>
|
|
17
17
|
|
|
18
|
-
cd into `<worktree_path>` before any git
|
|
18
|
+
cd into `<worktree_path>` before any git or file operation.
|
|
19
19
|
|
|
20
20
|
<scope>
|
|
21
21
|
<diff_path>Absolute path to the per-PR patch file: <run_temp_dir>/pr-<N>/loop-<L>.patch (same path as gh pr diff redirect in AUDIT)</diff_path>
|
|
22
22
|
<scope_rule>Audit only lines added or modified in the diff. Pre-existing code on untouched lines is out of scope.</scope_rule>
|
|
23
|
+
<changed_files_rule>Build the list of changed file paths from the diff. Open each one with Read and audit cross-file consistency. Read every changed test file and cross-reference test assertions, expected values, and mock setup against the production code's config constants and function signatures. When a test file asserts a value that diverges from config, file a finding under category J.</changed_files_rule>
|
|
23
24
|
</scope>
|
|
24
25
|
|
|
25
26
|
<bug_categories>
|
|
26
|
-
Investigate each
|
|
27
|
-
one finding OR a verified-clean entry with the
|
|
27
|
+
Investigate each of the eleven categories (A–K) explicitly. For each,
|
|
28
|
+
return either at least one finding OR a verified-clean entry with the
|
|
29
|
+
evidence used to clear it. A category is verified-clean only when one
|
|
30
|
+
complete execution path through the changed code has been traced from
|
|
31
|
+
entry to exit. Surface-level scanning is insufficient evidence. The
|
|
32
|
+
evidence field must name (1) the specific function examined, (2) the
|
|
33
|
+
code path traced from entry to exit, and (3) the specific check performed.
|
|
34
|
+
Generic phrases such as "verified clean", "no issues found",
|
|
35
|
+
"pattern appears correct", "looks good", "seems fine", and
|
|
36
|
+
"no problems detected" do not satisfy the verified-clean requirement.
|
|
37
|
+
When evidence contains any of these phrases, the category is not
|
|
38
|
+
verified-clean -- re-audit with a concrete trace.
|
|
39
|
+
|
|
40
|
+
Categories A–K (one-line summary; full rubric and sub-bucket decomposition
|
|
41
|
+
for each is in `packages/claude-dev-env/audit-rubrics/category_rubrics/`;
|
|
42
|
+
ready-to-send Variant C prompts — each with a PR/repo-independent
|
|
43
|
+
generalized skeleton above a `---` separator and a worked example against
|
|
44
|
+
an authentic PR below — are in
|
|
45
|
+
`packages/claude-dev-env/audit-rubrics/prompts/`):
|
|
46
|
+
|
|
28
47
|
A. API contract verification (signatures, return types, async/await correctness)
|
|
29
48
|
B. Selector / query / engine compatibility
|
|
30
49
|
C. Resource cleanup and lifecycle (file handles, connections, processes, locks)
|
|
@@ -35,6 +54,10 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
35
54
|
H. Security boundaries (injection, path traversal, auth bypass, secret leakage)
|
|
36
55
|
I. Concurrency hazards (race conditions, missing awaits, shared mutable state)
|
|
37
56
|
J. Magic values and configuration drift
|
|
57
|
+
K. Codebase conflicts — a change updates one site of a pattern but a parallel
|
|
58
|
+
site in unchanged code stays stale, producing contradictory behavior;
|
|
59
|
+
the diff is internally consistent, the bug emerges only against unchanged
|
|
60
|
+
code (canonical example: jl-cmd/claude-code-config PR #397 r3210166636)
|
|
38
61
|
</bug_categories>
|
|
39
62
|
|
|
40
63
|
<constraints>
|
|
@@ -42,51 +65,57 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
42
65
|
- Cite file:line for every finding.
|
|
43
66
|
- When the diff alone does not provide enough context to confirm a bug,
|
|
44
67
|
list it under "Open questions" rather than assert it.
|
|
68
|
+
- For every finding, search `git grep` for all callers of the targeted function. When the obvious fix would silently change behavior for other call paths, include a fix constraint that preserves them.
|
|
45
69
|
</constraints>
|
|
46
70
|
|
|
47
71
|
<comment_posting>
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
72
|
+
Sibling auditors (-b through -k): run only steps 1–2 (audit, assign IDs,
|
|
73
|
+
capture excerpt, validate anchors), then write outcome XML per <output_format> and return.
|
|
74
|
+
Skip steps 3–5 — sibling auditors do not post PR reviews.
|
|
75
|
+
|
|
76
|
+
Validator (-a) and single-opus auditors: run all steps below.
|
|
77
|
+
|
|
78
|
+
1. Audit the diff against the 11 categories above. Buffer the findings
|
|
79
|
+
in memory; all posting happens at step 4 once anchors are validated.
|
|
80
|
+
2. Assign each finding a stable finding_id of exactly the form `loop<L>-<K>`
|
|
81
|
+
where <K> is 1-based within this loop.
|
|
82
|
+
3. For each finding, capture a verbatim excerpt from the target file at the cited
|
|
83
|
+
line. Populate the `<excerpt>` element in the outcome XML with it. Validate
|
|
84
|
+
every finding's (file, line) against the captured diff. Split findings into two
|
|
85
|
+
buckets: anchored (line is in the diff) and unanchored (line is not in the diff
|
|
86
|
+
— goes into the review body's "Findings without a diff anchor" section per
|
|
87
|
+
Step 2.5). Format each finding body as:
|
|
59
88
|
|
|
60
89
|
**[severity] one-line title**
|
|
61
90
|
Category: <letter> (<category name>)
|
|
62
91
|
<2-3 sentence description with concrete trace>
|
|
63
92
|
|
|
64
|
-
_From /bugteam audit loop
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
93
|
+
_From /bugteam audit loop <L>._
|
|
94
|
+
|
|
95
|
+
4. Post ONE review via `pull_request_review_write(method="create",
|
|
96
|
+
event="COMMENT", body=<review_body>, owner=<O>, repo=<R>,
|
|
97
|
+
pullNumber=<N>, comments=[...])`. See Step 2.5 in SKILL.md for the full
|
|
98
|
+
parameter shape. Harvest the parent review `html_url` from the response
|
|
99
|
+
and the `comments[]` child entries (each with its own `id` and `html_url`).
|
|
100
|
+
Match child entries to anchored findings in index order.
|
|
101
|
+
5. If the review POST fails, use `add_issue_comment(owner=<O>, repo=<R>,
|
|
102
|
+
issueNumber=<N>, body=<full_text>)` as fallback.
|
|
103
|
+
Body text is passed directly as string parameters to the MCP tool calls —
|
|
104
|
+
no temp files, no jq, no shell pipes.
|
|
76
105
|
</comment_posting>
|
|
77
106
|
|
|
78
107
|
<output_format>
|
|
79
|
-
For the
|
|
80
|
-
the PR's worktree directory (<worktree_path>). For sibling auditors (-b
|
|
108
|
+
For the (-a) validator: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
|
|
109
|
+
the PR's worktree directory (<worktree_path>). For sibling auditors (-b through -k): write to <run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml (absolute path passed in prompt). Sibling auditors do not post PR reviews; set review_url, finding_comment_id, and finding_comment_url to empty strings, and used_fallback to "false". Omit unanchored findings from sibling output — only the validator handles those. Return only that path on stdout. The schema:
|
|
81
110
|
</output_format>
|
|
82
111
|
```
|
|
83
112
|
|
|
84
113
|
## AUDIT outcome XML schema (bugfind writes this)
|
|
85
114
|
|
|
86
115
|
```xml
|
|
87
|
-
<bugteam_audit loop="<
|
|
116
|
+
<bugteam_audit loop="<L>" review_url="<url>">
|
|
88
117
|
<finding
|
|
89
|
-
finding_id="loop<
|
|
118
|
+
finding_id="loop<L>-<K>"
|
|
90
119
|
severity="P0|P1|P2"
|
|
91
120
|
category="<letter>"
|
|
92
121
|
file="<path>"
|
|
@@ -96,6 +125,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
96
125
|
used_fallback="true|false"
|
|
97
126
|
>
|
|
98
127
|
<title>one-line title</title>
|
|
128
|
+
<excerpt>verbatim source line or snippet from the file at the cited line</excerpt>
|
|
99
129
|
<description>2-3 sentence description with concrete trace</description>
|
|
100
130
|
</finding>
|
|
101
131
|
<verified_clean>
|
|
@@ -114,17 +144,17 @@ After the teammate writes the XML and returns, the lead reads `.bugteam-pr<N>-lo
|
|
|
114
144
|
<branch>head</branch>
|
|
115
145
|
<base_branch>base</base_branch>
|
|
116
146
|
<pr_url>url</pr_url>
|
|
117
|
-
<loop>
|
|
147
|
+
<loop>L</loop>
|
|
118
148
|
<pr_number>N</pr_number>
|
|
119
149
|
<worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
|
|
120
150
|
</context>
|
|
121
151
|
|
|
122
|
-
cd into `<worktree_path>` before any git
|
|
152
|
+
cd into `<worktree_path>` before any git or file operation.
|
|
123
153
|
|
|
124
154
|
<bugs_to_fix>
|
|
125
155
|
[for each P0/P1/P2 finding from last_findings:]
|
|
126
156
|
<bug
|
|
127
|
-
finding_id="loop<
|
|
157
|
+
finding_id="loop<L>-<K>"
|
|
128
158
|
severity="P0|P1|P2"
|
|
129
159
|
file="<path>"
|
|
130
160
|
line="<int>"
|
|
@@ -140,25 +170,28 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
140
170
|
1. Read each referenced file before editing.
|
|
141
171
|
2. Apply each fix you can address.
|
|
142
172
|
3. Run `python -m py_compile` (or language-equivalent) on every modified file.
|
|
143
|
-
4.
|
|
173
|
+
4. Run the project's test suite and confirm all existing tests pass. If a test fails, diagnose the regression and fix it before committing.
|
|
174
|
+
5. Read the previous loop's outcome XML (`<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`) and obtain its total finding count. If this is the first loop (L <= 1) or the file does not exist, skip this comparison. Otherwise, re-read each changed file and count any new violations. Compute the post-fix total: previous total minus bugs fixed in this round plus new violations. If the post-fix total exceeds the previous total, flag all new findings as same-loop fix-targets and revise before committing.
|
|
175
|
+
6. git add by explicit path, then git commit with a message summarizing the bugs fixed.
|
|
144
176
|
- If the commit fails because a git hook (pre-commit, commit-msg, etc.) blocked it,
|
|
145
177
|
capture the hook's stderr, write status=hook_blocked for every finding in this loop
|
|
146
178
|
(the commit was atomic; if it failed, no finding was applied), populate hook_output
|
|
147
179
|
on each outcome, and return WITHOUT retrying. The lead will treat this loop as no-progress.
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
180
|
+
7. git push with a plain fast-forward push (the default, no flag overrides).
|
|
181
|
+
8. For each bug, post a fix reply to its finding_comment_id via
|
|
182
|
+
`add_reply_to_pull_request_comment(commentId=<id>, body=<reply_text>,
|
|
183
|
+
owner=<O>, repo=<R>, pullNumber=<N>)`:
|
|
151
184
|
- "Fixed in <commit_sha>" if the bug was addressed by your commit
|
|
152
185
|
- "Could not address this loop: <one-line reason>" if you skipped or failed it
|
|
153
186
|
- "Hook blocked the fix commit: <one-line summary>" if the commit was hook-blocked
|
|
154
|
-
|
|
155
|
-
|
|
187
|
+
Body text is passed directly as string parameters -- no temp files, no jq, no shell pipes.
|
|
188
|
+
9. Write `.bugteam-pr<N>-loop<L>.fix-outcomes.xml` inside `<worktree_path>` (schema below) and return its path.
|
|
156
189
|
</execution>
|
|
157
190
|
|
|
158
191
|
<outcome_xml_schema>
|
|
159
|
-
<bugteam_fix loop="<
|
|
192
|
+
<bugteam_fix loop="<L>" commit_sha="<sha or empty if no commit>">
|
|
160
193
|
<outcome
|
|
161
|
-
finding_id="loop<
|
|
194
|
+
finding_id="loop<L>-<K>"
|
|
162
195
|
status="fixed|could_not_address|hook_blocked"
|
|
163
196
|
commit_sha="<sha if fixed, empty otherwise>"
|
|
164
197
|
reply_comment_id="<id of the reply posted>"
|
|
@@ -179,5 +212,8 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
179
212
|
- git add by explicit path — name each file being staged.
|
|
180
213
|
- Preserve existing comments on lines you do not modify.
|
|
181
214
|
- Type hints on every signature you touch.
|
|
215
|
+
- **Narrow scope.** Fix only the exact defect at the specified file:line. No restructuring, no inlining helpers, no renames, no "while I'm here" cleanup.
|
|
216
|
+
- **Preserve helpers.** Do not remove or inline existing helper functions unless the finding explicitly names the helper as the problem.
|
|
217
|
+
- **No regression.** Before committing, re-read each changed file and count any new violations. Compare the post-fix total (previous total minus bugs fixed plus new violations) against the previous loop's total finding count (from `<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`). On the first loop (L <= 1) or when the file does not exist, skip this guard. The post-fix total must be flat or decreased relative to the previous loop. An increase means the fix introduced new bugs — revise before committing. Do not commit a regression.
|
|
182
218
|
</constraints>
|
|
183
219
|
```
|
package/skills/bugteam/SKILL.md
CHANGED
|
@@ -120,9 +120,9 @@ Non-zero → stop. Revoke in Step 5 on every exit path.
|
|
|
120
120
|
|
|
121
121
|
### Step 1: Resolve PR scope (once)
|
|
122
122
|
|
|
123
|
-
Accept one or more PR numbers from the invocation. For each PR,
|
|
124
|
-
|
|
125
|
-
path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
|
|
123
|
+
Accept one or more PR numbers from the invocation. For each PR, call
|
|
124
|
+
`pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` (falling back
|
|
125
|
+
to the merge-base diff path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
|
|
126
126
|
headRef, url}, ...]`. A single-PR invocation produces a one-element list and
|
|
127
127
|
follows the same downstream rules.
|
|
128
128
|
|
|
@@ -184,43 +184,36 @@ only PR write before Step 4.5 is the final description rewrite.
|
|
|
184
184
|
|
|
185
185
|
Order: audit → buffer → validate anchors vs diff → single review POST.
|
|
186
186
|
Review body states counts; zero findings → still one review, `comments: []`,
|
|
187
|
-
body `## /bugteam loop <
|
|
187
|
+
body `## /bugteam loop <L> audit: 0P0 / 0P1 / 0P2 → clean`.
|
|
188
188
|
|
|
189
|
-
**Payloads:**
|
|
190
|
-
|
|
191
|
-
|
|
189
|
+
**Payloads:** Use MCP tool calls (see below). Body text with markdown (backticks,
|
|
190
|
+
newlines, quotes) passes through safely as string parameters — no temp files, no
|
|
191
|
+
jq, no shell pipes.
|
|
192
192
|
|
|
193
193
|
**Review POST** (one `comments[]` object per anchored finding; single-line
|
|
194
194
|
`{path, line, side: "RIGHT", body}`; multi-line add `start_line`, `start_side:
|
|
195
195
|
"RIGHT"`):
|
|
196
196
|
|
|
197
197
|
```
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
[
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
comments: [
|
|
210
|
-
{path: $path_1, line: $line_1, side: "RIGHT", body: $finding_body_1}
|
|
211
|
-
[, ... ]
|
|
212
|
-
]
|
|
213
|
-
}' \
|
|
214
|
-
| gh api repos/<owner>/<repo>/pulls/<number>/reviews -X POST --input -
|
|
198
|
+
pull_request_review_write(
|
|
199
|
+
method="create",
|
|
200
|
+
event="COMMENT",
|
|
201
|
+
body=<review_body_text>,
|
|
202
|
+
commitID=<head_sha_at_post_time>,
|
|
203
|
+
owner=<owner>, repo=<repo>, pullNumber=<number>,
|
|
204
|
+
comments=[
|
|
205
|
+
{path: <path_1>, line: <line_1>, side: "RIGHT", body: <finding_body_1>}
|
|
206
|
+
[, ... ]
|
|
207
|
+
]
|
|
208
|
+
)
|
|
215
209
|
```
|
|
216
210
|
|
|
217
|
-
**Fix reply:** `
|
|
218
|
-
|
|
219
|
-
POST --input -`
|
|
211
|
+
**Fix reply:** `add_reply_to_pull_request_comment(commentId=<finding_comment_id>,
|
|
212
|
+
body=<reply_text>, owner=<owner>, repo=<repo>, pullNumber=<number>)`
|
|
220
213
|
|
|
221
|
-
**Review POST fails:** issue comment fallback:
|
|
222
|
-
|
|
223
|
-
|
|
214
|
+
**Review POST fails:** issue comment fallback:
|
|
215
|
+
`add_issue_comment(owner=<owner>, repo=<repo>, issueNumber=<number>,
|
|
216
|
+
body=<fallback_text>)`
|
|
224
217
|
|
|
225
218
|
`<head_sha_at_post_time>`: `git rev-parse HEAD` in subagent cwd immediately
|
|
226
219
|
before POST.
|
|
@@ -228,7 +221,7 @@ before POST.
|
|
|
228
221
|
**Review body template (`<tmp_review_body.md>`):**
|
|
229
222
|
|
|
230
223
|
```
|
|
231
|
-
## /bugteam loop <
|
|
224
|
+
## /bugteam loop <L> audit: <P0>P0 / <P1>P1 / <P2>P2
|
|
232
225
|
|
|
233
226
|
### Findings without a diff anchor
|
|
234
227
|
(only if needed)
|
|
@@ -263,10 +256,16 @@ and before iteration begins, when `last_action == "fresh"`). A re-invocation of
|
|
|
263
256
|
cleaned this HEAD (short-circuit) and otherwise records that prior loops were
|
|
264
257
|
dirty so the AUDIT runs against the latest diff with that signal in mind:
|
|
265
258
|
|
|
266
|
-
```
|
|
267
|
-
dirty_review_count=0
|
|
268
|
-
|
|
269
|
-
|
|
259
|
+
```python
|
|
260
|
+
dirty_review_count = 0
|
|
261
|
+
all_reviews = pull_request_read(
|
|
262
|
+
method="get_reviews", pullNumber=N, owner=O, repo=R
|
|
263
|
+
)
|
|
264
|
+
prior_reviews = [
|
|
265
|
+
rev for rev in all_reviews
|
|
266
|
+
if rev.get("body", "").startswith("## /bugteam loop ")
|
|
267
|
+
]
|
|
268
|
+
prior_reviews.sort(key=lambda rev: rev["submitted_at"], reverse=True)
|
|
270
269
|
```
|
|
271
270
|
|
|
272
271
|
Iterate from index 0 (most recent) toward older entries:
|
|
@@ -313,16 +312,16 @@ Iterate from index 0 (most recent) toward older entries:
|
|
|
313
312
|
Lead only; merge-base / diff semantics:
|
|
314
313
|
[`../../_shared/pr-loop/code-rules-gate.md`][path-code-rules]; shared script
|
|
315
314
|
inventory: [`../../_shared/pr-loop/scripts/README.md`][path-scripts-readme].
|
|
316
|
-
Non-zero → spawn **clean-coder** standards-fix (read stderr, edit, re-run
|
|
315
|
+
Non-zero → spawn **clean-coder** standards-fix (`mode="bypassPermissions"`) (read stderr, edit, re-run
|
|
317
316
|
**this same** command, one commit, `git push`, shutdown) until exit **0** or
|
|
318
317
|
**5**
|
|
319
318
|
failed gate rounds → `error: code rules gate failed pre-audit`. After **0**:
|
|
320
319
|
`loop_count += 1`; if `loop_count > 10` → `cap reached`. Then **AUDIT**
|
|
321
|
-
(bugfind); print `Loop <
|
|
320
|
+
(bugfind); print `Loop <L> audit: ...`.
|
|
322
321
|
|
|
323
322
|
3. **FIX** (`last_action == "audited"` and `last_findings.total > 0`):
|
|
324
323
|
`loop_count += 1`; if `loop_count > 10` → `cap reached`; **FIX** (bugfix);
|
|
325
|
-
print `Loop <
|
|
324
|
+
print `Loop <L> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
|
|
326
325
|
to step 1.
|
|
327
326
|
|
|
328
327
|
4. After **AUDIT**: update `last_action`, `last_findings`, `audit_log`; print
|
|
@@ -335,12 +334,10 @@ before the next AUDIT.
|
|
|
335
334
|
|
|
336
335
|
### AUDIT action
|
|
337
336
|
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
**Spawn:**
|
|
337
|
+
1. Create the directory: `mkdir -p "<run_temp_dir>/pr-<N>"`.
|
|
338
|
+
2. Call `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)`
|
|
339
|
+
to capture the diff text, then write it to
|
|
340
|
+
`"<run_temp_dir>/pr-<N>/loop-<L>.patch"` using the `Write` tool.
|
|
344
341
|
|
|
345
342
|
```
|
|
346
343
|
Agent(
|
|
@@ -361,18 +358,31 @@ background-completion notification, then reads
|
|
|
361
358
|
`last_action = "audited"`; append audit line to `audit_log`.
|
|
362
359
|
|
|
363
360
|
**Parallel auditors (`loop_count >= 4`):** gate passes immediately before;
|
|
364
|
-
after three full audit/fix rounds without convergence, issue
|
|
365
|
-
calls in one assistant message (`run_in_background=true`):
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
361
|
+
after three full audit/fix rounds without convergence, issue eleven `Agent`
|
|
362
|
+
calls in one assistant message (`run_in_background=true`):
|
|
363
|
+
|
|
364
|
+
- **10 haiku auditors (`-b` through `-k`):** `subagent_type="code-quality-agent"`,
|
|
365
|
+
`model="haiku"`, write sibling XML to
|
|
366
|
+
`<run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml`, skip PR posting.
|
|
367
|
+
Prompts must pass literal absolute sibling paths.
|
|
368
|
+
- **1 opus validator (`-a`):** `subagent_type="code-quality-agent"`,
|
|
369
|
+
`model="opus"`:
|
|
370
|
+
- Polls for all 10 sibling XMLs before proceeding (60s timeout, 2s interval). On timeout: log diagnostics entry, proceed with validated findings from available XMLs, report count in validator output.
|
|
371
|
+
- Validates each finding: file exists, line in bounds, excerpt contains the exact
|
|
372
|
+
text of the cited line, category is A–J, severity is P0/P1/P2.
|
|
373
|
+
- Hallucinated findings → quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under
|
|
374
|
+
`validator_rejected` (added alongside the required diagnostics keys defined in the shared audit contract).
|
|
375
|
+
- De-dups by `(file, line, category)`, max severity wins; on conflict, keep longest description text.
|
|
376
|
+
- Re-ids as `loop<L>-<K>`.
|
|
377
|
+
- Writes `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`, posts review.
|
|
378
|
+
|
|
379
|
+
Lead awaits the opus validator (-a) background-completion notification (120s
|
|
380
|
+
timeout). The validator independently polls all 10 sibling XMLs; the lead does
|
|
381
|
+
not gate on haiku peer completion. On lead timeout: the validator did not post
|
|
382
|
+
a merged review — treat as a hard blocker and abort the loop.
|
|
383
|
+
|
|
384
|
+
The sibling-output paths in [`PROMPTS.md`](PROMPTS.md) must cover the full
|
|
385
|
+
`-b` through `-k` range.
|
|
376
386
|
|
|
377
387
|
### FIX action
|
|
378
388
|
|
|
@@ -398,6 +408,8 @@ advanced; `git -C "<run_temp_dir>/pr-<N>/worktree" fetch origin <branch> && git
|
|
|
398
408
|
`HEAD`. Unchanged HEAD →
|
|
399
409
|
`stuck — bugfix subagent could not address findings`.
|
|
400
410
|
|
|
411
|
+
**Scope verification.** Run `git diff HEAD~1 --name-only` and compare against the set of files referenced in bugs_to_fix. When the commit touches any file NOT in the bugs_to_fix list, downgrade the outcome to `unverified_fixed` with reason "commit touched unexpected files: <list>".
|
|
412
|
+
|
|
401
413
|
### Step 4: Teardown
|
|
402
414
|
|
|
403
415
|
1. For each PR in `all_prs`: `git worktree remove
|
|
@@ -418,16 +430,17 @@ else {'onerror': h}))"
|
|
|
418
430
|
### Step 4.5: PR description
|
|
419
431
|
|
|
420
432
|
Lead only; cumulative product narrative (not process). Delegate body to
|
|
421
|
-
`pr-description-writer` via `Agent` (else `general-purpose`) so the
|
|
422
|
-
mandatory-pr-description hook accepts `
|
|
433
|
+
`pr-description-writer` via `Agent` (`mode="bypassPermissions"`) (else `general-purpose`) so the
|
|
434
|
+
mandatory-pr-description hook accepts `update_pull_request`.
|
|
423
435
|
|
|
424
|
-
1. `
|
|
425
|
-
|
|
426
|
-
|
|
436
|
+
1. `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)` → write
|
|
437
|
+
output to `.bugteam-final.diff` with `Write` tool.
|
|
438
|
+
2. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → extract
|
|
439
|
+
`.body` from response, write to `.bugteam-original-body.md` with `Write` tool.
|
|
427
440
|
3. Agent brief: paths + branch names; describe merge-ready change from diff;
|
|
428
441
|
keep curated original sections intact; return markdown body.
|
|
429
|
-
4. Write `.bugteam-final-body.md`; `
|
|
430
|
-
|
|
442
|
+
4. Write `.bugteam-final-body.md`; `update_pull_request(pullNumber=N, owner=O,
|
|
443
|
+
repo=R, body=<body_text>)`.
|
|
431
444
|
5. Delete the three temp files.
|
|
432
445
|
|
|
433
446
|
On failure: log in final report; continue to Step 5.
|
|
@@ -22,9 +22,9 @@ Each invariant cites the normative section or companion file it derives from. Al
|
|
|
22
22
|
| I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
|
|
23
23
|
| I-3 | Orchestration uses `Agent(..., run_in_background=true)` only — no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
|
|
24
24
|
| I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` — **Fresh subagent per loop** |
|
|
25
|
-
| I-5 | Audit and fix spawns pass `model="opus"
|
|
25
|
+
| I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
|
|
26
26
|
| I-6 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
|
|
27
|
-
| I-7 | From loop 4 onward without convergence,
|
|
27
|
+
| I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
|
|
28
28
|
| I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
|
|
29
29
|
| I-9 | Teardown sequence: `git worktree remove` each PR → `rmtree` `<run_temp_dir>` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
|
|
30
30
|
| I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
|
|
@@ -68,7 +68,7 @@ The harness does not yet exist; this document defines its contract.
|
|
|
68
68
|
**Scenario.** Current branch is `main` with no PR and no upstream difference.
|
|
69
69
|
|
|
70
70
|
**Layer B predicted trace.**
|
|
71
|
-
1. `
|
|
71
|
+
1. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → fails / no matching PR.
|
|
72
72
|
2. `Bash("git merge-base HEAD origin/main")` → empty.
|
|
73
73
|
3. No grant script.
|
|
74
74
|
|
|
@@ -103,10 +103,10 @@ The harness does not yet exist; this document defines its contract.
|
|
|
103
103
|
| # | Tool call | Source |
|
|
104
104
|
|---|---|---|
|
|
105
105
|
| 1 | `Bash("python .../scripts/grant_project_claude_permissions.py")` | `SKILL.md` § Step 0 |
|
|
106
|
-
| 2 | `
|
|
106
|
+
| 2 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` | `SKILL.md` § Step 1 |
|
|
107
107
|
| 3 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
|
|
108
108
|
| 4 | `Bash("mkdir -p <run_temp_dir>/pr-42")` | `SKILL.md` § AUDIT action |
|
|
109
|
-
| 5 | `
|
|
109
|
+
| 5 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `<run_temp_dir>/pr-42/loop-1.patch` | `SKILL.md` § AUDIT action |
|
|
110
110
|
| 6 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
|
|
111
111
|
| 7 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
|
|
112
112
|
| 8 | `Read(".bugteam-pr42-loop1.outcomes.xml")` | `SKILL.md` § AUDIT action |
|
|
@@ -116,17 +116,17 @@ The harness does not yet exist; this document defines its contract.
|
|
|
116
116
|
| 12 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
|
|
117
117
|
| 13 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" fetch origin <branch>")` → fetch remote state | `SKILL.md` § FIX action (**Verify**) |
|
|
118
118
|
| 14 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse origin/<branch>")` → confirm matches HEAD | `SKILL.md` § FIX action (**Verify**) |
|
|
119
|
-
| 15 | `
|
|
119
|
+
| 15 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `<run_temp_dir>/pr-42/loop-2.patch` | `SKILL.md` § AUDIT action |
|
|
120
120
|
| 16 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop2", run_in_background=true, ...)` (loop 2) | `SKILL.md` § AUDIT action |
|
|
121
121
|
| 17 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
|
|
122
122
|
| 18 | `Read(".bugteam-pr42-loop2.outcomes.xml")` — zero findings | `SKILL.md` § AUDIT action |
|
|
123
123
|
| 19 | `Bash("git worktree remove \"<run_temp_dir>/pr-42/worktree\"")` | `SKILL.md` § Step 4 step 1 |
|
|
124
124
|
| 20 | `Bash("python -c \"...shutil.rmtree(r'<run_temp_dir>', ...)\"")` | `SKILL.md` § Step 4 step 2 (Windows-safe teardown) |
|
|
125
|
-
| 21 | `
|
|
126
|
-
| 22 | `
|
|
125
|
+
| 21 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `.bugteam-final.diff` | `SKILL.md` § Step 4.5 step 1 |
|
|
126
|
+
| 22 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` → extract `.body`, write to `.bugteam-original-body.md` | `SKILL.md` § Step 4.5 step 2 |
|
|
127
127
|
| 23 | `Agent(subagent_type="pr-description-writer", description=..., prompt=<brief>)` | `SKILL.md` § Step 4.5 |
|
|
128
128
|
| 24 | `Write(".bugteam-final-body.md", <returned body>)` | `SKILL.md` § Step 4.5 step 4 |
|
|
129
|
-
| 25 | `
|
|
129
|
+
| 25 | `update_pull_request(pullNumber=42, owner=..., repo=..., body=...)` | `SKILL.md` § Step 4.5 step 4 |
|
|
130
130
|
| 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 |
|
|
131
131
|
| 27 | `Bash("python .../scripts/revoke_project_claude_permissions.py")` | `SKILL.md` § Step 5 |
|
|
132
132
|
|
|
@@ -171,14 +171,14 @@ Patch this table to match observation and annotate each correction.
|
|
|
171
171
|
|
|
172
172
|
**Layer B predicted behavior.**
|
|
173
173
|
- Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
|
|
174
|
-
- Loops 4–10:
|
|
174
|
+
- Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
|
|
175
175
|
- Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
|
|
176
176
|
- Exactly 10 audit phases, exactly 10 fix phases.
|
|
177
177
|
- Steps 19–26 from Eval 5 fire at teardown.
|
|
178
178
|
|
|
179
179
|
**Pass criteria.**
|
|
180
180
|
- I-6 holds: exactly 10 audit phases.
|
|
181
|
-
- I-7 holds: loops 4–10 each emit
|
|
181
|
+
- I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
|
|
182
182
|
- Final report contains `/bugteam exit: cap reached` and the remaining bug count.
|
|
183
183
|
|
|
184
184
|
**Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
|
|
@@ -224,7 +224,7 @@ Patch this table to match observation and annotate each correction.
|
|
|
224
224
|
- Every finding's outcome XML carries `used_fallback="true"` and the issue-comment URL as `finding_comment_url`.
|
|
225
225
|
- Cycle continues to the FIX action without aborting.
|
|
226
226
|
|
|
227
|
-
**Open item for the real run.** The issue-comments fallback
|
|
227
|
+
**Open item for the real run.** The issue-comments fallback uses `add_issue_comment(owner=..., repo=..., issueNumber=42, body=...)` (`SKILL.md` § Step 2.5 **Review POST fails**; full narrative in `reference/github-pr-reviews.md` § **Review POST failure fallback**). Before running Eval 10 for real, confirm the teammate obeys this shape — the fixture must assert the `add_issue_comment` tool call.
|
|
228
228
|
|
|
229
229
|
---
|
|
230
230
|
|