claude-dev-env 1.37.0 → 1.37.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/rules/gh-paginate.md +4 -50
- package/rules/no-historical-clutter.md +36 -0
- package/skills/bugteam/CONSTRAINTS.md +2 -2
- package/skills/bugteam/PROMPTS.md +20 -13
- package/skills/bugteam/SKILL.md +29 -16
- package/skills/bugteam/SKILL_EVALS.md +4 -4
- package/skills/bugteam/reference/audit-and-teammates.md +21 -48
- package/skills/bugteam/reference/audit-contract.md +7 -7
- package/skills/pr-converge/reference/convergence-gates.md +22 -18
- package/skills/pr-converge/reference/fix-protocol.md +2 -0
- package/skills/pr-converge/reference/per-tick.md +10 -7
- package/skills/pr-converge/scripts/config/pr_converge_constants.py +16 -0
- package/skills/pr-converge/scripts/test_view_pr_context.py +44 -0
- package/skills/pr-converge/scripts/view_pr_context.py +35 -4
package/package.json
CHANGED
package/rules/gh-paginate.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# gh API Pagination Rule
|
|
2
2
|
|
|
3
|
-
**Root cause:**
|
|
3
|
+
**Root cause:** GitHub REST API list endpoints paginate by default. Without `--paginate --slurp`, callers see only the oldest page, and cross-page jq operations (e.g., `sort_by | last`) operate within a single page — producing wrong-but-confident results.
|
|
4
4
|
|
|
5
5
|
**Rule:** All `gh api` calls that read `pulls/<number>/reviews`, `pulls/<number>/comments`, `issues/<number>/comments`, or any other paginated GitHub list endpoint **must** request the full set of pages AND apply any cross-page jq operation through external `jq`, not through `gh`'s built-in `--jq`. Use `--paginate --slurp | jq` (preferred — see [Safe patterns](#safe-patterns)). Never call these endpoints with their default pagination, and never use `gh`'s `--jq` for cross-page operations like `sort_by | last` or `| reverse | .[0]`.
|
|
6
6
|
|
|
@@ -8,8 +8,8 @@
|
|
|
8
8
|
|
|
9
9
|
This rule guards against two distinct silent-truncation defects that compound:
|
|
10
10
|
|
|
11
|
-
1. **Default
|
|
12
|
-
2. **`--jq` runs per-page, not on the concatenated result.** Per [GitHub CLI #10459](https://github.com/cli/cli/issues/10459), `gh api --paginate --jq '<filter>'` applies `<filter>` to each page **separately** and emits one output per page. Cross-page operations like `sort_by(.submitted_at) | last` therefore operate within each page independently, not across the merged result set.
|
|
11
|
+
1. **Default page truncation.** Without `--paginate`, only the first page is fetched.
|
|
12
|
+
2. **`--jq` runs per-page, not on the concatenated result.** Per [GitHub CLI #10459](https://github.com/cli/cli/issues/10459), `gh api --paginate --jq '<filter>'` applies `<filter>` to each page **separately** and emits one output per page. Cross-page operations like `sort_by(.submitted_at) | last` therefore operate within each page independently, not across the merged result set.
|
|
13
13
|
|
|
14
14
|
The safe patterns below fix both defects together: `--paginate --slurp` walks every page AND emits a single merged structure, and an **external** `jq` then runs cross-page operations on that merged structure.
|
|
15
15
|
|
|
@@ -39,7 +39,7 @@ gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate --s
|
|
|
39
39
|
| jq '[.[][] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
The `.[][]` flattens the array-of-pages into one stream of items before the cross-page operators (`sort_by`, `last`, `reverse`) run. Combine with `?per_page=100`
|
|
42
|
+
The `.[][]` flattens the array-of-pages into one stream of items before the cross-page operators (`sort_by`, `last`, `reverse`) run. Combine with `?per_page=100` to reduce round-trips on long PRs.
|
|
43
43
|
|
|
44
44
|
`gh`'s `--jq` flag and `--slurp` flag are mutually exclusive (gh CLI rejects `--paginate --slurp --jq` with `the --slurp option is not supported with --jq or --template`), which is why the filter must run in an external `jq` invocation.
|
|
45
45
|
|
|
@@ -74,52 +74,6 @@ gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate --s
|
|
|
74
74
|
|
|
75
75
|
This is the canonical pattern for the bugbot ↔ bugteam convergence loop: walk newest-first, stop at the first clean review.
|
|
76
76
|
|
|
77
|
-
## What NOT to do
|
|
78
|
-
|
|
79
|
-
```bash
|
|
80
|
-
# BAD — default 30-item page silently truncates on long PRs
|
|
81
|
-
gh api repos/<owner>/<repo>/pulls/<number>/reviews \
|
|
82
|
-
--jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
|
|
83
|
-
|
|
84
|
-
# BAD — `?per_page=100` alone caps at 100 items; PRs with 100+ reviews still truncate
|
|
85
|
-
gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' \
|
|
86
|
-
--jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
|
|
87
|
-
|
|
88
|
-
# BAD — --paginate fetches every page, but `--jq` runs PER-PAGE (gh CLI #10459).
|
|
89
|
-
# `sort_by(.submitted_at) | last` operates within each page independently and
|
|
90
|
-
# emits one "latest" per page, not the actual latest across the full result set.
|
|
91
|
-
gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate \
|
|
92
|
-
--jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
|
|
93
|
-
|
|
94
|
-
# BAD — taking `| last` on an unpaginated read returns the latest of the first 30,
|
|
95
|
-
# not the actual latest. Same defect for `| reverse | .[0]`.
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
## Why both defects matter
|
|
99
|
-
|
|
100
|
-
`gh api`'s default page is the FIRST page of results, ordered oldest-to-newest by the GitHub API. When the result set exceeds 30 items, page 1 contains the OLDEST 30 — not the newest. A jq `| last` after `sort_by(.submitted_at)` picks the latest entry within those 30 oldest items, producing output that looks correct but reports a state from days or weeks ago.
|
|
101
|
-
|
|
102
|
-
`--paginate` alone does NOT fix this when paired with `--jq`: gh applies the jq filter to each page separately and emits one result per page. A consumer reading "the last line of output" still gets the latest within a single page, not the latest across all pages. The skill that consumes this output then makes decisions (re-trigger bugbot, mark a finding stale, report convergence) against an obsolete view of the PR.
|
|
103
|
-
|
|
104
|
-
`--paginate --slurp | jq` fixes both defects: every page is fetched, every page is merged into one structure before any jq operator runs, and cross-page operations see the full result set.
|
|
105
|
-
|
|
106
|
-
## Consumers
|
|
107
|
-
|
|
108
|
-
Skills and scripts in this repo that read paginated endpoints and must therefore use `--paginate --slurp` plus external `jq`:
|
|
109
|
-
|
|
110
|
-
- `pr-converge` — bugbot review walk (BUGBOT phase, Step 2.a) and inline-comments fetch (Step 2.b).
|
|
111
|
-
- `bugteam` — review threads, inline comments, audit-loop history.
|
|
112
|
-
- `qbug` — same as bugteam, scoped to a single subagent loop.
|
|
113
|
-
- `pr-review-responder` — review comments fetch (already enforced; this rule extends the same constraint to reviews and other endpoints).
|
|
114
|
-
- `monitor-many` — open-PR enumeration and per-PR review/comment scans.
|
|
115
|
-
- `babysit-pr` — review-comment polling.
|
|
116
|
-
|
|
117
|
-
Updating any of these to read paginated endpoints requires `--paginate --slurp` plus external `jq` (or a documented single-page bound on a small list).
|
|
118
|
-
|
|
119
77
|
## Enforcement
|
|
120
78
|
|
|
121
79
|
This rule is documentation-only at present. A future PreToolUse hook may pattern-match `Bash` invocations of `gh api repos/.../pulls/<n>/(reviews|comments)` without `--paginate --slurp` (or with `--paginate --jq` doing cross-page operations) and return a corrective message. Until that hook lands, treat this rule as binding by review and rely on it during skill authoring.
|
|
122
|
-
|
|
123
|
-
## Precedent
|
|
124
|
-
|
|
125
|
-
The `pr-review-responder` skill predated this rule and forbids default pagination on `pulls/<n>/comments` reads (`packages/claude-dev-env/skills/pr-review-responder/SKILL.md` Rule 1). This file generalizes that constraint to every paginated GitHub endpoint, adds the `--jq` per-page defect (gh CLI #10459) discovered while reviewing this rule, and centralizes the safe patterns so additional skills inherit the rule by reference instead of restating it.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
---
|
|
2
|
+
paths: **/*.md
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# No Historical Clutter in Documentation
|
|
6
|
+
|
|
7
|
+
**When this applies:** Any Write or Edit to `.md` files.
|
|
8
|
+
|
|
9
|
+
## Rule
|
|
10
|
+
|
|
11
|
+
Never reference removed implementations, old defaults, prior behaviors, or how something "used to be" when updating documentation. The current state is all that matters.
|
|
12
|
+
|
|
13
|
+
## Examples of prohibited patterns
|
|
14
|
+
|
|
15
|
+
| Pattern | Why it's clutter |
|
|
16
|
+
|---------|-----------------|
|
|
17
|
+
| "instead of 30" in a pagination rule | The old default no longer exists in code; the rule reader doesn't need to know what it was |
|
|
18
|
+
| "previously this used X" | If X is gone, it's noise |
|
|
19
|
+
| "before this rule, we did Y" | The rule exists now; the before-state is irrelevant |
|
|
20
|
+
| "migrated from Z to W" | If Z is fully removed, the migration story is git history, not documentation |
|
|
21
|
+
| "the old implementation did A" | If A is gone, the reader gains nothing from knowing it existed |
|
|
22
|
+
| "originally" / "used to be" | Same — dead context |
|
|
23
|
+
|
|
24
|
+
## What IS allowed
|
|
25
|
+
|
|
26
|
+
- Comparisons to *currently existing* alternatives (e.g., "use `--paginate --slurp | jq`, not `--jq` alone")
|
|
27
|
+
- Rationale that explains *why* a pattern is wrong in terms of present behavior (e.g., "`--jq` runs per-page, so cross-page operations produce wrong results")
|
|
28
|
+
- References to external sources for defects that still exist (e.g., gh CLI #10459)
|
|
29
|
+
|
|
30
|
+
## The test
|
|
31
|
+
|
|
32
|
+
After writing documentation, ask: **"If someone reads this a year from now, with no knowledge of what came before, does every sentence still make sense and add value?"** If a sentence only adds value to someone who knew the old state, delete it.
|
|
33
|
+
|
|
34
|
+
## Why
|
|
35
|
+
|
|
36
|
+
Historical references clog context windows and force readers to mentally filter "what was" from "what is." The git log is the authoritative record of what changed and why. Documentation describes the current contract.
|
|
@@ -10,12 +10,12 @@
|
|
|
10
10
|
- **Code rules gate before every AUDIT.** Run `_shared/pr-loop/scripts/code_rules_gate.py` (resolved via `${CLAUDE_SKILL_DIR}/../../_shared/pr-loop/scripts/code_rules_gate.py`) until exit **0** before spawning **bugfind**. Same `validate_content` logic as `hooks/blocking/code_rules_enforcer.py`.
|
|
11
11
|
- **Clean-room audits, every loop.** Each bugfind subagent's spawn prompt contains only the PR scope, audit rubric, and the current loop number. Prior loop history stays in the lead.
|
|
12
12
|
- **Targeted fixes.** Each fix subagent sees ONLY the most recent audit's findings. Prior loops are invisible to the fix subagent.
|
|
13
|
-
- **Opus 4.7 at xhigh effort for
|
|
13
|
+
- **Opus 4.7 at xhigh effort for validator and fix subagents.** Single-auditor mode, validator, and fix spawns pass `model="opus"`; parallel-auditor siblings (`-b` through `-k`) pass `model="haiku"`. Opus 4.7's default effort level in Claude Code is `xhigh` (https://code.claude.com/docs/en/model-config — *"On Opus 4.7, the default effort is `xhigh` for all plans and providers."*), so no `effort` override is needed at spawn time. Effort is set per-subagent in YAML frontmatter, not via the `Agent` tool's parameters; `code-quality-agent` and `clean-coder` rely on the model default. The trade vs Sonnet is higher per-loop cost in exchange for deeper audit recall and stronger fix correctness on bug-hunting work, which the per-PR loop economics tolerate (10-loop hard cap bounds total spend).
|
|
14
14
|
- **Fix subagent receives the latest audit as its input contract.** Passing the audit's findings to the fix subagent is the input contract — each loop's fix run operates on the current audit's output and only that.
|
|
15
15
|
- **One commit per fix action.** Loops produce one commit per loop, not one per bug.
|
|
16
16
|
- **Linear branch, fixed PR base.** Every loop appends one forward-only commit; existing commits and the PR base stay intact throughout the cycle.
|
|
17
17
|
- **Lead-only cleanup.** Cleanup runs in the lead (this session) only. Step 4 removes the full `<run_temp_dir>` so no loop patches leak between runs.
|
|
18
|
-
- **Cleanup all `.bugteam-*` files on exit.** The per-run `<run_temp_dir>` is removed entirely by Step 4, which covers `<run_temp_dir>/pr-<N>/loop-<L>.patch` and `<run_temp_dir>/pr-<N>/loop-<L
|
|
18
|
+
- **Cleanup all `.bugteam-*` files on exit.** The per-run `<run_temp_dir>` is removed entirely by Step 4, which covers `<run_temp_dir>/pr-<N>/loop-<L>.patch` and `<run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml`. The per-loop outcomes XML at `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml` is removed with the worktree. Step 4.5 deletes `.bugteam-final.diff`, `.bugteam-original-body.md`, and `.bugteam-final-body.md`. Working directory ends clean.
|
|
19
19
|
- **Audit/fix comment posting.** The bugfind subagent posts ONE per-loop review (parent body + child finding comments in a single batched POST, with review-fallback to a top-level issue comment). The bugfix subagent posts the fix replies after committing. All comment, review, and reply POSTs belong to the subagents; the lead's single PR-write action is the final description rewrite at Step 4.5.
|
|
20
20
|
- **Lead owns the final PR description rewrite only** (Step 4.5), and only via the `pr-description-writer` agent. The lead does not compose the description inline.
|
|
21
21
|
- **One review per loop, findings as child comments of that review.** Each loop posts a single pull-request review whose body is the loop header and whose `comments[]` are the anchored findings. Each loop's review stands alone — one review created per loop, fully self-contained on the PR conversation.
|
|
@@ -10,7 +10,7 @@ Keep the spawn prompt self-contained: reference only the PR scope, audit rubric,
|
|
|
10
10
|
<branch>head ref</branch>
|
|
11
11
|
<base_branch>base ref</base_branch>
|
|
12
12
|
<pr_url>full URL</pr_url>
|
|
13
|
-
<loop>
|
|
13
|
+
<loop>L</loop>
|
|
14
14
|
<pr_number>N</pr_number>
|
|
15
15
|
<worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
|
|
16
16
|
</context>
|
|
@@ -45,11 +45,17 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
45
45
|
</constraints>
|
|
46
46
|
|
|
47
47
|
<comment_posting>
|
|
48
|
+
Sibling auditors (-b through -k): run only steps 1–3 (audit, assign IDs,
|
|
49
|
+
capture excerpt, validate anchors), then write outcome XML per <output_format> and return.
|
|
50
|
+
Skip steps 4–8 — sibling auditors do not post PR reviews.
|
|
51
|
+
|
|
52
|
+
Validator (-a) and single-opus auditors: run all steps below.
|
|
53
|
+
|
|
48
54
|
1. Audit the diff against the 10 categories above. Buffer the findings
|
|
49
55
|
in memory; all posting happens at step 6 once anchors are validated.
|
|
50
|
-
2. Assign each finding a stable finding_id of exactly the form `
|
|
51
|
-
where K is 1-based within this loop.
|
|
52
|
-
3. Validate every finding's (file, line) against the captured diff. Split
|
|
56
|
+
2. Assign each finding a stable finding_id of exactly the form `loop<L>-<K>`
|
|
57
|
+
where <K> is 1-based within this loop.
|
|
58
|
+
3. For each finding, capture a verbatim excerpt from the target file at the cited line. Populate the `<excerpt>` element in the outcome XML with it. Validate every finding's (file, line) against the captured diff. Split
|
|
53
59
|
findings into two buckets: anchored (line is in the diff) and
|
|
54
60
|
unanchored (line is not in the diff — goes into the review body's
|
|
55
61
|
"Findings without a diff anchor" section per Step 2.5).
|
|
@@ -61,7 +67,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
61
67
|
Category: <letter> (<category name>)
|
|
62
68
|
<2-3 sentence description with concrete trace>
|
|
63
69
|
|
|
64
|
-
_From /bugteam audit loop
|
|
70
|
+
_From /bugteam audit loop <L>._
|
|
65
71
|
|
|
66
72
|
6. Post ONE review via Step 2.5's per-loop review CLI shape. Harvest the
|
|
67
73
|
parent review `html_url` from the response JSON and the `comments[]`
|
|
@@ -76,17 +82,17 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
76
82
|
</comment_posting>
|
|
77
83
|
|
|
78
84
|
<output_format>
|
|
79
|
-
For the
|
|
80
|
-
the PR's worktree directory (<worktree_path>). For sibling auditors (-b
|
|
85
|
+
For the (-a) validator: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
|
|
86
|
+
the PR's worktree directory (<worktree_path>). For sibling auditors (-b through -k): write to <run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml (absolute path passed in prompt). Sibling auditors do not post PR reviews; set review_url, finding_comment_id, and finding_comment_url to empty strings, and used_fallback to "false". Omit unanchored findings from sibling output — only the validator handles those. Return only that path on stdout. The schema:
|
|
81
87
|
</output_format>
|
|
82
88
|
```
|
|
83
89
|
|
|
84
90
|
## AUDIT outcome XML schema (bugfind writes this)
|
|
85
91
|
|
|
86
92
|
```xml
|
|
87
|
-
<bugteam_audit loop="<
|
|
93
|
+
<bugteam_audit loop="<L>" review_url="<url>">
|
|
88
94
|
<finding
|
|
89
|
-
finding_id="loop<
|
|
95
|
+
finding_id="loop<L>-<K>"
|
|
90
96
|
severity="P0|P1|P2"
|
|
91
97
|
category="<letter>"
|
|
92
98
|
file="<path>"
|
|
@@ -96,6 +102,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
96
102
|
used_fallback="true|false"
|
|
97
103
|
>
|
|
98
104
|
<title>one-line title</title>
|
|
105
|
+
<excerpt>verbatim source line or snippet from the file at the cited line</excerpt>
|
|
99
106
|
<description>2-3 sentence description with concrete trace</description>
|
|
100
107
|
</finding>
|
|
101
108
|
<verified_clean>
|
|
@@ -114,7 +121,7 @@ After the teammate writes the XML and returns, the lead reads `.bugteam-pr<N>-lo
|
|
|
114
121
|
<branch>head</branch>
|
|
115
122
|
<base_branch>base</base_branch>
|
|
116
123
|
<pr_url>url</pr_url>
|
|
117
|
-
<loop>
|
|
124
|
+
<loop>L</loop>
|
|
118
125
|
<pr_number>N</pr_number>
|
|
119
126
|
<worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
|
|
120
127
|
</context>
|
|
@@ -124,7 +131,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
124
131
|
<bugs_to_fix>
|
|
125
132
|
[for each P0/P1/P2 finding from last_findings:]
|
|
126
133
|
<bug
|
|
127
|
-
finding_id="loop<
|
|
134
|
+
finding_id="loop<L>-<K>"
|
|
128
135
|
severity="P0|P1|P2"
|
|
129
136
|
file="<path>"
|
|
130
137
|
line="<int>"
|
|
@@ -156,9 +163,9 @@ cd into `<worktree_path>` before any git, gh, or file operation.
|
|
|
156
163
|
</execution>
|
|
157
164
|
|
|
158
165
|
<outcome_xml_schema>
|
|
159
|
-
<bugteam_fix loop="<
|
|
166
|
+
<bugteam_fix loop="<L>" commit_sha="<sha or empty if no commit>">
|
|
160
167
|
<outcome
|
|
161
|
-
finding_id="loop<
|
|
168
|
+
finding_id="loop<L>-<K>"
|
|
162
169
|
status="fixed|could_not_address|hook_blocked"
|
|
163
170
|
commit_sha="<sha if fixed, empty otherwise>"
|
|
164
171
|
reply_comment_id="<id of the reply posted>"
|
package/skills/bugteam/SKILL.md
CHANGED
|
@@ -184,7 +184,7 @@ only PR write before Step 4.5 is the final description rewrite.
|
|
|
184
184
|
|
|
185
185
|
Order: audit → buffer → validate anchors vs diff → single review POST.
|
|
186
186
|
Review body states counts; zero findings → still one review, `comments: []`,
|
|
187
|
-
body `## /bugteam loop <
|
|
187
|
+
body `## /bugteam loop <L> audit: 0P0 / 0P1 / 0P2 → clean`.
|
|
188
188
|
|
|
189
189
|
**Payloads:** build JSON with `jq --rawfile` / `-Rs`, pipe to `gh api ...
|
|
190
190
|
--input -` (avoids shell-quoting; satisfies `gh-body-backtick-guard`). Write
|
|
@@ -228,7 +228,7 @@ before POST.
|
|
|
228
228
|
**Review body template (`<tmp_review_body.md>`):**
|
|
229
229
|
|
|
230
230
|
```
|
|
231
|
-
## /bugteam loop <
|
|
231
|
+
## /bugteam loop <L> audit: <P0>P0 / <P1>P1 / <P2>P2
|
|
232
232
|
|
|
233
233
|
### Findings without a diff anchor
|
|
234
234
|
(only if needed)
|
|
@@ -318,11 +318,11 @@ Non-zero → spawn **clean-coder** standards-fix (read stderr, edit, re-run
|
|
|
318
318
|
**5**
|
|
319
319
|
failed gate rounds → `error: code rules gate failed pre-audit`. After **0**:
|
|
320
320
|
`loop_count += 1`; if `loop_count > 10` → `cap reached`. Then **AUDIT**
|
|
321
|
-
(bugfind); print `Loop <
|
|
321
|
+
(bugfind); print `Loop <L> audit: ...`.
|
|
322
322
|
|
|
323
323
|
3. **FIX** (`last_action == "audited"` and `last_findings.total > 0`):
|
|
324
324
|
`loop_count += 1`; if `loop_count > 10` → `cap reached`; **FIX** (bugfix);
|
|
325
|
-
print `Loop <
|
|
325
|
+
print `Loop <L> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
|
|
326
326
|
to step 1.
|
|
327
327
|
|
|
328
328
|
4. After **AUDIT**: update `last_action`, `last_findings`, `audit_log`; print
|
|
@@ -361,18 +361,31 @@ background-completion notification, then reads
|
|
|
361
361
|
`last_action = "audited"`; append audit line to `audit_log`.
|
|
362
362
|
|
|
363
363
|
**Parallel auditors (`loop_count >= 4`):** gate passes immediately before;
|
|
364
|
-
after three full audit/fix rounds without convergence, issue
|
|
365
|
-
calls in one assistant message (`run_in_background=true`):
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
364
|
+
after three full audit/fix rounds without convergence, issue eleven `Agent`
|
|
365
|
+
calls in one assistant message (`run_in_background=true`):
|
|
366
|
+
|
|
367
|
+
- **10 haiku auditors (`-b` through `-k`):** `subagent_type="code-quality-agent"`,
|
|
368
|
+
`model="haiku"`, write sibling XML to
|
|
369
|
+
`<run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml`, skip PR posting.
|
|
370
|
+
Prompts must pass literal absolute sibling paths.
|
|
371
|
+
- **1 opus validator (`-a`):** `subagent_type="code-quality-agent"`,
|
|
372
|
+
`model="opus"`:
|
|
373
|
+
- Polls for all 10 sibling XMLs before proceeding (60s timeout, 2s interval). On timeout: log diagnostics entry, proceed with validated findings from available XMLs, report count in validator output.
|
|
374
|
+
- Validates each finding: file exists, line in bounds, excerpt contains the exact
|
|
375
|
+
text of the cited line, category is A–J, severity is P0/P1/P2.
|
|
376
|
+
- Hallucinated findings → quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under
|
|
377
|
+
`validator_rejected` (added alongside the required diagnostics keys defined in the shared audit contract).
|
|
378
|
+
- De-dups by `(file, line, category)`, max severity wins; on conflict, keep longest description text.
|
|
379
|
+
- Re-ids as `loop<L>-<K>`.
|
|
380
|
+
- Writes `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`, posts review.
|
|
381
|
+
|
|
382
|
+
Lead awaits the opus validator (-a) background-completion notification (120s
|
|
383
|
+
timeout). The validator independently polls all 10 sibling XMLs; the lead does
|
|
384
|
+
not gate on haiku peer completion. On lead timeout: the validator did not post
|
|
385
|
+
a merged review — treat as a hard blocker and abort the loop.
|
|
386
|
+
|
|
387
|
+
The sibling-output paths in [`PROMPTS.md`](PROMPTS.md) must cover the full
|
|
388
|
+
`-b` through `-k` range.
|
|
376
389
|
|
|
377
390
|
### FIX action
|
|
378
391
|
|
|
@@ -22,9 +22,9 @@ Each invariant cites the normative section or companion file it derives from. Al
|
|
|
22
22
|
| I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
|
|
23
23
|
| I-3 | Orchestration uses `Agent(..., run_in_background=true)` only — no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
|
|
24
24
|
| I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` — **Fresh subagent per loop** |
|
|
25
|
-
| I-5 | Audit and fix spawns pass `model="opus"
|
|
25
|
+
| I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
|
|
26
26
|
| I-6 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
|
|
27
|
-
| I-7 | From loop 4 onward without convergence,
|
|
27
|
+
| I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
|
|
28
28
|
| I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
|
|
29
29
|
| I-9 | Teardown sequence: `git worktree remove` each PR → `rmtree` `<run_temp_dir>` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
|
|
30
30
|
| I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
|
|
@@ -171,14 +171,14 @@ Patch this table to match observation and annotate each correction.
|
|
|
171
171
|
|
|
172
172
|
**Layer B predicted behavior.**
|
|
173
173
|
- Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
|
|
174
|
-
- Loops 4–10:
|
|
174
|
+
- Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
|
|
175
175
|
- Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
|
|
176
176
|
- Exactly 10 audit phases, exactly 10 fix phases.
|
|
177
177
|
- Steps 19–26 from Eval 5 fire at teardown.
|
|
178
178
|
|
|
179
179
|
**Pass criteria.**
|
|
180
180
|
- I-6 holds: exactly 10 audit phases.
|
|
181
|
-
- I-7 holds: loops 4–10 each emit
|
|
181
|
+
- I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
|
|
182
182
|
- Final report contains `/bugteam exit: cap reached` and the remaining bug count.
|
|
183
183
|
|
|
184
184
|
**Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
|
|
@@ -24,11 +24,11 @@ Repeat until an exit condition fires.
|
|
|
24
24
|
2. If exit code **0** → continue to step 2.5 (AUDIT spawn) below.
|
|
25
25
|
3. If exit code **non-zero** → spawn a new **clean-coder** teammate — **standards-fix pass** — with instructions: read the script’s stderr, edit the repo until a **re-run** of the **same** gate command exits **0**, then one commit, `git push`, shutdown. Repeat standards-fix spawns until the gate exits **0** or **5** failed gate rounds (each round = one teammate session after a non-zero gate). If still non-zero after 5 rounds → exit reason = `error: code rules gate failed pre-audit`.
|
|
26
26
|
4. After gate exit **0**, increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached` (counts **audits**, not standards-only rounds).
|
|
27
|
-
5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <
|
|
27
|
+
5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <L> audit: ...`
|
|
28
28
|
|
|
29
29
|
3. **FIX path** (when `last_action == "audited"` and `last_findings.total > 0`):
|
|
30
30
|
1. Increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached`.
|
|
31
|
-
2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <
|
|
31
|
+
2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <L> fix: commit ...`
|
|
32
32
|
3. Set `last_action = "fixed"`, update `audit_log`, loop to step 1 (next iteration hits **pre-audit path** before the next AUDIT).
|
|
33
33
|
|
|
34
34
|
4. After **AUDIT**, update `last_action`, `last_findings`, `audit_log`; print the audit progress line if not already printed.
|
|
@@ -39,62 +39,45 @@ Repeat until an exit condition fires.
|
|
|
39
39
|
|
|
40
40
|
## AUDIT action (clean-room teammate, fresh per loop)
|
|
41
41
|
|
|
42
|
-
Capture a fresh PR diff for this loop into the per-
|
|
42
|
+
Capture a fresh PR diff for this loop into the per-PR scoped directory so concurrent `/bugteam` runs keep patches isolated. Use the literal `<run_temp_dir>` resolved once in Step 2 — Claude resolves the absolute path; every shell receives the same literal value.
|
|
43
43
|
|
|
44
44
|
Commands and `Agent(...)` shape: `SKILL.md`.
|
|
45
45
|
|
|
46
|
-
`<
|
|
46
|
+
`<run_temp_dir>` includes the sanitized `team_name` and timestamp; `team_name` is already prefixed with `bugteam-`. Claude resolves `Path(tempfile.gettempdir()) / team_name` once and passes that absolute path to every shell. `tempfile.gettempdir()` honors `TMPDIR`, `TEMP`, `TMP` and falls back to the OS temp directory, so the same approach works on macOS, Linux, Windows cmd.exe, and PowerShell.
|
|
47
47
|
|
|
48
48
|
Each loop calls `Agent` again with a fresh invocation so the teammate starts with its own context window. Doc line on lead history: [`../sources.md`](../sources.md).
|
|
49
49
|
|
|
50
50
|
See [`../PROMPTS.md`](../PROMPTS.md) for AUDIT spawn-prompt XML and bugfind outcome schema. Substitute placeholders (`repo`, `branch`, `base_branch`, `pr_url`, `loop`, `diff_path`) into the `prompt` argument.
|
|
51
51
|
|
|
52
|
-
After the teammate returns, the lead reads `.bugteam-loop
|
|
52
|
+
After the teammate returns, the lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` from the worktree directory with the `Read` tool, parses it, and populates `loop_comment_index` from `<finding>` elements.
|
|
53
53
|
|
|
54
54
|
### Shutdown (bugfind)
|
|
55
55
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
**Fallback — lead-initiated shutdown:** If the teammate still appears active after `Agent` returns, send:
|
|
59
|
-
|
|
60
|
-
```
|
|
61
|
-
SendMessage(
|
|
62
|
-
to="bugfind",
|
|
63
|
-
message={
|
|
64
|
-
"type": "shutdown_request",
|
|
65
|
-
"reason": "audit loop <N> complete; outcome XML captured"
|
|
66
|
-
}
|
|
67
|
-
)
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
The teammate replies with `{type: "shutdown_response", approve: true}`. If `approve` is `false`, exit reason = `error: bugfind teammate refused shutdown` → Step 4 teardown then Step 5 revoke.
|
|
56
|
+
Teammates self-terminate when complete — the background-completion notification arrives and the lead reads the outcomes XML. If the notification does not arrive within the lead timeout (120s), treat as a hard blocker and abort the loop.
|
|
71
57
|
|
|
72
58
|
`last_action = "audited"`. Append audit metadata to `audit_log`.
|
|
73
59
|
|
|
74
60
|
### Parallel auditors (`loop_count >= 4`)
|
|
75
61
|
|
|
76
|
-
The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue
|
|
62
|
+
The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue eleven `Agent` calls in **one** assistant message so they run in parallel:
|
|
77
63
|
|
|
78
64
|
```
|
|
79
|
-
Agent(subagent_type="code-quality-agent", name="bugfind-
|
|
80
|
-
Agent(subagent_type="code-quality-agent", name="bugfind-
|
|
81
|
-
Agent(subagent_type="code-quality-agent", name="bugfind-
|
|
65
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-a", team_name="<team_name>", model="opus", run_in_background=true, description="Bugfind audit PR <N> loop <L> validator", prompt="<audit XML; poll for all 10 sibling XMLs at <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml through <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml (60s timeout, 2s interval); on timeout: log diagnostics entry, proceed with validated findings from available XMLs; validate each finding: file exists, line in bounds, excerpt matches claimed line, category A-J, severity P0/P1/P2; quarantine hallucinated findings to <run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json under validator_rejected; de-dup by (file, line, category), max severity wins, keep longest description on conflict; re-id as loop<L>-<K>; write <worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml; post review>")
|
|
66
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-b", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant b", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml; skip PR posting>")
|
|
67
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-c", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant c", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-c.outcomes.xml; skip PR posting>")
|
|
68
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-d", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant d", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-d.outcomes.xml; skip PR posting>")
|
|
69
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-e", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant e", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-e.outcomes.xml; skip PR posting>")
|
|
70
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-f", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant f", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-f.outcomes.xml; skip PR posting>")
|
|
71
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-g", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant g", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-g.outcomes.xml; skip PR posting>")
|
|
72
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-h", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant h", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-h.outcomes.xml; skip PR posting>")
|
|
73
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-i", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant i", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-i.outcomes.xml; skip PR posting>")
|
|
74
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-j", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant j", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-j.outcomes.xml; skip PR posting>")
|
|
75
|
+
Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-k", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant k", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml; skip PR posting>")
|
|
82
76
|
```
|
|
83
77
|
|
|
84
|
-
Teammate `-a` is the
|
|
85
|
-
|
|
86
|
-
Shutdown order: parallel `SendMessage` to `b` and `c`, then `a`:
|
|
78
|
+
Teammate `-a` is the opus validator: polls for all 10 sibling XMLs at explicit absolute paths under `<run_temp_dir>/pr-<N>` (60s timeout, 2s interval; on timeout: log diagnostics entry, proceed with validated findings from available XMLs), then validates each finding — file exists, line in bounds, excerpt matches claimed line, category is A–J, severity is P0/P1/P2. Hallucinated findings are quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under `validator_rejected`. Valid findings are de-duplicated by `(file, line, category)` (max severity wins, keep longest description on conflict) and re-assigned merged IDs as `loop<L>-<K>`. The `-a` prompt must embed sibling paths as literal absolutes so `Read` works without discovery.
|
|
87
79
|
|
|
88
|
-
|
|
89
|
-
SendMessage(to="bugfind-loop-<N>-b", message={"type": "shutdown_request", "reason": "variant XML captured"})
|
|
90
|
-
SendMessage(to="bugfind-loop-<N>-c", message={"type": "shutdown_request", "reason": "variant XML captured"})
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
then
|
|
94
|
-
|
|
95
|
-
```
|
|
96
|
-
SendMessage(to="bugfind-loop-<N>-a", message={"type": "shutdown_request", "reason": "merged review posted"})
|
|
97
|
-
```
|
|
80
|
+
All subagents self-terminate via background completion. The lead awaits only the validator (-a) notification (120s timeout). Missing notification → hard blocker.
|
|
98
81
|
|
|
99
82
|
## FIX action (fresh teammate)
|
|
100
83
|
|
|
@@ -106,17 +89,7 @@ After replies, the teammate writes outcome XML (schema in [`../PROMPTS.md`](../P
|
|
|
106
89
|
|
|
107
90
|
### Shutdown (bugfix)
|
|
108
91
|
|
|
109
|
-
Same self-termination
|
|
110
|
-
|
|
111
|
-
```
|
|
112
|
-
SendMessage(
|
|
113
|
-
to="bugfix",
|
|
114
|
-
message={
|
|
115
|
-
"type": "shutdown_request",
|
|
116
|
-
"reason": "fix loop <N> complete; commit <sha7> pushed"
|
|
117
|
-
}
|
|
118
|
-
)
|
|
119
|
-
```
|
|
92
|
+
Same self-termination model as bugfind. Missing notification → hard blocker.
|
|
120
93
|
|
|
121
94
|
`approve: false` → `error: bugfix teammate refused shutdown` → Step 4 then 5.
|
|
122
95
|
|
|
@@ -8,7 +8,7 @@ Shared output schema and audit-loop contract used by `/bugteam`, `/qbug`, `/find
|
|
|
8
8
|
- Adversarial second pass
|
|
9
9
|
- Haiku secondary auditor
|
|
10
10
|
- Post-fix self-audit
|
|
11
|
-
- Persistence (loop
|
|
11
|
+
- Persistence (loop-<L>-audit.json, loop-<L>-diagnostics.json)
|
|
12
12
|
|
|
13
13
|
## Finding schema
|
|
14
14
|
|
|
@@ -18,7 +18,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
|
|
|
18
18
|
|
|
19
19
|
```json
|
|
20
20
|
{
|
|
21
|
-
"id": "loop<
|
|
21
|
+
"id": "loop<L>-<K>",
|
|
22
22
|
"file": "path/relative/to/repo/root.py",
|
|
23
23
|
"line": 123,
|
|
24
24
|
"category": "A | B | C | D | E | F | G | H | I | J",
|
|
@@ -29,7 +29,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
|
|
|
29
29
|
}
|
|
30
30
|
```
|
|
31
31
|
|
|
32
|
-
`id` is `loop<
|
|
32
|
+
`id` is `loop<L>-<K>` where `L` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
|
|
33
33
|
|
|
34
34
|
### Shape B — structured proof-of-absence
|
|
35
35
|
|
|
@@ -105,9 +105,9 @@ Merge rules:
|
|
|
105
105
|
- **Unique-to-Haiku findings**: added to the primary set with Haiku's severity and source annotation.
|
|
106
106
|
- **Unique-to-primary findings**: kept as-is.
|
|
107
107
|
- **Zero Haiku findings**: primary set trusted; proceed.
|
|
108
|
-
- **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<
|
|
108
|
+
- **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<L>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
|
|
109
109
|
|
|
110
|
-
For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via
|
|
110
|
+
For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via 10 haiku auditors + opus validator.
|
|
111
111
|
|
|
112
112
|
## Post-fix self-audit
|
|
113
113
|
|
|
@@ -131,7 +131,7 @@ Sequence:
|
|
|
131
131
|
|
|
132
132
|
Every audit loop writes two JSON files under the skill's scoped temp directory (resolved via `tempfile.gettempdir()`):
|
|
133
133
|
|
|
134
|
-
### `loop-<
|
|
134
|
+
### `loop-<L>-audit.json`
|
|
135
135
|
|
|
136
136
|
```json
|
|
137
137
|
{
|
|
@@ -141,7 +141,7 @@ Every audit loop writes two JSON files under the skill's scoped temp directory (
|
|
|
141
141
|
}
|
|
142
142
|
```
|
|
143
143
|
|
|
144
|
-
### `loop-<
|
|
144
|
+
### `loop-<L>-diagnostics.json`
|
|
145
145
|
|
|
146
146
|
```json
|
|
147
147
|
{
|
|
@@ -23,20 +23,25 @@ Decide (four branches; match first whose predicate holds):
|
|
|
23
23
|
|
|
24
24
|
- **`classification == "dirty"` with non-empty inline comments matching
|
|
25
25
|
`pull_request_review_id`:** Fix protocol input (same shape as bugbot
|
|
26
|
-
dirty).
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
26
|
+
dirty). Spawn Agent (subagent_type: clean-coder) to implement → push → reply inline on each thread via
|
|
27
|
+
`reply_to_inline_comment.py` → Step 3 in same tick (see
|
|
28
|
+
[Single-PR fix workflow](fix-protocol.md#single-pr-fix-workflow) for
|
|
29
|
+
full contract).
|
|
30
|
+
Reset `bugbot_clean_at = null` AND `copilot_clean_at = null`, `phase =
|
|
31
|
+
BUGBOT`, schedule next wakeup, return. Full back-to-back-clean cycle
|
|
32
|
+
plus all four gates must hold again on new HEAD.
|
|
31
33
|
- **`classification == "dirty"` with empty inline comments matching
|
|
32
34
|
`pull_request_review_id`:** Copilot posted findings only in review body
|
|
33
35
|
(`CHANGES_REQUESTED` or `COMMENTED` with non-empty body, no inline
|
|
34
|
-
threads). Parse body for actionable findings
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
36
|
+
threads). Parse body for actionable findings. Spawn Agent (subagent_type: clean-coder) to implement → push → post
|
|
37
|
+
top-level review reply citing new HEAD SHA → Step 3 in same tick (see
|
|
38
|
+
[Single-PR fix workflow](fix-protocol.md#single-pr-fix-workflow) for
|
|
39
|
+
full contract).
|
|
40
|
+
Reset
|
|
41
|
+
`bugbot_clean_at = null` AND
|
|
42
|
+
`copilot_clean_at = null`, `phase = BUGBOT`, Step 3 on new HEAD,
|
|
43
|
+
schedule next wakeup, return. Convergence requires full
|
|
44
|
+
back-to-back-clean on new HEAD.
|
|
40
45
|
- **`classification == "clean"` (state `APPROVED`):** Set
|
|
41
46
|
`copilot_clean_at = current_head`. Continue to gate (b).
|
|
42
47
|
- **No Copilot review on `current_head` yet:** Skip — gate (c) issues
|
|
@@ -89,13 +94,12 @@ Next tick with `phase == BUGTEAM` and prior state preserved → re-run gate
|
|
|
89
94
|
current_head`. Mark PR ready (`mark_pr_ready.py`), report convergence
|
|
90
95
|
per §(d), terminate per [stop-conditions.md](stop-conditions.md) / Convergence.
|
|
91
96
|
- **Copilot review `dirty`:** Treat identically to gate (a) dirty path —
|
|
92
|
-
fix in same PR, restart convergence from BUGBOT.
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
back-to-back-clean cycle plus all four gates must hold again on new HEAD.
|
|
97
|
+
spawn Agent (subagent_type: clean-coder) to fix in same PR, restart convergence from BUGBOT. Follow [Single-PR fix workflow](fix-protocol.md#single-pr-fix-workflow).
|
|
98
|
+
For body-only findings with empty inline, spawn Agent (subagent_type: clean-coder) to implement, then post top-level review reply
|
|
99
|
+
citing new HEAD SHA. Reset `bugbot_clean_at = null` AND
|
|
100
|
+
`copilot_clean_at = null`, `phase = BUGBOT`, schedule next wakeup,
|
|
101
|
+
return. Full back-to-back-clean cycle plus all four gates must hold
|
|
102
|
+
again on new HEAD.
|
|
99
103
|
- **No Copilot review at `current_head` yet (still propagating):**
|
|
100
104
|
Schedule one more wakeup (270s), re-check next tick. After three consecutive empty waits,
|
|
101
105
|
escalate as hard blocker per [stop-conditions.md](stop-conditions.md).
|
|
@@ -20,6 +20,8 @@ per [ground-rules.md](ground-rules.md).
|
|
|
20
20
|
Orchestrator does not reply inline, trigger bugbot, or read repo source
|
|
21
21
|
files during fix phase in multi-PR mode.
|
|
22
22
|
|
|
23
|
+
### Single-PR fix workflow
|
|
24
|
+
|
|
23
25
|
**Single-PR (no `state.json`) — same gates, main session executor:**
|
|
24
26
|
|
|
25
27
|
- Read each referenced file:line.
|
|
@@ -37,9 +37,11 @@ state line when **no** `state.json` (single-PR only). With `state.json`, do
|
|
|
37
37
|
**not** increment here — orchestrator's per-tick bump is sole increment.
|
|
38
38
|
|
|
39
39
|
```bash
|
|
40
|
-
python "${CLAUDE_SKILL_DIR}/scripts/view_pr_context.py"
|
|
40
|
+
python "${CLAUDE_SKILL_DIR}/scripts/view_pr_context.py" --owner <OWNER> --repo <REPO> --number <NUMBER>
|
|
41
41
|
```
|
|
42
42
|
|
|
43
|
+
If owner/repo/number are not yet known, extract them from the PR URL or run without flags in a repo checkout.
|
|
44
|
+
|
|
43
45
|
Capture `number`, `headRefOid` (= `current_head`), owner/repo, branch.
|
|
44
46
|
|
|
45
47
|
## Step 2: Branch on `phase`
|
|
@@ -93,9 +95,11 @@ c. Decide (four branches; match first whose predicate holds):
|
|
|
93
95
|
`state.json`: clean-coder teammate pushes, replies inline, writes
|
|
94
96
|
`state.json`, goes idle; Step 3 on new HEAD runs after via
|
|
95
97
|
orchestrator-spawned follow-up agent (§Fix result → general-purpose).
|
|
96
|
-
No `state.json` (single-PR): implement → push → inline
|
|
97
|
-
→ Step 3 in same tick
|
|
98
|
-
|
|
98
|
+
No `state.json` (single-PR): spawn Agent (subagent_type: clean-coder) to implement → push → reply inline on each thread
|
|
99
|
+
via `reply_to_inline_comment.py` → Step 3 in same tick (see
|
|
100
|
+
[Single-PR fix workflow](fix-protocol.md#single-pr-fix-workflow) for
|
|
101
|
+
full contract).
|
|
102
|
+
Schedule next wakeup, return.
|
|
99
103
|
- **`commit_id == current_head` AND review body findings AND inline
|
|
100
104
|
API zero matching for `current_head`:** Transient API lag. Increment
|
|
101
105
|
`inline_lag_streak`. `>= 3` → hard blocker; report and terminate with
|
|
@@ -142,9 +146,8 @@ never falsely terminates:
|
|
|
142
146
|
**omit loop pacing** per **Convergence** of active pacing workflow.
|
|
143
147
|
- **Convergence BUT `bugbot_clean_at != current_head` (no push):**
|
|
144
148
|
`phase = BUGBOT`, schedule next wakeup, return.
|
|
145
|
-
- **Findings without committed fixes:**
|
|
146
|
-
|
|
147
|
-
single-PR. `phase = BUGBOT`, schedule next wakeup, return.
|
|
149
|
+
- **Findings without committed fixes:** spawn Agent (subagent_type: clean-coder) to implement fixes and push, then reply inline via `reply_to_inline_comment.py`, following [Single-PR fix workflow](fix-protocol.md#single-pr-fix-workflow).
|
|
150
|
+
`phase = BUGBOT`, schedule next wakeup, return.
|
|
148
151
|
|
|
149
152
|
## Step 3: Re-trigger bugbot
|
|
150
153
|
|
|
@@ -67,6 +67,22 @@ BUGBOT_RUN_TEMPFILE_PREFIX: str = "pr-converge-bugbot-run-"
|
|
|
67
67
|
|
|
68
68
|
PR_CONTEXT_FIELDS: str = "number,url,headRefOid,baseRefName,headRefName,isDraft"
|
|
69
69
|
|
|
70
|
+
PR_DETACHED_HEAD_ARGS_ERROR: str = "--owner and --repo require --number; all three must be provided together for detached-HEAD PR resolution"
|
|
71
|
+
|
|
72
|
+
PR_NUMBER_ARG_FLAG: str = "--number"
|
|
73
|
+
|
|
74
|
+
PR_NUMBER_ARG_HELP: str = "PR number"
|
|
75
|
+
|
|
76
|
+
PR_OWNER_ARG_FLAG: str = "--owner"
|
|
77
|
+
|
|
78
|
+
PR_OWNER_ARG_HELP: str = "GitHub repository owner"
|
|
79
|
+
|
|
80
|
+
PR_REPO_ARG_FLAG: str = "--repo"
|
|
81
|
+
|
|
82
|
+
PR_REPO_ARG_HELP: str = "GitHub repository name"
|
|
83
|
+
|
|
84
|
+
GH_REPO_FLAG: str = "--repo"
|
|
85
|
+
|
|
70
86
|
MERGEABILITY_FIELDS: str = "mergeable,mergeStateStatus,headRefOid"
|
|
71
87
|
|
|
72
88
|
GH_FIELD_BODY_AT_PREFIX: str = "body=@"
|
|
@@ -91,6 +91,26 @@ def test_should_raise_when_gh_subprocess_fails() -> None:
|
|
|
91
91
|
view_pr_context_module.view_pr_context()
|
|
92
92
|
|
|
93
93
|
|
|
94
|
+
def test_should_append_number_and_repo_flag_when_owner_repo_and_number_provided() -> None:
|
|
95
|
+
payload = json.dumps(
|
|
96
|
+
{
|
|
97
|
+
"number": 25,
|
|
98
|
+
"url": "https://github.com/acme/widget/pull/25",
|
|
99
|
+
"headRefOid": "abc123",
|
|
100
|
+
"baseRefName": "main",
|
|
101
|
+
"headRefName": "feat/x",
|
|
102
|
+
"isDraft": True,
|
|
103
|
+
}
|
|
104
|
+
)
|
|
105
|
+
with patch("subprocess.run") as mock_run:
|
|
106
|
+
mock_run.return_value = _completed(payload)
|
|
107
|
+
view_pr_context_module.view_pr_context(number="25", owner="acme", repo="widget")
|
|
108
|
+
invoked_argv = mock_run.call_args[0][0]
|
|
109
|
+
assert "25" in invoked_argv
|
|
110
|
+
assert "--repo" in invoked_argv
|
|
111
|
+
assert "acme/widget" in invoked_argv
|
|
112
|
+
|
|
113
|
+
|
|
94
114
|
def test_should_pass_imported_constant_directly_without_local_alias() -> None:
|
|
95
115
|
payload = json.dumps(
|
|
96
116
|
{
|
|
@@ -109,3 +129,27 @@ def test_should_pass_imported_constant_directly_without_local_alias() -> None:
|
|
|
109
129
|
fields_arg = invoked_argv[invoked_argv.index("--json") + 1]
|
|
110
130
|
expected_fields = view_pr_context_module.PR_CONTEXT_FIELDS
|
|
111
131
|
assert fields_arg is expected_fields
|
|
132
|
+
|
|
133
|
+
|
|
134
|
+
def test_should_not_exit_when_number_provided_alone() -> None:
|
|
135
|
+
payload = json.dumps(
|
|
136
|
+
{
|
|
137
|
+
"number": 42,
|
|
138
|
+
"url": "https://github.com/acme/widget/pull/42",
|
|
139
|
+
"headRefOid": "abc123",
|
|
140
|
+
"baseRefName": "main",
|
|
141
|
+
"headRefName": "feat/x",
|
|
142
|
+
"isDraft": True,
|
|
143
|
+
}
|
|
144
|
+
)
|
|
145
|
+
with patch("subprocess.run") as mock_run:
|
|
146
|
+
mock_run.return_value = _completed(payload)
|
|
147
|
+
with patch("sys.argv", ["view_pr_context.py", "--number", "42"]):
|
|
148
|
+
return_code = view_pr_context_module.main()
|
|
149
|
+
assert return_code == 0
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
def test_should_exit_when_owner_and_repo_provided_without_number() -> None:
|
|
153
|
+
with patch("sys.argv", ["view_pr_context.py", "--owner", "acme", "--repo", "widget"]):
|
|
154
|
+
with pytest.raises(SystemExit):
|
|
155
|
+
view_pr_context_module.main()
|
|
@@ -17,12 +17,33 @@ from evict_cached_config_modules import evict_cached_config_modules
|
|
|
17
17
|
|
|
18
18
|
evict_cached_config_modules()
|
|
19
19
|
|
|
20
|
-
from config.pr_converge_constants import
|
|
20
|
+
from config.pr_converge_constants import (
|
|
21
|
+
GH_REPO_ARG_TEMPLATE,
|
|
22
|
+
GH_REPO_FLAG,
|
|
23
|
+
PR_CONTEXT_FIELDS,
|
|
24
|
+
PR_DETACHED_HEAD_ARGS_ERROR,
|
|
25
|
+
PR_NUMBER_ARG_FLAG,
|
|
26
|
+
PR_NUMBER_ARG_HELP,
|
|
27
|
+
PR_OWNER_ARG_FLAG,
|
|
28
|
+
PR_OWNER_ARG_HELP,
|
|
29
|
+
PR_REPO_ARG_FLAG,
|
|
30
|
+
PR_REPO_ARG_HELP,
|
|
31
|
+
)
|
|
21
32
|
|
|
22
33
|
|
|
23
|
-
def view_pr_context(
|
|
34
|
+
def view_pr_context(
|
|
35
|
+
number: str | None = None,
|
|
36
|
+
owner: str | None = None,
|
|
37
|
+
repo: str | None = None,
|
|
38
|
+
) -> dict[str, object]:
|
|
24
39
|
"""Return the parsed JSON object from `gh pr view --json <fields>`."""
|
|
25
40
|
gh_command: list[str] = ["gh", "pr", "view", "--json", PR_CONTEXT_FIELDS]
|
|
41
|
+
if owner and repo and number:
|
|
42
|
+
gh_command.append(number)
|
|
43
|
+
gh_command.append(GH_REPO_FLAG)
|
|
44
|
+
gh_command.append(GH_REPO_ARG_TEMPLATE.format(owner=owner, repo=repo))
|
|
45
|
+
elif number:
|
|
46
|
+
gh_command.append(number)
|
|
26
47
|
completed = subprocess.run(
|
|
27
48
|
gh_command,
|
|
28
49
|
capture_output=True,
|
|
@@ -36,8 +57,18 @@ def view_pr_context() -> dict[str, object]:
|
|
|
36
57
|
|
|
37
58
|
def main() -> int:
|
|
38
59
|
parser = argparse.ArgumentParser(description=__doc__)
|
|
39
|
-
parser.
|
|
40
|
-
|
|
60
|
+
parser.add_argument(PR_NUMBER_ARG_FLAG, default=None, help=PR_NUMBER_ARG_HELP)
|
|
61
|
+
parser.add_argument(PR_OWNER_ARG_FLAG, default=None, help=PR_OWNER_ARG_HELP)
|
|
62
|
+
parser.add_argument(PR_REPO_ARG_FLAG, default=None, help=PR_REPO_ARG_HELP)
|
|
63
|
+
parsed = parser.parse_args()
|
|
64
|
+
number = (parsed.number.strip() or None) if parsed.number else None
|
|
65
|
+
owner = (parsed.owner.strip() or None) if parsed.owner else None
|
|
66
|
+
repo = (parsed.repo.strip() or None) if parsed.repo else None
|
|
67
|
+
needs_repo = owner is not None or repo is not None
|
|
68
|
+
has_all = number is not None and owner is not None and repo is not None
|
|
69
|
+
if needs_repo and not has_all:
|
|
70
|
+
parser.error(PR_DETACHED_HEAD_ARGS_ERROR)
|
|
71
|
+
pr_context = view_pr_context(number=number, owner=owner, repo=repo)
|
|
41
72
|
json.dump(pr_context, sys.stdout)
|
|
42
73
|
sys.stdout.write("\n")
|
|
43
74
|
return 0
|