claude-dev-env 1.37.0 → 1.38.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (95) hide show
  1. package/CLAUDE.md +3 -0
  2. package/_shared/pr-loop/audit-contract.md +4 -3
  3. package/_shared/pr-loop/fix-protocol.md +2 -0
  4. package/_shared/pr-loop/gh-payloads.md +38 -37
  5. package/_shared/pr-loop/scripts/README.md +0 -1
  6. package/_shared/pr-loop/scripts/preflight.py +2 -1
  7. package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +2 -2
  8. package/_shared/pr-loop/scripts/tests/test_preflight.py +22 -0
  9. package/_shared/pr-loop/state-schema.md +10 -10
  10. package/agents/clean-coder.md +4 -0
  11. package/agents/code-quality-agent.md +23 -85
  12. package/agents/groq-coder.md +8 -6
  13. package/hooks/blocking/__init__.py +0 -0
  14. package/hooks/blocking/hedging_language_blocker.py +2 -2
  15. package/hooks/blocking/state_description_blocker.py +243 -0
  16. package/hooks/blocking/tdd_enforcer.py +94 -0
  17. package/hooks/blocking/test_hedging_language_blocker.py +1 -1
  18. package/hooks/blocking/test_state_description_blocker.py +618 -0
  19. package/hooks/blocking/test_tdd_enforcer.py +152 -0
  20. package/hooks/config/state_description_blocker_constants.py +130 -0
  21. package/hooks/hooks.json +10 -0
  22. package/package.json +1 -1
  23. package/rules/gh-paginate.md +4 -50
  24. package/rules/no-historical-clutter.md +57 -0
  25. package/scripts/config/groq_bugteam_config.py +13 -5
  26. package/skills/bugteam/CONSTRAINTS.md +20 -27
  27. package/skills/bugteam/EXAMPLES.md +1 -1
  28. package/skills/bugteam/PROMPTS.md +78 -42
  29. package/skills/bugteam/SKILL.md +76 -63
  30. package/skills/bugteam/SKILL_EVALS.md +12 -12
  31. package/skills/bugteam/reference/audit-and-teammates.md +21 -48
  32. package/skills/bugteam/reference/audit-contract.md +7 -7
  33. package/skills/bugteam/reference/github-pr-reviews.md +31 -31
  34. package/skills/bugteam/reference/team-setup.md +1 -1
  35. package/skills/bugteam/reference/teardown-publish-permissions.md +4 -4
  36. package/skills/copilot-review/SKILL.md +7 -14
  37. package/skills/findbugs/SKILL.md +2 -2
  38. package/skills/fixbugs/SKILL.md +1 -1
  39. package/skills/monitor-open-prs/SKILL.md +6 -6
  40. package/skills/pr-converge/SKILL.md +7 -6
  41. package/skills/pr-converge/reference/convergence-gates.md +46 -44
  42. package/skills/pr-converge/reference/examples.md +4 -4
  43. package/skills/pr-converge/reference/fix-protocol.md +8 -8
  44. package/skills/pr-converge/reference/multi-pr-orchestration.md +10 -10
  45. package/skills/pr-converge/reference/per-tick.md +24 -36
  46. package/skills/pr-converge/reference/stop-conditions.md +7 -7
  47. package/skills/pr-converge/scripts/README.md +65 -117
  48. package/skills/pr-review-responder/EXAMPLES.md +2 -2
  49. package/skills/pr-review-responder/PRINCIPLES.md +2 -8
  50. package/skills/pr-review-responder/README.md +7 -48
  51. package/skills/pr-review-responder/SKILL.md +2 -3
  52. package/skills/pr-review-responder/TESTING.md +8 -65
  53. package/skills/qbug/SKILL.md +10 -16
  54. package/_shared/pr-loop/scripts/config/gh_util_constants.py +0 -31
  55. package/_shared/pr-loop/scripts/gh_util.py +0 -193
  56. package/_shared/pr-loop/scripts/tests/test_gh_util.py +0 -257
  57. package/_shared/pr-loop/scripts/tests/test_gh_util_constants.py +0 -61
  58. package/skills/pr-converge/scripts/check_pr_mergeability.py +0 -78
  59. package/skills/pr-converge/scripts/config/pr_converge_constants.py +0 -118
  60. package/skills/pr-converge/scripts/config/test_pr_converge_constants.py +0 -152
  61. package/skills/pr-converge/scripts/fetch_bugbot_inline_comments.py +0 -70
  62. package/skills/pr-converge/scripts/fetch_bugbot_reviews.py +0 -57
  63. package/skills/pr-converge/scripts/fetch_claude_inline_comments.py +0 -70
  64. package/skills/pr-converge/scripts/fetch_claude_reviews.py +0 -61
  65. package/skills/pr-converge/scripts/fetch_copilot_inline_comments.py +0 -70
  66. package/skills/pr-converge/scripts/fetch_copilot_reviews.py +0 -61
  67. package/skills/pr-converge/scripts/mark_pr_ready.py +0 -54
  68. package/skills/pr-converge/scripts/post-bugbot-run.helpers.ps1 +0 -49
  69. package/skills/pr-converge/scripts/post-bugbot-run.ps1 +0 -33
  70. package/skills/pr-converge/scripts/reply_to_inline_comment.py +0 -84
  71. package/skills/pr-converge/scripts/request_copilot_review.py +0 -71
  72. package/skills/pr-converge/scripts/resolve_pr_head.py +0 -58
  73. package/skills/pr-converge/scripts/review_field_helpers.py +0 -43
  74. package/skills/pr-converge/scripts/reviewer_fetch_core.py +0 -153
  75. package/skills/pr-converge/scripts/reviewer_specs.py +0 -98
  76. package/skills/pr-converge/scripts/test_check_pr_mergeability.py +0 -126
  77. package/skills/pr-converge/scripts/test_fetch_bugbot_inline_comments.py +0 -443
  78. package/skills/pr-converge/scripts/test_fetch_bugbot_reviews.py +0 -299
  79. package/skills/pr-converge/scripts/test_fetch_claude_inline_comments.py +0 -485
  80. package/skills/pr-converge/scripts/test_fetch_claude_reviews.py +0 -368
  81. package/skills/pr-converge/scripts/test_fetch_copilot_inline_comments.py +0 -440
  82. package/skills/pr-converge/scripts/test_fetch_copilot_reviews.py +0 -366
  83. package/skills/pr-converge/scripts/test_mark_pr_ready.py +0 -69
  84. package/skills/pr-converge/scripts/test_post_bugbot_run.py +0 -195
  85. package/skills/pr-converge/scripts/test_reply_to_inline_comment.py +0 -159
  86. package/skills/pr-converge/scripts/test_request_copilot_review.py +0 -101
  87. package/skills/pr-converge/scripts/test_resolve_pr_head.py +0 -79
  88. package/skills/pr-converge/scripts/test_review_field_helpers.py +0 -80
  89. package/skills/pr-converge/scripts/test_reviewer_fetch_core.py +0 -448
  90. package/skills/pr-converge/scripts/test_reviewer_specs.py +0 -107
  91. package/skills/pr-converge/scripts/test_trigger_bugbot.py +0 -139
  92. package/skills/pr-converge/scripts/test_view_pr_context.py +0 -111
  93. package/skills/pr-converge/scripts/trigger_bugbot.py +0 -77
  94. package/skills/pr-converge/scripts/view_pr_context.py +0 -47
  95. package/skills/pr-review-responder/scripts/respond_to_reviews.py +0 -376
@@ -190,6 +190,17 @@ def test_should_deny_edit_when_pragma_sentinel_present_in_new_string_without_tes
190
190
  assert _decision_from(completed) == "deny"
191
191
 
192
192
 
193
+ def _make_edit_payload(file_path: Path, old_string: str, new_string: str) -> dict:
194
+ return {
195
+ "tool_name": "Edit",
196
+ "tool_input": {
197
+ "file_path": str(file_path),
198
+ "old_string": old_string,
199
+ "new_string": new_string,
200
+ },
201
+ }
202
+
203
+
193
204
  def test_should_allow_python_file_with_only_module_level_constants(tmp_path: Path) -> None:
194
205
  sandbox = _sandbox(tmp_path)
195
206
  constants_file = sandbox / "constants.py"
@@ -209,6 +220,147 @@ def test_should_allow_python_file_with_only_module_level_constants(tmp_path: Pat
209
220
  assert _decision_from(completed) == "allow"
210
221
 
211
222
 
223
+ def test_should_allow_edit_to_change_constant_value_in_constants_only_file(
224
+ tmp_path: Path,
225
+ ) -> None:
226
+ sandbox = _sandbox(tmp_path)
227
+ constants_file = sandbox / "constants.py"
228
+ constants_file.write_text(
229
+ '"""Module-level constants."""\n'
230
+ "MAXIMUM_RETRIES: int = 3\n"
231
+ "DEFAULT_TIMEOUT_SECONDS: float = 30.0\n"
232
+ )
233
+
234
+ completed = _run_hook_with_payload(
235
+ _make_edit_payload(
236
+ constants_file,
237
+ old_string="MAXIMUM_RETRIES: int = 3",
238
+ new_string="MAXIMUM_RETRIES: int = 5",
239
+ )
240
+ )
241
+
242
+ assert _decision_from(completed) == "allow"
243
+
244
+
245
+ def _make_multiedit_payload(file_path: Path, edits: list[dict]) -> dict:
246
+ return {
247
+ "tool_name": "MultiEdit",
248
+ "tool_input": {
249
+ "file_path": str(file_path),
250
+ "edits": edits,
251
+ },
252
+ }
253
+
254
+
255
+ def test_should_allow_multiedit_to_change_constant_value_in_constants_only_file(
256
+ tmp_path: Path,
257
+ ) -> None:
258
+ sandbox = _sandbox(tmp_path)
259
+ constants_file = sandbox / "constants.py"
260
+ constants_file.write_text(
261
+ '"""Module-level constants."""\n'
262
+ "MAXIMUM_RETRIES: int = 3\n"
263
+ "DEFAULT_TIMEOUT_SECONDS: float = 30.0\n"
264
+ )
265
+
266
+ completed = _run_hook_with_payload(
267
+ _make_multiedit_payload(
268
+ constants_file,
269
+ edits=[
270
+ {
271
+ "old_string": "MAXIMUM_RETRIES: int = 3",
272
+ "new_string": "MAXIMUM_RETRIES: int = 5",
273
+ },
274
+ ],
275
+ )
276
+ )
277
+
278
+ assert _decision_from(completed) == "allow"
279
+
280
+
281
+ def test_should_deny_multiedit_that_adds_function_to_constants_only_file(
282
+ tmp_path: Path,
283
+ ) -> None:
284
+ sandbox = _sandbox(tmp_path)
285
+ constants_file = sandbox / "constants.py"
286
+ constants_file.write_text(
287
+ '"""Module-level constants."""\n'
288
+ "MAXIMUM_RETRIES: int = 3\n"
289
+ )
290
+
291
+ completed = _run_hook_with_payload(
292
+ _make_multiedit_payload(
293
+ constants_file,
294
+ edits=[
295
+ {
296
+ "old_string": "MAXIMUM_RETRIES: int = 3",
297
+ "new_string": "MAXIMUM_RETRIES: int = 3\n\ndef reset() -> None:\n return None",
298
+ },
299
+ ],
300
+ )
301
+ )
302
+
303
+ assert _decision_from(completed) == "deny"
304
+
305
+
306
+ def test_should_deny_edit_that_adds_function_to_constants_only_file(
307
+ tmp_path: Path,
308
+ ) -> None:
309
+ sandbox = _sandbox(tmp_path)
310
+ constants_file = sandbox / "constants.py"
311
+ constants_file.write_text(
312
+ '"""Module-level constants."""\n'
313
+ "MAXIMUM_RETRIES: int = 3\n"
314
+ )
315
+
316
+ completed = _run_hook_with_payload(
317
+ _make_edit_payload(
318
+ constants_file,
319
+ old_string="MAXIMUM_RETRIES: int = 3",
320
+ new_string="MAXIMUM_RETRIES: int = 3\n\ndef reset() -> None:\n return None",
321
+ )
322
+ )
323
+
324
+ assert _decision_from(completed) == "deny"
325
+
326
+
327
+ def test_should_deny_python_file_with_assignment_calling_undefined_function(
328
+ tmp_path: Path,
329
+ ) -> None:
330
+ sandbox = _sandbox(tmp_path)
331
+ unsafe_file = sandbox / "unsafe.py"
332
+ unsafe_content = (
333
+ '"""Config with unsafe call."""\n'
334
+ "VALUE: str = compute()\n"
335
+ )
336
+ unsafe_file.write_text(unsafe_content)
337
+
338
+ completed = _run_hook_with_payload(
339
+ _make_write_payload(unsafe_file, unsafe_content)
340
+ )
341
+
342
+ assert _decision_from(completed) == "deny"
343
+
344
+
345
+ def test_should_allow_python_file_with_assignment_calling_imported_function(
346
+ tmp_path: Path,
347
+ ) -> None:
348
+ sandbox = _sandbox(tmp_path)
349
+ safe_file = sandbox / "safe.py"
350
+ safe_content = (
351
+ '"""Config with imported call."""\n'
352
+ "from pathlib import Path\n"
353
+ "BASE_PATH = Path(r'C:\\\\data')\n"
354
+ )
355
+ safe_file.write_text(safe_content)
356
+
357
+ completed = _run_hook_with_payload(
358
+ _make_write_payload(safe_file, safe_content)
359
+ )
360
+
361
+ assert _decision_from(completed) == "allow"
362
+
363
+
212
364
  def test_should_deny_python_file_when_any_function_definition_is_present(tmp_path: Path) -> None:
213
365
  sandbox = _sandbox(tmp_path)
214
366
  mixed_file = sandbox / "mixed.py"
@@ -0,0 +1,130 @@
1
+ """Configuration constants for the state_description_blocker PreToolUse hook."""
2
+
3
+ from re import IGNORECASE, Pattern, compile
4
+
5
+ ALL_COMMENT_TRANSITION_PATTERNS: list[Pattern[str]] = [
6
+ compile(r"\binstead of\b", IGNORECASE),
7
+ compile(r"\bpreviously\b", IGNORECASE),
8
+ compile(r"\bnow uses\b", IGNORECASE),
9
+ compile(r"\bnow does\b", IGNORECASE),
10
+ compile(r"\bnow handles\b", IGNORECASE),
11
+ compile(r"\bnow supports\b", IGNORECASE),
12
+ compile(r"\bnow names\b", IGNORECASE),
13
+ compile(r"\bnow includes\b", IGNORECASE),
14
+ compile(r"\bwas previously\b", IGNORECASE),
15
+ compile(r"\bwere previously\b", IGNORECASE),
16
+ compile(r"\bwas formerly\b", IGNORECASE),
17
+ compile(r"\bwas added\b", IGNORECASE),
18
+ compile(r"\bused to\b", IGNORECASE),
19
+ compile(r"\bno longer\b", IGNORECASE),
20
+ compile(r"\bhas been updated\b", IGNORECASE),
21
+ compile(r"\bhave been updated\b", IGNORECASE),
22
+ compile(r"\bhas been changed\b", IGNORECASE),
23
+ compile(r"\bhave been changed\b", IGNORECASE),
24
+ compile(r"\breplaced by\b", IGNORECASE),
25
+ compile(r"\breplaces\b", IGNORECASE),
26
+ compile(r"\bsuperseded by\b", IGNORECASE),
27
+ compile(r"\bsupersedes\b", IGNORECASE),
28
+ compile(r"\bchanged from\b", IGNORECASE),
29
+ compile(r"\bchanges from\b", IGNORECASE),
30
+ compile(r"\bswitched from\b", IGNORECASE),
31
+ compile(r"\bswitched to\b", IGNORECASE),
32
+ compile(r"\bmigrated from\b", IGNORECASE),
33
+ compile(r"\bmigrated to\b", IGNORECASE),
34
+ compile(r"\bmoved to\b", IGNORECASE),
35
+ compile(r"\bmoved into\b", IGNORECASE),
36
+ compile(r"\bextracted as\b", IGNORECASE),
37
+ compile(r"\bupdated to\b", IGNORECASE),
38
+ compile(r"\boriginally\b", IGNORECASE),
39
+ compile(r"\bas of\b", IGNORECASE),
40
+ ]
41
+
42
+ CODE_FENCE_PATTERN: Pattern[str] = compile(r"```[\s\S]*?```")
43
+ INLINE_CODE_PATTERN: Pattern[str] = compile(r"``[^`]+``|`[^`]+`")
44
+
45
+ ALL_MARKDOWN_EXTENSIONS: frozenset[str] = frozenset(
46
+ {".md", ".mdx", ".markdown", ".rmd"}
47
+ )
48
+
49
+ ALL_HASH_ONLY_EXTENSIONS: frozenset[str] = frozenset(
50
+ {
51
+ ".py",
52
+ ".rb",
53
+ ".sh",
54
+ ".bash",
55
+ ".zsh",
56
+ ".ps1",
57
+ ".psm1",
58
+ ".yaml",
59
+ ".yml",
60
+ ".tf",
61
+ }
62
+ )
63
+
64
+ ALL_BLOCK_COMMENT_ONLY_EXTENSIONS: frozenset[str] = frozenset(
65
+ {
66
+ ".css",
67
+ }
68
+ )
69
+
70
+ ALL_HASH_AND_SLASH_EXTENSIONS: frozenset[str] = frozenset(
71
+ {
72
+ ".php",
73
+ }
74
+ )
75
+
76
+ ALL_BLOCK_COMMENT_EXTENSIONS: frozenset[str] = frozenset(
77
+ {
78
+ ".js",
79
+ ".jsx",
80
+ ".ts",
81
+ ".tsx",
82
+ ".java",
83
+ ".c",
84
+ ".cpp",
85
+ ".h",
86
+ ".hpp",
87
+ ".rs",
88
+ ".go",
89
+ ".swift",
90
+ ".kt",
91
+ ".scala",
92
+ ".php",
93
+ ".css",
94
+ ".scss",
95
+ ".less",
96
+ }
97
+ )
98
+
99
+ ALL_COMMENT_BEARING_EXTENSIONS: frozenset[str] = frozenset(
100
+ {
101
+ ".py",
102
+ ".js",
103
+ ".jsx",
104
+ ".ts",
105
+ ".tsx",
106
+ ".java",
107
+ ".c",
108
+ ".cpp",
109
+ ".h",
110
+ ".hpp",
111
+ ".rs",
112
+ ".go",
113
+ ".rb",
114
+ ".php",
115
+ ".swift",
116
+ ".kt",
117
+ ".scala",
118
+ ".sh",
119
+ ".bash",
120
+ ".zsh",
121
+ ".ps1",
122
+ ".psm1",
123
+ ".yaml",
124
+ ".yml",
125
+ ".tf",
126
+ ".css",
127
+ ".scss",
128
+ ".less",
129
+ }
130
+ )
package/hooks/hooks.json CHANGED
@@ -30,6 +30,11 @@
30
30
  "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/validation/hook_format_validator.py",
31
31
  "timeout": 15
32
32
  },
33
+ {
34
+ "type": "command",
35
+ "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/blocking/code_rules_enforcer.py",
36
+ "timeout": 30
37
+ },
33
38
  {
34
39
  "type": "command",
35
40
  "command": "python3 -c \"import sys; sys.path.insert(0, r'${CLAUDE_PLUGIN_ROOT}/hooks'); from validators.run_all_validators import main; sys.exit(main())\"",
@@ -44,6 +49,11 @@
44
49
  "type": "command",
45
50
  "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/blocking/windows_rmtree_blocker.py",
46
51
  "timeout": 10
52
+ },
53
+ {
54
+ "type": "command",
55
+ "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/blocking/state_description_blocker.py",
56
+ "timeout": 10
47
57
  }
48
58
  ]
49
59
  },
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-dev-env",
3
- "version": "1.37.0",
3
+ "version": "1.38.0",
4
4
  "description": "Claude Code development standards — rules, hooks, agents, commands, and skills",
5
5
  "type": "module",
6
6
  "bin": {
@@ -1,6 +1,6 @@
1
1
  # gh API Pagination Rule
2
2
 
3
- **Root cause:** The GitHub REST API returns 30 items per page by default. `gh api repos/<owner>/<repo>/pulls/<number>/reviews` and `gh api repos/<owner>/<repo>/pulls/<number>/comments` silently truncate at 30 results without warning. PRs that have accumulated more than 30 reviews or inline comments — common on long PR-loop cycles where bugbot, copilot, or the in-house bugteam each post repeatedly — return only the **oldest** 30, hiding the most recent reviews and findings entirely. A `sort_by(.submitted_at) | last` (or `| reverse`) on a truncated array picks the latest entry **within the first 30**, not the actual latest, which produces a stale-but-confident report that then drives wrong decisions (e.g., re-triggering bugbot when it has already posted a CLEAN review on a later page).
3
+ **Root cause:** GitHub REST API list endpoints paginate by default. Without `--paginate --slurp`, callers see only the oldest page, and cross-page jq operations (e.g., `sort_by | last`) operate within a single page producing wrong-but-confident results.
4
4
 
5
5
  **Rule:** All `gh api` calls that read `pulls/<number>/reviews`, `pulls/<number>/comments`, `issues/<number>/comments`, or any other paginated GitHub list endpoint **must** request the full set of pages AND apply any cross-page jq operation through external `jq`, not through `gh`'s built-in `--jq`. Use `--paginate --slurp | jq` (preferred — see [Safe patterns](#safe-patterns)). Never call these endpoints with their default pagination, and never use `gh`'s `--jq` for cross-page operations like `sort_by | last` or `| reverse | .[0]`.
6
6
 
@@ -8,8 +8,8 @@
8
8
 
9
9
  This rule guards against two distinct silent-truncation defects that compound:
10
10
 
11
- 1. **Default 30-item page.** Without `--paginate`, only the first page is fetched. On long PRs this hides the most recent reviews entirely.
12
- 2. **`--jq` runs per-page, not on the concatenated result.** Per [GitHub CLI #10459](https://github.com/cli/cli/issues/10459), `gh api --paginate --jq '<filter>'` applies `<filter>` to each page **separately** and emits one output per page. Cross-page operations like `sort_by(.submitted_at) | last` therefore operate within each page independently, not across the merged result set. On PRs with more than 100 reviews this still produces a wrong-but-confident "latest" review even when `--paginate` is set.
11
+ 1. **Default page truncation.** Without `--paginate`, only the first page is fetched.
12
+ 2. **`--jq` runs per-page, not on the concatenated result.** Per [GitHub CLI #10459](https://github.com/cli/cli/issues/10459), `gh api --paginate --jq '<filter>'` applies `<filter>` to each page **separately** and emits one output per page. Cross-page operations like `sort_by(.submitted_at) | last` therefore operate within each page independently, not across the merged result set.
13
13
 
14
14
  The safe patterns below fix both defects together: `--paginate --slurp` walks every page AND emits a single merged structure, and an **external** `jq` then runs cross-page operations on that merged structure.
15
15
 
@@ -39,7 +39,7 @@ gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate --s
39
39
  | jq '[.[][] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
40
40
  ```
41
41
 
42
- The `.[][]` flattens the array-of-pages into one stream of items before the cross-page operators (`sort_by`, `last`, `reverse`) run. Combine with `?per_page=100` so each page fetches 100 items instead of 30, reducing round-trips on long PRs without changing correctness.
42
+ The `.[][]` flattens the array-of-pages into one stream of items before the cross-page operators (`sort_by`, `last`, `reverse`) run. Combine with `?per_page=100` to reduce round-trips on long PRs.
43
43
 
44
44
  `gh`'s `--jq` flag and `--slurp` flag are mutually exclusive (gh CLI rejects `--paginate --slurp --jq` with `the --slurp option is not supported with --jq or --template`), which is why the filter must run in an external `jq` invocation.
45
45
 
@@ -74,52 +74,6 @@ gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate --s
74
74
 
75
75
  This is the canonical pattern for the bugbot ↔ bugteam convergence loop: walk newest-first, stop at the first clean review.
76
76
 
77
- ## What NOT to do
78
-
79
- ```bash
80
- # BAD — default 30-item page silently truncates on long PRs
81
- gh api repos/<owner>/<repo>/pulls/<number>/reviews \
82
- --jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
83
-
84
- # BAD — `?per_page=100` alone caps at 100 items; PRs with 100+ reviews still truncate
85
- gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' \
86
- --jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
87
-
88
- # BAD — --paginate fetches every page, but `--jq` runs PER-PAGE (gh CLI #10459).
89
- # `sort_by(.submitted_at) | last` operates within each page independently and
90
- # emits one "latest" per page, not the actual latest across the full result set.
91
- gh api 'repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100' --paginate \
92
- --jq '[.[] | select(.user.login=="cursor[bot]")] | sort_by(.submitted_at) | last'
93
-
94
- # BAD — taking `| last` on an unpaginated read returns the latest of the first 30,
95
- # not the actual latest. Same defect for `| reverse | .[0]`.
96
- ```
97
-
98
- ## Why both defects matter
99
-
100
- `gh api`'s default page is the FIRST page of results, ordered oldest-to-newest by the GitHub API. When the result set exceeds 30 items, page 1 contains the OLDEST 30 — not the newest. A jq `| last` after `sort_by(.submitted_at)` picks the latest entry within those 30 oldest items, producing output that looks correct but reports a state from days or weeks ago.
101
-
102
- `--paginate` alone does NOT fix this when paired with `--jq`: gh applies the jq filter to each page separately and emits one result per page. A consumer reading "the last line of output" still gets the latest within a single page, not the latest across all pages. The skill that consumes this output then makes decisions (re-trigger bugbot, mark a finding stale, report convergence) against an obsolete view of the PR.
103
-
104
- `--paginate --slurp | jq` fixes both defects: every page is fetched, every page is merged into one structure before any jq operator runs, and cross-page operations see the full result set.
105
-
106
- ## Consumers
107
-
108
- Skills and scripts in this repo that read paginated endpoints and must therefore use `--paginate --slurp` plus external `jq`:
109
-
110
- - `pr-converge` — bugbot review walk (BUGBOT phase, Step 2.a) and inline-comments fetch (Step 2.b).
111
- - `bugteam` — review threads, inline comments, audit-loop history.
112
- - `qbug` — same as bugteam, scoped to a single subagent loop.
113
- - `pr-review-responder` — review comments fetch (already enforced; this rule extends the same constraint to reviews and other endpoints).
114
- - `monitor-many` — open-PR enumeration and per-PR review/comment scans.
115
- - `babysit-pr` — review-comment polling.
116
-
117
- Updating any of these to read paginated endpoints requires `--paginate --slurp` plus external `jq` (or a documented single-page bound on a small list).
118
-
119
77
  ## Enforcement
120
78
 
121
79
  This rule is documentation-only at present. A future PreToolUse hook may pattern-match `Bash` invocations of `gh api repos/.../pulls/<n>/(reviews|comments)` without `--paginate --slurp` (or with `--paginate --jq` doing cross-page operations) and return a corrective message. Until that hook lands, treat this rule as binding by review and rely on it during skill authoring.
122
-
123
- ## Precedent
124
-
125
- The `pr-review-responder` skill predated this rule and forbids default pagination on `pulls/<n>/comments` reads (`packages/claude-dev-env/skills/pr-review-responder/SKILL.md` Rule 1). This file generalizes that constraint to every paginated GitHub endpoint, adds the `--jq` per-page defect (gh CLI #10459) discovered while reviewing this rule, and centralizes the safe patterns so additional skills inherit the rule by reference instead of restating it.
@@ -0,0 +1,57 @@
1
+ ---
2
+ paths: **/*
3
+ ---
4
+
5
+ # No Historical Clutter in Documentation or Comments
6
+
7
+ **When this applies:** Any Write or Edit to files containing comments or documentation.
8
+
9
+ **Hook enforcement:** `state-description-blocker` (PreToolUse on Write|Edit) blocks historical/comparative language automatically. See `hooks.json` for registration.
10
+
11
+ ## Rule
12
+
13
+ Never reference removed implementations, old defaults, prior behaviors, or how something `"used to be"` when updating documentation. The current state is all that matters.
14
+
15
+ ## Examples of prohibited patterns
16
+
17
+ ### In documentation (.md files)
18
+
19
+ | Pattern | Why it's clutter |
20
+ |---------|-----------------|
21
+ | `` `"instead of 30"` `` in a pagination rule | The old default `no longer` exists in code; the rule reader doesn't need to know what it was |
22
+ | `` `"previously this used X"` `` | If X is gone, it's noise |
23
+ | `` `"before this rule, we did Y"` `` | The rule exists now; the before-state is irrelevant |
24
+ | `` `"migrated from Z to W"` `` | If Z is fully removed, the migration story is git history, not documentation |
25
+ | `` `"the old implementation did A"` `` | If A is gone, the reader gains nothing from knowing it existed |
26
+ | `` `"originally"` `` / `` `"used to be"` `` | Same — dead context |
27
+
28
+ ### In code comments
29
+
30
+ | Pattern | Good replacement |
31
+ |---------|-----------------|
32
+ | `# Uses X instead of Y` | `# Uses X` |
33
+ | `# Previously configured via Z` | `# Configured via Z` |
34
+ | `# Now uses the new API client` | `# Uses the new API client` |
35
+ | `# No longer supports legacy mode` | `# Supports modern mode only` |
36
+ | `// Switched to async processing` | `// Processes asynchronously` |
37
+ | `# Replaced by the cache layer` | `# Cache layer handles reads` |
38
+
39
+ ### Hook-detected patterns
40
+
41
+ The `state-description-blocker` hook (PreToolUse on Write\|Edit) enforces these patterns automatically:
42
+
43
+ `instead of`, `previously`, `now uses/does/handles/supports/names/includes`, `was previously`, `were previously`, `was formerly`, `was added`, `used to`, `no longer`, `has/have been updated/changed`, `replaced by`, `replaces`, `superseded by`, `supersedes`, `changed from`, `changes from`, `switched from/to`, `migrated from/to`, `moved to/into`, `extracted as`, `updated to`, `originally`, `as of`
44
+
45
+ ## What IS allowed
46
+
47
+ - Comparisons to *currently existing* alternatives (e.g., "use `--paginate --slurp | jq`, not `--jq` alone")
48
+ - Rationale that explains *why* a pattern is wrong in terms of present behavior (e.g., "`--jq` runs per-page, so cross-page operations produce wrong results")
49
+ - References to external sources for defects that still exist (e.g., gh CLI #10459)
50
+
51
+ ## The test
52
+
53
+ After writing documentation, ask: **"If someone reads this a year from now, with no knowledge of what came before, does every sentence still make sense and add value?"** If a sentence only adds value to someone who knew the old state, delete it.
54
+
55
+ ## Why
56
+
57
+ Historical references clog context windows and force readers to mentally filter "what was" from "what is." The git log is the authoritative record of what changed and why. Documentation describes the current contract.
@@ -22,7 +22,7 @@ GROQ_RETRY_BACKOFF_SECONDS = (2, 4, 8)
22
22
  REVIEW_BODY_HEADER_TEMPLATE = "## groq-bugteam audit: {p0} P0 / {p1} P1 / {p2} P2"
23
23
  NO_FINDINGS_REVIEW_BODY = (
24
24
  "## groq-bugteam audit: clean\n\n"
25
- "Groq ({model}) reviewed the diff against categories A-J and found no issues."
25
+ "Groq ({model}) reviewed the diff against categories A-K and found no issues."
26
26
  )
27
27
 
28
28
  AUDIT_SYSTEM_PROMPT = """You are an adversarial code reviewer auditing a pull request diff.
@@ -31,8 +31,13 @@ Inspect ONLY lines added or modified in the diff. Pre-existing code on
31
31
  untouched lines is out of scope. Cite file:line for every finding -- the line
32
32
  number MUST refer to the NEW side of the diff (post-change line number).
33
33
 
34
- Investigate these ten categories. Skip a category silently when you find
35
- nothing; do not emit verified-clean entries.
34
+ Investigate these eleven categories. Skip a category silently when you find
35
+ nothing; do not emit verified-clean entries. For the canonical rubric and
36
+ sub-bucket decomposition for each category, see
37
+ packages/claude-dev-env/audit-rubrics/category_rubrics/. For ready-to-send
38
+ Variant C audit prompts (each containing a PR/repo-independent generalized
39
+ skeleton above a `---` separator and a worked example against an authentic
40
+ PR below it), see packages/claude-dev-env/audit-rubrics/prompts/.
36
41
 
37
42
  A. API contract verification (signatures, return types, async/await)
38
43
  B. Selector / query / engine compatibility
@@ -44,6 +49,9 @@ G. Off-by-one, bounds, integer overflow
44
49
  H. Security boundaries (injection, path traversal, auth bypass, secret leakage)
45
50
  I. Concurrency hazards (race conditions, missing awaits, shared mutable state)
46
51
  J. Magic values and configuration drift
52
+ K. Codebase conflicts (a change updates one site of a pattern but a parallel
53
+ site in unchanged code stays stale, producing contradictory behavior;
54
+ diff is internally consistent, bug emerges only against unchanged code)
47
55
 
48
56
  Severity rubric:
49
57
  - P0: crashes, data loss, security breach, broken production invariant
@@ -56,7 +64,7 @@ Respond with JSON only -- no prose outside the JSON object. Shape:
56
64
  "findings": [
57
65
  {
58
66
  "severity": "P0" | "P1" | "P2",
59
- "category": "A" | ... | "J",
67
+ "category": "A" | ... | "K",
60
68
  "file": "relative path from repo root",
61
69
  "line": int,
62
70
  "title": "one-line summary",
@@ -126,7 +134,7 @@ SPEC_IMPLEMENTER_SYSTEM_PROMPT = """<groq_spec_implementer>
126
134
 
127
135
  - finding_index (int, stable across audit and fix)
128
136
  - severity (P0 | P1 | P2)
129
- - category (single letter A–J)
137
+ - category (single letter A–K)
130
138
  - file (relative path, must match the file being patched)
131
139
  - target_line_start (int, 1-based, inclusive)
132
140
  - target_line_end (int, 1-based, inclusive; equals target_line_start for single-line edits)
@@ -1,35 +1,28 @@
1
- # Bugteam — invariants and design rationale
2
-
3
- ## Constraints
4
-
5
- - **One run per invocation, multi-PR supported.** All PRs in a single /bugteam invocation share one `run_temp_dir`. Per-PR identity lives in the subagent name prefix (`bugfind-pr<N>-loop<L>` / `bugfix-pr<N>-loop<L>`) and the `<run_temp_dir>/pr-<N>/` subfolder containing that PR's git worktree, diff patches, and outcome XML files.
6
- - **Grant before any spawn, revoke before any return.** Step 0 grants project `.claude/**` permissions; Step 5 revokes. Both are mandatory. Revoke runs on every exit path including error, cap-reached, and stuck.
7
- - **Fresh subagent per loop.** Both bugfind and bugfix are spawned new each loop. Reusing a subagent across loops accumulates context inside that subagent's window defeats clean-room.
8
- - **One up-front confirmation = whole cycle.** The `/bugteam` invocation authorizes the entire cycle; every subsequent decision runs on that single authorization.
9
- - **10-loop hard cap.** Counted as **AUDIT** completions (increment in Step 3). Standards-fix passes before an audit do not advance `loop_count`. Worst case includes extra clean-coder spawns for the code-rules gate.
10
- - **Code rules gate before every AUDIT.** Run `_shared/pr-loop/scripts/code_rules_gate.py` (resolved via `${CLAUDE_SKILL_DIR}/../../_shared/pr-loop/scripts/code_rules_gate.py`) until exit **0** before spawning **bugfind**. Same `validate_content` logic as `hooks/blocking/code_rules_enforcer.py`.
11
- - **Clean-room audits, every loop.** Each bugfind subagent's spawn prompt contains only the PR scope, audit rubric, and the current loop number. Prior loop history stays in the lead.
12
- - **Targeted fixes.** Each fix subagent sees ONLY the most recent audit's findings. Prior loops are invisible to the fix subagent.
13
- - **Opus 4.7 at xhigh effort for both subagents.** Both `Agent(...)` spawns pass `model="opus"`, which resolves to Opus 4.7 on the Anthropic API. Opus 4.7's default effort level in Claude Code is `xhigh` (https://code.claude.com/docs/en/model-config — *"On Opus 4.7, the default effort is `xhigh` for all plans and providers."*), so no `effort` override is needed at spawn time. Effort is set per-subagent in YAML frontmatter, not via the `Agent` tool's parameters; `code-quality-agent` and `clean-coder` rely on the model default. The trade vs Sonnet is higher per-loop cost in exchange for deeper audit recall and stronger fix correctness on bug-hunting work, which the per-PR loop economics tolerate (10-loop hard cap bounds total spend).
14
- - **Fix subagent receives the latest audit as its input contract.** Passing the audit's findings to the fix subagent is the input contract — each loop's fix run operates on the current audit's output and only that.
15
- - **One commit per fix action.** Loops produce one commit per loop, not one per bug.
16
- - **Linear branch, fixed PR base.** Every loop appends one forward-only commit; existing commits and the PR base stay intact throughout the cycle.
17
- - **Lead-only cleanup.** Cleanup runs in the lead (this session) only. Step 4 removes the full `<run_temp_dir>` so no loop patches leak between runs.
18
- - **Cleanup all `.bugteam-*` files on exit.** The per-run `<run_temp_dir>` is removed entirely by Step 4, which covers `<run_temp_dir>/pr-<N>/loop-<L>.patch` and `<run_temp_dir>/pr-<N>/loop-<L>-{b,c}.outcomes.xml`. The per-loop outcomes XML at `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml` is removed with the worktree. Step 4.5 deletes `.bugteam-final.diff`, `.bugteam-original-body.md`, and `.bugteam-final-body.md`. Working directory ends clean.
19
- - **Audit/fix comment posting.** The bugfind subagent posts ONE per-loop review (parent body + child finding comments in a single batched POST, with review-fallback to a top-level issue comment). The bugfix subagent posts the fix replies after committing. All comment, review, and reply POSTs belong to the subagents; the lead's single PR-write action is the final description rewrite at Step 4.5.
20
- - **Lead owns the final PR description rewrite only** (Step 4.5), and only via the `pr-description-writer` agent. The lead does not compose the description inline.
1
+ # Bugteam constraints
2
+
3
+ ## Non-Negotiable
4
+
5
+ - **Pre-flight is mandatory.** `preflight.py` must exit 0 before Step 0. If it fails for `core.hooksPath`, auto-remediate with `fix_hookspath.py`. All other failures require manual fixes.
6
+ - **Looping against a fixed known count.** 10 audit loops hard cap. No exceptions. The cap is a safety value, set high enough to converge on most non-trivial PRs while preventing infinite loops.
7
+ - **`loop_count` is the iteration counter.** It increments before each AUDIT in Step 3. A FIX without a preceding AUDIT does not advance `loop_count`. The `loop_count > 10` check runs before each AUDIT. After 10 AUDITs, the cycle exits regardless of remaining FIX rounds. Standards-fix passes before an audit do not advance `loop_count`.
21
8
  - **One review per loop, findings as child comments of that review.** Each loop posts a single pull-request review whose body is the loop header and whose `comments[]` are the anchored findings. Each loop's review stands alone — one review created per loop, fully self-contained on the PR conversation.
22
9
  - **PR description rewrite on every exit.** Step 4.5 runs on `converged`, `cap reached`, and `stuck`. On `error`, the rewrite is best-effort; if it fails, surface the error in the final report and continue to revoke.
23
- - **Outcome XML, not JSON.** Both subagents write structured outcome data (findings or fix outcomes) to `.bugteam-pr<N>-loop<L>.outcomes.xml`. The lead reads these files between actions. XML chosen for parser robustness against multi-line, special-character, and quoted reason fields.
10
+ - **Outcome XML, not JSON.** The AUDIT subagent writes findings to `.bugteam-pr<N>-loop<L>.outcomes.xml` and the FIX subagent writes fix outcomes to `.bugteam-pr<N>-loop<L>.fix-outcomes.xml`. The lead reads these files between actions. Separate paths prevent the FIX output from overwriting the AUDIT's findings file. XML chosen for parser robustness against multi-line, special-character, and quoted reason fields.
24
11
 
25
12
  ## Why this design
26
13
 
27
- The three sibling skills compose, but `/bugteam` solves a problem they cannot solve in sequence:
14
+ ### Why retry with fix why not just reject and move on
15
+
16
+ Bugteam's purpose is to make real PRs better before they ship, not to just point out problems. A review that says "fix this bug" without giving the author&#60;subagent&#62; a chance to fix it in the same session would be a weaker intervention — the PR author still has to go back, figure out the fix, apply it, re-push, and re-trigger review. By bundling fix attempts into the same loop, bugteam reduces round-trips from N audits + N manual fix cycles to N audits + N automated fix attempts, with no human context-switching.
17
+
18
+ ### Why 10 loops — why not unlimited
19
+
20
+ A PR that needs more than 10 audit-fix rounds has deeper problems than bugteam can address. The 10-loop cap is a forcing function: after 10 rounds, escalate to `/findbugs` or human review rather than grinding on diminishing returns.
21
+
22
+ ### Why outcome XML — why not JSON
28
23
 
29
- - `/findbugs` audits once and stops.
30
- - `/fixbugs` fixes the findings of one audit and stops.
31
- - A human-driven `/findbugs` → `/fixbugs` → `/findbugs` → `/fixbugs` cycle works but requires the user to drive it.
24
+ JSON escapes `\n` inside `"reason": "could not address: some\nmulti-line\ntext"`, making the file hard to read and grep. XML preserves the raw text as element content, so `&#60;reason&#62;could not address: some&#10;multi-line&#10;text&#60;/reason&#62;` renders legibly in every markdown-capable viewer. The choice is ergonomic, not technical — both formats carry the same information.
32
25
 
33
- `/bugteam` automates that cycle. The clean-room property is preserved by spawning a fresh audit agent each loop with no inherited context — every audit is independent of the prior loop's verdict. The 10-loop cap is the safety: pathological cases (audit agent oscillating, fix agent regressing) cannot run away.
26
+ ### Why sibling auditor paths diverge (worktree vs temp)
34
27
 
35
- The single up-front confirmation is the explicit trade — `/bugteam` is more autonomous than `/findbugs`+`/fixbugs` chained manually. The user accepts that autonomy by typing the command. Stop conditions and the loop log give the user full visibility on exit.
28
+ Only the -a validator writes to the worktree `.bugteam-pr&#60;N&#62;-loop&#60;L&#62;.outcomes.xml` path, which the lead reads. Sibling auditors (-b through -k) write to unique paths under `&#60;run_temp_dir&#62;` to avoid collisions. Without this split, parallel haiku auditors writing to the same path would clobber each other's output, and the lead consuming one path would see only whichever writer finished last.
@@ -50,7 +50,7 @@ Claude: [resolves PR #99, runs loop with partial-fix outcomes]
50
50
  `Loops: 2`
51
51
  `Unresolved findings (2): src/auth.py:45 (P1: file is generated, cannot edit); src/legacy.py:200 (P1: rewrite scope exceeds the bug)`
52
52
 
53
- The bugfix teammate writes one outcome per finding to `.bugteam-loop-2.outcomes.xml`. Findings with `status=could_not_address` carry their `<reason>` text, and the teammate posts a matching reply to each finding comment so the reviewer sees why each bug stayed open.
53
+ The bugfix teammate writes one outcome per finding to `.bugteam-pr99-loop2.fix-outcomes.xml`. Findings with `status=could_not_address` carry their `<reason>` text, and the teammate posts a matching reply to each finding comment so the reviewer sees why each bug stayed open.
54
54
  </example>
55
55
 
56
56
  <example>