claude-dev-env 1.28.0 → 1.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/agents/caveman.md +74 -0
  2. package/hooks/blocking/code_rules_enforcer.py +82 -7
  3. package/hooks/blocking/code_rules_path_utils.py +31 -0
  4. package/hooks/blocking/es_exe_path_rewriter.py +159 -0
  5. package/hooks/blocking/hedging_language_blocker.py +12 -2
  6. package/hooks/blocking/test_code_rules_enforcer.py +148 -0
  7. package/hooks/blocking/test_code_rules_enforcer_config_path.py +123 -0
  8. package/hooks/blocking/test_code_rules_enforcer_magic_allowlist.py +1 -1
  9. package/hooks/blocking/test_code_rules_path_utils.py +52 -0
  10. package/hooks/blocking/test_es_exe_path_rewriter.py +369 -0
  11. package/hooks/blocking/test_hedging_language_blocker.py +7 -6
  12. package/hooks/config/dynamic_stderr_handler.py +22 -0
  13. package/hooks/config/path_rewriter_constants.py +13 -0
  14. package/hooks/config/project_paths_reader.py +78 -0
  15. package/hooks/config/setup_project_paths_constants.py +41 -0
  16. package/hooks/config/test_dynamic_stderr_handler.py +48 -0
  17. package/hooks/config/test_messages.py +5 -1
  18. package/hooks/config/test_path_rewriter_constants.py +57 -0
  19. package/hooks/config/test_project_paths_reader.py +149 -0
  20. package/hooks/config/test_setup_project_paths_constants.py +74 -0
  21. package/hooks/git-hooks/test_config.py +1 -0
  22. package/hooks/git-hooks/test_gate_utils.py +1 -0
  23. package/hooks/git-hooks/test_pre_commit.py +1 -0
  24. package/hooks/git-hooks/test_pre_push.py +1 -0
  25. package/hooks/hooks.json +10 -0
  26. package/hooks/session/test_untracked_repo_detector.py +192 -0
  27. package/hooks/session/untracked_repo_detector.py +103 -0
  28. package/hooks/validators/exempt_paths.py +17 -14
  29. package/hooks/validators/test_exempt_paths.py +65 -0
  30. package/hooks/validators/test_git_checks.py +17 -17
  31. package/package.json +1 -1
  32. package/scripts/config/__init__.py +1 -0
  33. package/scripts/config/groq_bugteam_config.py +118 -0
  34. package/scripts/config/test_groq_bugteam_config.py +72 -0
  35. package/scripts/groq_bugteam.README.md +129 -0
  36. package/scripts/groq_bugteam.py +586 -0
  37. package/scripts/setup_project_paths.py +347 -0
  38. package/scripts/test_groq_bugteam.py +391 -0
  39. package/scripts/test_setup_project_paths.py +532 -0
  40. package/scripts/test_setup_project_paths_config.py +6 -0
  41. package/skills/bugteam/CONSTRAINTS.md +1 -1
  42. package/skills/bugteam/PROMPTS.md +1 -1
  43. package/skills/bugteam/SKILL.md +5 -5
  44. package/skills/bugteam/SKILL_EVALS.md +5 -5
  45. package/skills/bugteam/reference/audit-and-teammates.md +3 -3
  46. package/skills/bugteam/reference/audit-contract.md +159 -0
  47. package/skills/bugteam/reference/team-setup.md +2 -2
  48. package/skills/bugteam/scripts/bugteam_preflight.py +66 -0
  49. package/skills/bugteam/scripts/test_bugteam_preflight.py +189 -0
  50. package/skills/copilot-review/SKILL.md +145 -0
  51. package/skills/findbugs/SKILL.md +14 -22
  52. package/skills/qbug/SKILL.md +56 -12
  53. package/skills/qbug/test_qbug_skill_audit_schema.py +156 -0
  54. package/skills/qbug/test_qbug_skill_post_fix_audit.py +103 -0
@@ -0,0 +1,159 @@
1
+ # Audit contract
2
+
3
+ Shared output schema and audit-loop contract used by `/bugteam`, `/qbug`, `/findbugs`, and `/fixbugs`. Changing a shape here is a breaking change for every consuming skill.
4
+
5
+ ## Contents
6
+
7
+ - Finding schema (Shape A, Shape B)
8
+ - Adversarial second pass
9
+ - Haiku secondary auditor
10
+ - Post-fix self-audit
11
+ - Persistence (loop-N-audit.json, loop-N-diagnostics.json)
12
+
13
+ ## Finding schema
14
+
15
+ Each finding an audit produces MUST be one of exactly two shapes.
16
+
17
+ ### Shape A — structured finding
18
+
19
+ ```json
20
+ {
21
+ "id": "loop<N>-<K>",
22
+ "file": "path/relative/to/repo/root.py",
23
+ "line": 123,
24
+ "category": "A | B | C | D | E | F | G | H | I | J",
25
+ "severity": "P0 | P1 | P2",
26
+ "excerpt": "verbatim code snippet from the offending line(s)",
27
+ "failure_mode": "one sentence describing what goes wrong and when",
28
+ "evidence_files": ["additional/files/opened.py"]
29
+ }
30
+ ```
31
+
32
+ `id` is `loop<N>-<K>` where `N` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
33
+
34
+ ### Shape B — structured proof-of-absence
35
+
36
+ Used when an audit investigates a category and does NOT find a bug. Bare "verified clean" claims are REJECTED because they hide shallow reading.
37
+
38
+ ```json
39
+ {
40
+ "category": "A | B | C | D | E | F | G | H | I | J",
41
+ "files_opened": ["file1.py", "file2.py"],
42
+ "lines_quoted": [
43
+ {"file": "file1.py", "line": 88, "text": "verbatim line content"}
44
+ ],
45
+ "adversarial_probes": [
46
+ "what failure mode was tested for and how it was ruled out"
47
+ ]
48
+ }
49
+ ```
50
+
51
+ Every category an audit touches MUST have either at least one Shape A finding OR at least one Shape B proof-of-absence entry. A category with neither is a protocol violation.
52
+
53
+ ### Example — Shape A
54
+
55
+ ```json
56
+ {
57
+ "id": "loop1-1",
58
+ "file": "scripts/db/neon.py",
59
+ "line": 43,
60
+ "category": "C",
61
+ "severity": "P1",
62
+ "excerpt": "load_dotenv(env_path, override=False)",
63
+ "failure_mode": "Called on every connect() — repeats file I/O per connection in scripts that open multiple short-lived connections.",
64
+ "evidence_files": ["scripts/db/neon.py", "scripts/update_new_releases.py"]
65
+ }
66
+ ```
67
+
68
+ ### Example — Shape B
69
+
70
+ ```json
71
+ {
72
+ "category": "H",
73
+ "files_opened": ["scripts/db/neon.py", "scripts/db/config.py"],
74
+ "lines_quoted": [
75
+ {"file": "scripts/db/neon.py", "line": 30, "text": "dsn = os.environ.get(\"DATABASE_URL\")"}
76
+ ],
77
+ "adversarial_probes": [
78
+ "Checked whether DATABASE_URL is interpolated into a shell — it is passed to psycopg.connect() directly with no shell involvement.",
79
+ "Checked whether the env path is user-controlled — it is derived from a fixed Y: drive constant, not user input."
80
+ ]
81
+ }
82
+ ```
83
+
84
+ ## Adversarial second pass
85
+
86
+ After the primary finding list is complete, every audit runs a second pass against itself with the prompt:
87
+
88
+ > Assume your first pass missed at least 3 P1 bugs. Where are they?
89
+
90
+ The audit must either produce new Shape A findings citing new file:line references not present in the first pass, or cite explicit Shape B adversarial-probe entries for each category it re-examined. An adversarial pass that returns "nothing new, confident first pass was complete" is REJECTED — produce evidence or findings, not confidence.
91
+
92
+ ## Haiku secondary auditor
93
+
94
+ For single-subagent skills (`/qbug`, `/findbugs`) the LEAD spawns two `Agent()` calls in one message:
95
+
96
+ - **Primary** — `subagent_type=clean-coder`, `model=sonnet` (for qbug cycle) or `subagent_type=code-quality-agent`, `model=sonnet` (for findbugs clean-room).
97
+ - **Secondary (Haiku)** — `subagent_type=code-quality-agent`, `model=haiku`, same self-contained clean-room prompt shape used by `/findbugs`.
98
+
99
+ Both audit the same diff. The secondary returns findings to the LEAD only — never posted to the PR.
100
+
101
+ Merge rules:
102
+
103
+ - **De-dup key**: `(file, line, category)`.
104
+ - **Severity conflict**: max wins (P0 > P1 > P2).
105
+ - **Unique-to-Haiku findings**: added to the primary set with Haiku's severity and source annotation.
106
+ - **Unique-to-primary findings**: kept as-is.
107
+ - **Zero Haiku findings**: primary set trusted; proceed.
108
+ - **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<N>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
109
+
110
+ For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via the three variant teammates.
111
+
112
+ ## Post-fix self-audit
113
+
114
+ Audit-and-fix skills (`/qbug`, `/bugteam`) MUST re-audit modified files between `py_compile` and `git add`. This catches fix-induced regressions in the same loop that introduced them rather than on loop N+1.
115
+
116
+ Sequence:
117
+
118
+ 1. Capture pre-fix file contents for every file this FIX will touch.
119
+ 2. Apply edits.
120
+ 3. Run `py_compile` (or language-equivalent) on each modified file.
121
+ 4. Compute `fix_diff` against pre-fix contents for the modified set.
122
+ 5. Run `bugteam_code_rules_gate.py` with explicit paths for every modified file.
123
+ 6. Spawn a scoped audit of `fix_diff` with full A–J rigor, Shape A/B contract, adversarial pass, AND Haiku secondary in parallel (paranoid mode on post-fix).
124
+ 7. Any new findings become same-loop fix-targets. Internal iteration count increments by one.
125
+ 8. After 3 internal iterations with fresh findings each time, exit `stuck: post-fix audit not converging`.
126
+ 9. Only when `gate_findings` empty AND `post_fix_findings` empty: `git add`, commit, push.
127
+
128
+ `converged` exit condition: `primary_audit_clean AND post_fix_audit_clean` for the committing loop.
129
+
130
+ ## Persistence
131
+
132
+ Every audit loop writes two JSON files under the skill's scoped temp directory (resolved via `tempfile.gettempdir()`):
133
+
134
+ ### `loop-<N>-audit.json`
135
+
136
+ ```json
137
+ {
138
+ "findings": [],
139
+ "proof_of_absence": [],
140
+ "source": "primary | haiku | adversarial | merged"
141
+ }
142
+ ```
143
+
144
+ ### `loop-<N>-diagnostics.json`
145
+
146
+ ```json
147
+ {
148
+ "loop": 1,
149
+ "gate_findings": [],
150
+ "primary_findings": [],
151
+ "adversarial_findings": [],
152
+ "haiku_findings": [],
153
+ "post_fix_findings": [],
154
+ "merged": [],
155
+ "deduped": []
156
+ }
157
+ ```
158
+
159
+ All eight keys MUST be present. Missing keys break convergence debugging.
@@ -35,8 +35,8 @@ This session is the **team lead**. Create the team with `TeamCreate` using the e
35
35
  - **Per-team temp directory (resolved once, reused everywhere):** After `team_name` is captured, resolve a portable absolute path with a Claude-side lookup using Python’s `tempfile.gettempdir()`, which honors `TMPDIR`, `TEMP`, and `TMP` in the platform-correct order and falls back to `C:\Users\<user>\AppData\Local\Temp` on Windows or `/tmp` on Unix: `Path(tempfile.gettempdir()) / team_name` (requires `import tempfile`). The `team_name` value already carries the `bugteam-` prefix; keep it as-is here. Let `tempfile.gettempdir()` perform the lookup; use its result directly. Capture the resolved absolute path as `<team_temp_dir>` and pass that literal path to every shell command that follows. Claude performs all temp-root resolution so every shell (bash, cmd.exe, PowerShell) receives the same literal absolute value.
36
36
 
37
37
  - **Roles defined up front (spawned per loop, not at team creation):**
38
- - `bugfind` — teammate role `code-quality-agent`, model sonnet
39
- - `bugfix` — teammate role `clean-coder`, model sonnet
38
+ - `bugfind` — teammate role `code-quality-agent`, model opus (Opus 4.7 at default xhigh effort)
39
+ - `bugfix` — teammate role `clean-coder`, model opus (Opus 4.7 at default xhigh effort)
40
40
 
41
41
  - **Display mode:** inherit the user’s default (`teammateMode` in `~/.claude.json`); do not override.
42
42
 
@@ -7,6 +7,69 @@ import sys
7
7
  from pathlib import Path
8
8
 
9
9
 
10
+ def verify_git_hooks_path(repository_root: Path | None = None) -> int:
11
+ """Check that core.hooksPath resolves to the claude-dev-env git-hooks directory.
12
+
13
+ When *repository_root* is provided, queries the effective config for that
14
+ repository (``git -C <root> config --get``), which detects repo-level
15
+ overrides such as Husky or lefthook. Falls back to the current working
16
+ directory's effective config when *repository_root* is None.
17
+
18
+ Returns zero when the configured path ends with the expected hooks suffix.
19
+ Returns non-zero and prints a correction message when unset or pointing elsewhere.
20
+ """
21
+ expected_hooks_path_suffix = "hooks/git-hooks"
22
+ enforcement_absent_message = (
23
+ "Git-side CODE_RULES enforcement is not active on this host.\n"
24
+ "Run: npx claude-dev-env .\n"
25
+ "Or set core.hooksPath at any scope, e.g.:\n"
26
+ " git config --global core.hooksPath ~/.claude/hooks/git-hooks"
27
+ )
28
+ git_command: list[str] = ["git"]
29
+ if repository_root is not None:
30
+ git_command.extend(["-C", str(repository_root)])
31
+ git_command.extend(["config", "--get", "core.hooksPath"])
32
+ try:
33
+ query_result = subprocess.run(
34
+ git_command,
35
+ capture_output=True,
36
+ text=True,
37
+ encoding="utf-8",
38
+ errors="replace",
39
+ check=False,
40
+ )
41
+ except FileNotFoundError:
42
+ print(
43
+ "bugteam_preflight: git is not installed or not available on PATH.\n"
44
+ f"{enforcement_absent_message}",
45
+ file=sys.stderr,
46
+ )
47
+ return 1
48
+ except OSError as os_error:
49
+ print(
50
+ f"bugteam_preflight: failed to run git: {os_error}\n"
51
+ f"{enforcement_absent_message}",
52
+ file=sys.stderr,
53
+ )
54
+ return 1
55
+ if query_result.returncode != 0:
56
+ print(
57
+ f"bugteam_preflight: {enforcement_absent_message}",
58
+ file=sys.stderr,
59
+ )
60
+ return 1
61
+ configured_path = query_result.stdout.strip().replace("\\", "/").rstrip("/")
62
+ if not configured_path.endswith(expected_hooks_path_suffix):
63
+ print(
64
+ f"bugteam_preflight: core.hooksPath is '{configured_path}' — "
65
+ f"expected path ending in '{expected_hooks_path_suffix}'.\n"
66
+ f"{enforcement_absent_message}",
67
+ file=sys.stderr,
68
+ )
69
+ return 1
70
+ return 0
71
+
72
+
10
73
  def find_repository_root(start: Path) -> Path:
11
74
  resolved = start.resolve()
12
75
  candidates = [resolved, *resolved.parents]
@@ -109,6 +172,9 @@ def main(argv: list[str] | None = None) -> int:
109
172
  if arguments.repo_root is not None
110
173
  else find_repository_root(start)
111
174
  )
175
+ hooks_path_exit_code = verify_git_hooks_path(repository_root)
176
+ if hooks_path_exit_code != 0:
177
+ return hooks_path_exit_code
112
178
  if not arguments.no_pytest and has_pytest_configuration(repository_root):
113
179
  if not has_discoverable_tests(repository_root):
114
180
  print(
@@ -0,0 +1,189 @@
1
+ """Tests for bugteam_preflight git hooks path verification.
2
+
3
+ Covers:
4
+ - core.hooksPath unset: exits non-zero with correction message
5
+ - core.hooksPath pointing to the correct claude hooks dir: exits zero
6
+ - core.hooksPath pointing elsewhere (husky override): exits non-zero
7
+ - core.hooksPath with trailing slash: must still pass after normalization
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ import importlib.util
13
+ import subprocess
14
+ from pathlib import Path
15
+ from types import ModuleType
16
+ from unittest.mock import MagicMock, patch
17
+
18
+ import pytest
19
+
20
+
21
+ def _load_preflight_module() -> ModuleType:
22
+ module_path = Path(__file__).parent / "bugteam_preflight.py"
23
+ spec = importlib.util.spec_from_file_location("bugteam_preflight", module_path)
24
+ assert spec is not None
25
+ assert spec.loader is not None
26
+ module = importlib.util.module_from_spec(spec)
27
+ spec.loader.exec_module(module)
28
+ return module
29
+
30
+
31
+ bugteam_preflight = _load_preflight_module()
32
+
33
+
34
+ def _make_completed_process(
35
+ stdout: str, returncode: int
36
+ ) -> subprocess.CompletedProcess:
37
+ process = MagicMock(spec=subprocess.CompletedProcess)
38
+ process.stdout = stdout
39
+ process.returncode = returncode
40
+ return process
41
+
42
+
43
+ def test_should_exit_nonzero_when_core_hooks_path_unset(capsys: pytest.CaptureFixture[str]) -> None:
44
+ with patch("subprocess.run") as mock_run:
45
+ mock_run.return_value = _make_completed_process("", returncode=1)
46
+ exit_code = bugteam_preflight.verify_git_hooks_path()
47
+ assert exit_code != 0
48
+ captured = capsys.readouterr()
49
+ assert "core.hooksPath" in captured.err
50
+ assert "npx claude-dev-env" in captured.err or "git config" in captured.err
51
+
52
+
53
+ def test_should_exit_zero_when_core_hooks_path_points_to_claude_hooks(tmp_path: Path) -> None:
54
+ claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
55
+ claude_hooks_path.mkdir(parents=True)
56
+ with patch("subprocess.run") as mock_run:
57
+ mock_run.return_value = _make_completed_process(
58
+ str(claude_hooks_path) + "\n", returncode=0
59
+ )
60
+ exit_code = bugteam_preflight.verify_git_hooks_path()
61
+ assert exit_code == 0
62
+
63
+
64
+ def test_should_exit_nonzero_when_core_hooks_path_points_elsewhere(capsys: pytest.CaptureFixture[str]) -> None:
65
+ with patch("subprocess.run") as mock_run:
66
+ mock_run.return_value = _make_completed_process(
67
+ "/some/other/path/.husky\n", returncode=0
68
+ )
69
+ exit_code = bugteam_preflight.verify_git_hooks_path()
70
+ assert exit_code != 0
71
+ captured = capsys.readouterr()
72
+ assert "core.hooksPath" in captured.err
73
+
74
+
75
+ def test_should_include_correction_commands_in_error_message(capsys: pytest.CaptureFixture[str]) -> None:
76
+ with patch("subprocess.run") as mock_run:
77
+ mock_run.return_value = _make_completed_process("", returncode=1)
78
+ bugteam_preflight.verify_git_hooks_path()
79
+ captured = capsys.readouterr()
80
+ assert (
81
+ "npx claude-dev-env" in captured.err
82
+ or "git config --global core.hooksPath" in captured.err
83
+ )
84
+
85
+
86
+ def test_main_should_exit_nonzero_when_hooks_path_unset() -> None:
87
+ with patch("subprocess.run") as mock_run:
88
+ mock_run.return_value = _make_completed_process("", returncode=1)
89
+ exit_code = bugteam_preflight.main(["--no-pytest"])
90
+ assert exit_code != 0
91
+
92
+
93
+ def test_main_should_continue_when_hooks_path_valid(tmp_path: Path) -> None:
94
+ claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
95
+ claude_hooks_path.mkdir(parents=True)
96
+ with patch("subprocess.run") as mock_run:
97
+ mock_run.return_value = _make_completed_process(
98
+ str(claude_hooks_path) + "\n", returncode=0
99
+ )
100
+ exit_code = bugteam_preflight.main(["--no-pytest"])
101
+ assert exit_code == 0
102
+
103
+
104
+ def test_should_accept_hooks_path_with_trailing_slash() -> None:
105
+ with patch("subprocess.run") as mock_run:
106
+ mock_run.return_value = _make_completed_process(
107
+ "/home/user/.claude/hooks/git-hooks/\n", returncode=0
108
+ )
109
+ exit_code = bugteam_preflight.verify_git_hooks_path()
110
+ assert exit_code == 0, (
111
+ "hooksPath with trailing slash must pass verification after normalization"
112
+ )
113
+
114
+
115
+ def test_should_exit_zero_when_hooks_path_set_at_repo_scope(tmp_path: Path) -> None:
116
+ claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
117
+ claude_hooks_path.mkdir(parents=True)
118
+ repo_root = tmp_path / "my-repo"
119
+ repo_root.mkdir()
120
+ with patch("subprocess.run") as mock_run:
121
+ mock_run.return_value = _make_completed_process(
122
+ str(claude_hooks_path) + "\n", returncode=0
123
+ )
124
+ exit_code = bugteam_preflight.verify_git_hooks_path(repo_root)
125
+ assert exit_code == 0, (
126
+ "verify_git_hooks_path must accept a valid path returned by effective "
127
+ "config query (not restricted to --global scope)"
128
+ )
129
+ called_command = mock_run.call_args[0][0]
130
+ assert "--global" not in called_command, (
131
+ "verify_git_hooks_path must query effective config, not --global only"
132
+ )
133
+ assert "-C" in called_command, (
134
+ "verify_git_hooks_path must use git -C <repo_root> for repo-effective config"
135
+ )
136
+ dash_c_index = called_command.index("-C")
137
+ assert called_command[dash_c_index + 1] == str(repo_root), (
138
+ "git -C must receive the resolved repository root path"
139
+ )
140
+
141
+
142
+ def test_should_accept_hooks_path_with_backslash_and_trailing_slash() -> None:
143
+ with patch("subprocess.run") as mock_run:
144
+ mock_run.return_value = _make_completed_process(
145
+ "C:\\Users\\user\\.claude\\hooks\\git-hooks\\\n", returncode=0
146
+ )
147
+ exit_code = bugteam_preflight.verify_git_hooks_path()
148
+ assert exit_code == 0, (
149
+ "Windows hooksPath with trailing backslash must pass after normalization"
150
+ )
151
+
152
+
153
+ def test_should_exit_nonzero_when_git_executable_not_found(
154
+ capsys: pytest.CaptureFixture[str],
155
+ ) -> None:
156
+ """Preflight must not crash with a traceback when git is missing from PATH."""
157
+ with patch("subprocess.run", side_effect=FileNotFoundError()):
158
+ exit_code = bugteam_preflight.verify_git_hooks_path()
159
+ assert exit_code != 0, (
160
+ "FileNotFoundError from subprocess.run must produce a non-zero exit, "
161
+ "not a propagated traceback"
162
+ )
163
+ captured = capsys.readouterr()
164
+ assert "git" in captured.err.lower(), (
165
+ "Error message must mention git so the user knows what is missing"
166
+ )
167
+ assert (
168
+ "npx claude-dev-env" in captured.err
169
+ or "git config --global core.hooksPath" in captured.err
170
+ ), "Error message must include the enforcement-absent remediation hints"
171
+
172
+
173
+ def test_should_exit_nonzero_when_subprocess_run_raises_os_error(
174
+ capsys: pytest.CaptureFixture[str],
175
+ ) -> None:
176
+ """Preflight must surface a clean error for other OS-level git launch failures."""
177
+ with patch("subprocess.run", side_effect=OSError("permission denied")):
178
+ exit_code = bugteam_preflight.verify_git_hooks_path()
179
+ assert exit_code != 0, (
180
+ "OSError from subprocess.run must produce a non-zero exit, "
181
+ "not a propagated traceback"
182
+ )
183
+ captured = capsys.readouterr()
184
+ assert "bugteam_preflight" in captured.err, (
185
+ "Error message must be prefixed with the preflight tool name for context"
186
+ )
187
+ assert "permission denied" in captured.err, (
188
+ "Error message must include the underlying OSError detail for diagnosis"
189
+ )
@@ -0,0 +1,145 @@
1
+ ---
2
+ name: copilot-review
3
+ description: >-
4
+ Spawns a background subagent that babysits the GitHub Copilot reviewer on the
5
+ current PR. The subagent self-paces at ~5 minutes per tick, fetches the
6
+ latest copilot-pull-request-reviewer[bot] review, fixes unaddressed inline
7
+ findings against current HEAD (new commit, push, inline replies), and
8
+ re-requests review via the documented requested_reviewers API. The subagent
9
+ terminates on convergence (clean review against HEAD) and reports back.
10
+ Triggers: '/copilot-review', 'watch copilot', 'babysit copilot review',
11
+ 'loop copilot reviews', 're-request copilot', 'keep re-requesting copilot'.
12
+ ---
13
+
14
+ # Copilot Review
15
+
16
+ Delegates Copilot babysitting to a background subagent so the main session stays free. The subagent loops internally and closes itself on convergence.
17
+
18
+ ## When this skill applies
19
+
20
+ The user is on a PR branch, wants Copilot (the GitHub Copilot reviewer bot) to keep re-reviewing after each push, and wants findings auto-addressed between ticks — but does not want the main conversation consumed by polling.
21
+
22
+ ## The Process
23
+
24
+ ### Step 1: Gather PR context
25
+
26
+ From the current repo:
27
+
28
+ ```bash
29
+ gh pr view --json number,url,headRefOid,baseRefName,headRefName,isDraft
30
+ ```
31
+
32
+ Capture `number`, `headRefOid`, owner/repo (from `url`), and branch name. Pass these to the subagent so it does not rediscover them.
33
+
34
+ ### Step 2: Spawn the background subagent
35
+
36
+ Invoke the `Agent` tool with:
37
+
38
+ - `subagent_type: "general-purpose"`
39
+ - `run_in_background: true`
40
+ - `description: "Copilot review loop for PR #<N>"`
41
+ - `prompt`: the full instructions in **Step 3 (Subagent prompt template)**, with placeholders filled in from Step 1.
42
+
43
+ Record the returned agent ID. Report to the user in one or two lines:
44
+
45
+ - The subagent is running in the background.
46
+ - It self-terminates on convergence.
47
+ - To stop it early, the user says "stop the copilot loop" and you call `TaskStop <agent_id>`.
48
+ - The main session stays free; completion arrives as a notification.
49
+
50
+ Let the subagent own the cadence. The skill's job in the main session ends once the subagent is spawned and reported.
51
+
52
+ ### Step 3: Subagent prompt template
53
+
54
+ Pass this verbatim to the subagent (substituting the bracketed values):
55
+
56
+ > You are babysitting the GitHub Copilot reviewer on PR **#[NUMBER]** at **[OWNER]/[REPO]** (branch `[BRANCH]`, current HEAD `[HEAD_SHA]`). Your job: keep the loop running until Copilot returns a clean review against the current HEAD, then stop.
57
+ >
58
+ > **Per-tick work** (do this now, then on each wakeup):
59
+ >
60
+ > 1. Resolve current HEAD: `gh api repos/[OWNER]/[REPO]/pulls/[NUMBER] --jq '.head.sha'`.
61
+ > 2. Fetch latest Copilot review:
62
+ > ```bash
63
+ > gh api repos/[OWNER]/[REPO]/pulls/[NUMBER]/reviews \
64
+ > --jq '[.[] | select(.user.login=="copilot-pull-request-reviewer[bot]")] | sort_by(.submitted_at) | last'
65
+ > ```
66
+ > Capture `commit_id`, `state`, `submitted_at`, `id`.
67
+ > 3. Decide the branch:
68
+ > - **No review exists:** re-request (step 4), schedule next wakeup, return.
69
+ > - **Latest review's `commit_id` != current HEAD:** re-request (step 4), schedule next wakeup, return.
70
+ > - **Latest review's `commit_id` == current HEAD with unresolved inline findings:** TDD-fix them, push, reply inline on each thread, re-request (step 4), schedule next wakeup, return.
71
+ > - **Latest review's `commit_id` == current HEAD and clean:** report convergence to the parent with a one-sentence summary and terminate. The loop is done; skip the ScheduleWakeup call.
72
+ > 4. Re-request Copilot. The reviewer ID **must** be `copilot-pull-request-reviewer[bot]` with the `[bot]` suffix — empirically verified: `Copilot`, `copilot`, and `github-copilot` all return `requested_reviewers: []` with no error, silently no-op.
73
+ > ```bash
74
+ > gh api -X POST repos/[OWNER]/[REPO]/pulls/[NUMBER]/requested_reviewers \
75
+ > -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
76
+ > ```
77
+ > 5. Schedule the next wakeup with `ScheduleWakeup`:
78
+ > - `delaySeconds: 300`
79
+ > - `reason`: one short sentence on what you are waiting for.
80
+ > - `prompt`: the literal sentinel `<<autonomous-loop-dynamic>>` so the next firing re-enters these instructions.
81
+ >
82
+ > **Fix protocol** (step 3, third branch):
83
+ >
84
+ > - Read each referenced file:line.
85
+ > - Write a failing test first when the finding has behavior to test. For pure doc or comment nits that have no behavior, go straight to the fix.
86
+ > - Implement the fix.
87
+ > - Stage the fix and create one new commit on the existing branch: `git add <files> && git commit -m "fix(review): ..."`.
88
+ > - Push the new commit: `git push origin [BRANCH]`.
89
+ > - Reply inline on each comment thread with `gh api -X POST repos/[OWNER]/[REPO]/pulls/[NUMBER]/comments` using `in_reply_to` set to the comment id, referencing the new commit SHA.
90
+ >
91
+ > When a pre-push, pre-commit, or other hook rejects the change, solve it. Read the hook's error message, diagnose the root cause in the code or test, and fix that. Then rerun the commit or push. Hooks exist to catch real problems; treat each rejection as new evidence to act on.
92
+ >
93
+ > **Stop conditions:**
94
+ >
95
+ > - Convergence (clean review against HEAD): report one-sentence summary to parent and terminate.
96
+ > - Blocker you have exhausted fix attempts on (API auth failure persists, CI regression whose root cause falls outside this PR, a hook you have investigated and cannot resolve in one commit): report the specific blocker and its diagnosis to the parent, then terminate without scheduling another wakeup.
97
+ > - Parent sends `TaskStop`: terminate immediately.
98
+ >
99
+ > **Safety cap:** after 20 ticks without convergence, stop and report. That many rounds means something structural is wrong with the loop.
100
+
101
+ ### Step 4: Report back to the user
102
+
103
+ After spawning, tell the user in one or two lines: subagent ID, PR URL, that it will notify on convergence or blocker. Nothing else.
104
+
105
+ ## Stopping the subagent
106
+
107
+ - Convergence → subagent stops itself.
108
+ - Blocker → subagent reports and stops.
109
+ - User says stop → `TaskStop <agent_id>`.
110
+ - User asks what loops are running → `TaskList`.
111
+
112
+ ## Ground rules (for the subagent)
113
+
114
+ - **Append commits.** Each tick adds one new commit on the existing branch with `git commit` and `git push origin [BRANCH]`.
115
+ - **Honor pre-push and pre-commit hooks.** When a hook rejects the change, read its output, fix the underlying issue (the failing test, the missing constant, the broken import), and retry. Solve, do not punt.
116
+ - **Respect the PR's current state.** Whatever draft-vs-ready state the PR has when the loop starts is the state the subagent preserves. The user decides when to flip it.
117
+ - **One fix commit per tick.** Batch all of the current tick's findings into a single commit; the next tick handles the next review round.
118
+ - **Use `copilot-pull-request-reviewer[bot]` with the `[bot]` suffix for the reviewer ID.** That exact spelling is load-bearing — it is the only form the API accepts.
119
+
120
+ ## Examples
121
+
122
+ <example>
123
+ User: `/copilot-review`
124
+ Claude: [reads PR context, spawns background subagent with the Step 3 template, reports "subagent X watching PR #123; will notify on convergence"]
125
+ </example>
126
+
127
+ <example>
128
+ User: "babysit copilot on this PR until it's clean"
129
+ Claude: [same as above]
130
+ </example>
131
+
132
+ <example>
133
+ Subagent tick fires, latest Copilot review is against an older commit.
134
+ Subagent: [re-requests review, schedules next wakeup, returns]
135
+ </example>
136
+
137
+ <example>
138
+ Subagent tick fires, Copilot has 2 unaddressed inline findings on HEAD.
139
+ Subagent: [TDD-fixes both, one commit, pushes, replies inline on both threads, re-requests review, schedules next wakeup]
140
+ </example>
141
+
142
+ <example>
143
+ Subagent tick fires, latest review is clean against HEAD.
144
+ Subagent: [reports convergence to parent, terminates — no further wakeups]
145
+ </example>
@@ -47,12 +47,12 @@ The audit's authoritative scope is this single diff file. Do not inject extra fi
47
47
 
48
48
  ### Step 3: Spawn the code-quality-agent — clean room
49
49
 
50
- Call the Agent tool with:
50
+ Call the Agent tool twice in a single message (primary + Haiku secondary per the audit contract's Haiku secondary section):
51
51
 
52
- - `subagent_type: code-quality-agent`
53
- - `model: sonnet`
54
- - `description: "PR bug audit"`
55
- - `run_in_background: false` the user invoked `/findbugs` to get a result on this turn
52
+ - Primary: `subagent_type: code-quality-agent`, `model: sonnet`, `description: "PR bug audit"`, `run_in_background: false`
53
+ - Secondary: `subagent_type: code-quality-agent`, `model: haiku`, `description: "PR bug audit (secondary)"`, `run_in_background: false`
54
+
55
+ After both return, merge per the contract's Haiku secondary section (de-dup key, max-wins severity, malformed-output fallback) before reporting to the user.
56
56
 
57
57
  **The agent prompt must be self-contained and context-free.** Specifically:
58
58
 
@@ -98,25 +98,17 @@ The XML prompt skeleton:
98
98
  </constraints>
99
99
 
100
100
  <output_format>
101
- P0 = will not run / data corruption
102
- P1 = regression or silent failure
103
- P2 = dead code, minor smell
104
-
105
- ## Summary
106
- Total: N (P0=N, P1=N, P2=N)
107
-
108
- ## Findings
109
- ### [P_] short title
110
- File: file/path:line
111
- Category: A-J
112
- Issue: 2-3 sentence description with concrete trace
113
- Evidence: code excerpt or grep result
101
+ Follow the shared audit contract at ../bugteam/reference/audit-contract.md:
114
102
 
115
- ## Verified clean
116
- Per category investigated, name the evidence and the conclusion.
103
+ - Severity: P0 = will not run / data corruption; P1 = regression or silent
104
+ failure; P2 = dead code, minor smell.
105
+ - Per category, produce either Shape A (structured finding) or Shape B
106
+ (proof-of-absence). Bare "verified clean" labels are REJECTED.
107
+ - Run the contract's adversarial second pass after the primary list.
117
108
 
118
- ## Open questions
119
- Anything ambiguous from the diff alone.
109
+ Preamble: `Total: N (P0=N, P1=N, P2=N)`. Emit findings and proof-of-absence
110
+ entries in the JSON shapes defined by the contract. Include an "Open
111
+ questions" section for items the diff alone cannot resolve.
120
112
  </output_format>
121
113
  ```
122
114