claude-dev-env 1.28.1 → 1.29.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/caveman.md +74 -0
- package/hooks/blocking/code_rules_enforcer.py +82 -7
- package/hooks/blocking/code_rules_path_utils.py +31 -0
- package/hooks/blocking/es_exe_path_rewriter.py +159 -0
- package/hooks/blocking/hedging_language_blocker.py +12 -2
- package/hooks/blocking/test_code_rules_enforcer.py +148 -0
- package/hooks/blocking/test_code_rules_enforcer_config_path.py +123 -0
- package/hooks/blocking/test_code_rules_enforcer_magic_allowlist.py +1 -1
- package/hooks/blocking/test_code_rules_path_utils.py +52 -0
- package/hooks/blocking/test_es_exe_path_rewriter.py +369 -0
- package/hooks/blocking/test_hedging_language_blocker.py +7 -6
- package/hooks/config/dynamic_stderr_handler.py +22 -0
- package/hooks/config/path_rewriter_constants.py +13 -0
- package/hooks/config/project_paths_reader.py +78 -0
- package/hooks/config/setup_project_paths_constants.py +41 -0
- package/hooks/config/test_dynamic_stderr_handler.py +48 -0
- package/hooks/config/test_messages.py +5 -1
- package/hooks/config/test_path_rewriter_constants.py +57 -0
- package/hooks/config/test_project_paths_reader.py +149 -0
- package/hooks/config/test_setup_project_paths_constants.py +74 -0
- package/hooks/git-hooks/test_config.py +1 -0
- package/hooks/git-hooks/test_gate_utils.py +1 -0
- package/hooks/git-hooks/test_pre_commit.py +1 -0
- package/hooks/git-hooks/test_pre_push.py +1 -0
- package/hooks/hooks.json +10 -0
- package/hooks/session/test_untracked_repo_detector.py +192 -0
- package/hooks/session/untracked_repo_detector.py +103 -0
- package/hooks/validators/exempt_paths.py +17 -14
- package/hooks/validators/test_exempt_paths.py +65 -0
- package/hooks/validators/test_git_checks.py +17 -17
- package/package.json +1 -1
- package/scripts/config/__init__.py +1 -0
- package/scripts/config/groq_bugteam_config.py +118 -0
- package/scripts/config/test_groq_bugteam_config.py +72 -0
- package/scripts/groq_bugteam.README.md +129 -0
- package/scripts/groq_bugteam.py +586 -0
- package/scripts/setup_project_paths.py +352 -0
- package/scripts/test_groq_bugteam.py +391 -0
- package/scripts/test_setup_project_paths.py +532 -0
- package/scripts/test_setup_project_paths_config.py +6 -0
- package/skills/bugteam/CONSTRAINTS.md +1 -1
- package/skills/bugteam/PROMPTS.md +1 -1
- package/skills/bugteam/SKILL.md +5 -5
- package/skills/bugteam/SKILL_EVALS.md +5 -5
- package/skills/bugteam/reference/audit-and-teammates.md +3 -3
- package/skills/bugteam/reference/audit-contract.md +159 -0
- package/skills/bugteam/reference/team-setup.md +2 -2
- package/skills/bugteam/scripts/bugteam_preflight.py +66 -0
- package/skills/bugteam/scripts/test_bugteam_preflight.py +189 -0
- package/skills/copilot-review/SKILL.md +145 -0
- package/skills/findbugs/SKILL.md +14 -22
- package/skills/qbug/SKILL.md +56 -13
- package/skills/qbug/test_qbug_skill_audit_schema.py +156 -0
- package/skills/qbug/test_qbug_skill_post_fix_audit.py +103 -0
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
# Audit contract
|
|
2
|
+
|
|
3
|
+
Shared output schema and audit-loop contract used by `/bugteam`, `/qbug`, `/findbugs`, and `/fixbugs`. Changing a shape here is a breaking change for every consuming skill.
|
|
4
|
+
|
|
5
|
+
## Contents
|
|
6
|
+
|
|
7
|
+
- Finding schema (Shape A, Shape B)
|
|
8
|
+
- Adversarial second pass
|
|
9
|
+
- Haiku secondary auditor
|
|
10
|
+
- Post-fix self-audit
|
|
11
|
+
- Persistence (loop-N-audit.json, loop-N-diagnostics.json)
|
|
12
|
+
|
|
13
|
+
## Finding schema
|
|
14
|
+
|
|
15
|
+
Each finding an audit produces MUST be one of exactly two shapes.
|
|
16
|
+
|
|
17
|
+
### Shape A — structured finding
|
|
18
|
+
|
|
19
|
+
```json
|
|
20
|
+
{
|
|
21
|
+
"id": "loop<N>-<K>",
|
|
22
|
+
"file": "path/relative/to/repo/root.py",
|
|
23
|
+
"line": 123,
|
|
24
|
+
"category": "A | B | C | D | E | F | G | H | I | J",
|
|
25
|
+
"severity": "P0 | P1 | P2",
|
|
26
|
+
"excerpt": "verbatim code snippet from the offending line(s)",
|
|
27
|
+
"failure_mode": "one sentence describing what goes wrong and when",
|
|
28
|
+
"evidence_files": ["additional/files/opened.py"]
|
|
29
|
+
}
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
`id` is `loop<N>-<K>` where `N` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
|
|
33
|
+
|
|
34
|
+
### Shape B — structured proof-of-absence
|
|
35
|
+
|
|
36
|
+
Used when an audit investigates a category and does NOT find a bug. Bare "verified clean" claims are REJECTED because they hide shallow reading.
|
|
37
|
+
|
|
38
|
+
```json
|
|
39
|
+
{
|
|
40
|
+
"category": "A | B | C | D | E | F | G | H | I | J",
|
|
41
|
+
"files_opened": ["file1.py", "file2.py"],
|
|
42
|
+
"lines_quoted": [
|
|
43
|
+
{"file": "file1.py", "line": 88, "text": "verbatim line content"}
|
|
44
|
+
],
|
|
45
|
+
"adversarial_probes": [
|
|
46
|
+
"what failure mode was tested for and how it was ruled out"
|
|
47
|
+
]
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Every category an audit touches MUST have either at least one Shape A finding OR at least one Shape B proof-of-absence entry. A category with neither is a protocol violation.
|
|
52
|
+
|
|
53
|
+
### Example — Shape A
|
|
54
|
+
|
|
55
|
+
```json
|
|
56
|
+
{
|
|
57
|
+
"id": "loop1-1",
|
|
58
|
+
"file": "scripts/db/neon.py",
|
|
59
|
+
"line": 43,
|
|
60
|
+
"category": "C",
|
|
61
|
+
"severity": "P1",
|
|
62
|
+
"excerpt": "load_dotenv(env_path, override=False)",
|
|
63
|
+
"failure_mode": "Called on every connect() — repeats file I/O per connection in scripts that open multiple short-lived connections.",
|
|
64
|
+
"evidence_files": ["scripts/db/neon.py", "scripts/update_new_releases.py"]
|
|
65
|
+
}
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Example — Shape B
|
|
69
|
+
|
|
70
|
+
```json
|
|
71
|
+
{
|
|
72
|
+
"category": "H",
|
|
73
|
+
"files_opened": ["scripts/db/neon.py", "scripts/db/config.py"],
|
|
74
|
+
"lines_quoted": [
|
|
75
|
+
{"file": "scripts/db/neon.py", "line": 30, "text": "dsn = os.environ.get(\"DATABASE_URL\")"}
|
|
76
|
+
],
|
|
77
|
+
"adversarial_probes": [
|
|
78
|
+
"Checked whether DATABASE_URL is interpolated into a shell — it is passed to psycopg.connect() directly with no shell involvement.",
|
|
79
|
+
"Checked whether the env path is user-controlled — it is derived from a fixed Y: drive constant, not user input."
|
|
80
|
+
]
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Adversarial second pass
|
|
85
|
+
|
|
86
|
+
After the primary finding list is complete, every audit runs a second pass against itself with the prompt:
|
|
87
|
+
|
|
88
|
+
> Assume your first pass missed at least 3 P1 bugs. Where are they?
|
|
89
|
+
|
|
90
|
+
The audit must either produce new Shape A findings citing new file:line references not present in the first pass, or cite explicit Shape B adversarial-probe entries for each category it re-examined. An adversarial pass that returns "nothing new, confident first pass was complete" is REJECTED — produce evidence or findings, not confidence.
|
|
91
|
+
|
|
92
|
+
## Haiku secondary auditor
|
|
93
|
+
|
|
94
|
+
For single-subagent skills (`/qbug`, `/findbugs`) the LEAD spawns two `Agent()` calls in one message:
|
|
95
|
+
|
|
96
|
+
- **Primary** — `subagent_type=clean-coder`, `model=sonnet` (for qbug cycle) or `subagent_type=code-quality-agent`, `model=sonnet` (for findbugs clean-room).
|
|
97
|
+
- **Secondary (Haiku)** — `subagent_type=code-quality-agent`, `model=haiku`, same self-contained clean-room prompt shape used by `/findbugs`.
|
|
98
|
+
|
|
99
|
+
Both audit the same diff. The secondary returns findings to the LEAD only — never posted to the PR.
|
|
100
|
+
|
|
101
|
+
Merge rules:
|
|
102
|
+
|
|
103
|
+
- **De-dup key**: `(file, line, category)`.
|
|
104
|
+
- **Severity conflict**: max wins (P0 > P1 > P2).
|
|
105
|
+
- **Unique-to-Haiku findings**: added to the primary set with Haiku's severity and source annotation.
|
|
106
|
+
- **Unique-to-primary findings**: kept as-is.
|
|
107
|
+
- **Zero Haiku findings**: primary set trusted; proceed.
|
|
108
|
+
- **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<N>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
|
|
109
|
+
|
|
110
|
+
For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via the three variant teammates.
|
|
111
|
+
|
|
112
|
+
## Post-fix self-audit
|
|
113
|
+
|
|
114
|
+
Audit-and-fix skills (`/qbug`, `/bugteam`) MUST re-audit modified files between `py_compile` and `git add`. This catches fix-induced regressions in the same loop that introduced them rather than on loop N+1.
|
|
115
|
+
|
|
116
|
+
Sequence:
|
|
117
|
+
|
|
118
|
+
1. Capture pre-fix file contents for every file this FIX will touch.
|
|
119
|
+
2. Apply edits.
|
|
120
|
+
3. Run `py_compile` (or language-equivalent) on each modified file.
|
|
121
|
+
4. Compute `fix_diff` against pre-fix contents for the modified set.
|
|
122
|
+
5. Run `bugteam_code_rules_gate.py` with explicit paths for every modified file.
|
|
123
|
+
6. Spawn a scoped audit of `fix_diff` with full A–J rigor, Shape A/B contract, adversarial pass, AND Haiku secondary in parallel (paranoid mode on post-fix).
|
|
124
|
+
7. Any new findings become same-loop fix-targets. Internal iteration count increments by one.
|
|
125
|
+
8. After 3 internal iterations with fresh findings each time, exit `stuck: post-fix audit not converging`.
|
|
126
|
+
9. Only when `gate_findings` empty AND `post_fix_findings` empty: `git add`, commit, push.
|
|
127
|
+
|
|
128
|
+
`converged` exit condition: `primary_audit_clean AND post_fix_audit_clean` for the committing loop.
|
|
129
|
+
|
|
130
|
+
## Persistence
|
|
131
|
+
|
|
132
|
+
Every audit loop writes two JSON files under the skill's scoped temp directory (resolved via `tempfile.gettempdir()`):
|
|
133
|
+
|
|
134
|
+
### `loop-<N>-audit.json`
|
|
135
|
+
|
|
136
|
+
```json
|
|
137
|
+
{
|
|
138
|
+
"findings": [],
|
|
139
|
+
"proof_of_absence": [],
|
|
140
|
+
"source": "primary | haiku | adversarial | merged"
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### `loop-<N>-diagnostics.json`
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"loop": 1,
|
|
149
|
+
"gate_findings": [],
|
|
150
|
+
"primary_findings": [],
|
|
151
|
+
"adversarial_findings": [],
|
|
152
|
+
"haiku_findings": [],
|
|
153
|
+
"post_fix_findings": [],
|
|
154
|
+
"merged": [],
|
|
155
|
+
"deduped": []
|
|
156
|
+
}
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
All eight keys MUST be present. Missing keys break convergence debugging.
|
|
@@ -35,8 +35,8 @@ This session is the **team lead**. Create the team with `TeamCreate` using the e
|
|
|
35
35
|
- **Per-team temp directory (resolved once, reused everywhere):** After `team_name` is captured, resolve a portable absolute path with a Claude-side lookup using Python’s `tempfile.gettempdir()`, which honors `TMPDIR`, `TEMP`, and `TMP` in the platform-correct order and falls back to `C:\Users\<user>\AppData\Local\Temp` on Windows or `/tmp` on Unix: `Path(tempfile.gettempdir()) / team_name` (requires `import tempfile`). The `team_name` value already carries the `bugteam-` prefix; keep it as-is here. Let `tempfile.gettempdir()` perform the lookup; use its result directly. Capture the resolved absolute path as `<team_temp_dir>` and pass that literal path to every shell command that follows. Claude performs all temp-root resolution so every shell (bash, cmd.exe, PowerShell) receives the same literal absolute value.
|
|
36
36
|
|
|
37
37
|
- **Roles defined up front (spawned per loop, not at team creation):**
|
|
38
|
-
- `bugfind` — teammate role `code-quality-agent`, model
|
|
39
|
-
- `bugfix` — teammate role `clean-coder`, model
|
|
38
|
+
- `bugfind` — teammate role `code-quality-agent`, model opus (Opus 4.7 at default xhigh effort)
|
|
39
|
+
- `bugfix` — teammate role `clean-coder`, model opus (Opus 4.7 at default xhigh effort)
|
|
40
40
|
|
|
41
41
|
- **Display mode:** inherit the user’s default (`teammateMode` in `~/.claude.json`); do not override.
|
|
42
42
|
|
|
@@ -7,6 +7,69 @@ import sys
|
|
|
7
7
|
from pathlib import Path
|
|
8
8
|
|
|
9
9
|
|
|
10
|
+
def verify_git_hooks_path(repository_root: Path | None = None) -> int:
|
|
11
|
+
"""Check that core.hooksPath resolves to the claude-dev-env git-hooks directory.
|
|
12
|
+
|
|
13
|
+
When *repository_root* is provided, queries the effective config for that
|
|
14
|
+
repository (``git -C <root> config --get``), which detects repo-level
|
|
15
|
+
overrides such as Husky or lefthook. Falls back to the current working
|
|
16
|
+
directory's effective config when *repository_root* is None.
|
|
17
|
+
|
|
18
|
+
Returns zero when the configured path ends with the expected hooks suffix.
|
|
19
|
+
Returns non-zero and prints a correction message when unset or pointing elsewhere.
|
|
20
|
+
"""
|
|
21
|
+
expected_hooks_path_suffix = "hooks/git-hooks"
|
|
22
|
+
enforcement_absent_message = (
|
|
23
|
+
"Git-side CODE_RULES enforcement is not active on this host.\n"
|
|
24
|
+
"Run: npx claude-dev-env .\n"
|
|
25
|
+
"Or set core.hooksPath at any scope, e.g.:\n"
|
|
26
|
+
" git config --global core.hooksPath ~/.claude/hooks/git-hooks"
|
|
27
|
+
)
|
|
28
|
+
git_command: list[str] = ["git"]
|
|
29
|
+
if repository_root is not None:
|
|
30
|
+
git_command.extend(["-C", str(repository_root)])
|
|
31
|
+
git_command.extend(["config", "--get", "core.hooksPath"])
|
|
32
|
+
try:
|
|
33
|
+
query_result = subprocess.run(
|
|
34
|
+
git_command,
|
|
35
|
+
capture_output=True,
|
|
36
|
+
text=True,
|
|
37
|
+
encoding="utf-8",
|
|
38
|
+
errors="replace",
|
|
39
|
+
check=False,
|
|
40
|
+
)
|
|
41
|
+
except FileNotFoundError:
|
|
42
|
+
print(
|
|
43
|
+
"bugteam_preflight: git is not installed or not available on PATH.\n"
|
|
44
|
+
f"{enforcement_absent_message}",
|
|
45
|
+
file=sys.stderr,
|
|
46
|
+
)
|
|
47
|
+
return 1
|
|
48
|
+
except OSError as os_error:
|
|
49
|
+
print(
|
|
50
|
+
f"bugteam_preflight: failed to run git: {os_error}\n"
|
|
51
|
+
f"{enforcement_absent_message}",
|
|
52
|
+
file=sys.stderr,
|
|
53
|
+
)
|
|
54
|
+
return 1
|
|
55
|
+
if query_result.returncode != 0:
|
|
56
|
+
print(
|
|
57
|
+
f"bugteam_preflight: {enforcement_absent_message}",
|
|
58
|
+
file=sys.stderr,
|
|
59
|
+
)
|
|
60
|
+
return 1
|
|
61
|
+
configured_path = query_result.stdout.strip().replace("\\", "/").rstrip("/")
|
|
62
|
+
if not configured_path.endswith(expected_hooks_path_suffix):
|
|
63
|
+
print(
|
|
64
|
+
f"bugteam_preflight: core.hooksPath is '{configured_path}' — "
|
|
65
|
+
f"expected path ending in '{expected_hooks_path_suffix}'.\n"
|
|
66
|
+
f"{enforcement_absent_message}",
|
|
67
|
+
file=sys.stderr,
|
|
68
|
+
)
|
|
69
|
+
return 1
|
|
70
|
+
return 0
|
|
71
|
+
|
|
72
|
+
|
|
10
73
|
def find_repository_root(start: Path) -> Path:
|
|
11
74
|
resolved = start.resolve()
|
|
12
75
|
candidates = [resolved, *resolved.parents]
|
|
@@ -109,6 +172,9 @@ def main(argv: list[str] | None = None) -> int:
|
|
|
109
172
|
if arguments.repo_root is not None
|
|
110
173
|
else find_repository_root(start)
|
|
111
174
|
)
|
|
175
|
+
hooks_path_exit_code = verify_git_hooks_path(repository_root)
|
|
176
|
+
if hooks_path_exit_code != 0:
|
|
177
|
+
return hooks_path_exit_code
|
|
112
178
|
if not arguments.no_pytest and has_pytest_configuration(repository_root):
|
|
113
179
|
if not has_discoverable_tests(repository_root):
|
|
114
180
|
print(
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
"""Tests for bugteam_preflight git hooks path verification.
|
|
2
|
+
|
|
3
|
+
Covers:
|
|
4
|
+
- core.hooksPath unset: exits non-zero with correction message
|
|
5
|
+
- core.hooksPath pointing to the correct claude hooks dir: exits zero
|
|
6
|
+
- core.hooksPath pointing elsewhere (husky override): exits non-zero
|
|
7
|
+
- core.hooksPath with trailing slash: must still pass after normalization
|
|
8
|
+
"""
|
|
9
|
+
|
|
10
|
+
from __future__ import annotations
|
|
11
|
+
|
|
12
|
+
import importlib.util
|
|
13
|
+
import subprocess
|
|
14
|
+
from pathlib import Path
|
|
15
|
+
from types import ModuleType
|
|
16
|
+
from unittest.mock import MagicMock, patch
|
|
17
|
+
|
|
18
|
+
import pytest
|
|
19
|
+
|
|
20
|
+
|
|
21
|
+
def _load_preflight_module() -> ModuleType:
|
|
22
|
+
module_path = Path(__file__).parent / "bugteam_preflight.py"
|
|
23
|
+
spec = importlib.util.spec_from_file_location("bugteam_preflight", module_path)
|
|
24
|
+
assert spec is not None
|
|
25
|
+
assert spec.loader is not None
|
|
26
|
+
module = importlib.util.module_from_spec(spec)
|
|
27
|
+
spec.loader.exec_module(module)
|
|
28
|
+
return module
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
bugteam_preflight = _load_preflight_module()
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
def _make_completed_process(
|
|
35
|
+
stdout: str, returncode: int
|
|
36
|
+
) -> subprocess.CompletedProcess:
|
|
37
|
+
process = MagicMock(spec=subprocess.CompletedProcess)
|
|
38
|
+
process.stdout = stdout
|
|
39
|
+
process.returncode = returncode
|
|
40
|
+
return process
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
def test_should_exit_nonzero_when_core_hooks_path_unset(capsys: pytest.CaptureFixture[str]) -> None:
|
|
44
|
+
with patch("subprocess.run") as mock_run:
|
|
45
|
+
mock_run.return_value = _make_completed_process("", returncode=1)
|
|
46
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
47
|
+
assert exit_code != 0
|
|
48
|
+
captured = capsys.readouterr()
|
|
49
|
+
assert "core.hooksPath" in captured.err
|
|
50
|
+
assert "npx claude-dev-env" in captured.err or "git config" in captured.err
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def test_should_exit_zero_when_core_hooks_path_points_to_claude_hooks(tmp_path: Path) -> None:
|
|
54
|
+
claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
|
|
55
|
+
claude_hooks_path.mkdir(parents=True)
|
|
56
|
+
with patch("subprocess.run") as mock_run:
|
|
57
|
+
mock_run.return_value = _make_completed_process(
|
|
58
|
+
str(claude_hooks_path) + "\n", returncode=0
|
|
59
|
+
)
|
|
60
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
61
|
+
assert exit_code == 0
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
def test_should_exit_nonzero_when_core_hooks_path_points_elsewhere(capsys: pytest.CaptureFixture[str]) -> None:
|
|
65
|
+
with patch("subprocess.run") as mock_run:
|
|
66
|
+
mock_run.return_value = _make_completed_process(
|
|
67
|
+
"/some/other/path/.husky\n", returncode=0
|
|
68
|
+
)
|
|
69
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
70
|
+
assert exit_code != 0
|
|
71
|
+
captured = capsys.readouterr()
|
|
72
|
+
assert "core.hooksPath" in captured.err
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
def test_should_include_correction_commands_in_error_message(capsys: pytest.CaptureFixture[str]) -> None:
|
|
76
|
+
with patch("subprocess.run") as mock_run:
|
|
77
|
+
mock_run.return_value = _make_completed_process("", returncode=1)
|
|
78
|
+
bugteam_preflight.verify_git_hooks_path()
|
|
79
|
+
captured = capsys.readouterr()
|
|
80
|
+
assert (
|
|
81
|
+
"npx claude-dev-env" in captured.err
|
|
82
|
+
or "git config --global core.hooksPath" in captured.err
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
|
|
86
|
+
def test_main_should_exit_nonzero_when_hooks_path_unset() -> None:
|
|
87
|
+
with patch("subprocess.run") as mock_run:
|
|
88
|
+
mock_run.return_value = _make_completed_process("", returncode=1)
|
|
89
|
+
exit_code = bugteam_preflight.main(["--no-pytest"])
|
|
90
|
+
assert exit_code != 0
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
def test_main_should_continue_when_hooks_path_valid(tmp_path: Path) -> None:
|
|
94
|
+
claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
|
|
95
|
+
claude_hooks_path.mkdir(parents=True)
|
|
96
|
+
with patch("subprocess.run") as mock_run:
|
|
97
|
+
mock_run.return_value = _make_completed_process(
|
|
98
|
+
str(claude_hooks_path) + "\n", returncode=0
|
|
99
|
+
)
|
|
100
|
+
exit_code = bugteam_preflight.main(["--no-pytest"])
|
|
101
|
+
assert exit_code == 0
|
|
102
|
+
|
|
103
|
+
|
|
104
|
+
def test_should_accept_hooks_path_with_trailing_slash() -> None:
|
|
105
|
+
with patch("subprocess.run") as mock_run:
|
|
106
|
+
mock_run.return_value = _make_completed_process(
|
|
107
|
+
"/home/user/.claude/hooks/git-hooks/\n", returncode=0
|
|
108
|
+
)
|
|
109
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
110
|
+
assert exit_code == 0, (
|
|
111
|
+
"hooksPath with trailing slash must pass verification after normalization"
|
|
112
|
+
)
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
def test_should_exit_zero_when_hooks_path_set_at_repo_scope(tmp_path: Path) -> None:
|
|
116
|
+
claude_hooks_path = tmp_path / ".claude" / "hooks" / "git-hooks"
|
|
117
|
+
claude_hooks_path.mkdir(parents=True)
|
|
118
|
+
repo_root = tmp_path / "my-repo"
|
|
119
|
+
repo_root.mkdir()
|
|
120
|
+
with patch("subprocess.run") as mock_run:
|
|
121
|
+
mock_run.return_value = _make_completed_process(
|
|
122
|
+
str(claude_hooks_path) + "\n", returncode=0
|
|
123
|
+
)
|
|
124
|
+
exit_code = bugteam_preflight.verify_git_hooks_path(repo_root)
|
|
125
|
+
assert exit_code == 0, (
|
|
126
|
+
"verify_git_hooks_path must accept a valid path returned by effective "
|
|
127
|
+
"config query (not restricted to --global scope)"
|
|
128
|
+
)
|
|
129
|
+
called_command = mock_run.call_args[0][0]
|
|
130
|
+
assert "--global" not in called_command, (
|
|
131
|
+
"verify_git_hooks_path must query effective config, not --global only"
|
|
132
|
+
)
|
|
133
|
+
assert "-C" in called_command, (
|
|
134
|
+
"verify_git_hooks_path must use git -C <repo_root> for repo-effective config"
|
|
135
|
+
)
|
|
136
|
+
dash_c_index = called_command.index("-C")
|
|
137
|
+
assert called_command[dash_c_index + 1] == str(repo_root), (
|
|
138
|
+
"git -C must receive the resolved repository root path"
|
|
139
|
+
)
|
|
140
|
+
|
|
141
|
+
|
|
142
|
+
def test_should_accept_hooks_path_with_backslash_and_trailing_slash() -> None:
|
|
143
|
+
with patch("subprocess.run") as mock_run:
|
|
144
|
+
mock_run.return_value = _make_completed_process(
|
|
145
|
+
"C:\\Users\\user\\.claude\\hooks\\git-hooks\\\n", returncode=0
|
|
146
|
+
)
|
|
147
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
148
|
+
assert exit_code == 0, (
|
|
149
|
+
"Windows hooksPath with trailing backslash must pass after normalization"
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
|
|
153
|
+
def test_should_exit_nonzero_when_git_executable_not_found(
|
|
154
|
+
capsys: pytest.CaptureFixture[str],
|
|
155
|
+
) -> None:
|
|
156
|
+
"""Preflight must not crash with a traceback when git is missing from PATH."""
|
|
157
|
+
with patch("subprocess.run", side_effect=FileNotFoundError()):
|
|
158
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
159
|
+
assert exit_code != 0, (
|
|
160
|
+
"FileNotFoundError from subprocess.run must produce a non-zero exit, "
|
|
161
|
+
"not a propagated traceback"
|
|
162
|
+
)
|
|
163
|
+
captured = capsys.readouterr()
|
|
164
|
+
assert "git" in captured.err.lower(), (
|
|
165
|
+
"Error message must mention git so the user knows what is missing"
|
|
166
|
+
)
|
|
167
|
+
assert (
|
|
168
|
+
"npx claude-dev-env" in captured.err
|
|
169
|
+
or "git config --global core.hooksPath" in captured.err
|
|
170
|
+
), "Error message must include the enforcement-absent remediation hints"
|
|
171
|
+
|
|
172
|
+
|
|
173
|
+
def test_should_exit_nonzero_when_subprocess_run_raises_os_error(
|
|
174
|
+
capsys: pytest.CaptureFixture[str],
|
|
175
|
+
) -> None:
|
|
176
|
+
"""Preflight must surface a clean error for other OS-level git launch failures."""
|
|
177
|
+
with patch("subprocess.run", side_effect=OSError("permission denied")):
|
|
178
|
+
exit_code = bugteam_preflight.verify_git_hooks_path()
|
|
179
|
+
assert exit_code != 0, (
|
|
180
|
+
"OSError from subprocess.run must produce a non-zero exit, "
|
|
181
|
+
"not a propagated traceback"
|
|
182
|
+
)
|
|
183
|
+
captured = capsys.readouterr()
|
|
184
|
+
assert "bugteam_preflight" in captured.err, (
|
|
185
|
+
"Error message must be prefixed with the preflight tool name for context"
|
|
186
|
+
)
|
|
187
|
+
assert "permission denied" in captured.err, (
|
|
188
|
+
"Error message must include the underlying OSError detail for diagnosis"
|
|
189
|
+
)
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: copilot-review
|
|
3
|
+
description: >-
|
|
4
|
+
Spawns a background subagent that babysits the GitHub Copilot reviewer on the
|
|
5
|
+
current PR. The subagent self-paces at ~5 minutes per tick, fetches the
|
|
6
|
+
latest copilot-pull-request-reviewer[bot] review, fixes unaddressed inline
|
|
7
|
+
findings against current HEAD (new commit, push, inline replies), and
|
|
8
|
+
re-requests review via the documented requested_reviewers API. The subagent
|
|
9
|
+
terminates on convergence (clean review against HEAD) and reports back.
|
|
10
|
+
Triggers: '/copilot-review', 'watch copilot', 'babysit copilot review',
|
|
11
|
+
'loop copilot reviews', 're-request copilot', 'keep re-requesting copilot'.
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Copilot Review
|
|
15
|
+
|
|
16
|
+
Delegates Copilot babysitting to a background subagent so the main session stays free. The subagent loops internally and closes itself on convergence.
|
|
17
|
+
|
|
18
|
+
## When this skill applies
|
|
19
|
+
|
|
20
|
+
The user is on a PR branch, wants Copilot (the GitHub Copilot reviewer bot) to keep re-reviewing after each push, and wants findings auto-addressed between ticks — but does not want the main conversation consumed by polling.
|
|
21
|
+
|
|
22
|
+
## The Process
|
|
23
|
+
|
|
24
|
+
### Step 1: Gather PR context
|
|
25
|
+
|
|
26
|
+
From the current repo:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
gh pr view --json number,url,headRefOid,baseRefName,headRefName,isDraft
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Capture `number`, `headRefOid`, owner/repo (from `url`), and branch name. Pass these to the subagent so it does not rediscover them.
|
|
33
|
+
|
|
34
|
+
### Step 2: Spawn the background subagent
|
|
35
|
+
|
|
36
|
+
Invoke the `Agent` tool with:
|
|
37
|
+
|
|
38
|
+
- `subagent_type: "general-purpose"`
|
|
39
|
+
- `run_in_background: true`
|
|
40
|
+
- `description: "Copilot review loop for PR #<N>"`
|
|
41
|
+
- `prompt`: the full instructions in **Step 3 (Subagent prompt template)**, with placeholders filled in from Step 1.
|
|
42
|
+
|
|
43
|
+
Record the returned agent ID. Report to the user in one or two lines:
|
|
44
|
+
|
|
45
|
+
- The subagent is running in the background.
|
|
46
|
+
- It self-terminates on convergence.
|
|
47
|
+
- To stop it early, the user says "stop the copilot loop" and you call `TaskStop <agent_id>`.
|
|
48
|
+
- The main session stays free; completion arrives as a notification.
|
|
49
|
+
|
|
50
|
+
Let the subagent own the cadence. The skill's job in the main session ends once the subagent is spawned and reported.
|
|
51
|
+
|
|
52
|
+
### Step 3: Subagent prompt template
|
|
53
|
+
|
|
54
|
+
Pass this verbatim to the subagent (substituting the bracketed values):
|
|
55
|
+
|
|
56
|
+
> You are babysitting the GitHub Copilot reviewer on PR **#[NUMBER]** at **[OWNER]/[REPO]** (branch `[BRANCH]`, current HEAD `[HEAD_SHA]`). Your job: keep the loop running until Copilot returns a clean review against the current HEAD, then stop.
|
|
57
|
+
>
|
|
58
|
+
> **Per-tick work** (do this now, then on each wakeup):
|
|
59
|
+
>
|
|
60
|
+
> 1. Resolve current HEAD: `gh api repos/[OWNER]/[REPO]/pulls/[NUMBER] --jq '.head.sha'`.
|
|
61
|
+
> 2. Fetch latest Copilot review:
|
|
62
|
+
> ```bash
|
|
63
|
+
> gh api repos/[OWNER]/[REPO]/pulls/[NUMBER]/reviews \
|
|
64
|
+
> --jq '[.[] | select(.user.login=="copilot-pull-request-reviewer[bot]")] | sort_by(.submitted_at) | last'
|
|
65
|
+
> ```
|
|
66
|
+
> Capture `commit_id`, `state`, `submitted_at`, `id`.
|
|
67
|
+
> 3. Decide the branch:
|
|
68
|
+
> - **No review exists:** re-request (step 4), schedule next wakeup, return.
|
|
69
|
+
> - **Latest review's `commit_id` != current HEAD:** re-request (step 4), schedule next wakeup, return.
|
|
70
|
+
> - **Latest review's `commit_id` == current HEAD with unresolved inline findings:** TDD-fix them, push, reply inline on each thread, re-request (step 4), schedule next wakeup, return.
|
|
71
|
+
> - **Latest review's `commit_id` == current HEAD and clean:** report convergence to the parent with a one-sentence summary and terminate. The loop is done; skip the ScheduleWakeup call.
|
|
72
|
+
> 4. Re-request Copilot. The reviewer ID **must** be `copilot-pull-request-reviewer[bot]` with the `[bot]` suffix — empirically verified: `Copilot`, `copilot`, and `github-copilot` all return `requested_reviewers: []` with no error, silently no-op.
|
|
73
|
+
> ```bash
|
|
74
|
+
> gh api -X POST repos/[OWNER]/[REPO]/pulls/[NUMBER]/requested_reviewers \
|
|
75
|
+
> -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
76
|
+
> ```
|
|
77
|
+
> 5. Schedule the next wakeup with `ScheduleWakeup`:
|
|
78
|
+
> - `delaySeconds: 300`
|
|
79
|
+
> - `reason`: one short sentence on what you are waiting for.
|
|
80
|
+
> - `prompt`: the literal sentinel `<<autonomous-loop-dynamic>>` so the next firing re-enters these instructions.
|
|
81
|
+
>
|
|
82
|
+
> **Fix protocol** (step 3, third branch):
|
|
83
|
+
>
|
|
84
|
+
> - Read each referenced file:line.
|
|
85
|
+
> - Write a failing test first when the finding has behavior to test. For pure doc or comment nits that have no behavior, go straight to the fix.
|
|
86
|
+
> - Implement the fix.
|
|
87
|
+
> - Stage the fix and create one new commit on the existing branch: `git add <files> && git commit -m "fix(review): ..."`.
|
|
88
|
+
> - Push the new commit: `git push origin [BRANCH]`.
|
|
89
|
+
> - Reply inline on each comment thread with `gh api -X POST repos/[OWNER]/[REPO]/pulls/[NUMBER]/comments` using `in_reply_to` set to the comment id, referencing the new commit SHA.
|
|
90
|
+
>
|
|
91
|
+
> When a pre-push, pre-commit, or other hook rejects the change, solve it. Read the hook's error message, diagnose the root cause in the code or test, and fix that. Then rerun the commit or push. Hooks exist to catch real problems; treat each rejection as new evidence to act on.
|
|
92
|
+
>
|
|
93
|
+
> **Stop conditions:**
|
|
94
|
+
>
|
|
95
|
+
> - Convergence (clean review against HEAD): report one-sentence summary to parent and terminate.
|
|
96
|
+
> - Blocker you have exhausted fix attempts on (API auth failure persists, CI regression whose root cause falls outside this PR, a hook you have investigated and cannot resolve in one commit): report the specific blocker and its diagnosis to the parent, then terminate without scheduling another wakeup.
|
|
97
|
+
> - Parent sends `TaskStop`: terminate immediately.
|
|
98
|
+
>
|
|
99
|
+
> **Safety cap:** after 20 ticks without convergence, stop and report. That many rounds means something structural is wrong with the loop.
|
|
100
|
+
|
|
101
|
+
### Step 4: Report back to the user
|
|
102
|
+
|
|
103
|
+
After spawning, tell the user in one or two lines: subagent ID, PR URL, that it will notify on convergence or blocker. Nothing else.
|
|
104
|
+
|
|
105
|
+
## Stopping the subagent
|
|
106
|
+
|
|
107
|
+
- Convergence → subagent stops itself.
|
|
108
|
+
- Blocker → subagent reports and stops.
|
|
109
|
+
- User says stop → `TaskStop <agent_id>`.
|
|
110
|
+
- User asks what loops are running → `TaskList`.
|
|
111
|
+
|
|
112
|
+
## Ground rules (for the subagent)
|
|
113
|
+
|
|
114
|
+
- **Append commits.** Each tick adds one new commit on the existing branch with `git commit` and `git push origin [BRANCH]`.
|
|
115
|
+
- **Honor pre-push and pre-commit hooks.** When a hook rejects the change, read its output, fix the underlying issue (the failing test, the missing constant, the broken import), and retry. Solve, do not punt.
|
|
116
|
+
- **Respect the PR's current state.** Whatever draft-vs-ready state the PR has when the loop starts is the state the subagent preserves. The user decides when to flip it.
|
|
117
|
+
- **One fix commit per tick.** Batch all of the current tick's findings into a single commit; the next tick handles the next review round.
|
|
118
|
+
- **Use `copilot-pull-request-reviewer[bot]` with the `[bot]` suffix for the reviewer ID.** That exact spelling is load-bearing — it is the only form the API accepts.
|
|
119
|
+
|
|
120
|
+
## Examples
|
|
121
|
+
|
|
122
|
+
<example>
|
|
123
|
+
User: `/copilot-review`
|
|
124
|
+
Claude: [reads PR context, spawns background subagent with the Step 3 template, reports "subagent X watching PR #123; will notify on convergence"]
|
|
125
|
+
</example>
|
|
126
|
+
|
|
127
|
+
<example>
|
|
128
|
+
User: "babysit copilot on this PR until it's clean"
|
|
129
|
+
Claude: [same as above]
|
|
130
|
+
</example>
|
|
131
|
+
|
|
132
|
+
<example>
|
|
133
|
+
Subagent tick fires, latest Copilot review is against an older commit.
|
|
134
|
+
Subagent: [re-requests review, schedules next wakeup, returns]
|
|
135
|
+
</example>
|
|
136
|
+
|
|
137
|
+
<example>
|
|
138
|
+
Subagent tick fires, Copilot has 2 unaddressed inline findings on HEAD.
|
|
139
|
+
Subagent: [TDD-fixes both, one commit, pushes, replies inline on both threads, re-requests review, schedules next wakeup]
|
|
140
|
+
</example>
|
|
141
|
+
|
|
142
|
+
<example>
|
|
143
|
+
Subagent tick fires, latest review is clean against HEAD.
|
|
144
|
+
Subagent: [reports convergence to parent, terminates — no further wakeups]
|
|
145
|
+
</example>
|
package/skills/findbugs/SKILL.md
CHANGED
|
@@ -47,12 +47,12 @@ The audit's authoritative scope is this single diff file. Do not inject extra fi
|
|
|
47
47
|
|
|
48
48
|
### Step 3: Spawn the code-quality-agent — clean room
|
|
49
49
|
|
|
50
|
-
Call the Agent tool
|
|
50
|
+
Call the Agent tool twice in a single message (primary + Haiku secondary per the audit contract's Haiku secondary section):
|
|
51
51
|
|
|
52
|
-
- `subagent_type: code-quality-agent`
|
|
53
|
-
- `model:
|
|
54
|
-
|
|
55
|
-
|
|
52
|
+
- Primary: `subagent_type: code-quality-agent`, `model: sonnet`, `description: "PR bug audit"`, `run_in_background: false`
|
|
53
|
+
- Secondary: `subagent_type: code-quality-agent`, `model: haiku`, `description: "PR bug audit (secondary)"`, `run_in_background: false`
|
|
54
|
+
|
|
55
|
+
After both return, merge per the contract's Haiku secondary section (de-dup key, max-wins severity, malformed-output fallback) before reporting to the user.
|
|
56
56
|
|
|
57
57
|
**The agent prompt must be self-contained and context-free.** Specifically:
|
|
58
58
|
|
|
@@ -98,25 +98,17 @@ The XML prompt skeleton:
|
|
|
98
98
|
</constraints>
|
|
99
99
|
|
|
100
100
|
<output_format>
|
|
101
|
-
|
|
102
|
-
P1 = regression or silent failure
|
|
103
|
-
P2 = dead code, minor smell
|
|
104
|
-
|
|
105
|
-
## Summary
|
|
106
|
-
Total: N (P0=N, P1=N, P2=N)
|
|
107
|
-
|
|
108
|
-
## Findings
|
|
109
|
-
### [P_] short title
|
|
110
|
-
File: file/path:line
|
|
111
|
-
Category: A-J
|
|
112
|
-
Issue: 2-3 sentence description with concrete trace
|
|
113
|
-
Evidence: code excerpt or grep result
|
|
101
|
+
Follow the shared audit contract at ../bugteam/reference/audit-contract.md:
|
|
114
102
|
|
|
115
|
-
|
|
116
|
-
|
|
103
|
+
- Severity: P0 = will not run / data corruption; P1 = regression or silent
|
|
104
|
+
failure; P2 = dead code, minor smell.
|
|
105
|
+
- Per category, produce either Shape A (structured finding) or Shape B
|
|
106
|
+
(proof-of-absence). Bare "verified clean" labels are REJECTED.
|
|
107
|
+
- Run the contract's adversarial second pass after the primary list.
|
|
117
108
|
|
|
118
|
-
|
|
119
|
-
|
|
109
|
+
Preamble: `Total: N (P0=N, P1=N, P2=N)`. Emit findings and proof-of-absence
|
|
110
|
+
entries in the JSON shapes defined by the contract. Include an "Open
|
|
111
|
+
questions" section for items the diff alone cannot resolve.
|
|
120
112
|
</output_format>
|
|
121
113
|
```
|
|
122
114
|
|