claude-dev-env 1.62.1 → 1.63.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/code-advisor.md +22 -0
- package/agents/code-verifier.md +42 -0
- package/bin/install.mjs +1 -1
- package/hooks/blocking/config/verified_commit_constants.py +16 -0
- package/hooks/blocking/test_verification_verdict_store.py +232 -0
- package/hooks/blocking/test_verified_commit_gate.py +43 -0
- package/hooks/blocking/test_verifier_verdict_minter.py +139 -0
- package/hooks/blocking/verification_verdict_store.py +165 -10
- package/hooks/blocking/verified_commit_gate.py +8 -2
- package/hooks/blocking/verifier_verdict_minter.py +59 -9
- package/package.json +1 -1
- package/skills/autoconverge/SKILL.md +26 -1
- package/skills/autoconverge/workflow/converge.contract.test.mjs +82 -18
- package/skills/autoconverge/workflow/converge.mjs +46 -18
- package/skills/verified-build/SKILL.md +38 -0
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-advisor
|
|
3
|
+
description: Mid-run advisor for executor agents. A coder that hits a decision it can't reasonably solve consults this agent with its task, what it tried, and the exact blocker. Returns a plan, a correction, or a stop signal — guidance only. Has zero tools by design; it never runs commands, never edits files, never produces user-facing output.
|
|
4
|
+
tools: []
|
|
5
|
+
model: inherit
|
|
6
|
+
color: purple
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
You are the advisor in an executor/advisor pair (Anthropic's advisor strategy: https://claude.com/blog/the-advisor-strategy). An executor agent — a coder partway through a task — consults you when it hits a decision it can't reasonably solve. You have no tools; everything you know arrives in the consultation message: the task, what the executor tried, the exact blocker, and any code excerpts it chose to include.
|
|
10
|
+
|
|
11
|
+
Reply with exactly one of three signals, named on the first line:
|
|
12
|
+
|
|
13
|
+
- **PLAN** — the blocker needs a different approach. Give concrete ordered steps the executor can run with its own tools. Name files, commands, and decision points; never hand back vague direction.
|
|
14
|
+
- **CORRECTION** — the executor's approach is right but one thing is wrong. Name the wrong assumption or step and the precise fix.
|
|
15
|
+
- **STOP** — no path satisfies the task as assigned (contradictory constraints, missing access, a rule that forbids every way through). Say why in one or two sentences so the executor can report it upward.
|
|
16
|
+
|
|
17
|
+
Rules:
|
|
18
|
+
|
|
19
|
+
- Guidance only. You never call tools, never write code blocks longer than a focused excerpt, and your reply goes to the executor, not the user.
|
|
20
|
+
- Reason from what the executor sent. When the consultation lacks the facts a sound answer needs, your PLAN's first step is the exact lookup the executor should run, then what to do with each likely answer.
|
|
21
|
+
- Keep replies short. The executor pays for every token of your answer twice — reading it and acting on it.
|
|
22
|
+
- Never invent repository facts. Tie every claim to something in the consultation or label it for the executor to check.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-verifier
|
|
3
|
+
description: Post-hoc verification agent for the two-phase code workflow. Spawned by the main session after coder agents finish. Runs every check itself in a fresh context — named gates, tests against recorded baselines, two-way diff-vs-task reading — and ends with a fenced verdict block the verifier_verdict_minter hook turns into the commit-gate verdict. Read and execute only; it never edits files.
|
|
4
|
+
tools: Read, Grep, Glob, Bash
|
|
5
|
+
model: inherit
|
|
6
|
+
color: orange
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
You are the verifier in a two-phase code workflow: coder agents wrote changes, and you grade the result on its own terms (Claude Code best practices, fresh-context review: https://code.claude.com/docs/en/best-practices). The agent doing the work is never the one grading it — that is you, so you trust nothing you did not run or read yourself this session.
|
|
10
|
+
|
|
11
|
+
The caller gives you task texts, the diff scope, and baselines recorded before the coders ran. Treat every claim in the caller's message — and any coder summary quoted in it — as a hypothesis to test, never as a fact.
|
|
12
|
+
|
|
13
|
+
Run all three layers, in this order:
|
|
14
|
+
|
|
15
|
+
1. **Runnable gates.** Every check the task names (its verification section), plus the universal set whether or not the caller asked: compile/syntax checks on changed files, the recorded-baseline tests scoped to the changed modules — the test files the task names plus tests that import a changed module (the failure set must match the recorded baseline exactly — no new failures, none silently fixed without explanation), imports of changed modules, and any repo commit gate. Run the full recorded suite only when the caller recorded a full-suite baseline because the surface spans multiple modules or multiple coders. Run each command yourself and keep its output.
|
|
16
|
+
2. **Two-way diff-vs-task reading.** Read each coder's diff against that coder's task text. Every task item maps to a hunk that does it; every hunk maps back to a task item — a hunk with no task item is out-of-scope change, a task item with no hunk is missing work.
|
|
17
|
+
3. **Negative space.** Walk the task's item list asking "where is this one?": silent deferrals, stubs, TODO markers, the smaller half of a task shipped, a sync change without its async twin.
|
|
18
|
+
|
|
19
|
+
Findings discipline:
|
|
20
|
+
|
|
21
|
+
- A finding must cite a failing command (with its output) or a named task item. No citation, no finding.
|
|
22
|
+
- Report gaps that affect correctness or the task's stated terms — never style preferences. Sound work produces zero findings; do not invent gaps to look thorough.
|
|
23
|
+
- Never edit a file. You verify; repair agents repair.
|
|
24
|
+
- Never execute code that drives the user's real input or screen — no live mouse moves, keystrokes, clicks, or window focus (pyautogui and its callers included). Run only the test commands the task names, scoped to the test files it names; no repo-wide test sweeps. Judge behavior equivalence by reading both versions, never by live execution of input-driving paths.
|
|
25
|
+
|
|
26
|
+
Before you write the verdict, learn the surface hash of the work tree you verified. Use the branch mode — it resolves the work tree that holds the branch automatically, so it is immune to your own cwd:
|
|
27
|
+
|
|
28
|
+
python ~/.claude/hooks/blocking/verification_verdict_store.py --manifest-hash-for-branch <branch under review>
|
|
29
|
+
|
|
30
|
+
On Windows the same file sits at %USERPROFILE%\.claude\hooks\blocking\verification_verdict_store.py; invoke it with the python on your PATH. If the caller named an explicit work-tree path rather than a branch, use the explicit-directory mode instead:
|
|
31
|
+
|
|
32
|
+
python ~/.claude/hooks/blocking/verification_verdict_store.py --manifest-hash <explicit-work-tree-dir>
|
|
33
|
+
|
|
34
|
+
The printed hash commits to every changed and untracked file's content in the verified work tree, so it names that surface no matter which directory you or the committer run from. If the CLI prints an empty-surface or wrong-work-tree error and no hash, you are pointed at a work tree with no changes versus origin/main — re-run with the branch mode to locate the correct work tree.
|
|
35
|
+
|
|
36
|
+
End your final message with exactly one fenced verdict block — the verifier_verdict_minter hook parses it, binds it to that hash, and the verified_commit_gate hook unlocks `git commit`/`git push` for any work tree whose live surface matches it:
|
|
37
|
+
|
|
38
|
+
```verdict
|
|
39
|
+
{"all_pass": false, "findings": [{"check": "<gate or task item>", "detail": "<command + output, or the named task item and what is missing>"}], "manifest_sha256": "<hash the CLI printed>"}
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Set `all_pass` to true with an empty `findings` list only when every layer came back clean. Always include `manifest_sha256` so the verdict clears the commit regardless of which work tree the verifier or the committer ran in. Any file change after you finish moves that hash and invalidates the verdict, so you are the last step before the commit.
|
package/bin/install.mjs
CHANGED
|
@@ -149,7 +149,7 @@ const INSTALL_GROUPS = {
|
|
|
149
149
|
skills: [
|
|
150
150
|
'anthropic-plan', 'everything-search',
|
|
151
151
|
'pr-review-responder',
|
|
152
|
-
'recall', 'remember', 'task-build'
|
|
152
|
+
'recall', 'remember', 'task-build', 'verified-build'
|
|
153
153
|
],
|
|
154
154
|
includeDirectories: ['rules', 'docs', 'commands', 'agents', 'audit-rubrics'],
|
|
155
155
|
includeAllHooks: true,
|
|
@@ -48,6 +48,7 @@ WRITE_CALL_REGION_PATTERN = (
|
|
|
48
48
|
VERDICT_KEY_ALL_PASS = "all_pass"
|
|
49
49
|
VERDICT_KEY_MANIFEST_SHA256 = "manifest_sha256"
|
|
50
50
|
VERDICT_KEY_FINDINGS = "findings"
|
|
51
|
+
VERDICT_FILE_GLOB = "*.json"
|
|
51
52
|
SUBAGENTS_DIRECTORY_NAME = "subagents"
|
|
52
53
|
AGENT_TRANSCRIPT_GLOB = "agent-*.jsonl"
|
|
53
54
|
AGENT_META_SIDECAR_SUFFIX = ".meta.json"
|
|
@@ -61,6 +62,21 @@ TRANSCRIPT_TEXT_CONTENT_TYPE = "text"
|
|
|
61
62
|
TRANSCRIPT_TEXT_KEY = "text"
|
|
62
63
|
VERDICT_FENCE_PATTERN = r"```verdict\s*\n(.*?)```"
|
|
63
64
|
MANIFEST_HASH_CLI_FLAG = "--manifest-hash"
|
|
65
|
+
MANIFEST_HASH_FOR_BRANCH_CLI_FLAG = "--manifest-hash-for-branch"
|
|
66
|
+
WORKTREE_LIST_PATH_PREFIX = "worktree "
|
|
67
|
+
WORKTREE_LIST_BRANCH_PREFIX = "branch "
|
|
68
|
+
BRANCH_REFERENCE_PREFIX = "refs/heads/"
|
|
69
|
+
EMPTY_SURFACE_GUARD_MESSAGE = (
|
|
70
|
+
"ERROR: The work tree at {repo_root} has no changed or untracked files "
|
|
71
|
+
"versus origin/main (empty change surface). This work tree holds nothing "
|
|
72
|
+
"to verify — you are pointed at the wrong work tree. Run `git worktree list` "
|
|
73
|
+
"to see all checked-out work trees and target the one on the branch under review."
|
|
74
|
+
)
|
|
75
|
+
BRANCH_WORKTREE_ABSENT_MESSAGE = (
|
|
76
|
+
"ERROR: No work tree has branch '{branch}' checked out. "
|
|
77
|
+
"Check it out in a work tree first: "
|
|
78
|
+
"`git worktree add <path> <branch>` or `git checkout <branch>`."
|
|
79
|
+
)
|
|
64
80
|
DOCS_ONLY_EXTENSIONS = frozenset(
|
|
65
81
|
{".md", ".txt", ".rst", ".png", ".jpg", ".jpeg", ".gif", ".svg", ".webp", ".ico"}
|
|
66
82
|
)
|
|
@@ -11,6 +11,8 @@ import pathlib
|
|
|
11
11
|
import subprocess
|
|
12
12
|
import sys
|
|
13
13
|
|
|
14
|
+
import pytest
|
|
15
|
+
|
|
14
16
|
_HOOK_DIR = pathlib.Path(__file__).parent
|
|
15
17
|
if str(_HOOK_DIR) not in sys.path:
|
|
16
18
|
sys.path.insert(0, str(_HOOK_DIR))
|
|
@@ -28,6 +30,10 @@ resolve_merge_base = store_module.resolve_merge_base
|
|
|
28
30
|
branch_surface_manifest = store_module.branch_surface_manifest
|
|
29
31
|
manifest_sha256 = store_module.manifest_sha256
|
|
30
32
|
workflow_verdict_covers_surface = store_module.workflow_verdict_covers_surface
|
|
33
|
+
minted_verdict_covers_surface = store_module.minted_verdict_covers_surface
|
|
34
|
+
write_verdict = store_module.write_verdict
|
|
35
|
+
worktree_path_for_branch = store_module.worktree_path_for_branch
|
|
36
|
+
empty_surface_hash = store_module.empty_surface_hash
|
|
31
37
|
|
|
32
38
|
constants_spec = importlib.util.spec_from_file_location(
|
|
33
39
|
"verified_commit_constants",
|
|
@@ -38,6 +44,8 @@ assert constants_spec.loader is not None
|
|
|
38
44
|
constants_module = importlib.util.module_from_spec(constants_spec)
|
|
39
45
|
constants_spec.loader.exec_module(constants_module)
|
|
40
46
|
CORRECTIVE_MESSAGE = constants_module.CORRECTIVE_MESSAGE
|
|
47
|
+
EMPTY_SURFACE_GUARD_MESSAGE = constants_module.EMPTY_SURFACE_GUARD_MESSAGE
|
|
48
|
+
BRANCH_WORKTREE_ABSENT_MESSAGE = constants_module.BRANCH_WORKTREE_ABSENT_MESSAGE
|
|
41
49
|
|
|
42
50
|
PRODUCTION_SOURCE = "def add(left: int, right: int) -> int:\n return left + right\n"
|
|
43
51
|
TEST_SOURCE = "def test_add() -> None:\n assert 1 + 1 == 2\n"
|
|
@@ -488,3 +496,227 @@ def test_manifest_hash_cli_prints_live_surface_hash(tmp_path: pathlib.Path) -> N
|
|
|
488
496
|
text=True,
|
|
489
497
|
)
|
|
490
498
|
assert completed_process.stdout.strip() == expected_hash
|
|
499
|
+
|
|
500
|
+
|
|
501
|
+
def _isolate_home(monkeypatch: pytest.MonkeyPatch, fake_home: pathlib.Path) -> None:
|
|
502
|
+
home_text = str(fake_home)
|
|
503
|
+
monkeypatch.setenv("HOME", home_text)
|
|
504
|
+
monkeypatch.setenv("USERPROFILE", home_text)
|
|
505
|
+
monkeypatch.delenv("HOMEDRIVE", raising=False)
|
|
506
|
+
monkeypatch.delenv("HOMEPATH", raising=False)
|
|
507
|
+
|
|
508
|
+
|
|
509
|
+
def test_minted_verdict_covers_surface_matches_other_worktree_by_hash(
|
|
510
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
511
|
+
) -> None:
|
|
512
|
+
fake_home = tmp_path / "home"
|
|
513
|
+
fake_home.mkdir()
|
|
514
|
+
_isolate_home(monkeypatch, fake_home)
|
|
515
|
+
write_verdict(
|
|
516
|
+
str(tmp_path / "other" / "worktree"),
|
|
517
|
+
MATCHING_MANIFEST_SHA256,
|
|
518
|
+
True,
|
|
519
|
+
[],
|
|
520
|
+
"agent-x",
|
|
521
|
+
)
|
|
522
|
+
assert minted_verdict_covers_surface(MATCHING_MANIFEST_SHA256) is True
|
|
523
|
+
|
|
524
|
+
|
|
525
|
+
def test_minted_verdict_covers_surface_false_for_other_hash(
|
|
526
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
527
|
+
) -> None:
|
|
528
|
+
fake_home = tmp_path / "home"
|
|
529
|
+
fake_home.mkdir()
|
|
530
|
+
_isolate_home(monkeypatch, fake_home)
|
|
531
|
+
write_verdict(
|
|
532
|
+
str(tmp_path / "other" / "worktree"),
|
|
533
|
+
OTHER_MANIFEST_SHA256,
|
|
534
|
+
True,
|
|
535
|
+
[],
|
|
536
|
+
"agent-x",
|
|
537
|
+
)
|
|
538
|
+
assert minted_verdict_covers_surface(MATCHING_MANIFEST_SHA256) is False
|
|
539
|
+
|
|
540
|
+
|
|
541
|
+
def test_minted_verdict_covers_surface_false_for_failing_verdict(
|
|
542
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
543
|
+
) -> None:
|
|
544
|
+
fake_home = tmp_path / "home"
|
|
545
|
+
fake_home.mkdir()
|
|
546
|
+
_isolate_home(monkeypatch, fake_home)
|
|
547
|
+
write_verdict(
|
|
548
|
+
str(tmp_path / "other" / "worktree"),
|
|
549
|
+
MATCHING_MANIFEST_SHA256,
|
|
550
|
+
False,
|
|
551
|
+
[{"severity": "P0", "summary": "boom"}],
|
|
552
|
+
"agent-x",
|
|
553
|
+
)
|
|
554
|
+
assert minted_verdict_covers_surface(MATCHING_MANIFEST_SHA256) is False
|
|
555
|
+
|
|
556
|
+
|
|
557
|
+
def test_minted_verdict_covers_surface_false_when_directory_absent(
|
|
558
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
559
|
+
) -> None:
|
|
560
|
+
fake_home = tmp_path / "home"
|
|
561
|
+
fake_home.mkdir()
|
|
562
|
+
_isolate_home(monkeypatch, fake_home)
|
|
563
|
+
assert minted_verdict_covers_surface(MATCHING_MANIFEST_SHA256) is False
|
|
564
|
+
|
|
565
|
+
|
|
566
|
+
def _make_repo_with_branch_worktree(
|
|
567
|
+
tmp_path: pathlib.Path, branch_name: str
|
|
568
|
+
) -> tuple[pathlib.Path, pathlib.Path]:
|
|
569
|
+
"""Create a repo with a branch checked out in a separate worktree.
|
|
570
|
+
|
|
571
|
+
Returns:
|
|
572
|
+
A tuple of (main worktree path, branch worktree path).
|
|
573
|
+
"""
|
|
574
|
+
empty_hooks_dir = tmp_path / "nohooks"
|
|
575
|
+
empty_hooks_dir.mkdir()
|
|
576
|
+
|
|
577
|
+
main_dir = tmp_path / "main"
|
|
578
|
+
main_dir.mkdir()
|
|
579
|
+
_run_git(main_dir, "init", "--initial-branch=main")
|
|
580
|
+
_run_git(main_dir, "config", "user.email", "tests@example.com")
|
|
581
|
+
_run_git(main_dir, "config", "user.name", "Worktree Tests")
|
|
582
|
+
_run_git(main_dir, "config", "core.hooksPath", str(empty_hooks_dir))
|
|
583
|
+
(main_dir / "app.py").write_text(PRODUCTION_SOURCE, encoding="utf-8")
|
|
584
|
+
_run_git(main_dir, "add", "-A")
|
|
585
|
+
_run_git(main_dir, "commit", "-m", "base")
|
|
586
|
+
|
|
587
|
+
origin_dir = tmp_path / "origin.git"
|
|
588
|
+
subprocess.run(
|
|
589
|
+
["git", "init", "--bare", "--initial-branch=main", str(origin_dir)],
|
|
590
|
+
check=True,
|
|
591
|
+
capture_output=True,
|
|
592
|
+
text=True,
|
|
593
|
+
)
|
|
594
|
+
_run_git(main_dir, "remote", "add", "origin", str(origin_dir))
|
|
595
|
+
_run_git(main_dir, "push", "-u", "origin", "main")
|
|
596
|
+
|
|
597
|
+
_run_git(main_dir, "branch", branch_name)
|
|
598
|
+
|
|
599
|
+
branch_worktree_dir = tmp_path / "branch-worktree"
|
|
600
|
+
_run_git(main_dir, "worktree", "add", str(branch_worktree_dir), branch_name)
|
|
601
|
+
|
|
602
|
+
return main_dir, branch_worktree_dir
|
|
603
|
+
|
|
604
|
+
|
|
605
|
+
def test_worktree_path_for_branch_returns_path_when_branch_present(
|
|
606
|
+
tmp_path: pathlib.Path,
|
|
607
|
+
) -> None:
|
|
608
|
+
_main_dir, branch_worktree_dir = _make_repo_with_branch_worktree(
|
|
609
|
+
tmp_path, "feature-x"
|
|
610
|
+
)
|
|
611
|
+
resolved_path = worktree_path_for_branch(str(branch_worktree_dir), "feature-x")
|
|
612
|
+
assert resolved_path is not None
|
|
613
|
+
assert pathlib.Path(resolved_path).resolve() == branch_worktree_dir.resolve()
|
|
614
|
+
|
|
615
|
+
|
|
616
|
+
def test_worktree_path_for_branch_returns_none_when_branch_absent(
|
|
617
|
+
tmp_path: pathlib.Path,
|
|
618
|
+
) -> None:
|
|
619
|
+
main_dir, _branch_worktree_dir = _make_repo_with_branch_worktree(
|
|
620
|
+
tmp_path, "feature-x"
|
|
621
|
+
)
|
|
622
|
+
resolved_path = worktree_path_for_branch(str(main_dir), "branch-never-checked-out")
|
|
623
|
+
assert resolved_path is None
|
|
624
|
+
|
|
625
|
+
|
|
626
|
+
def test_empty_surface_hash_equals_hash_of_empty_string() -> None:
|
|
627
|
+
assert empty_surface_hash() == manifest_sha256("")
|
|
628
|
+
|
|
629
|
+
|
|
630
|
+
def test_manifest_hash_cli_empty_surface_writes_guard_message_to_stderr(
|
|
631
|
+
tmp_path: pathlib.Path,
|
|
632
|
+
) -> None:
|
|
633
|
+
work_dir = _make_repo_with_origin(tmp_path)
|
|
634
|
+
completed_process = subprocess.run(
|
|
635
|
+
[
|
|
636
|
+
sys.executable,
|
|
637
|
+
str(_HOOK_DIR / "verification_verdict_store.py"),
|
|
638
|
+
"--manifest-hash",
|
|
639
|
+
str(work_dir),
|
|
640
|
+
],
|
|
641
|
+
capture_output=True,
|
|
642
|
+
text=True,
|
|
643
|
+
)
|
|
644
|
+
assert completed_process.returncode != 0
|
|
645
|
+
assert completed_process.stdout.strip() == ""
|
|
646
|
+
lowered_stderr = completed_process.stderr.lower()
|
|
647
|
+
assert "wrong work tree" in lowered_stderr or "empty" in lowered_stderr
|
|
648
|
+
|
|
649
|
+
|
|
650
|
+
def test_manifest_hash_cli_empty_surface_prints_nothing_on_stdout(
|
|
651
|
+
tmp_path: pathlib.Path,
|
|
652
|
+
) -> None:
|
|
653
|
+
work_dir = _make_repo_with_origin(tmp_path)
|
|
654
|
+
completed_process = subprocess.run(
|
|
655
|
+
[
|
|
656
|
+
sys.executable,
|
|
657
|
+
str(_HOOK_DIR / "verification_verdict_store.py"),
|
|
658
|
+
"--manifest-hash",
|
|
659
|
+
str(work_dir),
|
|
660
|
+
],
|
|
661
|
+
capture_output=True,
|
|
662
|
+
text=True,
|
|
663
|
+
)
|
|
664
|
+
assert completed_process.stdout.strip() == ""
|
|
665
|
+
|
|
666
|
+
|
|
667
|
+
def test_manifest_hash_for_branch_cli_prints_same_hash_as_explicit_dir(
|
|
668
|
+
tmp_path: pathlib.Path,
|
|
669
|
+
) -> None:
|
|
670
|
+
_main_dir, branch_worktree_dir = _make_repo_with_branch_worktree(
|
|
671
|
+
tmp_path, "feature-branch"
|
|
672
|
+
)
|
|
673
|
+
(branch_worktree_dir / "app.py").write_text(
|
|
674
|
+
"def add(left: int, right: int) -> int:\n return left - right\n",
|
|
675
|
+
encoding="utf-8",
|
|
676
|
+
)
|
|
677
|
+
direct_process = subprocess.run(
|
|
678
|
+
[
|
|
679
|
+
sys.executable,
|
|
680
|
+
str(_HOOK_DIR / "verification_verdict_store.py"),
|
|
681
|
+
"--manifest-hash",
|
|
682
|
+
str(branch_worktree_dir),
|
|
683
|
+
],
|
|
684
|
+
capture_output=True,
|
|
685
|
+
text=True,
|
|
686
|
+
check=True,
|
|
687
|
+
)
|
|
688
|
+
branch_process = subprocess.run(
|
|
689
|
+
[
|
|
690
|
+
sys.executable,
|
|
691
|
+
str(_HOOK_DIR / "verification_verdict_store.py"),
|
|
692
|
+
"--manifest-hash-for-branch",
|
|
693
|
+
"feature-branch",
|
|
694
|
+
],
|
|
695
|
+
capture_output=True,
|
|
696
|
+
text=True,
|
|
697
|
+
check=True,
|
|
698
|
+
cwd=str(branch_worktree_dir),
|
|
699
|
+
)
|
|
700
|
+
assert direct_process.stdout.strip() == branch_process.stdout.strip()
|
|
701
|
+
assert direct_process.stdout.strip() != ""
|
|
702
|
+
|
|
703
|
+
|
|
704
|
+
def test_manifest_hash_for_branch_cli_returns_nonzero_when_branch_absent(
|
|
705
|
+
tmp_path: pathlib.Path,
|
|
706
|
+
) -> None:
|
|
707
|
+
main_dir, _branch_worktree_dir = _make_repo_with_branch_worktree(
|
|
708
|
+
tmp_path, "feature-branch"
|
|
709
|
+
)
|
|
710
|
+
completed_process = subprocess.run(
|
|
711
|
+
[
|
|
712
|
+
sys.executable,
|
|
713
|
+
str(_HOOK_DIR / "verification_verdict_store.py"),
|
|
714
|
+
"--manifest-hash-for-branch",
|
|
715
|
+
"branch-never-checked-out",
|
|
716
|
+
],
|
|
717
|
+
capture_output=True,
|
|
718
|
+
text=True,
|
|
719
|
+
cwd=str(main_dir),
|
|
720
|
+
)
|
|
721
|
+
assert completed_process.returncode != 0
|
|
722
|
+
assert completed_process.stdout.strip() == ""
|
|
@@ -525,3 +525,46 @@ def test_verification_bypass_marker_allows_an_otherwise_gated_commit(
|
|
|
525
525
|
assert "VERIFIED_COMMIT_GATE" in capsys.readouterr().out
|
|
526
526
|
_run_gate_main(monkeypatch, "git commit -m x # verify-skip", work_dir)
|
|
527
527
|
assert capsys.readouterr().out == ""
|
|
528
|
+
|
|
529
|
+
|
|
530
|
+
def test_minted_verdict_from_other_worktree_allows_commit_by_hash(
|
|
531
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
532
|
+
) -> None:
|
|
533
|
+
fake_home = tmp_path / "home"
|
|
534
|
+
fake_home.mkdir()
|
|
535
|
+
_isolate_home(monkeypatch, fake_home)
|
|
536
|
+
work_dir = _make_gated_repo(tmp_path)
|
|
537
|
+
live_surface_hash = _live_surface_hash(work_dir)
|
|
538
|
+
store_module.write_verdict(
|
|
539
|
+
str(tmp_path / "sibling" / "worktree"),
|
|
540
|
+
live_surface_hash,
|
|
541
|
+
True,
|
|
542
|
+
[],
|
|
543
|
+
"agent-x",
|
|
544
|
+
)
|
|
545
|
+
transcript_path = tmp_path / "projects" / "demo" / "sess1.jsonl"
|
|
546
|
+
transcript_path.parent.mkdir(parents=True)
|
|
547
|
+
transcript_path.write_text("", encoding="utf-8")
|
|
548
|
+
assert deny_reason_for_directory(str(work_dir), str(transcript_path)) is None
|
|
549
|
+
|
|
550
|
+
|
|
551
|
+
def test_minted_verdict_from_other_worktree_with_wrong_hash_denies(
|
|
552
|
+
monkeypatch: pytest.MonkeyPatch, tmp_path: pathlib.Path
|
|
553
|
+
) -> None:
|
|
554
|
+
fake_home = tmp_path / "home"
|
|
555
|
+
fake_home.mkdir()
|
|
556
|
+
_isolate_home(monkeypatch, fake_home)
|
|
557
|
+
work_dir = _make_gated_repo(tmp_path)
|
|
558
|
+
store_module.write_verdict(
|
|
559
|
+
str(tmp_path / "sibling" / "worktree"),
|
|
560
|
+
"d" * 64,
|
|
561
|
+
True,
|
|
562
|
+
[],
|
|
563
|
+
"agent-x",
|
|
564
|
+
)
|
|
565
|
+
transcript_path = tmp_path / "projects" / "demo" / "sess1.jsonl"
|
|
566
|
+
transcript_path.parent.mkdir(parents=True)
|
|
567
|
+
transcript_path.write_text("", encoding="utf-8")
|
|
568
|
+
deny_reason = deny_reason_for_directory(str(work_dir), str(transcript_path))
|
|
569
|
+
assert deny_reason is not None
|
|
570
|
+
assert "VERIFIED_COMMIT_GATE" in deny_reason
|
|
@@ -37,6 +37,16 @@ minter_spec.loader.exec_module(minter_module)
|
|
|
37
37
|
mint_for_payload = minter_module.mint_for_payload
|
|
38
38
|
resolved_subagent_type = minter_module.resolved_subagent_type
|
|
39
39
|
|
|
40
|
+
store_spec = importlib.util.spec_from_file_location(
|
|
41
|
+
"verification_verdict_store",
|
|
42
|
+
_HOOK_DIR / "verification_verdict_store.py",
|
|
43
|
+
)
|
|
44
|
+
assert store_spec is not None
|
|
45
|
+
assert store_spec.loader is not None
|
|
46
|
+
store_module = importlib.util.module_from_spec(store_spec)
|
|
47
|
+
store_spec.loader.exec_module(store_module)
|
|
48
|
+
empty_surface_hash = store_module.empty_surface_hash
|
|
49
|
+
|
|
40
50
|
constants_spec = importlib.util.spec_from_file_location(
|
|
41
51
|
"verified_commit_constants",
|
|
42
52
|
_HOOK_DIR / "config" / "verified_commit_constants.py",
|
|
@@ -191,3 +201,132 @@ def test_settings_deny_verdict_directory_write() -> None:
|
|
|
191
201
|
|
|
192
202
|
def test_settings_deny_verdict_directory_edit() -> None:
|
|
193
203
|
assert "Edit($HOME/.claude/verification/**)" in _deny_rules()
|
|
204
|
+
|
|
205
|
+
|
|
206
|
+
def test_minter_refuses_when_attested_hash_equals_empty_surface_hash(
|
|
207
|
+
tmp_path: pathlib.Path,
|
|
208
|
+
) -> None:
|
|
209
|
+
repo_root = tmp_path / "repo"
|
|
210
|
+
repo_root.mkdir()
|
|
211
|
+
_init_repo_with_upstream_and_edit(repo_root)
|
|
212
|
+
attested_empty = empty_surface_hash()
|
|
213
|
+
verdict_fence = json.dumps(
|
|
214
|
+
{"all_pass": True, "findings": [], "manifest_sha256": attested_empty}
|
|
215
|
+
)
|
|
216
|
+
agent_transcript = tmp_path / "agent-7.jsonl"
|
|
217
|
+
agent_transcript.write_text(
|
|
218
|
+
json.dumps(
|
|
219
|
+
{
|
|
220
|
+
"type": "assistant",
|
|
221
|
+
"message": {
|
|
222
|
+
"content": [
|
|
223
|
+
{
|
|
224
|
+
"type": "text",
|
|
225
|
+
"text": f"ok\n```verdict\n{verdict_fence}\n```\n",
|
|
226
|
+
}
|
|
227
|
+
]
|
|
228
|
+
},
|
|
229
|
+
}
|
|
230
|
+
)
|
|
231
|
+
+ "\n",
|
|
232
|
+
encoding="utf-8",
|
|
233
|
+
)
|
|
234
|
+
_write_sidecar(agent_transcript, MINTING_AGENT_TYPE)
|
|
235
|
+
payload = {
|
|
236
|
+
"agent_transcript_path": str(agent_transcript),
|
|
237
|
+
"cwd": str(repo_root),
|
|
238
|
+
"agent_id": "empty-surface-1",
|
|
239
|
+
}
|
|
240
|
+
assert mint_for_payload(payload) is None
|
|
241
|
+
|
|
242
|
+
|
|
243
|
+
def test_minter_refuses_when_recomputed_surface_is_empty(
|
|
244
|
+
tmp_path: pathlib.Path,
|
|
245
|
+
) -> None:
|
|
246
|
+
repo_root = tmp_path / "repo"
|
|
247
|
+
repo_root.mkdir()
|
|
248
|
+
subprocess.run(["git", "-C", str(repo_root), "init", "-q"], check=True)
|
|
249
|
+
subprocess.run(
|
|
250
|
+
["git", "-C", str(repo_root), "config", "user.email", "verifier@test"],
|
|
251
|
+
check=True,
|
|
252
|
+
)
|
|
253
|
+
subprocess.run(
|
|
254
|
+
["git", "-C", str(repo_root), "config", "user.name", "verifier"],
|
|
255
|
+
check=True,
|
|
256
|
+
)
|
|
257
|
+
(repo_root / "module.py").write_text("answer = 1\n", encoding="utf-8")
|
|
258
|
+
subprocess.run(["git", "-C", str(repo_root), "add", "-A"], check=True)
|
|
259
|
+
subprocess.run(
|
|
260
|
+
["git", "-C", str(repo_root), "commit", "-qm", "init"], check=True
|
|
261
|
+
)
|
|
262
|
+
subprocess.run(
|
|
263
|
+
["git", "-C", str(repo_root), "branch", "-f", "origin/main", "HEAD"],
|
|
264
|
+
check=True,
|
|
265
|
+
)
|
|
266
|
+
agent_transcript = tmp_path / "agent-7.jsonl"
|
|
267
|
+
agent_transcript.write_text(
|
|
268
|
+
json.dumps(
|
|
269
|
+
{
|
|
270
|
+
"type": "assistant",
|
|
271
|
+
"message": {
|
|
272
|
+
"content": [
|
|
273
|
+
{
|
|
274
|
+
"type": "text",
|
|
275
|
+
"text": 'ok\n```verdict\n{"all_pass": true, "findings": []}\n```\n',
|
|
276
|
+
}
|
|
277
|
+
]
|
|
278
|
+
},
|
|
279
|
+
}
|
|
280
|
+
)
|
|
281
|
+
+ "\n",
|
|
282
|
+
encoding="utf-8",
|
|
283
|
+
)
|
|
284
|
+
_write_sidecar(agent_transcript, MINTING_AGENT_TYPE)
|
|
285
|
+
payload = {
|
|
286
|
+
"agent_transcript_path": str(agent_transcript),
|
|
287
|
+
"cwd": str(repo_root),
|
|
288
|
+
"agent_id": "empty-recompute-1",
|
|
289
|
+
}
|
|
290
|
+
assert mint_for_payload(payload) is None
|
|
291
|
+
|
|
292
|
+
|
|
293
|
+
def test_attested_manifest_hash_binds_over_cwd_surface(tmp_path: pathlib.Path) -> None:
|
|
294
|
+
repo_root = tmp_path / "repo"
|
|
295
|
+
repo_root.mkdir()
|
|
296
|
+
_init_repo_with_upstream_and_edit(repo_root)
|
|
297
|
+
attested_hash = "c" * 64
|
|
298
|
+
agent_transcript = tmp_path / "agent-7.jsonl"
|
|
299
|
+
verdict_fence = json.dumps(
|
|
300
|
+
{"all_pass": True, "findings": [], "manifest_sha256": attested_hash}
|
|
301
|
+
)
|
|
302
|
+
agent_transcript.write_text(
|
|
303
|
+
json.dumps(
|
|
304
|
+
{
|
|
305
|
+
"type": "assistant",
|
|
306
|
+
"message": {
|
|
307
|
+
"content": [
|
|
308
|
+
{
|
|
309
|
+
"type": "text",
|
|
310
|
+
"text": f"ok\n```verdict\n{verdict_fence}\n```\n",
|
|
311
|
+
}
|
|
312
|
+
]
|
|
313
|
+
},
|
|
314
|
+
}
|
|
315
|
+
)
|
|
316
|
+
+ "\n",
|
|
317
|
+
encoding="utf-8",
|
|
318
|
+
)
|
|
319
|
+
_write_sidecar(agent_transcript, MINTING_AGENT_TYPE)
|
|
320
|
+
payload = {
|
|
321
|
+
"agent_transcript_path": str(agent_transcript),
|
|
322
|
+
"cwd": str(repo_root),
|
|
323
|
+
"agent_id": "attest-1",
|
|
324
|
+
}
|
|
325
|
+
verdict_path = mint_for_payload(payload)
|
|
326
|
+
try:
|
|
327
|
+
assert verdict_path is not None
|
|
328
|
+
verdict_record = json.loads(verdict_path.read_text(encoding="utf-8"))
|
|
329
|
+
assert verdict_record["manifest_sha256"] == attested_hash
|
|
330
|
+
finally:
|
|
331
|
+
if verdict_path is not None and verdict_path.exists():
|
|
332
|
+
verdict_path.unlink()
|
|
@@ -29,12 +29,16 @@ from config.verified_commit_constants import (
|
|
|
29
29
|
AGENT_META_SIDECAR_SUFFIX,
|
|
30
30
|
AGENT_META_TYPE_KEY,
|
|
31
31
|
AGENT_TRANSCRIPT_GLOB,
|
|
32
|
+
BRANCH_REFERENCE_PREFIX,
|
|
33
|
+
BRANCH_WORKTREE_ABSENT_MESSAGE,
|
|
32
34
|
CLAUDE_HOME_DIRECTORY_NAME,
|
|
33
35
|
CONFTEST_FILE_NAME,
|
|
34
36
|
DOCS_ONLY_EXTENSIONS,
|
|
35
37
|
ALL_FALLBACK_BASE_REFERENCES,
|
|
38
|
+
EMPTY_SURFACE_GUARD_MESSAGE,
|
|
36
39
|
GIT_TIMEOUT_SECONDS,
|
|
37
40
|
MANIFEST_HASH_CLI_FLAG,
|
|
41
|
+
MANIFEST_HASH_FOR_BRANCH_CLI_FLAG,
|
|
38
42
|
MINIMUM_STATUS_FIELD_COUNT,
|
|
39
43
|
MINTING_AGENT_TYPE,
|
|
40
44
|
PYTHON_EXTENSION,
|
|
@@ -52,10 +56,13 @@ from config.verified_commit_constants import (
|
|
|
52
56
|
TRANSCRIPT_TEXT_KEY,
|
|
53
57
|
VERDICT_DIRECTORY_NAME,
|
|
54
58
|
VERDICT_FENCE_PATTERN,
|
|
59
|
+
VERDICT_FILE_GLOB,
|
|
55
60
|
VERDICT_JSON_INDENT,
|
|
56
61
|
VERDICT_KEY_ALL_PASS,
|
|
57
62
|
VERDICT_KEY_FINDINGS,
|
|
58
63
|
VERDICT_KEY_MANIFEST_SHA256,
|
|
64
|
+
WORKTREE_LIST_BRANCH_PREFIX,
|
|
65
|
+
WORKTREE_LIST_PATH_PREFIX,
|
|
59
66
|
)
|
|
60
67
|
|
|
61
68
|
|
|
@@ -243,6 +250,65 @@ def manifest_sha256(surface_manifest_text: str) -> str:
|
|
|
243
250
|
return hashlib.sha256(surface_manifest_text.encode("utf-8")).hexdigest()
|
|
244
251
|
|
|
245
252
|
|
|
253
|
+
def empty_surface_hash() -> str:
|
|
254
|
+
"""Return the manifest hash that represents an empty change surface.
|
|
255
|
+
|
|
256
|
+
A work tree whose HEAD equals the merge base has no changed or untracked
|
|
257
|
+
files, so ``branch_surface_manifest`` returns ``""`` and this hash is what
|
|
258
|
+
the minter or store CLI would produce for it. Comparing an attested hash
|
|
259
|
+
against this value lets the minter refuse to mint for a verifier that ran
|
|
260
|
+
in the wrong work tree.
|
|
261
|
+
|
|
262
|
+
Returns:
|
|
263
|
+
The hex sha256 digest of the empty surface manifest (the empty string).
|
|
264
|
+
"""
|
|
265
|
+
return manifest_sha256("")
|
|
266
|
+
|
|
267
|
+
|
|
268
|
+
def worktree_path_for_branch(repo_directory: str, branch_name: str) -> str | None:
|
|
269
|
+
"""Find the work-tree directory that has a given branch checked out.
|
|
270
|
+
|
|
271
|
+
Parses the porcelain output of ``git worktree list --porcelain`` and
|
|
272
|
+
returns the ``worktree <path>`` line whose block carries a
|
|
273
|
+
``branch refs/heads/<branch_name>`` line. Returns None when git fails
|
|
274
|
+
or no work tree holds the branch.
|
|
275
|
+
|
|
276
|
+
Args:
|
|
277
|
+
repo_directory: Any directory inside the repository.
|
|
278
|
+
branch_name: The short branch name to locate (without ``refs/heads/``).
|
|
279
|
+
|
|
280
|
+
Returns:
|
|
281
|
+
The absolute path of the work tree that has the branch checked out,
|
|
282
|
+
or None when no work tree holds the branch or git fails.
|
|
283
|
+
"""
|
|
284
|
+
porcelain_output = run_git(repo_directory, "worktree", "list", "--porcelain")
|
|
285
|
+
if porcelain_output is None:
|
|
286
|
+
return None
|
|
287
|
+
target_branch_reference = f"{BRANCH_REFERENCE_PREFIX}{branch_name}"
|
|
288
|
+
current_worktree_path: str | None = None
|
|
289
|
+
for each_line in porcelain_output.splitlines():
|
|
290
|
+
if each_line.startswith(WORKTREE_LIST_PATH_PREFIX):
|
|
291
|
+
current_worktree_path = each_line[len(WORKTREE_LIST_PATH_PREFIX):]
|
|
292
|
+
elif each_line.startswith(WORKTREE_LIST_BRANCH_PREFIX):
|
|
293
|
+
branch_reference = each_line[len(WORKTREE_LIST_BRANCH_PREFIX):]
|
|
294
|
+
if branch_reference == target_branch_reference and current_worktree_path:
|
|
295
|
+
return current_worktree_path
|
|
296
|
+
return None
|
|
297
|
+
|
|
298
|
+
|
|
299
|
+
def verdict_directory() -> Path:
|
|
300
|
+
"""Return the shared directory holding every work tree's verdict file.
|
|
301
|
+
|
|
302
|
+
Verdicts live outside any repository (under the user's Claude home) so no
|
|
303
|
+
repo accumulates untracked verdict files; every work tree's verdict shares
|
|
304
|
+
this one directory, distinguished by file name.
|
|
305
|
+
|
|
306
|
+
Returns:
|
|
307
|
+
The verdict directory under the user's Claude home.
|
|
308
|
+
"""
|
|
309
|
+
return Path.home() / CLAUDE_HOME_DIRECTORY_NAME / VERDICT_DIRECTORY_NAME
|
|
310
|
+
|
|
311
|
+
|
|
246
312
|
def verdict_path_for_repo(repo_root: str) -> Path:
|
|
247
313
|
"""Derive the verdict file path for a repository work tree.
|
|
248
314
|
|
|
@@ -258,9 +324,7 @@ def verdict_path_for_repo(repo_root: str) -> Path:
|
|
|
258
324
|
"""
|
|
259
325
|
normalized_root = str(Path(repo_root).resolve()).replace("\\", "/").lower()
|
|
260
326
|
root_key = hashlib.sha256(normalized_root.encode("utf-8")).hexdigest()[:ROOT_KEY_HEX_LENGTH]
|
|
261
|
-
return (
|
|
262
|
-
Path.home() / CLAUDE_HOME_DIRECTORY_NAME / VERDICT_DIRECTORY_NAME / f"{root_key}.json"
|
|
263
|
-
)
|
|
327
|
+
return verdict_directory() / f"{root_key}.json"
|
|
264
328
|
|
|
265
329
|
|
|
266
330
|
def load_valid_verdict(repo_root: str, expected_manifest_sha256: str) -> dict | None:
|
|
@@ -289,6 +353,44 @@ def load_valid_verdict(repo_root: str, expected_manifest_sha256: str) -> dict |
|
|
|
289
353
|
return verdict_record
|
|
290
354
|
|
|
291
355
|
|
|
356
|
+
def minted_verdict_covers_surface(expected_manifest_sha256: str) -> bool:
|
|
357
|
+
"""Decide whether any minted verdict covers the live surface, keyed by hash.
|
|
358
|
+
|
|
359
|
+
A verdict's bound ``manifest_sha256`` commits to the exact set of surface
|
|
360
|
+
file paths and their byte contents; the work tree's location never enters
|
|
361
|
+
the hash. A clean verdict minted while verifying one work tree therefore
|
|
362
|
+
proves the same change surface in a sibling work tree of the same branch,
|
|
363
|
+
even though each work tree files its verdict under its own path-keyed name.
|
|
364
|
+
Scanning every verdict file by bound hash lets that verdict clear the
|
|
365
|
+
sibling's commit, while a verdict bound to a different hash — different
|
|
366
|
+
code — never matches. The path-keyed ``load_valid_verdict`` stays the fast
|
|
367
|
+
same-work-tree lookup; this is the cross-work-tree fallback.
|
|
368
|
+
|
|
369
|
+
Args:
|
|
370
|
+
expected_manifest_sha256: Hash of the live surface manifest the verdict
|
|
371
|
+
must match exactly.
|
|
372
|
+
|
|
373
|
+
Returns:
|
|
374
|
+
True as soon as one verdict file reports ``all_pass`` true and binds to
|
|
375
|
+
the expected hash; False when none match or the directory is absent.
|
|
376
|
+
"""
|
|
377
|
+
verdict_dir = verdict_directory()
|
|
378
|
+
if not verdict_dir.is_dir():
|
|
379
|
+
return False
|
|
380
|
+
for each_verdict_file in sorted(verdict_dir.glob(VERDICT_FILE_GLOB)):
|
|
381
|
+
try:
|
|
382
|
+
verdict_record = json.loads(each_verdict_file.read_text(encoding="utf-8"))
|
|
383
|
+
except (OSError, json.JSONDecodeError):
|
|
384
|
+
continue
|
|
385
|
+
if not isinstance(verdict_record, dict):
|
|
386
|
+
continue
|
|
387
|
+
if verdict_record.get(VERDICT_KEY_ALL_PASS) is not True:
|
|
388
|
+
continue
|
|
389
|
+
if verdict_record.get(VERDICT_KEY_MANIFEST_SHA256) == expected_manifest_sha256:
|
|
390
|
+
return True
|
|
391
|
+
return False
|
|
392
|
+
|
|
393
|
+
|
|
292
394
|
def _subagents_directory_for_transcript(transcript_path: str) -> Path | None:
|
|
293
395
|
"""Locate the live session's subagents directory from a transcript path.
|
|
294
396
|
|
|
@@ -647,14 +749,18 @@ def _print_live_manifest_hash(repo_directory: str) -> int:
|
|
|
647
749
|
"""Print the live surface manifest hash for a repo, for a workflow verifier.
|
|
648
750
|
|
|
649
751
|
A workflow code-verifier runs this to learn the exact hash to bind its
|
|
650
|
-
verdict to, so stdout carries only the hash and nothing else.
|
|
752
|
+
verdict to, so stdout carries only the hash and nothing else. When the
|
|
753
|
+
work tree has no changed or untracked files (empty change surface), this
|
|
754
|
+
prints the empty-surface guard message to stderr and returns nonzero —
|
|
755
|
+
an empty surface means the verifier is pointed at the wrong work tree.
|
|
651
756
|
|
|
652
757
|
Args:
|
|
653
758
|
repo_directory: A directory inside the work tree to bind the verdict to.
|
|
654
759
|
|
|
655
760
|
Returns:
|
|
656
|
-
0 after printing the hash; nonzero with no stdout when the repo root
|
|
657
|
-
merge base cannot be resolved
|
|
761
|
+
0 after printing the hash; nonzero with no stdout when the repo root,
|
|
762
|
+
merge base, or manifest cannot be resolved, or when the change surface
|
|
763
|
+
is empty (wrong work tree).
|
|
658
764
|
"""
|
|
659
765
|
repo_root = resolve_repo_root(repo_directory)
|
|
660
766
|
if repo_root is None:
|
|
@@ -665,20 +771,69 @@ def _print_live_manifest_hash(repo_directory: str) -> int:
|
|
|
665
771
|
surface_manifest_text = branch_surface_manifest(repo_root, merge_base_sha)
|
|
666
772
|
if surface_manifest_text is None:
|
|
667
773
|
return 1
|
|
774
|
+
if surface_manifest_text == "":
|
|
775
|
+
print(EMPTY_SURFACE_GUARD_MESSAGE.format(repo_root=repo_root), file=sys.stderr)
|
|
776
|
+
return 1
|
|
668
777
|
print(manifest_sha256(surface_manifest_text))
|
|
669
778
|
return 0
|
|
670
779
|
|
|
671
780
|
|
|
781
|
+
def _print_branch_manifest_hash(branch_name: str) -> int:
|
|
782
|
+
"""Print the manifest hash for the work tree that holds a given branch.
|
|
783
|
+
|
|
784
|
+
Resolves the repository root from the current working directory, then
|
|
785
|
+
finds the work tree that has ``branch_name`` checked out, and delegates
|
|
786
|
+
to ``_print_live_manifest_hash`` for that work tree. This mode is immune
|
|
787
|
+
to the verifier's own cwd: it always hashes the work tree that holds the
|
|
788
|
+
branch under review, regardless of where the verifier itself is running.
|
|
789
|
+
|
|
790
|
+
Args:
|
|
791
|
+
branch_name: The short branch name to locate (without ``refs/heads/``).
|
|
792
|
+
|
|
793
|
+
Returns:
|
|
794
|
+
0 after printing the hash; nonzero with a stderr message when the
|
|
795
|
+
repo root cannot be resolved, no work tree holds the branch, or the
|
|
796
|
+
located work tree has an empty change surface.
|
|
797
|
+
"""
|
|
798
|
+
repo_root = resolve_repo_root(str(Path.cwd()))
|
|
799
|
+
if repo_root is None:
|
|
800
|
+
print("ERROR: Current directory is not inside a git repository.", file=sys.stderr)
|
|
801
|
+
return 1
|
|
802
|
+
branch_worktree_path = worktree_path_for_branch(repo_root, branch_name)
|
|
803
|
+
if branch_worktree_path is None:
|
|
804
|
+
print(
|
|
805
|
+
BRANCH_WORKTREE_ABSENT_MESSAGE.format(branch=branch_name),
|
|
806
|
+
file=sys.stderr,
|
|
807
|
+
)
|
|
808
|
+
return 1
|
|
809
|
+
return _print_live_manifest_hash(branch_worktree_path)
|
|
810
|
+
|
|
811
|
+
|
|
672
812
|
def main() -> None:
|
|
673
813
|
"""Run the verdict-store CLI: compute the live surface-manifest hash.
|
|
674
814
|
|
|
675
|
-
|
|
676
|
-
|
|
677
|
-
|
|
678
|
-
|
|
815
|
+
Two modes:
|
|
816
|
+
|
|
817
|
+
``--manifest-hash <work-tree-dir>``
|
|
818
|
+
Print the live ``manifest_sha256`` for the given work tree directory
|
|
819
|
+
so a workflow code-verifier can bind its verdict to the exact surface
|
|
820
|
+
the gate checks. Fails with a stderr message when the change surface
|
|
821
|
+
is empty (wrong work tree).
|
|
822
|
+
|
|
823
|
+
``--manifest-hash-for-branch <branch>``
|
|
824
|
+
Resolve the work tree that has ``<branch>`` checked out (via
|
|
825
|
+
``git worktree list --porcelain``) and print its manifest hash.
|
|
826
|
+
Immune to the verifier's own cwd — always targets the branch's own
|
|
827
|
+
work tree. Fails when no work tree holds the branch or the surface
|
|
828
|
+
is empty.
|
|
829
|
+
|
|
830
|
+
Exits nonzero with no stdout on any other argument shape or when the
|
|
831
|
+
surface cannot be resolved.
|
|
679
832
|
"""
|
|
680
833
|
if len(sys.argv) == 3 and sys.argv[1] == MANIFEST_HASH_CLI_FLAG:
|
|
681
834
|
sys.exit(_print_live_manifest_hash(sys.argv[2]))
|
|
835
|
+
if len(sys.argv) == 3 and sys.argv[1] == MANIFEST_HASH_FOR_BRANCH_CLI_FLAG:
|
|
836
|
+
sys.exit(_print_branch_manifest_hash(sys.argv[2]))
|
|
682
837
|
sys.exit(1)
|
|
683
838
|
|
|
684
839
|
|
|
@@ -13,8 +13,11 @@ and allows the command only when one of these holds:
|
|
|
13
13
|
- the surface is mechanically exempt (docs/images by extension, pytest
|
|
14
14
|
test files by name convention, Python files whose docstring-stripped
|
|
15
15
|
AST is unchanged), or
|
|
16
|
-
- a verdict
|
|
17
|
-
|
|
16
|
+
- a passing verifier verdict binds to the exact live manifest hash —
|
|
17
|
+
matched by content hash, not by work-tree location, so a verdict
|
|
18
|
+
``verifier_verdict_minter.py`` minted while verifying any work tree of
|
|
19
|
+
the surface clears the commit, as does one a workflow ``code-verifier``
|
|
20
|
+
emitted in its own transcript.
|
|
18
21
|
|
|
19
22
|
The surface binds every changed and untracked file's content, so slicing
|
|
20
23
|
work into small commits or staging files cannot move the hash, while any
|
|
@@ -57,6 +60,7 @@ from verification_verdict_store import (
|
|
|
57
60
|
is_verification_exempt_diff,
|
|
58
61
|
load_valid_verdict,
|
|
59
62
|
manifest_sha256,
|
|
63
|
+
minted_verdict_covers_surface,
|
|
60
64
|
resolve_merge_base,
|
|
61
65
|
resolve_repo_root,
|
|
62
66
|
workflow_verdict_covers_surface,
|
|
@@ -500,6 +504,8 @@ def deny_reason_for_directory(target_directory: str, transcript_path: str) -> st
|
|
|
500
504
|
live_manifest_sha256 = manifest_sha256(surface_manifest_text)
|
|
501
505
|
if load_valid_verdict(repo_root, live_manifest_sha256) is not None:
|
|
502
506
|
return None
|
|
507
|
+
if minted_verdict_covers_surface(live_manifest_sha256):
|
|
508
|
+
return None
|
|
503
509
|
if workflow_verdict_covers_surface(transcript_path, live_manifest_sha256):
|
|
504
510
|
return None
|
|
505
511
|
hash_preview = live_manifest_sha256[:HASH_PREVIEW_LENGTH]
|
|
@@ -35,9 +35,13 @@ blocking_directory = str(Path(__file__).resolve().parent)
|
|
|
35
35
|
if blocking_directory not in sys.path:
|
|
36
36
|
sys.path.insert(0, blocking_directory)
|
|
37
37
|
|
|
38
|
-
from config.verified_commit_constants import
|
|
38
|
+
from config.verified_commit_constants import (
|
|
39
|
+
MINTING_AGENT_TYPE,
|
|
40
|
+
VERDICT_KEY_MANIFEST_SHA256,
|
|
41
|
+
)
|
|
39
42
|
from verification_verdict_store import (
|
|
40
43
|
branch_surface_manifest,
|
|
44
|
+
empty_surface_hash,
|
|
41
45
|
manifest_sha256,
|
|
42
46
|
resolve_merge_base,
|
|
43
47
|
resolve_repo_root,
|
|
@@ -169,6 +173,53 @@ def resolved_subagent_type(subagent_stop_payload: dict) -> str | None:
|
|
|
169
173
|
)
|
|
170
174
|
|
|
171
175
|
|
|
176
|
+
def _attested_or_recomputed_hash(verdict_record: dict, repo_root: str) -> str | None:
|
|
177
|
+
"""Choose the surface hash the minted verdict binds to.
|
|
178
|
+
|
|
179
|
+
A code-verifier that verifies a work tree other than the stop event's cwd
|
|
180
|
+
attests the surface it checked by emitting ``manifest_sha256`` in its
|
|
181
|
+
verdict fence (computed against the verified work tree via the
|
|
182
|
+
``--manifest-hash`` CLI). Binding the minted verdict to that attested hash
|
|
183
|
+
keeps the verdict tied to the code actually verified rather than the
|
|
184
|
+
subagent's cwd, so a verdict earned for one work tree clears a commit in a
|
|
185
|
+
sibling work tree of the same surface. When the fence attests no hash, the
|
|
186
|
+
minter recomputes one from the cwd work tree, which is correct whenever the
|
|
187
|
+
verifier ran in the work tree it verified.
|
|
188
|
+
|
|
189
|
+
Returns None (mints nothing) in two empty-surface cases:
|
|
190
|
+
|
|
191
|
+
- The attested hash equals ``empty_surface_hash()`` — the verifier called
|
|
192
|
+
the store CLI on a wrong (empty) work tree and the hash it got back
|
|
193
|
+
represents nothing.
|
|
194
|
+
- The recompute branch produces an empty manifest — the cwd work tree also
|
|
195
|
+
has no changed or untracked files versus the merge base.
|
|
196
|
+
|
|
197
|
+
Args:
|
|
198
|
+
verdict_record: The parsed verdict fence from the verifier transcript.
|
|
199
|
+
repo_root: The work-tree root resolved from the stop event's cwd, used
|
|
200
|
+
for the recompute fallback.
|
|
201
|
+
|
|
202
|
+
Returns:
|
|
203
|
+
The attested ``manifest_sha256`` when the fence carries a non-empty
|
|
204
|
+
string one that is not the empty-surface sentinel; the cwd work tree's
|
|
205
|
+
recomputed surface hash when the fence attests nothing and the surface
|
|
206
|
+
is non-empty; or None when the attested hash is the empty-surface
|
|
207
|
+
sentinel, the surface manifest is empty, or no upstream base resolves.
|
|
208
|
+
"""
|
|
209
|
+
attested_manifest_sha256 = verdict_record.get(VERDICT_KEY_MANIFEST_SHA256)
|
|
210
|
+
if isinstance(attested_manifest_sha256, str) and attested_manifest_sha256:
|
|
211
|
+
if attested_manifest_sha256 == empty_surface_hash():
|
|
212
|
+
return None
|
|
213
|
+
return attested_manifest_sha256
|
|
214
|
+
merge_base_sha = resolve_merge_base(repo_root)
|
|
215
|
+
if merge_base_sha is None:
|
|
216
|
+
return None
|
|
217
|
+
surface_manifest_text = branch_surface_manifest(repo_root, merge_base_sha)
|
|
218
|
+
if not surface_manifest_text:
|
|
219
|
+
return None
|
|
220
|
+
return manifest_sha256(surface_manifest_text)
|
|
221
|
+
|
|
222
|
+
|
|
172
223
|
def mint_for_payload(subagent_stop_payload: dict) -> Path | None:
|
|
173
224
|
"""Mint a verdict file for a code-verifier stop event.
|
|
174
225
|
|
|
@@ -177,8 +228,10 @@ def mint_for_payload(subagent_stop_payload: dict) -> Path | None:
|
|
|
177
228
|
|
|
178
229
|
Returns:
|
|
179
230
|
The verdict file path when minted; None when the payload is not a
|
|
180
|
-
code-verifier stop, the transcript holds no verdict,
|
|
181
|
-
|
|
231
|
+
code-verifier stop, the transcript holds no verdict, the cwd is not a
|
|
232
|
+
work tree, or — for a verdict that attests no ``manifest_sha256`` of
|
|
233
|
+
its own — that work tree has no upstream base to recompute the surface
|
|
234
|
+
hash from.
|
|
182
235
|
"""
|
|
183
236
|
if resolved_subagent_type(subagent_stop_payload) != MINTING_AGENT_TYPE:
|
|
184
237
|
return None
|
|
@@ -191,15 +244,12 @@ def mint_for_payload(subagent_stop_payload: dict) -> Path | None:
|
|
|
191
244
|
repo_root = resolve_repo_root(subagent_stop_payload.get("cwd", "."))
|
|
192
245
|
if repo_root is None:
|
|
193
246
|
return None
|
|
194
|
-
|
|
195
|
-
if
|
|
196
|
-
return None
|
|
197
|
-
surface_manifest_text = branch_surface_manifest(repo_root, merge_base_sha)
|
|
198
|
-
if surface_manifest_text is None:
|
|
247
|
+
bound_manifest_sha256 = _attested_or_recomputed_hash(verdict_record, repo_root)
|
|
248
|
+
if bound_manifest_sha256 is None:
|
|
199
249
|
return None
|
|
200
250
|
return write_verdict(
|
|
201
251
|
repo_root,
|
|
202
|
-
|
|
252
|
+
bound_manifest_sha256,
|
|
203
253
|
verdict_record["all_pass"],
|
|
204
254
|
verdict_record["findings"],
|
|
205
255
|
str(subagent_stop_payload.get("agent_id", "")),
|
package/package.json
CHANGED
|
@@ -34,7 +34,9 @@ PR's owner.
|
|
|
34
34
|
|
|
35
35
|
1. **Enter a worktree.** Call `EnterWorktree` with no arguments before any
|
|
36
36
|
`gh`, `git`, file read, or edit. `gh`/`git` Bash calls do not auto-isolate,
|
|
37
|
-
so this is mandatory. If it fails, report and stop.
|
|
37
|
+
so this is mandatory. If it fails, report and stop. A bare `EnterWorktree`
|
|
38
|
+
branches from `origin/main`; step 2 positions the worktree on the PR's head
|
|
39
|
+
ref, which the workflow needs.
|
|
38
40
|
|
|
39
41
|
2. **Resolve PR scope.** When the user passed a PR URL or number, parse owner,
|
|
40
42
|
repo, and number from it. Otherwise read the current branch's PR:
|
|
@@ -43,6 +45,18 @@ PR's owner.
|
|
|
43
45
|
ready, mark it draft first (`gh pr ready <n> --repo <o>/<r> --undo`) so the
|
|
44
46
|
loop owns the ready transition.
|
|
45
47
|
|
|
48
|
+
**Position the worktree on the PR branch.** The workflow reviews
|
|
49
|
+
`git diff origin/main...HEAD` against this worktree's local `HEAD` and pushes
|
|
50
|
+
each fix to the PR branch, so the worktree sits on the PR's head ref at the PR
|
|
51
|
+
HEAD before the workflow launches. A worktree fresh off `origin/main` has
|
|
52
|
+
`HEAD == origin/main`, shows an empty diff, and reports a false convergence
|
|
53
|
+
with zero findings. When a local worktree already tracks the PR branch, enter
|
|
54
|
+
that one by passing its path to `EnterWorktree`; otherwise put the entered
|
|
55
|
+
worktree on the branch with `gh pr checkout <number> --repo <owner>/<repo>`
|
|
56
|
+
(or `git fetch origin <headRefName>` then `git switch <headRefName>`). Confirm
|
|
57
|
+
before launching: `git rev-parse --abbrev-ref HEAD` equals the PR's head ref
|
|
58
|
+
and local `HEAD` equals the PR head SHA.
|
|
59
|
+
|
|
46
60
|
3. **Verify the worktree is the PR's repo (strict pre-flight).** Run
|
|
47
61
|
`python "$HOME/.claude/skills/_shared/pr-loop/scripts/preflight_worktree.py" --owner <owner> --repo <repo> --mode strict`.
|
|
48
62
|
It confirms the working directory is a checkout of the PR's own repo and
|
|
@@ -56,6 +70,17 @@ PR's owner.
|
|
|
56
70
|
4. **Grant project permissions.**
|
|
57
71
|
`python "$HOME/.claude/skills/bugteam/scripts/grant_project_claude_permissions.py"`
|
|
58
72
|
|
|
73
|
+
In auto-mode the classifier blocks this grant as an unrequested change to the
|
|
74
|
+
permission allowlist: the `/autoconverge` invocation alone does not meet its
|
|
75
|
+
bar for an explicitly requested permission change. When it is blocked, keep
|
|
76
|
+
the run alive — surface the grant to the user through `AskUserQuestion` with
|
|
77
|
+
the exact command and ask them to approve it or run it themselves with the `!`
|
|
78
|
+
prefix:
|
|
79
|
+
`! python "$HOME/.claude/skills/bugteam/scripts/grant_project_claude_permissions.py"`.
|
|
80
|
+
Continue once the grant lands. A user who wants future runs to skip this
|
|
81
|
+
prompt can add a standing Bash permission allow-rule for that script in their
|
|
82
|
+
settings.
|
|
83
|
+
|
|
59
84
|
## Run the workflow
|
|
60
85
|
|
|
61
86
|
Call the `Workflow` tool against the colocated script:
|
|
@@ -220,37 +220,61 @@ test('the fix flow spawns a code-verifier step between the edit step and the com
|
|
|
220
220
|
);
|
|
221
221
|
});
|
|
222
222
|
|
|
223
|
-
|
|
224
|
-
const
|
|
225
|
-
assert.notEqual(constantStart, -1, `expected ${constantName} to exist`);
|
|
226
|
-
const nextConstantStart = convergeSource.indexOf('\nconst ', constantStart + 1);
|
|
227
|
-
const constantEnd = nextConstantStart === -1 ? convergeSource.length : nextConstantStart;
|
|
228
|
-
return convergeSource.slice(constantStart, constantEnd);
|
|
229
|
-
}
|
|
230
|
-
|
|
231
|
-
test('the shared verdict-fence steps name the binding-hash command and the verdict fence', () => {
|
|
232
|
-
const fenceSteps = constantBody('VERDICT_FENCE_STEPS');
|
|
233
|
-
assert.match(fenceSteps, /--manifest-hash/, 'expected the binding-hash command to be named');
|
|
223
|
+
test('the shared verdict-fence builder names the binding-hash command and the verdict fence', () => {
|
|
224
|
+
const fenceBuilder = lensPromptBody('buildVerdictFenceSteps');
|
|
234
225
|
assert.match(
|
|
235
|
-
|
|
226
|
+
fenceBuilder,
|
|
227
|
+
/--manifest-hash-for-branch/,
|
|
228
|
+
'expected the binding-hash command to use --manifest-hash-for-branch (cwd-immune)',
|
|
229
|
+
);
|
|
230
|
+
assert.doesNotMatch(
|
|
231
|
+
fenceBuilder,
|
|
232
|
+
/--manifest-hash(?!-for-branch)/,
|
|
233
|
+
'expected the old --manifest-hash <REPO> form to be removed in favour of --manifest-hash-for-branch',
|
|
234
|
+
);
|
|
235
|
+
assert.match(
|
|
236
|
+
fenceBuilder,
|
|
236
237
|
/verification_verdict_store\.py/,
|
|
237
238
|
'expected the verdict-store script that computes the binding hash to be named',
|
|
238
239
|
);
|
|
239
|
-
assert.match(
|
|
240
|
-
assert.match(
|
|
240
|
+
assert.match(fenceBuilder, /```verdict/, 'expected the verdict fence to be specified');
|
|
241
|
+
assert.match(fenceBuilder, /manifest_sha256/, 'expected the verdict fence to carry manifest_sha256');
|
|
242
|
+
assert.match(
|
|
243
|
+
fenceBuilder,
|
|
244
|
+
/gh pr view/,
|
|
245
|
+
'expected buildVerdictFenceSteps to resolve the head branch via gh pr view (cwd-immune)',
|
|
246
|
+
);
|
|
247
|
+
assert.match(
|
|
248
|
+
fenceBuilder,
|
|
249
|
+
/headRefName/,
|
|
250
|
+
'expected buildVerdictFenceSteps to extract the headRefName from gh pr view output',
|
|
251
|
+
);
|
|
252
|
+
});
|
|
253
|
+
|
|
254
|
+
test('the verdict-fence binding does not self-resolve a cwd via git rev-parse for the manifest hash', () => {
|
|
255
|
+
const fenceBuilder = lensPromptBody('buildVerdictFenceSteps');
|
|
256
|
+
assert.doesNotMatch(
|
|
257
|
+
fenceBuilder,
|
|
258
|
+
/git rev-parse --show-toplevel/,
|
|
259
|
+
'expected the binding hash to be cwd-immune (no git rev-parse in the binding step)',
|
|
260
|
+
);
|
|
241
261
|
});
|
|
242
262
|
|
|
243
|
-
test('every verify step
|
|
263
|
+
test('every verify step calls buildVerdictFenceSteps, uses code-verifier, and forbids edits', () => {
|
|
244
264
|
for (const verifyFunctionName of [
|
|
245
265
|
'verifyFixesInWorkingTree',
|
|
246
266
|
'verifyRepairChanges',
|
|
247
|
-
'verifyHardeningChanges',
|
|
248
267
|
]) {
|
|
249
268
|
const verifyBody = lensPromptBody(verifyFunctionName);
|
|
250
269
|
assert.match(
|
|
251
270
|
verifyBody,
|
|
252
|
-
/
|
|
253
|
-
`expected ${verifyFunctionName} to
|
|
271
|
+
/buildVerdictFenceSteps\(/,
|
|
272
|
+
`expected ${verifyFunctionName} to call buildVerdictFenceSteps (cwd-immune branch binding)`,
|
|
273
|
+
);
|
|
274
|
+
assert.doesNotMatch(
|
|
275
|
+
verifyBody,
|
|
276
|
+
/VERDICT_FENCE_STEPS(?!\s*\))/,
|
|
277
|
+
`expected ${verifyFunctionName} not to reference the removed VERDICT_FENCE_STEPS constant`,
|
|
254
278
|
);
|
|
255
279
|
assert.match(
|
|
256
280
|
verifyBody,
|
|
@@ -270,6 +294,46 @@ test('every verify step reuses the shared verdict-fence steps, uses code-verifie
|
|
|
270
294
|
}
|
|
271
295
|
});
|
|
272
296
|
|
|
297
|
+
test('verifyHardeningChanges uses --manifest-hash-for-branch with the hardening branch, uses code-verifier, and forbids edits', () => {
|
|
298
|
+
const verifyBody = lensPromptBody('verifyHardeningChanges');
|
|
299
|
+
assert.match(
|
|
300
|
+
verifyBody,
|
|
301
|
+
/--manifest-hash-for-branch/,
|
|
302
|
+
'expected verifyHardeningChanges to bind by hardening branch (cwd-immune)',
|
|
303
|
+
);
|
|
304
|
+
assert.doesNotMatch(
|
|
305
|
+
verifyBody,
|
|
306
|
+
/--manifest-hash(?!-for-branch)/,
|
|
307
|
+
'expected verifyHardeningChanges not to use the old --manifest-hash <REPO> form',
|
|
308
|
+
);
|
|
309
|
+
assert.match(
|
|
310
|
+
verifyBody,
|
|
311
|
+
/agentType:\s*'code-verifier'/,
|
|
312
|
+
'expected verifyHardeningChanges to spawn the code-verifier agent type',
|
|
313
|
+
);
|
|
314
|
+
assert.doesNotMatch(
|
|
315
|
+
verifyBody,
|
|
316
|
+
/schema:/,
|
|
317
|
+
'expected verifyHardeningChanges to pass no schema so its verdict fence stays as assistant text',
|
|
318
|
+
);
|
|
319
|
+
assert.match(
|
|
320
|
+
verifyBody,
|
|
321
|
+
/do no edits|make no edits|not edit|no file edits/i,
|
|
322
|
+
'expected verifyHardeningChanges to be told to make no edits',
|
|
323
|
+
);
|
|
324
|
+
});
|
|
325
|
+
|
|
326
|
+
test('verifyFixesInWorkingTree and verifyRepairChanges pass input.owner, input.repo, input.prNumber to buildVerdictFenceSteps', () => {
|
|
327
|
+
for (const verifyFunctionName of ['verifyFixesInWorkingTree', 'verifyRepairChanges']) {
|
|
328
|
+
const verifyBody = lensPromptBody(verifyFunctionName);
|
|
329
|
+
assert.match(
|
|
330
|
+
verifyBody,
|
|
331
|
+
/buildVerdictFenceSteps\(input\.owner,\s*input\.repo,\s*input\.prNumber\)/,
|
|
332
|
+
`expected ${verifyFunctionName} to pass PR coordinates to buildVerdictFenceSteps`,
|
|
333
|
+
);
|
|
334
|
+
}
|
|
335
|
+
});
|
|
336
|
+
|
|
273
337
|
test('the commit step is instructed to make no further file edits', () => {
|
|
274
338
|
const commitBody = lensPromptBody('commitVerifiedFixes');
|
|
275
339
|
assert.match(
|
|
@@ -141,15 +141,33 @@ const STANDARDS_EDIT_SCHEMA = {
|
|
|
141
141
|
required: ['issueUrl', 'hardeningRepoPath', 'hardeningBranch', 'hardeningEdited', 'summary'],
|
|
142
142
|
}
|
|
143
143
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
144
|
+
/**
|
|
145
|
+
* Build the verdict-fence step instructions for a verify agent, binding the
|
|
146
|
+
* surface hash by branch name rather than by a self-resolved cwd. Resolving
|
|
147
|
+
* the branch via `gh pr view` is cwd-immune: it does not matter which worktree
|
|
148
|
+
* the verify agent runs in, so a launcher session whose cwd is a different
|
|
149
|
+
* worktree cannot poison the binding hash.
|
|
150
|
+
* @param {string} prOwner GitHub owner of the repo that holds the branch
|
|
151
|
+
* @param {string} prRepo GitHub repo name
|
|
152
|
+
* @param {number|string} prNumber PR number used to resolve the head branch
|
|
153
|
+
* @returns {string} binding-hash and verdict-fence instructions for a verify prompt
|
|
154
|
+
*/
|
|
155
|
+
function buildVerdictFenceSteps(prOwner, prRepo, prNumber) {
|
|
156
|
+
return (
|
|
157
|
+
`Compute the binding hash for the live surface:\n` +
|
|
158
|
+
` a. Resolve the PR head branch (cwd-immune): run exactly\n` +
|
|
159
|
+
` gh pr view ${prNumber} --repo ${prOwner}/${prRepo} --json headRefName -q .headRefName\n` +
|
|
160
|
+
` Capture the branch name printed on stdout.\n` +
|
|
161
|
+
` b. Run exactly:\n` +
|
|
162
|
+
` "C:\\Python313\\python.exe" "<REPO>/packages/claude-dev-env/hooks/blocking/verification_verdict_store.py" --manifest-hash-for-branch "<that branch>"\n` +
|
|
163
|
+
` (substitute the REPO path you resolved for the script path, and the branch name for <that branch>). That prints a single 64-char hex hash on stdout — capture it.\n` +
|
|
164
|
+
`Then END your message with a fenced verdict block exactly in this shape, on its own, carrying that hash:\n` +
|
|
165
|
+
" ```verdict\n" +
|
|
166
|
+
` {"all_pass": true, "findings": [], "manifest_sha256": "<that hash>"}\n` +
|
|
167
|
+
" ```\n" +
|
|
168
|
+
` When verification fails, set all_pass to false and list the unresolved concerns in findings; still include the manifest_sha256. The verdict fence must be the last thing in your message.`
|
|
169
|
+
)
|
|
170
|
+
}
|
|
153
171
|
|
|
154
172
|
const CONVERGENCE_SUMMARY_SCHEMA = {
|
|
155
173
|
type: 'object',
|
|
@@ -713,9 +731,9 @@ function verifyFixesInWorkingTree(head, findings, sourceLabel) {
|
|
|
713
731
|
`You are the VERIFY step for ${findings.length} finding(s) (${sourceLabel}) on ${prCoordinates}, HEAD ${head}. The edit step left fixes in the working tree, uncommitted. Do NO edits of any kind — verification only; any edit invalidates the verdict you are about to emit.\n\n` +
|
|
714
732
|
`Findings the working-tree fixes must address:\n${findingsBlock}\n\n` +
|
|
715
733
|
`Steps:\n` +
|
|
716
|
-
`1. Resolve the worktree repo root: REPO=$(git rev-parse --show-toplevel).\n` +
|
|
734
|
+
`1. Resolve the worktree repo root for running tests: REPO=$(git rev-parse --show-toplevel).\n` +
|
|
717
735
|
`2. Verify the uncommitted working-tree changes resolve every finding above: run the relevant tests and the named gates against the working tree. Read the diff (git diff) and confirm each finding is fixed test-first per CODE_RULES.\n` +
|
|
718
|
-
`3. ${
|
|
736
|
+
`3. ${buildVerdictFenceSteps(input.owner, input.repo, input.prNumber)}`,
|
|
719
737
|
{ label: `fix-verify:${sourceLabel}`, phase: 'Converge', agentType: 'code-verifier' },
|
|
720
738
|
)
|
|
721
739
|
}
|
|
@@ -935,9 +953,9 @@ function verifyRepairChanges(head, failures) {
|
|
|
935
953
|
`You are the VERIFY step for the convergence repair on ${prCoordinates}, HEAD ${head}. The edit step left its repair in the working tree (a bot-thread fix uncommitted, and/or a rebase onto origin/main), unpushed. Do NO edits of any kind — verification only; any edit invalidates the verdict you are about to emit.\n\n` +
|
|
936
954
|
`Concerns the working-tree repair must resolve (the gates the convergence check flagged):\n${failureBlock}\n\n` +
|
|
937
955
|
`Steps:\n` +
|
|
938
|
-
`1. Resolve the worktree repo root: REPO=$(git rev-parse --show-toplevel).\n` +
|
|
956
|
+
`1. Resolve the worktree repo root for running tests: REPO=$(git rev-parse --show-toplevel).\n` +
|
|
939
957
|
`2. Verify the working tree against origin/main: any bot-thread code fix is correct test-first per CODE_RULES, and a rebase (if any) left a clean, conflict-free tree. Read the diff (git diff origin/main) and run the relevant tests and named gates.\n` +
|
|
940
|
-
`3. ${
|
|
958
|
+
`3. ${buildVerdictFenceSteps(input.owner, input.repo, input.prNumber)}`,
|
|
941
959
|
{ label: 'repair-verify', phase: 'Finalize', agentType: 'code-verifier' },
|
|
942
960
|
)
|
|
943
961
|
}
|
|
@@ -1044,22 +1062,32 @@ function standardsFollowUpEdit(head, findings, sourceLabel) {
|
|
|
1044
1062
|
/**
|
|
1045
1063
|
* Standards-hardening verify step: a code-verifier confirms the uncommitted
|
|
1046
1064
|
* hooks/rules change staged in the hardening repo blocks the deferred violation
|
|
1047
|
-
* classes, computes the binding surface hash for that repo
|
|
1048
|
-
* verdict fence as plain assistant text (NO schema) — unlocking the
|
|
1065
|
+
* classes, computes the binding surface hash for that repo by branch (cwd-immune),
|
|
1066
|
+
* and ends with a verdict fence as plain assistant text (NO schema) — unlocking the
|
|
1049
1067
|
* verified-commit gate for the cross-repo hardening commit. The verifier makes
|
|
1050
1068
|
* no edits.
|
|
1051
1069
|
* @param {string} hardeningRepoPath absolute path of the hardening repo checkout the edit staged
|
|
1070
|
+
* @param {string} hardeningBranch the branch in the hardening repo that the edit staged the change on
|
|
1052
1071
|
* @param {string} sourceLabel short description of where the findings came from
|
|
1053
1072
|
* @returns {Promise<string>} the verifier transcript carrying the verdict fence
|
|
1054
1073
|
*/
|
|
1055
|
-
function verifyHardeningChanges(hardeningRepoPath, sourceLabel) {
|
|
1074
|
+
function verifyHardeningChanges(hardeningRepoPath, hardeningBranch, sourceLabel) {
|
|
1056
1075
|
return convergeAgent(
|
|
1057
1076
|
`You are the VERIFY step for an environment-hardening change (${sourceLabel}) staged in the working tree of ${hardeningRepoPath}. The edit step left the hooks/rules edits uncommitted there. Do NO edits of any kind — verification only; any edit invalidates the verdict you are about to emit.\n\n` +
|
|
1058
1077
|
`Concern the working-tree change must resolve: the edited hooks/rules block the code-standard violation classes from the deferred round at Write/Edit time, and a hook change carries a passing test per CODE_RULES.\n\n` +
|
|
1059
1078
|
`Steps:\n` +
|
|
1060
1079
|
`1. cd into ${hardeningRepoPath}, then resolve its repo root: REPO=$(git rev-parse --show-toplevel).\n` +
|
|
1061
1080
|
`2. Verify the uncommitted working-tree change in REPO: read the diff (git diff) and run the hook/rule tests in that repo, confirming each violation class is now blocked.\n` +
|
|
1062
|
-
`3.
|
|
1081
|
+
`3. Compute the binding hash for the live surface:\n` +
|
|
1082
|
+
` The hardening branch is: ${hardeningBranch}\n` +
|
|
1083
|
+
` Run exactly:\n` +
|
|
1084
|
+
` "C:\\Python313\\python.exe" "<REPO>/packages/claude-dev-env/hooks/blocking/verification_verdict_store.py" --manifest-hash-for-branch "${hardeningBranch}"\n` +
|
|
1085
|
+
` (substitute the REPO path you resolved for the script path). That prints a single 64-char hex hash on stdout — capture it.\n` +
|
|
1086
|
+
` Then END your message with a fenced verdict block exactly in this shape, on its own, carrying that hash:\n` +
|
|
1087
|
+
" ```verdict\n" +
|
|
1088
|
+
` {"all_pass": true, "findings": [], "manifest_sha256": "<that hash>"}\n` +
|
|
1089
|
+
" ```\n" +
|
|
1090
|
+
` When verification fails, set all_pass to false and list the unresolved concerns in findings; still include the manifest_sha256. The verdict fence must be the last thing in your message.`,
|
|
1063
1091
|
{ label: `standards-verify:${sourceLabel}`, phase: 'Converge', agentType: 'code-verifier' },
|
|
1064
1092
|
)
|
|
1065
1093
|
}
|
|
@@ -1123,7 +1151,7 @@ async function spawnStandardsFollowUp(head, findings, sourceLabel) {
|
|
|
1123
1151
|
if (editResult?.hardeningEdited !== true || !editResult?.hardeningRepoPath) {
|
|
1124
1152
|
return { hardeningPrOpened: false }
|
|
1125
1153
|
}
|
|
1126
|
-
const verifyTranscript = await verifyHardeningChanges(editResult.hardeningRepoPath, sourceLabel)
|
|
1154
|
+
const verifyTranscript = await verifyHardeningChanges(editResult.hardeningRepoPath, editResult.hardeningBranch, sourceLabel)
|
|
1127
1155
|
if (!verdictPassed(verifyTranscript)) {
|
|
1128
1156
|
return { hardeningPrOpened: false }
|
|
1129
1157
|
}
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verified-build
|
|
3
|
+
description: >-
|
|
4
|
+
Runs a code task through the two-phase verified workflow: scoped coder
|
|
5
|
+
agents write the changes (consulting the tool-less code-advisor when
|
|
6
|
+
stuck on a decision), then a fresh-context code-verifier agent re-derives
|
|
7
|
+
and runs every check itself. The verifier's fenced verdict is minted by
|
|
8
|
+
the verifier_verdict_minter hook and unlocks the verified_commit_gate for
|
|
9
|
+
git commit/push. Use for feature implementations, refactors, and bug
|
|
10
|
+
fixes that land behind verification. Triggers: 'verified build', 'run
|
|
11
|
+
this verified', 'two-phase build', 'build and verify', 'verified
|
|
12
|
+
implementation'.
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Verified Build
|
|
16
|
+
|
|
17
|
+
Two phases, hook-enforced: coders write, a fresh-context verifier grades, and `git commit`/`git push` open only on a clean verdict bound to the live change surface.
|
|
18
|
+
|
|
19
|
+
## Workflow
|
|
20
|
+
|
|
21
|
+
Copy this checklist and check items off as you go:
|
|
22
|
+
|
|
23
|
+
- [ ] **Record baselines.** Before any coder runs: the test command and its exact failure set, plus any other gates the repo names. Scope the test command to the modules the task touches (their test files plus tests importing the changed modules); record a full-suite baseline only when the assignments span multiple modules or multiple coders. The verifier compares against these.
|
|
24
|
+
- [ ] **Scope assignments.** Split the task into file-disjoint assignments; write each as a task text with named checks.
|
|
25
|
+
- [ ] **Spawn coders.** One agent per assignment (`clean-coder` or Sonnet). Tell each: on a decision it can't reasonably solve, consult the tool-less `code-advisor` agent — it returns a plan, a correction, or a stop signal — then resume.
|
|
26
|
+
- [ ] **Settle the tree.** After coders finish: run formatters and any file-rewriting hooks, stage nothing, change nothing more.
|
|
27
|
+
- [ ] **Spawn the verifier last.** Agent tool, `subagent_type: "code-verifier"`, with the task texts, the diff scope, and the recorded baselines. When it stops, the SubagentStop hook mints its verdict.
|
|
28
|
+
- [ ] **Repair only reported findings.** On a failing verdict, spawn repair agents scoped to the findings, then re-spawn the verifier. Repeat until clean.
|
|
29
|
+
- [ ] **Land right away.** One commit, push, draft PR — before anything else touches a file.
|
|
30
|
+
|
|
31
|
+
## Gotchas
|
|
32
|
+
|
|
33
|
+
- Any file change after the verifier stops moves the surface hash and re-locks the gate — formatter rewrites included. Settle the tree first; land right after the clean verdict.
|
|
34
|
+
- The verdict covers the whole branch surface (merge base to work tree, untracked files included). There is no "verify just my part."
|
|
35
|
+
- The verifier must end with a ```` ```verdict ```` fence. No fence means nothing is minted and the gate stays closed.
|
|
36
|
+
- The minter keys on the agent type string `code-verifier` — spawning the same prompt under another agent type mints nothing.
|
|
37
|
+
- A surface whose every change is a docs/image file (by extension), a Python file whose docstring-stripped AST is unchanged (docstring-, comment-, or formatting-only Python edits), or a pytest test file (`test_*.py`, `*_test.py`, `conftest.py`) is exempt automatically; skip the verifier for those. Comment-only edits in non-Python files are not exempt.
|
|
38
|
+
- Record the test baseline before coders start. Without the exact pre-existing failure set, new breakage hides inside old noise.
|