npm - @windyroad/risk-scorer - Versions diffs - 0.9.0 → 0.10.0-preview.325 - Mend

@windyroad/risk-scorer 0.9.0 → 0.10.0-preview.325

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/.claude-plugin/plugin.json +1 -1
package/agents/external-comms.md +7 -15
package/agents/pipeline.md +26 -5
package/hooks/external-comms-gate.sh +12 -3
package/hooks/lib/external-comms-key.sh +44 -0
package/hooks/risk-score-mark.sh +35 -8
package/hooks/test/risk-score-mark-external-comms-prompt-parse.bats +94 -0
package/package.json +1 -1
package/scripts/evaluate-graduation.sh +251 -37
package/scripts/test/evaluate-graduation.bats +205 -4
package/skills/assess-external-comms/SKILL.md +18 -6

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
   "name": "wr-risk-scorer",
-  "version": "0.9.0",
+  "version": "0.10.0",
   "description": "Pipeline risk scoring, commit/push/release gates for Claude Code"
 }

package/agents/external-comms.md CHANGED Viewed

@@ -10,14 +10,14 @@ model: inherit
 You are the External-Comms Risk Reviewer. Your single job: read the draft of an outbound prose tool call (a `gh issue create --body ...`, a PR description, a security-advisory body, a `.changeset/*.md` file, or the README diff that `npm publish` will publish) and return a structured PASS/FAIL verdict against RISK-POLICY.md's Confidential Information classes.
-You are read-only. You do NOT write files, do NOT commit, do NOT modify the draft. Your verdict is consumed by the `risk-score-mark.sh` PostToolUse hook (P064 / ADR-028 amended), which writes the marker that allows the gated tool call to proceed.
+You are read-only. You do NOT write files, do NOT commit, do NOT modify the draft. Your verdict is consumed by the `risk-score-mark.sh` PostToolUse hook (P064 / ADR-028 amended 2026-05-14 + 2026-05-16), which derives the marker key from the prompt structure you receive and writes the marker that allows the gated tool call to proceed.
 ## What you receive
-The invoking skill (`/wr-risk-scorer:assess-external-comms`) or the agent that hit the gate provides:
+The invoking skill (`/wr-risk-scorer:assess-external-comms`) or the agent that hit the gate provides a structured prompt (P166 / ADR-028 amended 2026-05-16):
-- The **draft body** verbatim — the exact prose that would land on the external surface.
-- The **target surface** — one of: `gh-issue-create`, `gh-issue-comment`, `gh-issue-edit`, `gh-pr-create`, `gh-pr-comment`, `gh-pr-edit`, `gh-api-security-advisories`, `gh-api-comments`, `npm-publish`, `changeset-author`.
+- A leading `SURFACE: <name>` line — one of: `gh-issue-create`, `gh-issue-comment`, `gh-issue-edit`, `gh-pr-create`, `gh-pr-comment`, `gh-pr-edit`, `gh-api-security-advisories`, `gh-api-comments`, `npm-publish`, `changeset-author`.
+- The **draft body** verbatim, wrapped in `<draft>...</draft>` markers so the PostToolUse hook can extract it for marker-key derivation.
 - The **destination** when known (e.g. `anthropics/claude-code#52831`).
 Read `RISK-POLICY.md` (project root) to get the authoritative Confidential Information class list. As of P064 it covers:
@@ -42,28 +42,20 @@ The hybrid pre-filter (`packages/*/hooks/lib/leak-detect.sh`) has already caught
 ## Verdict format (MANDATORY)
-End your report with a structured block consumed by `risk-score-mark.sh`. Every field is required.
+End your report with a structured block consumed by `risk-score-mark.sh`:
 ```
 EXTERNAL_COMMS_RISK_VERDICT: PASS
-EXTERNAL_COMMS_RISK_KEY: <sha256 hex string>
 ```
 OR for a failed review:
 ```
 EXTERNAL_COMMS_RISK_VERDICT: FAIL
-EXTERNAL_COMMS_RISK_KEY: <sha256 hex string>
 EXTERNAL_COMMS_RISK_REASON: <one-line description of the leak class + matched fragment>
 ```
-Compute the key as:
-```
-printf '%s\n%s' "<draft body verbatim>" "<surface name>" | shasum -a 256 | cut -d' ' -f1
-```
-The key MUST match the gate's computation exactly — a key mismatch means the marker is written for a different draft and the original gated call will continue to deny.
+You do NOT need to emit `EXTERNAL_COMMS_RISK_KEY`. The PostToolUse hook derives the marker key directly from the `SURFACE:` line and `<draft>...</draft>` block in the prompt you received (P166 / ADR-028 amended 2026-05-16). Single fire per gate cycle.
 ## Grounding (ADR-026)
@@ -82,7 +74,7 @@ Example:
 - You are a reviewer, not an editor — do NOT propose rewrites in the verdict block. (Free prose suggestions outside the verdict block are fine and helpful.)
 - Do NOT score by analogy when the policy names the class.
 - Do NOT write to `/tmp/` or any marker location yourself — the PostToolUse hook owns that.
-- Do NOT skip the `EXTERNAL_COMMS_RISK_KEY` line; without it, the marker hook has no key to write the marker against and the gate will deny again on retry.
+- You do NOT need to emit `EXTERNAL_COMMS_RISK_KEY` — the hook derives the key from the prompt's `SURFACE:` + `<draft>` structure (P166 / ADR-028 amended 2026-05-16). If your prompt lacks that structure (legacy caller), the hook falls back to an emitted KEY line for backward compatibility, but the canonical path is hook-side derivation.
 - When the draft is empty (e.g. `npm publish` with no extractable body fragment), review the staged content the publish would push (README diff, package.json description) instead. If neither is available, FAIL with reason "draft body unresolvable; cannot risk-review without text" so the user can pre-review manually.
 ## Below-Appetite Output Rule (ADR-013 Rule 5)

package/agents/pipeline.md CHANGED Viewed

@@ -269,14 +269,17 @@ This is the symmetric counterpart to ADR-042 Rule 2's move-to-holding contract.
 ### Mechanism — invoke the deterministic graduation evaluator
-The Rule 1a join (changeset → problem ID → ticket Priority) and the Rule 2 VP carve-out detection are deterministic lookups. Invoke the `wr-risk-scorer-evaluate-graduation` shim (ADR-049 `$PATH`-resolved) to read structured candidate lines for each held changeset:
+The Rule 1a join (changeset → problem ID → ticket Priority), the Rule 2 VP carve-out detection, and the Rule 3b cohort grouping are deterministic lookups. Invoke the `wr-risk-scorer-evaluate-graduation` shim (ADR-049 `$PATH`-resolved) to read structured candidate lines for each held changeset:
 ```
 GRADUATION_CANDIDATE: changeset=<filename> | ticket=P<NNN> | priority=<N> | class=3a | status=<resolved|vp-blocked|halt-no-resolution>
+GRADUATION_CANDIDATE: changeset=<filename> | ticket=P<NNN> | priority=<cohort-max-N> | class=3b | cohort=<id> | status=<resolved|vp-blocked|halt-no-resolution>
 GRADUATION_SUMMARY: total=<N> resolved=<N> vp_blocked=<N> halts=<N>
 ```
-The script does NOT compute release-risk and does NOT apply Rule 4 evidence-floor judgement — those are LLM-judgement surfaces you own per ADR-015's pure-scorer contract. The script's job is to emit candidates with their joined Priority; your job is to decide whether each candidate's release-risk + evidence-floor profile justifies emitting a `reinstate-from-holding` remediation line.
+Class 3b lines insert a `cohort=<id>` column between `class` and `status`. The cohort id is derived from the normalised reinstate-trigger prose (first 8 tokens, kebab-sanitised) of the `docs/changesets-holding/README.md` "Currently held" entries that share an identical normalised trigger. Cohort `priority` is `max(Priority)` across all member tickets per ADR-061 Rule 3b; cohort `status` propagates atomically — any halt → cohort halts, any VP-blocked → cohort VP-blocked, otherwise cohort resolved. Single-member "cohorts" are emitted as class=3a (no Phase 2a regression).
+The script does NOT compute release-risk and does NOT apply Rule 4 evidence-floor judgement — those are LLM-judgement surfaces you own per ADR-015's pure-scorer contract. The script's job is to emit candidates with their joined Priority + cohort classification; your job is to decide whether each candidate's release-risk + evidence-floor profile justifies emitting a `reinstate-from-holding` remediation line.
 ### Per-candidate evaluation rules
@@ -308,13 +311,31 @@ For each `status=halt-no-resolution` candidate (Rule 1a terminal — no ticket r
 - **DO NOT auto-graduate**. Surface the unresolved candidate in your report body under an "Unresolvable graduation candidates" section so the caller (orchestrator) sees the join failure and can present it as a user-decision surface per ADR-013 + ADR-044 framework-resolution boundary. Per ADR-061 Rule 1a, join ambiguity is a user-decision surface, not an agent-decision surface.
-### Scope — Phase 2a only
+### Class 3b atomic-cohort evaluation (Phase 2b — ADR-061 Rule 3b)
+When candidate lines emit `class=3b` with a `cohort=<id>` column, ADR-061 Rule 3b applies: **the entire cohort ships atomically or none of it does**. Per-member graduation is not authorised. Evaluate the cohort as a single unit:
+1. **Group candidates by cohort id** — collect all `class=3b` candidates sharing the same `cohort=` column into a single evaluation set.
+2. **Compute cohort release-risk** — re-score the current pipeline as if the **full cohort** were `git mv`'d back to `.changeset/` together (not one at a time). The marginal release-risk delta is computed against the cohort's combined diff surface, not any single member's diff.
+3. **Compare against cohort priority** — the `priority=<cohort-max-N>` column on every cohort-member line already carries `max(Priority)` across all member tickets (deterministic join, Rule 3b math). Apply Rule 1: cohort graduates when `cohort-release-risk ≤ cohort-priority`.
+4. **Verify Rule 4 evidence floor per cohort** — every cohort member must independently satisfy its class-specific evidence shape (PreToolUse:Bash gate / UserPromptSubmit detector / commit-hook-with-auto-fix / SessionStart additionalContext). One floor failure in any member blocks the whole cohort. Per ADR-026 cite + persist + uncertainty: cite the artefact for each member in the audit trail.
+5. **Cohort-level VP carve-out** — if the deterministic evaluator already returned `status=vp-blocked` for the cohort (any member's ticket in Verification Pending), DO NOT emit a reinstate. The carve-out lifts when all member tickets transition out of `.verifying.md`.
+6. **Cohort-level halt-and-prompt** — if the deterministic evaluator returned `status=halt-no-resolution` for the cohort (any member fails Rule 1a join), DO NOT auto-graduate. Surface the cohort in the "Unresolvable graduation candidates" section. Per architect C1 (2026-05-17 P162 Phase 2b review), partial-cohort resolution is NOT authorised — the cohort is atomic.
+7. **Emit one `reinstate-from-holding` line per cohort member** when all six checks pass, all referencing the same cohort id so the consuming orchestrator can apply them as an atomic batch:
+   ```
+   RISK_REMEDIATIONS:
+   - R<N> | reinstate-from-holding <member-1>: cohort <id> release-risk <release-score>/25 ≤ cohort-priority <priority-value>; class 3b; evidence: <member-1 artefact citation> | S | -<release-score-share> | docs/changesets-holding/<member-1>, .changeset/<member-1>
+   - R<N+1> | reinstate-from-holding <member-2>: cohort <id> release-risk <release-score>/25 ≤ cohort-priority <priority-value>; class 3b; evidence: <member-2 artefact citation> | S | -<release-score-share> | docs/changesets-holding/<member-2>, .changeset/<member-2>
+   ```
+   The agent consuming these lines applies them as a single batch — either all members reinstate in one operation or none do. Partial application breaks ADR-061 Rule 3b atomicity.
-This evaluation surface covers **orthogonal-gate class (3a) only** per ADR-061 Rule 3. Atomic-cohort class (3b — RFC-shaped held changesets that graduate as a single atomic unit per ADR-060 finding 12) requires RFC ticket cohort enumeration and is **deferred to Phase 2b**. When the holding-area contains entries that belong to an RFC cohort, the Phase 2a evaluator emits each entry as an independent 3a candidate; treat such candidates conservatively (the symmetric-balance math is identical but the evaluation unit is wrong) and prefer a `RISK_REGISTER_HINT:` over auto-emitting `reinstate-from-holding` until Phase 2b lands the cohort enumeration.
+The cohort id-from-prose detection is the Phase 2b shape per the architect-approved 2026-05-17 design. If cohort grouping false-positives appear (e.g. two unrelated changesets coincidentally sharing trigger prose), ADR-061 Reassessment Triggers ("Manual graduations diverge from criterion verdicts") covers the upgrade to a structured cohort-declaration field.
 ### Audit trail (Rule 6)
-Every emitted `reinstate-from-holding` line MUST cite the resolved problem-ticket ID and Priority value in the description column so the audit trail extends ADR-042 Rule 6. The consuming orchestrator additionally appends to `docs/changesets-holding/README.md` "Recently reinstated" per Rule 6 § 2.
+Every emitted `reinstate-from-holding` line MUST cite the resolved problem-ticket ID and Priority value in the description column so the audit trail extends ADR-042 Rule 6. For Class 3b cohort reinstates, every member line MUST additionally cite the cohort id and the cohort-level priority + release-risk values so the per-member audit row reconstructs the atomic cohort decision. The consuming orchestrator additionally appends to `docs/changesets-holding/README.md` "Recently reinstated" per Rule 6 § 2 with the class (3a or 3b) and, for cohort members, the cohort id.
 ## Confidential Information Disclosure

package/hooks/external-comms-gate.sh CHANGED Viewed

@@ -31,7 +31,12 @@
 # Marker location: ${TMPDIR:-/tmp}/claude-risk-${SESSION_ID}/external-comms-<EVALUATOR_ID>-reviewed-<sha256>
 # Marker writer:   PostToolUse:Agent hook in each consumer plugin
 #                  (risk-score-mark.sh or external-comms-mark-reviewed.sh) on
-#                  subagent type wr-<plugin>:external-comms.
+#                  subagent type wr-<plugin>:external-comms. The mark hook
+#                  derives the marker key from the agent's tool_input.prompt
+#                  by parsing the same `SURFACE:` + `<draft>` structure the
+#                  orchestrator was instructed to include (P166 / ADR-028
+#                  amended 2026-05-16). Single fire per gate cycle suffices;
+#                  the agent no longer needs to compute the key itself.
 #
 # Per-evaluator marker scheme (ADR-028 amended 2026-05-14): when both
 # voice-tone and risk-scorer are installed, both gates fire on the same
@@ -234,8 +239,12 @@ if [ -f "$MARKER" ]; then
 fi
 # Marker absent — deny + delegate.
+# P166: instruct the orchestrator to structure the agent prompt with a
+# leading `SURFACE: <name>` line and a `<draft>...</draft>` block so the
+# PostToolUse mark hook can derive the canonical marker key locally
+# (sha256(DRAFT + '\n' + SURFACE)). Single fire per gate cycle.
 VERDICT_PREFIX="${EXTERNAL_COMMS_VERDICT_PREFIX:-EXTERNAL_COMMS_${EXTERNAL_COMMS_EVALUATOR_ID^^}}"
-REASON=$(printf 'BLOCKED (external-comms gate / %s evaluator): %s draft has not been reviewed by %s. Delegate to %s (subagent_type: '"'"'%s'"'"') with the draft body for review. The PostToolUse hook will mark this draft reviewed when the subagent emits %s_VERDICT: PASS. Use %s for an interactive walkthrough. Override only when intentional: BYPASS_RISK_GATE=1.' \
-    "$EXTERNAL_COMMS_EVALUATOR_ID" "$SURFACE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$VERDICT_PREFIX" "$EXTERNAL_COMMS_ASSESS_SKILL")
+REASON=$(printf 'BLOCKED (external-comms gate / %s evaluator): %s draft has not been reviewed by %s. Delegate to %s (subagent_type: '"'"'%s'"'"') with a prompt that starts with the line `SURFACE: %s` and wraps the draft body verbatim inside `<draft>...</draft>` markers. The PostToolUse hook derives the marker key from that structure and marks the draft reviewed when the subagent emits %s_VERDICT: PASS — single fire suffices. Use %s for an interactive walkthrough. Override only when intentional: BYPASS_RISK_GATE=1.' \
+    "$EXTERNAL_COMMS_EVALUATOR_ID" "$SURFACE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$SURFACE" "$VERDICT_PREFIX" "$EXTERNAL_COMMS_ASSESS_SKILL")
 deny_with_reason "$REASON"
 exit 0

package/hooks/lib/external-comms-key.sh ADDED Viewed

@@ -0,0 +1,44 @@
+#!/bin/bash
+# Shared helper: derive the external-comms marker key from an agent's
+# tool_input.prompt by extracting the structured `SURFACE: <name>` line
+# and `<draft>...</draft>` block, then computing
+# sha256(DRAFT + '\n' + SURFACE) — the same key shape the gate computes
+# at PreToolUse time (external-comms-gate.sh line 229).
+#
+# P166 + ADR-028 amended 2026-05-16: the PostToolUse:Agent mark hook
+# derives the marker key from observed runtime state instead of trusting
+# an agent-emitted EXTERNAL_COMMS_<EVAL>_KEY line. Removes the
+# double-invocation cost class — single fire per gate cycle suffices.
+#
+# Canonical source: packages/shared/hooks/lib/external-comms-key.sh
+# Synced byte-identically into each consumer plugin's hooks/lib/ via
+# scripts/sync-external-comms-gate.sh (ADR-017 duplicate-script pattern).
+#
+# Returns the 64-char hex sha256 on stdout when both markers are present
+# in the prompt. Returns empty string when either marker is absent — the
+# caller falls back to the agent-emitted KEY for backward compatibility
+# with cached old SKILL.md / agent prompts.
+derive_external_comms_key_from_prompt() {
+    local prompt="$1"
+    [ -n "$prompt" ] || { echo ""; return 0; }
+    printf '%s' "$prompt" | python3 -c "
+import sys, re, hashlib
+text = sys.stdin.read()
+# DRAFT extraction: non-greedy match between <draft>...</draft>.
+# Tolerates an optional newline immediately after <draft> and before </draft>
+# so the body content does not capture wrapping newlines.
+draft_match = re.search(r'<draft>\n?(.*?)\n?</draft>', text, re.DOTALL)
+# SURFACE extraction: must be anchored to line start (MULTILINE) to avoid
+# matching prose like 'context says SURFACE: x'. Surface name is a single
+# token: letter + word/hyphen chars.
+surface_match = re.search(r'^SURFACE:\s*([A-Za-z][\w-]*)', text, re.MULTILINE)
+if not draft_match or not surface_match:
+    print('')
+    sys.exit(0)
+draft = draft_match.group(1)
+surface = surface_match.group(1)
+payload = (draft + '\n' + surface).encode('utf-8')
+print(hashlib.sha256(payload).hexdigest())
+" 2>/dev/null
+}

package/hooks/risk-score-mark.sh CHANGED Viewed

@@ -11,6 +11,8 @@ set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 source "$SCRIPT_DIR/lib/gate-helpers.sh"
+# shellcheck source=lib/external-comms-key.sh
+source "$SCRIPT_DIR/lib/external-comms-key.sh"
 _enable_err_trap
 _parse_input
@@ -204,18 +206,43 @@ if echo "$SUBAGENT" | grep -qE 'risk-scorer.policy'; then
 fi
 # ---------------------------------------------------------------------------
-# External-comms reviewer (P064 / ADR-028 amended 2026-05-14): write
-# per-evaluator marker keyed on sha256(draft + '\n' + surface). Subagent
-# emits the key; this hook trusts and uses it. Marker file:
-# external-comms-risk-reviewed-<key>. The voice-tone evaluator (P038)
-# writes its own peer marker external-comms-voice-tone-reviewed-<key>
-# from packages/voice-tone/hooks/external-comms-mark-reviewed.sh.
+# External-comms reviewer (P064 / ADR-028 amended 2026-05-14, further
+# amended 2026-05-16 P166): write per-evaluator marker keyed on
+# sha256(draft + '\n' + surface). The hook derives the key from the
+# agent's tool_input.prompt (structured `SURFACE:` line + `<draft>`
+# block) instead of trusting an agent-emitted KEY — single fire per
+# gate cycle suffices. Backward-compat fallback to the agent's
+# EXTERNAL_COMMS_RISK_KEY line preserved during the deprecation window
+# (one release cycle).
+# Marker file: external-comms-risk-reviewed-<key>. The voice-tone
+# evaluator (P038) writes its own peer marker
+# external-comms-voice-tone-reviewed-<key> from
+# packages/voice-tone/hooks/external-comms-mark-reviewed.sh.
 # ---------------------------------------------------------------------------
 if echo "$SUBAGENT" | grep -qE 'risk-scorer.external-comms'; then
   VERDICT_LINE=$(echo "$AGENT_OUTPUT" | grep -E '^EXTERNAL_COMMS_RISK_VERDICT:' | tail -1) || true
-  KEY_LINE=$(echo "$AGENT_OUTPUT" | grep -E '^EXTERNAL_COMMS_RISK_KEY:' | tail -1) || true
   VERDICT=$(echo "$VERDICT_LINE" | sed 's/^EXTERNAL_COMMS_RISK_VERDICT:[[:space:]]*//' | tr -d '[:space:]')
-  KEY=$(echo "$KEY_LINE" | sed 's/^EXTERNAL_COMMS_RISK_KEY:[[:space:]]*//' | tr -d '[:space:]')
+  # Read the prompt the orchestrator sent to the agent so we can derive
+  # the canonical key locally. _HOOK_INPUT is set by gate-helpers.sh's
+  # _parse_input upstream of this branch.
+  PROMPT=$(echo "$_HOOK_INPUT" | python3 -c "
+import sys, json
+try:
+    print(json.load(sys.stdin).get('tool_input', {}).get('prompt', ''))
+except Exception:
+    print('')
+" 2>/dev/null || echo "")
+  # Primary: derive from the prompt (P166 single-fire path).
+  KEY=$(derive_external_comms_key_from_prompt "$PROMPT")
+  if [ -z "$KEY" ]; then
+    # Fallback: cached old SKILL.md still instructs the agent to emit
+    # EXTERNAL_COMMS_RISK_KEY. Honour it during the deprecation window.
+    KEY_LINE=$(echo "$AGENT_OUTPUT" | grep -E '^EXTERNAL_COMMS_RISK_KEY:' | tail -1) || true
+    KEY=$(echo "$KEY_LINE" | sed 's/^EXTERNAL_COMMS_RISK_KEY:[[:space:]]*//' | tr -d '[:space:]')
+  fi
   # Validate key: 64 hex chars (sha256 output). Reject anything else.
   if echo "$KEY" | grep -qE '^[0-9a-f]{64}$'; then
     case "$VERDICT" in

package/hooks/test/risk-score-mark-external-comms-prompt-parse.bats ADDED Viewed

@@ -0,0 +1,94 @@
+#!/usr/bin/env bats
+# Behavioural tests for risk-score-mark.sh external-comms branch under
+# P166 hook-side key derivation (ADR-028 amended 2026-05-16).
+#
+# Contract: the PostToolUse:Agent hook derives the marker key from
+# tool_input.prompt's `SURFACE: <name>` + `<draft>...</draft>` structure
+# instead of trusting an agent-emitted EXTERNAL_COMMS_RISK_KEY line.
+# On PASS, writes external-comms-risk-reviewed-<KEY> at the derived key.
+# Backward-compat: falls back to agent-emitted KEY when prompt has no
+# structure (one release-cycle window).
+setup() {
+  SCRIPT_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
+  HOOK="$SCRIPT_DIR/risk-score-mark.sh"
+  ORIG_DIR="$PWD"
+  TEST_DIR=$(mktemp -d)
+  cd "$TEST_DIR"
+  TMPDIR="$TEST_DIR/tmp"
+  export TMPDIR
+  mkdir -p "$TMPDIR"
+  SESSION_ID="test-rs-mark-extcomms-prompt-$$-${BATS_TEST_NUMBER}"
+  RDIR="$TMPDIR/claude-risk-${SESSION_ID}"
+}
+teardown() {
+  cd "$ORIG_DIR"
+  rm -rf "$TEST_DIR"
+}
+gate_key() {
+  local draft="$1" surface="$2"
+  printf '%s\n%s' "$draft" "$surface" | shasum -a 256 | cut -d' ' -f1
+}
+run_hook() {
+  local prompt="$1"
+  local agent_output="$2"
+  python3 -c "
+import json, sys
+print(json.dumps({
+  'tool_name': 'Agent',
+  'session_id': '${SESSION_ID}',
+  'tool_input': {'subagent_type': 'wr-risk-scorer:external-comms', 'prompt': sys.argv[1]},
+  'tool_response': {'content': [{'type': 'text', 'text': sys.argv[2]}]}
+}))" "$prompt" "$agent_output" | bash "$HOOK"
+}
+@test "external-comms PASS with structured prompt: marker lands at hook-derived key" {
+  DRAFT="we observed a leaked secret pattern in the changeset"
+  SURFACE="changeset-author"
+  PROMPT=$'SURFACE: '"$SURFACE"$'\n<draft>\n'"$DRAFT"$'\n</draft>\nReview against RISK-POLICY.md.'
+  AGENT_OUTPUT=$'no Confidential Information class matched\nEXTERNAL_COMMS_RISK_VERDICT: PASS'
+  run_hook "$PROMPT" "$AGENT_OUTPUT"
+  KEY=$(gate_key "$DRAFT" "$SURFACE")
+  [ -f "$RDIR/external-comms-risk-reviewed-${KEY}" ]
+}
+@test "external-comms FAIL with structured prompt: no marker" {
+  DRAFT="client Acme Corp is hitting this"
+  SURFACE="gh-issue-create"
+  PROMPT=$'SURFACE: '"$SURFACE"$'\n<draft>\n'"$DRAFT"$'\n</draft>'
+  AGENT_OUTPUT=$'EXTERNAL_COMMS_RISK_VERDICT: FAIL\nEXTERNAL_COMMS_RISK_REASON: Client names class — "Acme Corp"'
+  run_hook "$PROMPT" "$AGENT_OUTPUT"
+  KEY=$(gate_key "$DRAFT" "$SURFACE")
+  [ ! -f "$RDIR/external-comms-risk-reviewed-${KEY}" ]
+}
+@test "external-comms PASS with structured prompt AND agent-emitted KEY: hook-derived key wins" {
+  DRAFT="hook-derived wins"
+  SURFACE="gh-pr-comment"
+  PROMPT=$'SURFACE: '"$SURFACE"$'\n<draft>\n'"$DRAFT"$'\n</draft>'
+  BOGUS_KEY="0000000000000000000000000000000000000000000000000000000000000000"
+  AGENT_OUTPUT=$'EXTERNAL_COMMS_RISK_VERDICT: PASS\nEXTERNAL_COMMS_RISK_KEY: '"$BOGUS_KEY"
+  run_hook "$PROMPT" "$AGENT_OUTPUT"
+  DERIVED_KEY=$(gate_key "$DRAFT" "$SURFACE")
+  [ -f "$RDIR/external-comms-risk-reviewed-${DERIVED_KEY}" ]
+  [ ! -f "$RDIR/external-comms-risk-reviewed-${BOGUS_KEY}" ]
+}
+@test "external-comms backward-compat: PASS with no structured prompt but agent KEY" {
+  LEGACY_KEY="fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210"
+  PROMPT="legacy unstructured prompt"
+  AGENT_OUTPUT=$'EXTERNAL_COMMS_RISK_VERDICT: PASS\nEXTERNAL_COMMS_RISK_KEY: '"$LEGACY_KEY"
+  run_hook "$PROMPT" "$AGENT_OUTPUT"
+  [ -f "$RDIR/external-comms-risk-reviewed-${LEGACY_KEY}" ]
+}
+@test "external-comms no structured prompt and no agent KEY: no marker" {
+  PROMPT="legacy"
+  AGENT_OUTPUT=$'EXTERNAL_COMMS_RISK_VERDICT: PASS'
+  run_hook "$PROMPT" "$AGENT_OUTPUT"
+  ext_markers=$(find "$RDIR" -maxdepth 1 -name 'external-comms-risk-reviewed-*' 2>/dev/null | wc -l | tr -d ' ')
+  [ "$ext_markers" -eq 0 ]
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/risk-scorer",
-  "version": "0.9.0",
+  "version": "0.10.0-preview.325",
   "description": "Pipeline risk scoring, commit/push gates, and secret leak detection",
   "bin": {
     "windyroad-risk-scorer": "./bin/install.mjs"

package/scripts/evaluate-graduation.sh CHANGED Viewed

@@ -3,14 +3,29 @@
 #
 # Evaluates held-changeset graduation candidates per ADR-061
 # (Dogfood graduation criteria for held changesets — symmetric risk balance).
-# Phase 2a: orthogonal-gate class only (Class 3a per ADR-061 Rule 3).
-# Atomic-cohort class (3b) requires RFC ticket cohort enumeration and is
-# deferred to Phase 2b per the architect-approved Phase 2a/2b split.
 #
-# This script implements the deterministic Rule 1a join + Rule 2 VP carve-out
-# detection. It does NOT compute release-risk and does NOT apply Rule 4
-# evidence-floor judgement — those are LLM-judgement surfaces owned by the
-# wr-risk-scorer:pipeline agent (per ADR-015 pure-scorer contract).
+# Phase 2a — orthogonal-gate class (Class 3a per ADR-061 Rule 3): deterministic
+# Rule 1a join + Rule 2 VP carve-out detection per changeset, independently.
+#
+# Phase 2b — atomic-cohort class (Class 3b per ADR-061 Rule 3b): parses
+# docs/changesets-holding/README.md "Currently held" section, groups entries
+# by shared reinstate-trigger prose (parenthetical elaborations stripped
+# before grouping), and emits cohort-aware candidates. Cohort priority is
+# max(Priority) across all member tickets; any VP-blocked or halt-no-resolution
+# member propagates atomically to the entire cohort ("entire cohort ships or
+# none does" — symmetric to Rule 2's per-changeset carve-out at cohort grain).
+# Single-member "cohorts" fall back to class=3a (no Phase 2a regression).
+#
+# This script implements deterministic Rule 1a join + Rule 2 VP carve-out
+# detection + Rule 3b cohort grouping. It does NOT compute release-risk and
+# does NOT apply Rule 4 evidence-floor judgement — those are LLM-judgement
+# surfaces owned by the wr-risk-scorer:pipeline agent (per ADR-015 pure-scorer
+# contract).
+#
+# Cohort-id-from-prose is the Phase 2b shape per architect approval 2026-05-17.
+# Reassessment Triggers in ADR-061 ("Manual graduations diverge from criterion
+# verdicts") cover the upgrade to a structured cohort-declaration field if
+# prose-shape brittleness appears in dogfood.
 #
 # Usage:
 #   evaluate-graduation.sh [<project-root>]
@@ -27,12 +42,19 @@
 #       docs/problems/<NNN>-*.md (flat) AND docs/problems/*/<NNN>-*.md (per-state)
 #   - Extracts the Priority value from the ticket's `**Priority**: N (...)` line.
 #   - Detects Rule 2 VP carve-out (ticket file ends in .verifying.md).
+#   - Parses docs/changesets-holding/README.md "Currently held" section and
+#     groups entries by normalised reinstate-trigger prose (Phase 2b).
+#   - Multi-member groups emit class=3b + cohort=<id> with cohort-level
+#     priority/status. Single-member groups emit class=3a unchanged.
 #   - Emits one structured candidate line per held changeset to stdout.
 #
-# Stdout format (one candidate per held changeset, agent-parseable):
+# Stdout format — Class 3a (one candidate per held changeset, agent-parseable):
 #   GRADUATION_CANDIDATE: changeset=<filename> | ticket=P<NNN> | priority=<N> | class=3a | status=<resolved|vp-blocked|halt-no-resolution>
 #
-# Stdout summary line at end:
+# Stdout format — Class 3b (cohort member; cohort= column added between class and status):
+#   GRADUATION_CANDIDATE: changeset=<filename> | ticket=P<NNN> | priority=<cohort-max-N> | class=3b | cohort=<id> | status=<resolved|vp-blocked|halt-no-resolution>
+#
+# Stdout summary line at end (member-level counts; cohorts count individually):
 #   GRADUATION_SUMMARY: total=<N> resolved=<N> vp_blocked=<N> halts=<N>
 #
 # Exit codes:
@@ -42,13 +64,14 @@
 #   1 — no holding-area or empty holding-area (no-op caller signal)
 #   2 — invalid project root (missing docs/)
 #
-# @adr ADR-061 (graduation criteria — Phase 2a Rule 1a join + Rule 2 VP carve-out)
+# @adr ADR-061 (graduation criteria — Phase 2a Rule 1a join + Rule 2 VP carve-out;
+#               Phase 2b Rule 3b atomic-cohort grouping + cohort-level propagation)
 # @adr ADR-049 (resolved via bin/wr-risk-scorer-evaluate-graduation shim)
 # @adr ADR-052 (behavioural-fixture coverage at scripts/test/evaluate-graduation.bats)
-# @adr ADR-015 (pure-scorer contract — script does deterministic join only;
+# @adr ADR-015 (pure-scorer contract — script does deterministic join + grouping;
 #               agent owns release-risk re-computation + evidence-floor judgement)
 # @adr ADR-031 (dual-tolerant problem-ticket layout per RFC-002 migration window)
-# @problem P162 (Phase 2a)
+# @problem P162 (Phase 2a + Phase 2b)
 set -uo pipefail
@@ -83,8 +106,8 @@ if [ "${#HELD_FILES[@]}" -eq 0 ]; then
   exit 1
 fi
-# Delegate the per-candidate join + VP-check to python for re-readable
-# regex + dual-layout glob handling.
+# Delegate the per-candidate join + VP-check + cohort grouping to python for
+# re-readable regex + dual-layout glob handling.
 EVAL_RESULT=$(python3 - "$HOLDING_DIR" "$PROBLEMS_DIR" "${HELD_FILES[@]}" <<'PYEOF'
 import os
 import re
@@ -99,6 +122,18 @@ FILENAME_TICKET_RE = re.compile(r'-p(\d+)-', re.IGNORECASE)
 BODY_TICKET_RE = re.compile(r'\bP(\d+)\b')
 PRIORITY_LINE_RE = re.compile(r'^\*\*Priority\*\*:\s*(\d+)\b')
+# Phase 2b — README "Currently held" bullet parser.
+# Matches `- \`<filename>\` ... **Reinstate trigger**: <trigger-text>`.
+# Captures the filename (group 1) and the trigger text (group 2; rest of line).
+README_BULLET_RE = re.compile(
+    r'^-\s+`([^`]+\.md)`\s+.*?\*\*Reinstate trigger\*\*:\s*(.+?)\s*$'
+)
+# Strip parenthetical elaborations before grouping; nested parens are out of
+# scope for Phase 2b (no observed README entry uses them in the trigger).
+PAREN_RE = re.compile(r'\([^()]*\)')
+# Sanitise cohort-id from normalised trigger prose.
+NON_ID_CHAR_RE = re.compile(r'[^a-z0-9]+')
 def find_ticket_file(ticket_id_padded: str):
     """Dual-tolerant glob per ADR-031 / RFC-002 migration window.
@@ -167,55 +202,234 @@ def resolve_ticket_ids(changeset_path: str):
     return ids
-total = 0
-resolved = 0
-vp_blocked = 0
-halts = 0
+def normalise_trigger(trigger_text: str) -> str:
+    """Normalise reinstate-trigger prose for cohort-key comparison.
+    Strips parenthetical elaborations (Reassessment criterion citations,
+    inline notes), takes the prefix up to the first em-dash separator
+    (typical for "trigger description — review at ..." continuations),
+    strips trailing punctuation, lowercases, and collapses whitespace
+    LAST so paren-strip artefacts (stray spaces before punctuation) do
+    not break equality matching.
+    """
+    # Strip parentheticals; loop in case there are multiple non-nested groups.
+    prior = None
+    cleaned = trigger_text
+    while cleaned != prior:
+        prior = cleaned
+        cleaned = PAREN_RE.sub('', cleaned)
+    # Take prefix up to first em-dash separator (continuations begin here).
+    cleaned = cleaned.split('—', 1)[0]  # em-dash U+2014
+    # Lowercase, strip surrounding whitespace + trailing punctuation; collapse
+    # whitespace LAST so paren-strip leaves no orphaned single spaces before
+    # punctuation that would defeat equality comparison.
+    cleaned = cleaned.lower().strip().rstrip('.,;:').strip()
+    cleaned = ' '.join(cleaned.split())
+    # Strip any trailing punctuation that was previously space-separated.
+    cleaned = cleaned.rstrip('.,;:').strip()
+    return cleaned
+def cohort_id_from_trigger(normalised: str) -> str:
+    """Compute a filename-safe cohort id from normalised trigger prose.
+    Takes the first 8 tokens, replaces non-alphanumeric runs with single
+    dashes, trims surrounding dashes, caps at 60 chars.
+    """
+    tokens = normalised.split()[:8]
+    joined = ' '.join(tokens)
+    slug = NON_ID_CHAR_RE.sub('-', joined).strip('-')
+    return slug[:60] if slug else 'cohort'
+def parse_currently_held_cohorts(holding_dir: str):
+    """Parse docs/changesets-holding/README.md to build a filename→cohort-id map.
+    Reads only entries within the "## Currently held" section (case-insensitive),
+    extracts each bullet's filename + trigger text, normalises triggers, and
+    groups filenames sharing an identical normalised trigger. Cohorts with ≥ 2
+    members are returned as {filename: cohort_id}; single-member groups are
+    omitted so they fall back to class=3a per Phase 2a semantics.
+    Returns {} when README missing OR "Currently held" section absent OR no
+    multi-member groups present.
+    """
+    readme_path = os.path.join(holding_dir, 'README.md')
+    if not os.path.isfile(readme_path):
+        return {}
+    try:
+        with open(readme_path, 'r', encoding='utf-8') as f:
+            lines = f.readlines()
+    except (OSError, IOError):
+        return {}
+    # Walk lines; track whether we're inside the "Currently held" section.
+    in_section = False
+    bullets = []  # list of (filename, normalised_trigger)
+    for line in lines:
+        stripped = line.strip()
+        if stripped.startswith('## '):
+            heading = stripped[3:].strip().lower()
+            in_section = (heading == 'currently held')
+            continue
+        if not in_section:
+            continue
+        match = README_BULLET_RE.match(line.rstrip('\n'))
+        if not match:
+            continue
+        filename = match.group(1)
+        trigger = match.group(2)
+        normalised = normalise_trigger(trigger)
+        if not normalised:
+            continue
+        bullets.append((filename, normalised))
+    # Group bullets by normalised trigger.
+    groups = {}
+    for filename, normalised in bullets:
+        groups.setdefault(normalised, []).append(filename)
+    # Keep only multi-member groups; compute cohort id.
+    cohort_map = {}
+    for normalised, members in groups.items():
+        if len(members) < 2:
+            continue
+        cohort_id = cohort_id_from_trigger(normalised)
+        for filename in members:
+            cohort_map[filename] = cohort_id
+    return cohort_map
+# Per-changeset resolution structure:
+#   {basename: {ticket: 'P<NNN>'|'-', priority: <int>|None, status: <str>,
+#               ticket_ids: [<padded>], chosen_suffix: <str>|None}}
+per_changeset = {}
 for changeset_path in held_files:
-    total += 1
     basename = os.path.basename(changeset_path)
     ticket_ids = resolve_ticket_ids(changeset_path)
     if not ticket_ids:
-        # Rule 1a terminal — halt-and-prompt
-        print(f'GRADUATION_CANDIDATE: changeset={basename} | ticket=- | priority=- | class=3a | status=halt-no-resolution')
-        halts += 1
+        per_changeset[basename] = {
+            'ticket_label': '-',
+            'priority': None,
+            'status': 'halt-no-resolution',
+        }
         continue
-    # Resolve each referenced ticket; collect (ticket_id, priority, status_suffix) triples
     resolutions = []
-    unresolved_ids = []
     for tid in ticket_ids:
         path, suffix = find_ticket_file(tid)
         if path is None:
-            unresolved_ids.append(tid)
             continue
         priority = extract_priority(path)
         if priority is None:
-            unresolved_ids.append(tid)
             continue
         resolutions.append((tid, priority, suffix))
     if not resolutions:
-        # All referenced tickets failed to resolve — halt
-        print(f'GRADUATION_CANDIDATE: changeset={basename} | ticket={",".join(f"P{i}" for i in ticket_ids)} | priority=- | class=3a | status=halt-no-resolution')
-        halts += 1
+        per_changeset[basename] = {
+            'ticket_label': ','.join(f'P{i}' for i in ticket_ids),
+            'priority': None,
+            'status': 'halt-no-resolution',
+        }
         continue
-    # Rule 1a multi-ticket: max(Priority) across the referenced set
-    # Pick the resolution with the highest priority; report its ticket ID.
     resolutions.sort(key=lambda r: r[1], reverse=True)
     chosen_tid, chosen_priority, chosen_suffix = resolutions[0]
-    # Rule 2 VP carve-out
     if chosen_suffix == 'verifying':
-        print(f'GRADUATION_CANDIDATE: changeset={basename} | ticket=P{chosen_tid} | priority={chosen_priority} | class=3a | status=vp-blocked')
-        vp_blocked += 1
+        per_changeset[basename] = {
+            'ticket_label': f'P{chosen_tid}',
+            'priority': chosen_priority,
+            'status': 'vp-blocked',
+        }
         continue
-    print(f'GRADUATION_CANDIDATE: changeset={basename} | ticket=P{chosen_tid} | priority={chosen_priority} | class=3a | status=resolved')
-    resolved += 1
+    per_changeset[basename] = {
+        'ticket_label': f'P{chosen_tid}',
+        'priority': chosen_priority,
+        'status': 'resolved',
+    }
+# Phase 2b — cohort detection.
+cohort_map = parse_currently_held_cohorts(holding_dir)
+# Build inverse: cohort_id → [member basenames].
+cohort_members = {}
+for filename, cohort_id in cohort_map.items():
+    cohort_members.setdefault(cohort_id, []).append(filename)
+# Compute cohort-level rollups (priority + status).
+# Atomic propagation: any halt → cohort halts; else any vp-blocked → cohort
+# vp-blocked; else cohort resolved. Cohort priority = max(member priority)
+# across resolved/vp-blocked members; '-' when all members halted.
+cohort_rollup = {}
+for cohort_id, members in cohort_members.items():
+    statuses = []
+    priorities = []
+    for filename in members:
+        # Only consider members that are actually in the holding-area glob;
+        # README may list entries that no longer exist on disk (stale README).
+        info = per_changeset.get(filename)
+        if info is None:
+            continue
+        statuses.append(info['status'])
+        if info['priority'] is not None:
+            priorities.append(info['priority'])
+    if not statuses:
+        # No cohort members are real held files; skip cohort treatment.
+        continue
+    if 'halt-no-resolution' in statuses:
+        cohort_status = 'halt-no-resolution'
+    elif 'vp-blocked' in statuses:
+        cohort_status = 'vp-blocked'
+    else:
+        cohort_status = 'resolved'
+    cohort_priority = max(priorities) if priorities else None
+    cohort_rollup[cohort_id] = {
+        'status': cohort_status,
+        'priority': cohort_priority,
+    }
+# Emit candidate lines in held_files order.
+total = 0
+resolved = 0
+vp_blocked = 0
+halts = 0
+for changeset_path in held_files:
+    total += 1
+    basename = os.path.basename(changeset_path)
+    info = per_changeset[basename]
+    cohort_id = cohort_map.get(basename)
+    is_cohort = cohort_id is not None and cohort_id in cohort_rollup
+    if is_cohort:
+        rollup = cohort_rollup[cohort_id]
+        # Use cohort-level priority + status; ticket_label remains member-local
+        # so audit trail still cites the specific resolved ticket.
+        priority_str = '-' if rollup['priority'] is None else str(rollup['priority'])
+        ticket_label = info['ticket_label']
+        status = rollup['status']
+        print(
+            f'GRADUATION_CANDIDATE: changeset={basename} | ticket={ticket_label} | '
+            f'priority={priority_str} | class=3b | cohort={cohort_id} | status={status}'
+        )
+    else:
+        priority_str = '-' if info['priority'] is None else str(info['priority'])
+        print(
+            f'GRADUATION_CANDIDATE: changeset={basename} | ticket={info["ticket_label"]} | '
+            f'priority={priority_str} | class=3a | status={info["status"]}'
+        )
+    # Tally member-level counts (cohorts count per-member for backward compat).
+    effective_status = cohort_rollup[cohort_id]['status'] if is_cohort else info['status']
+    if effective_status == 'resolved':
+        resolved += 1
+    elif effective_status == 'vp-blocked':
+        vp_blocked += 1
+    elif effective_status == 'halt-no-resolution':
+        halts += 1
 print(f'GRADUATION_SUMMARY: total={total} resolved={resolved} vp_blocked={vp_blocked} halts={halts}')
 PYEOF

package/scripts/test/evaluate-graduation.bats CHANGED Viewed

@@ -2,10 +2,16 @@
 # Behavioural-fixture coverage for packages/risk-scorer/scripts/evaluate-graduation.sh
 # per ADR-052 (behavioural tests default) and ADR-061 (dogfood graduation criteria).
 #
-# Phase 2a coverage — orthogonal-gate class (Class 3a) only. Atomic-cohort
-# class (Class 3b — Rule 3b RFC cohort enumeration) is deferred to Phase 2b.
-# Maps to ADR-061 Confirmation criterion 2 items a-f (item g atomic-cohort
-# lands in Phase 2b alongside the RFC enumeration logic).
+# Phase 2a coverage — orthogonal-gate class (Class 3a). Maps to
+# ADR-061 Confirmation criterion 2 items a-f.
+#
+# Phase 2b coverage — atomic-cohort class (Class 3b — Rule 3b cohort enumeration).
+# Maps to ADR-061 Confirmation criterion 2 item g (full-cohort evaluation,
+# max(Priority) across cohort tickets, atomic VP-blocked + halt propagation).
+# Cohort detection reads docs/changesets-holding/README.md "Currently held"
+# section and groups entries by shared reinstate-trigger prose (parenthetical
+# elaborations stripped before grouping). Single-member "cohorts" fall back
+# to class=3a (no Phase 2a regression).
 setup() {
   REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../.." && pwd)"
@@ -270,3 +276,198 @@ EOF
   # Confirm body-referenced P800 was NOT picked up
   ! echo "$output" | grep -q 'ticket=P800'
 }
+# ----- Phase 2b: ADR-061 Confirmation criterion 2 item (g) — atomic-cohort -----
+# Helper: seed a Currently held entry into docs/changesets-holding/README.md.
+# Cohort detection reads this file and groups entries by shared reinstate-trigger
+# prose (parenthetical elaborations stripped) — see evaluate-graduation.sh.
+seed_holding_readme() {
+  # seed_holding_readme <bullet-line> [<bullet-line>...]
+  local readme="docs/changesets-holding/README.md"
+  if [ ! -f "$readme" ]; then
+    cat > "$readme" <<'EOF'
+# Changesets Holding Area
+## Currently held
+EOF
+  fi
+  for bullet in "$@"; do
+    printf '%s\n' "$bullet" >> "$readme"
+  done
+}
+# Case (g.1) — two members sharing identical reinstate-trigger prose form a cohort
+@test "case (g.1): two members sharing reinstate-trigger form Class 3b cohort" {
+  seed_problem "170" "open" "9"
+  seed_problem "171" "open" "12"
+  seed_changeset "wr-itil-p170-phase4.md"
+  seed_changeset "wr-itil-p171-phase3.md"
+  seed_holding_readme \
+    "- \`wr-itil-p170-phase4.md\` — patch. **Reinstate trigger**: Phase 3 + Phase 4 end-of-chain user verification fires." \
+    "- \`wr-itil-p171-phase3.md\` — minor. **Reinstate trigger**: Phase 3 + Phase 4 end-of-chain user verification fires."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Both members emit class=3b
+  echo "$output" | grep 'changeset=wr-itil-p170-phase4.md' | grep -q 'class=3b'
+  echo "$output" | grep 'changeset=wr-itil-p171-phase3.md' | grep -q 'class=3b'
+  # Both members share the same cohort= column
+  cohort_p170=$(echo "$output" | grep 'changeset=wr-itil-p170-phase4.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  cohort_p171=$(echo "$output" | grep 'changeset=wr-itil-p171-phase3.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  [ -n "$cohort_p170" ]
+  [ "$cohort_p170" = "$cohort_p171" ]
+}
+# Case (g.2) — cohort uses max(Priority) across all member tickets per ADR-061 Rule 3b
+@test "case (g.2): cohort priority is max across member tickets" {
+  seed_problem "172" "open" "6"
+  seed_problem "173" "open" "15"
+  seed_problem "174" "open" "9"
+  seed_changeset "wr-itil-p172-slice-a.md"
+  seed_changeset "wr-itil-p173-slice-b.md"
+  seed_changeset "wr-itil-p174-slice-c.md"
+  seed_holding_readme \
+    "- \`wr-itil-p172-slice-a.md\` — patch. **Reinstate trigger**: RFC-009 end-of-chain verification." \
+    "- \`wr-itil-p173-slice-b.md\` — patch. **Reinstate trigger**: RFC-009 end-of-chain verification." \
+    "- \`wr-itil-p174-slice-c.md\` — patch. **Reinstate trigger**: RFC-009 end-of-chain verification."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Every cohort member carries priority=15 (max across P172/P173/P174)
+  echo "$output" | grep 'changeset=wr-itil-p172-slice-a.md' | grep -q 'priority=15'
+  echo "$output" | grep 'changeset=wr-itil-p173-slice-b.md' | grep -q 'priority=15'
+  echo "$output" | grep 'changeset=wr-itil-p174-slice-c.md' | grep -q 'priority=15'
+}
+# Case (g.3) — one VP-blocked cohort member marks the entire cohort vp-blocked
+@test "case (g.3): VP-blocked member blocks entire cohort (Rule 2 carve-out symmetric)" {
+  seed_problem "175" "open" "9"
+  seed_problem "176" "verifying" "12"
+  seed_changeset "wr-itil-p175-slice-a.md"
+  seed_changeset "wr-itil-p176-slice-b.md"
+  seed_holding_readme \
+    "- \`wr-itil-p175-slice-a.md\` — minor. **Reinstate trigger**: cohort verification fires." \
+    "- \`wr-itil-p176-slice-b.md\` — minor. **Reinstate trigger**: cohort verification fires."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Both members report status=vp-blocked even though only P176 is in verifying state
+  echo "$output" | grep 'changeset=wr-itil-p175-slice-a.md' | grep -q 'status=vp-blocked'
+  echo "$output" | grep 'changeset=wr-itil-p176-slice-b.md' | grep -q 'status=vp-blocked'
+}
+# Case (g.4) — one halt-no-resolution member propagates to entire cohort (architect C1)
+@test "case (g.4): halt-no-resolution member propagates to entire cohort" {
+  seed_problem "177" "open" "9"
+  # P178 deliberately NOT seeded → halt-no-resolution
+  seed_changeset "wr-itil-p177-slice-a.md"
+  seed_changeset "wr-itil-p178-slice-b.md"
+  seed_holding_readme \
+    "- \`wr-itil-p177-slice-a.md\` — patch. **Reinstate trigger**: shared cohort fires." \
+    "- \`wr-itil-p178-slice-b.md\` — patch. **Reinstate trigger**: shared cohort fires."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Both members report status=halt-no-resolution — cohort cannot graduate partially
+  echo "$output" | grep 'changeset=wr-itil-p177-slice-a.md' | grep -q 'status=halt-no-resolution'
+  echo "$output" | grep 'changeset=wr-itil-p178-slice-b.md' | grep -q 'status=halt-no-resolution'
+}
+# Case (g.5) — single-member "cohort" falls back to Class 3a (no Phase 2a regression)
+@test "case (g.5): single-member 'cohort' falls back to class=3a" {
+  seed_problem "179" "open" "9"
+  seed_changeset "wr-itil-p179-solo.md"
+  seed_holding_readme \
+    "- \`wr-itil-p179-solo.md\` — patch. **Reinstate trigger**: nobody else shares this trigger."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep 'changeset=wr-itil-p179-solo.md' | grep -q 'class=3a'
+  ! echo "$output" | grep 'changeset=wr-itil-p179-solo.md' | grep -q 'cohort='
+}
+# Case (g.6) — parenthetical elaborations stripped before grouping
+@test "case (g.6): parenthetical elaborations stripped before cohort grouping" {
+  seed_problem "180" "open" "9"
+  seed_problem "181" "open" "9"
+  seed_changeset "wr-itil-p180-a.md"
+  seed_changeset "wr-itil-p181-b.md"
+  # P180 trigger has no parens; P181 trigger has parenthetical elaboration —
+  # cohort detection must strip the paren content before comparison.
+  seed_holding_readme \
+    "- \`wr-itil-p180-a.md\` — patch. **Reinstate trigger**: end-of-chain fires." \
+    "- \`wr-itil-p181-b.md\` — patch. **Reinstate trigger**: end-of-chain fires (only the slice 3 dependency remains, can defer per Reassessment criterion k)."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Despite different surface prose, the normalised trigger matches → both class=3b
+  echo "$output" | grep 'changeset=wr-itil-p180-a.md' | grep -q 'class=3b'
+  echo "$output" | grep 'changeset=wr-itil-p181-b.md' | grep -q 'class=3b'
+}
+# Case (g.7) — README without "Currently held" section → all entries fall back to class=3a
+@test "case (g.7): README without 'Currently held' section falls back to class=3a (defensive)" {
+  seed_problem "182" "open" "9"
+  seed_problem "183" "open" "12"
+  seed_changeset "wr-itil-p182-a.md"
+  seed_changeset "wr-itil-p183-b.md"
+  # README exists but has no "Currently held" section — cohort detection finds nothing.
+  cat > "docs/changesets-holding/README.md" <<'EOF'
+# Holding Area
+Some unrelated content.
+EOF
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep 'changeset=wr-itil-p182-a.md' | grep -q 'class=3a'
+  echo "$output" | grep 'changeset=wr-itil-p183-b.md' | grep -q 'class=3a'
+}
+# Case (g.8) — README absent entirely → all entries fall back to class=3a (defensive)
+@test "case (g.8): missing README falls back to class=3a" {
+  seed_problem "184" "open" "9"
+  seed_changeset "wr-itil-p184-a.md"
+  # Do NOT create README at all
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep 'changeset=wr-itil-p184-a.md' | grep -q 'class=3a'
+}
+# Case (g.9) — multiple distinct cohorts in the same holding-area resolve independently
+@test "case (g.9): multiple distinct cohorts coexist with distinct cohort= ids" {
+  seed_problem "185" "open" "9"
+  seed_problem "186" "open" "10"
+  seed_problem "187" "open" "12"
+  seed_problem "188" "open" "8"
+  seed_changeset "wr-itil-p185-cohort-a.md"
+  seed_changeset "wr-itil-p186-cohort-a.md"
+  seed_changeset "wr-itil-p187-cohort-b.md"
+  seed_changeset "wr-itil-p188-cohort-b.md"
+  seed_holding_readme \
+    "- \`wr-itil-p185-cohort-a.md\` — minor. **Reinstate trigger**: cohort alpha fires." \
+    "- \`wr-itil-p186-cohort-a.md\` — minor. **Reinstate trigger**: cohort alpha fires." \
+    "- \`wr-itil-p187-cohort-b.md\` — minor. **Reinstate trigger**: cohort beta fires." \
+    "- \`wr-itil-p188-cohort-b.md\` — minor. **Reinstate trigger**: cohort beta fires."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  cohort_a1=$(echo "$output" | grep 'changeset=wr-itil-p185-cohort-a.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  cohort_a2=$(echo "$output" | grep 'changeset=wr-itil-p186-cohort-a.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  cohort_b1=$(echo "$output" | grep 'changeset=wr-itil-p187-cohort-b.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  cohort_b2=$(echo "$output" | grep 'changeset=wr-itil-p188-cohort-b.md' | sed -n 's/.*cohort=\([^ |]*\).*/\1/p')
+  [ -n "$cohort_a1" ] && [ "$cohort_a1" = "$cohort_a2" ]
+  [ -n "$cohort_b1" ] && [ "$cohort_b1" = "$cohort_b2" ]
+  [ "$cohort_a1" != "$cohort_b1" ]
+  # Cohort A priority is max(9,10) = 10; Cohort B priority is max(12,8) = 12
+  echo "$output" | grep 'changeset=wr-itil-p185-cohort-a.md' | grep -q 'priority=10'
+  echo "$output" | grep 'changeset=wr-itil-p187-cohort-b.md' | grep -q 'priority=12'
+}
+# Case (g.10) — cohort detection does NOT regress Phase 2a summary counts
+@test "case (g.10): cohort members still count individually in GRADUATION_SUMMARY" {
+  seed_problem "190" "open" "9"
+  seed_problem "191" "open" "9"
+  seed_changeset "wr-itil-p190-cohort.md"
+  seed_changeset "wr-itil-p191-cohort.md"
+  seed_holding_readme \
+    "- \`wr-itil-p190-cohort.md\` — patch. **Reinstate trigger**: shared cohort." \
+    "- \`wr-itil-p191-cohort.md\` — patch. **Reinstate trigger**: shared cohort."
+  run bash "$SCRIPT" "$WORK_DIR"
+  [ "$status" -eq 0 ]
+  # Phase 2a parsers see total=2 resolved=2 — backwards compatible
+  echo "$output" | grep -q 'GRADUATION_SUMMARY: total=2 resolved=2 vp_blocked=0 halts=0'
+}

package/skills/assess-external-comms/SKILL.md CHANGED Viewed

@@ -51,12 +51,24 @@ Do not ask if the surface is obvious from the conversation context.
 ### 3. Construct the review prompt
-Build a self-contained prompt for the `wr-risk-scorer:external-comms` subagent that includes:
+Build a self-contained prompt for the `wr-risk-scorer:external-comms` subagent. The prompt MUST be structured so the PostToolUse hook can derive the marker key locally (P166 / ADR-028 amended 2026-05-16) — single fire per gate cycle suffices:
-- The **draft body** verbatim (between explicit `<draft>...</draft>` markers so the agent's substring extraction is unambiguous).
-- The **target surface** (one of the canonical strings above).
-- The **destination** when known.
-- A reminder to compute `EXTERNAL_COMMS_RISK_KEY = sha256(draft + '\n' + surface)`.
+```
+SURFACE: <surface-name>
+<draft>
+<draft body verbatim>
+</draft>
+Destination: <destination if known>
+Review against RISK-POLICY.md Confidential Information classes.
+```
+Two requirements:
+- A leading line `SURFACE: <surface-name>` where `<surface-name>` is one of the canonical strings (`gh-issue-create`, `gh-pr-comment`, etc.) — anchored to line start, single token.
+- The **draft body** wrapped verbatim inside `<draft>...</draft>` markers — the hook extracts everything between these markers and uses it for `sha256(DRAFT + '\n' + SURFACE)`.
+The orchestrator does NOT pre-compute the key — the hook derives it from the prompt structure. Skip the agent-emitted key entirely.
 ### 4. Delegate to wr-risk-scorer:external-comms
@@ -67,7 +79,7 @@ subagent_type: wr-risk-scorer:external-comms
 prompt: <constructed review prompt from step 3>
 ```
-Wait for the subagent to complete. The subagent will output a structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + `EXTERNAL_COMMS_RISK_KEY: <sha>` + optional `EXTERNAL_COMMS_RISK_REASON: ...`). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads that output and writes the marker automatically.
+Wait for the subagent to complete. The subagent outputs a structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
 **Do not write to `${TMPDIR:-/tmp}/claude-risk-*` yourself.** The hook is the only correct mechanism.