@windyroad/risk-scorer 0.12.7 → 0.12.8-preview.617

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -310,5 +310,5 @@
310
310
  }
311
311
  },
312
312
  "name": "wr-risk-scorer",
313
- "version": "0.12.7"
313
+ "version": "0.12.8"
314
314
  }
package/README.md CHANGED
@@ -79,6 +79,16 @@ The plugin includes six specialised agents:
79
79
  | `/wr-risk-scorer:bootstrap-catalog` | Bootstrap `docs/risks/` register from existing `.risk-reports/` corpus per ADR-059 — walks reports, dedupes by ADR-056 slug, emits one `R<NNN>-<slug>.active.md` per unique slug. Idempotent. Auto-triggers from `/install-updates` Step 6.5.1 when register is empty + `RISK-POLICY.md` present + `.risk-reports/` non-empty |
80
80
  | `/wr-risk-scorer:update-policy` | Generate or update `RISK-POLICY.md` |
81
81
 
82
+ ### Internal-use wrapper skills
83
+
84
+ These wrappers exist so consumer SKILLs can invoke the scoring agents via the Skill tool with `skill: wr-risk-scorer:<name>` per ADR-015's Confirmation literal phrasing. End users should invoke the `/wr-risk-scorer:assess-*` skills above; the wrappers are internal plumbing.
85
+
86
+ | Wrapper skill | Purpose |
87
+ |---------------|---------|
88
+ | `wr-risk-scorer:pipeline` | Skill-tool wrapper around the `wr-risk-scorer:pipeline` agent (consumer: `/wr-risk-scorer:assess-release`) |
89
+ | `wr-risk-scorer:wip` | Skill-tool wrapper around the `wr-risk-scorer:wip` agent (consumer: `/wr-risk-scorer:assess-wip`) |
90
+ | `wr-risk-scorer:external-comms` | Skill-tool wrapper around the `wr-risk-scorer:external-comms` agent (consumer: `/wr-risk-scorer:assess-external-comms`) |
91
+
82
92
  ## External-comms gate
83
93
 
84
94
  The `external-comms-gate.sh` hook intercepts outbound prose tool calls and the
@@ -36,15 +36,38 @@ fi
36
36
  if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run push:watch(\s|$)'; then
37
37
  if [ -n "$SESSION_ID" ]; then
38
38
  RDIR=$(_risk_dir "$SESSION_ID")
39
- # Risk-reducing/neutral bypass for push
39
+ # Risk-reducing/neutral bypass for push — session-scoped, drift-
40
+ # revalidated (P192). Persists across multiple push attempts while
41
+ # pipeline-state hash matches and TTL is unexpired; consumed on
42
+ # drift or TTL expiry. Symmetric with the commit-gate change above.
40
43
  if [ -f "${RDIR}/reducing-push" ]; then
44
+ NOW=$(date +%s)
45
+ MARK_TIME=$(_mtime "${RDIR}/reducing-push")
46
+ AGE=$(( NOW - MARK_TIME ))
47
+ TTL_SECONDS="${RISK_TTL:-3600}"
48
+ if [ "$AGE" -lt "$TTL_SECONDS" ] && [ -f "${RDIR}/state-hash" ]; then
49
+ STORED_HASH=$(cat "${RDIR}/state-hash")
50
+ CURRENT_HASH=$("$SCRIPT_DIR/lib/pipeline-state.sh" --hash-inputs 2>/dev/null | _hashcmd | cut -d' ' -f1)
51
+ if [ "$STORED_HASH" = "$CURRENT_HASH" ]; then
52
+ exit 0
53
+ fi
54
+ fi
41
55
  rm -f "${RDIR}/reducing-push"
42
- exit 0
43
56
  fi
44
57
  # Clean tree bypass: if no uncommitted changes, pushing existing commits is safe
45
58
  if [ -f "${RDIR}/clean" ]; then
46
59
  exit 0
47
60
  fi
61
+ # CI-status precondition (P208): a within-appetite predicted-risk
62
+ # score is necessary but not sufficient — the lagging CI signal
63
+ # must also be green (or no-history-yet for the documented
64
+ # first-push case). Fail-closed on gh errors. Ordered AFTER the
65
+ # one-shot bypass markers and BEFORE the predicted-risk gate so
66
+ # incident workflows and clean-tree pushes are unaffected.
67
+ if ! check_ci_status "$SESSION_ID" "push"; then
68
+ risk_gate_deny "Push blocked: ${CI_GATE_REASON}"
69
+ exit 0
70
+ fi
48
71
  if ! check_risk_gate "$SESSION_ID" "push"; then
49
72
  if [ "$RISK_GATE_CATEGORY" = "threshold" ]; then
50
73
  risk_gate_deny "Push blocked: Push risk score ${RISK_GATE_SCORE}/25 (Medium or above). To proceed: (1) release first via \`npm run release:watch\`, (2) split the push, or (3) add risk-reducing measures. If risk-neutral or risk-reducing, delegate to wr-risk-scorer:pipeline (subagent_type: 'wr-risk-scorer:pipeline') — it will create a bypass marker."
@@ -83,13 +106,32 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run release:watch(\s|$)'; the
83
106
  # Live-incident bypass: if an incident marker exists, allow release
84
107
  # regardless of risk score. Used when addressing outages, security
85
108
  # incidents, or information disclosure that requires immediate deployment.
109
+ # Per JTBD-201, this MUST short-circuit BEFORE the CI-status check
110
+ # so the hotfix path is unaffected by red CI on master.
86
111
  if [ -f "${RDIR}/incident-release" ]; then
87
112
  rm -f "${RDIR}/incident-release"
88
113
  exit 0
89
114
  fi
90
- # Risk-reducing bypass for release
115
+ # Risk-reducing bypass for release — session-scoped, drift-
116
+ # revalidated (P192). Same lifecycle as reducing-push above.
91
117
  if [ -f "${RDIR}/reducing-release" ]; then
118
+ NOW=$(date +%s)
119
+ MARK_TIME=$(_mtime "${RDIR}/reducing-release")
120
+ AGE=$(( NOW - MARK_TIME ))
121
+ TTL_SECONDS="${RISK_TTL:-3600}"
122
+ if [ "$AGE" -lt "$TTL_SECONDS" ] && [ -f "${RDIR}/state-hash" ]; then
123
+ STORED_HASH=$(cat "${RDIR}/state-hash")
124
+ CURRENT_HASH=$("$SCRIPT_DIR/lib/pipeline-state.sh" --hash-inputs 2>/dev/null | _hashcmd | cut -d' ' -f1)
125
+ if [ "$STORED_HASH" = "$CURRENT_HASH" ]; then
126
+ exit 0
127
+ fi
128
+ fi
92
129
  rm -f "${RDIR}/reducing-release"
130
+ fi
131
+ # CI-status precondition (P208): a green CI run on the target
132
+ # branch is required before shipping. Fail-closed on gh errors.
133
+ if ! check_ci_status "$SESSION_ID" "release"; then
134
+ risk_gate_deny "Release blocked: ${CI_GATE_REASON}"
93
135
  exit 0
94
136
  fi
95
137
  if ! check_risk_gate "$SESSION_ID" "release"; then
package/hooks/hooks.json CHANGED
@@ -16,7 +16,7 @@
16
16
  { "matcher": "Edit|Write", "hooks": [{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/wip-risk-mark.sh" }] },
17
17
  { "matcher": "Agent", "hooks": [{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/risk-score-mark.sh" }] },
18
18
  { "matcher": "Bash", "hooks": [{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/risk-hash-refresh.sh" }] },
19
- { "matcher": "Agent|Bash", "hooks": [{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/risk-slide-marker.sh" }] }
19
+ { "matcher": "Agent|Bash|Skill", "hooks": [{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/risk-slide-marker.sh" }] }
20
20
  ]
21
21
  }
22
22
  }
@@ -153,6 +153,148 @@ print(('yes' if score > N else 'no') + ' ' + str(N))
153
153
  return 0
154
154
  }
155
155
 
156
+ # Check CI health for the current branch (P208).
157
+ #
158
+ # Returns 0 if push/release may proceed, 1 if denied. Sets CI_GATE_REASON
159
+ # on deny with a human-readable message that names the CI conclusion and
160
+ # the run URL. Sets CI_GATE_CATEGORY ∈ {bypass, no-history, allow, red,
161
+ # pending, gh-error}.
162
+ #
163
+ # Consults `gh run list --branch <current-branch> --limit 1 --json
164
+ # status,conclusion,databaseId,url` for the working branch's most recent
165
+ # CI run.
166
+ #
167
+ # Decision table:
168
+ # - bypass marker present (${RDIR}/ci-bypass-${ACTION}) → allow, consume
169
+ # - gh failure (auth / timeout / API error) → DENY (fail-CLOSED, per
170
+ # P208 safe-high-fix-risk classifier — a buggy harden must NOT
171
+ # degrade to allow)
172
+ # - empty result `[]` → allow (no CI history yet; first push triggers
173
+ # CI naturally)
174
+ # - status ∈ {queued, in_progress, pending, requested, waiting} → deny
175
+ # - conclusion ∈ {failure, cancelled, timed_out, action_required,
176
+ # startup_failure} → deny
177
+ # - conclusion ∈ {success, skipped, neutral} or unknown → allow
178
+ #
179
+ # Usage: check_ci_status "$SESSION_ID" "push" # or "release"
180
+ check_ci_status() {
181
+ local SESSION_ID="$1"
182
+ local ACTION="$2"
183
+ local RDIR
184
+ RDIR=$(_risk_dir "$SESSION_ID")
185
+ local BYPASS_MARKER="${RDIR}/ci-bypass-${ACTION}"
186
+
187
+ CI_GATE_REASON=""
188
+ CI_GATE_CATEGORY=""
189
+
190
+ # One-shot bypass marker — consumed on use, same family as
191
+ # reducing-push / incident-release. Documented override for the
192
+ # legitimate "first push triggers CI" edge case and infra incidents.
193
+ if [ -f "$BYPASS_MARKER" ]; then
194
+ rm -f "$BYPASS_MARKER"
195
+ CI_GATE_CATEGORY="bypass"
196
+ return 0
197
+ fi
198
+
199
+ # Resolve current branch. If we're not in a git repo or HEAD is
200
+ # detached, skip the CI check (the surrounding push/release gate
201
+ # would itself fail at the git layer with a clearer error).
202
+ local BRANCH
203
+ BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
204
+ if [ -z "$BRANCH" ] || [ "$BRANCH" = "HEAD" ]; then
205
+ CI_GATE_CATEGORY="allow"
206
+ return 0
207
+ fi
208
+
209
+ # Query GitHub. Bounded at 10s wall-clock so a network stall cannot
210
+ # hang push:watch indefinitely. `command -v timeout` because macOS
211
+ # default install does not ship GNU `timeout`.
212
+ local JSON GH_EXIT
213
+ if command -v timeout >/dev/null 2>&1; then
214
+ JSON=$(timeout 10s gh run list --branch "$BRANCH" --limit 1 \
215
+ --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
216
+ else
217
+ JSON=$(gh run list --branch "$BRANCH" --limit 1 \
218
+ --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
219
+ fi
220
+
221
+ if [ -n "${GH_EXIT:-}" ] && [ "$GH_EXIT" != "0" ]; then
222
+ CI_GATE_CATEGORY="gh-error"
223
+ CI_GATE_REASON="CI status check failed (gh exit ${GH_EXIT}: auth / timeout / API error). Fail-closed per P208 safe-high-fix-risk. Fix the underlying gh failure, or to override for a legitimate first-push-triggers-CI run, create the bypass marker: touch ${BYPASS_MARKER}"
224
+ return 1
225
+ fi
226
+
227
+ # Empty array = no CI history for this branch yet. Natural allow for
228
+ # the documented "first push triggers CI" case — no marker needed.
229
+ local TRIMMED
230
+ TRIMMED=$(printf '%s' "$JSON" | tr -d '[:space:]')
231
+ if [ -z "$TRIMMED" ] || [ "$TRIMMED" = "[]" ]; then
232
+ CI_GATE_CATEGORY="no-history"
233
+ return 0
234
+ fi
235
+
236
+ # Parse status, conclusion, url. Fail-closed on parse error.
237
+ local PARSED
238
+ PARSED=$(echo "$JSON" | python3 -c "
239
+ import sys, json
240
+ try:
241
+ runs = json.load(sys.stdin)
242
+ if not isinstance(runs, list) or not runs:
243
+ print('||')
244
+ sys.exit(0)
245
+ r = runs[0]
246
+ print('{}|{}|{}'.format(r.get('status') or '', r.get('conclusion') or '', r.get('url') or ''))
247
+ except Exception:
248
+ print('PARSE_ERROR||')
249
+ " 2>/dev/null || echo "PARSE_ERROR||")
250
+
251
+ local STATUS CONCLUSION URL
252
+ STATUS="${PARSED%%|*}"
253
+ local REST="${PARSED#*|}"
254
+ CONCLUSION="${REST%%|*}"
255
+ URL="${REST#*|}"
256
+
257
+ if [ "$STATUS" = "PARSE_ERROR" ]; then
258
+ CI_GATE_CATEGORY="gh-error"
259
+ CI_GATE_REASON="CI status check returned unparseable response. Fail-closed per P208 safe-high-fix-risk. To override for a legitimate first-push case, create the bypass marker: touch ${BYPASS_MARKER}"
260
+ return 1
261
+ fi
262
+
263
+ case "$STATUS" in
264
+ queued|in_progress|pending|requested|waiting)
265
+ CI_GATE_CATEGORY="pending"
266
+ CI_GATE_REASON="Latest CI run on branch '${BRANCH}' is still in flight (status: ${STATUS}). Wait for it to settle: ${URL}. To override, create the bypass marker: touch ${BYPASS_MARKER}"
267
+ return 1
268
+ ;;
269
+ completed)
270
+ case "$CONCLUSION" in
271
+ success|skipped|neutral|"")
272
+ CI_GATE_CATEGORY="allow"
273
+ return 0
274
+ ;;
275
+ failure|cancelled|timed_out|action_required|startup_failure)
276
+ CI_GATE_CATEGORY="red"
277
+ CI_GATE_REASON="Latest CI run on branch '${BRANCH}' concluded ${CONCLUSION}: ${URL}. Fix CI before pushing/releasing. To override for a legitimate first-push or infra-incident case, create the bypass marker: touch ${BYPASS_MARKER}"
278
+ return 1
279
+ ;;
280
+ *)
281
+ # Unknown conclusion — allow rather than block on a value we
282
+ # don't recognise. New GitHub conclusion values are infrequent.
283
+ CI_GATE_CATEGORY="allow"
284
+ return 0
285
+ ;;
286
+ esac
287
+ ;;
288
+ *)
289
+ # Unknown status — allow rather than block on a value we don't
290
+ # recognise. Conservative tilts toward the threshold check below
291
+ # catching the actual risk.
292
+ CI_GATE_CATEGORY="allow"
293
+ return 0
294
+ ;;
295
+ esac
296
+ }
297
+
156
298
  # Emit fail-closed deny JSON for PreToolUse hooks.
157
299
  risk_gate_deny() {
158
300
  local REASON="$1"
@@ -64,10 +64,25 @@ if [ -f "${RDIR}/clean" ]; then
64
64
  exit 0
65
65
  fi
66
66
 
67
- # Risk-reducing/neutral bypass
67
+ # Risk-reducing/neutral bypass — session-scoped, drift-revalidated (P192).
68
+ # Preserved across multiple commits while pipeline-state hash matches and
69
+ # TTL is unexpired; consumed on drift or TTL expiry so a genuine risk-
70
+ # profile change forces a fresh wr-risk-scorer:pipeline rescore. Mirrors
71
+ # the clean-marker persist-until-drift precedent (above) — distinct from
72
+ # incident-release / ci-bypass, which remain deliberate one-time overrides.
68
73
  if [ -f "${RDIR}/reducing-commit" ]; then
74
+ NOW=$(date +%s)
75
+ MARK_TIME=$(_mtime "${RDIR}/reducing-commit")
76
+ AGE=$(( NOW - MARK_TIME ))
77
+ TTL_SECONDS="${RISK_TTL:-3600}"
78
+ if [ "$AGE" -lt "$TTL_SECONDS" ] && [ -f "${RDIR}/state-hash" ]; then
79
+ STORED_HASH=$(cat "${RDIR}/state-hash")
80
+ CURRENT_HASH=$("$SCRIPT_DIR/lib/pipeline-state.sh" --hash-inputs 2>/dev/null | _hashcmd | cut -d' ' -f1)
81
+ if [ "$STORED_HASH" = "$CURRENT_HASH" ]; then
82
+ exit 0
83
+ fi
84
+ fi
69
85
  rm -f "${RDIR}/reducing-commit"
70
- exit 0
71
86
  fi
72
87
 
73
88
  # Gate check: existence, TTL, drift, threshold
@@ -1,5 +1,5 @@
1
1
  #!/bin/bash
2
- # Risk Scorer - PostToolUse:Agent|Bash slide-marker hook (P111).
2
+ # Risk Scorer - PostToolUse:Agent|Bash|Skill slide-marker hook (P111 + P213).
3
3
  # Slides the parent session's existing risk score files forward on
4
4
  # subprocess return, treating subprocess wall-clock as continuous parent-
5
5
  # session work for TTL purposes. Only TOUCHES existing score files — never
@@ -0,0 +1,240 @@
1
+ #!/usr/bin/env bats
2
+ # Tests for the CI-status precondition in the push/release gate.
3
+ #
4
+ # Closes P208 (known-error): git-push-gate.sh did not consult CI health
5
+ # before scoring push/release risk, so a push could land on a CI-red
6
+ # master and a release could ship broken code.
7
+ #
8
+ # Contract:
9
+ # - `check_ci_status` queries `gh run list --branch <current-branch>
10
+ # --limit 1 --json status,conclusion,databaseId,url` for the current
11
+ # branch and returns 0 (allow) / 1 (deny).
12
+ # - Deny on conclusion ∈ {failure, cancelled, timed_out, action_required,
13
+ # startup_failure}.
14
+ # - Deny on status ∈ {queued, in_progress, pending, requested, waiting}.
15
+ # - Allow on conclusion ∈ {success, skipped, neutral} or unknown.
16
+ # - Empty array (no CI history yet) → allow. Handles the documented
17
+ # "first push triggers CI" case naturally — no bypass marker required.
18
+ # - `gh` failure (auth/timeout/API error) → DENY (fail-closed per the
19
+ # safe-high-fix-risk classifier on P208).
20
+ # - `${RDIR}/ci-bypass-${ACTION}` one-shot bypass marker — consumed on
21
+ # use, same family as reducing-push / incident-release.
22
+ # - Integration: in git-push-gate.sh, the ordering is bypass-markers →
23
+ # CI status → risk gate. The `incident-release` bypass MUST short-
24
+ # circuit BEFORE the CI check fires (per JTBD-201 + ADR-018).
25
+
26
+ setup() {
27
+ HOOKS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
28
+ source "$HOOKS_DIR/lib/gate-helpers.sh"
29
+ source "$HOOKS_DIR/lib/risk-gate.sh"
30
+
31
+ TEST_SESSION="bats-ci-gate-$$-${BATS_TEST_NUMBER}"
32
+ RDIR=$(_risk_dir "$TEST_SESSION")
33
+ rm -rf "$RDIR"
34
+ mkdir -p "$RDIR"
35
+
36
+ # Stand up a fake git repo so `git rev-parse --abbrev-ref HEAD` resolves.
37
+ TEST_REPO="$(mktemp -d)"
38
+ ( cd "$TEST_REPO" && git init -q -b main && \
39
+ git -c user.email=t@e -c user.name=t commit --allow-empty -q -m "init" )
40
+
41
+ # Stub `gh` on PATH. The stub reads $FAKE_GH_OUTPUT and $FAKE_GH_EXIT
42
+ # for behaviour. PATH ordering: stub dir first.
43
+ STUB_DIR="$(mktemp -d)"
44
+ cat > "$STUB_DIR/gh" <<'STUB'
45
+ #!/bin/bash
46
+ if [ -n "${FAKE_GH_DELAY:-}" ]; then sleep "$FAKE_GH_DELAY"; fi
47
+ if [ -n "${FAKE_GH_OUTPUT:-}" ]; then
48
+ printf '%s' "$FAKE_GH_OUTPUT"
49
+ fi
50
+ exit "${FAKE_GH_EXIT:-0}"
51
+ STUB
52
+ chmod +x "$STUB_DIR/gh"
53
+ # `timeout` may not exist on the path on some macOS setups — stub a
54
+ # passthrough for portability. Tests inject FAKE_GH_DELAY only when
55
+ # they specifically test timeout behaviour.
56
+ ORIG_PATH="$PATH"
57
+ export PATH="$STUB_DIR:$PATH"
58
+ export TEST_REPO STUB_DIR
59
+ }
60
+
61
+ teardown() {
62
+ rm -rf "$RDIR" "$TEST_REPO" "$STUB_DIR"
63
+ export PATH="$ORIG_PATH"
64
+ unset FAKE_GH_OUTPUT FAKE_GH_EXIT FAKE_GH_DELAY CI_GATE_REASON CI_GATE_CATEGORY 2>/dev/null || true
65
+ }
66
+
67
+ # Run check_ci_status inside the fake repo so branch resolution works.
68
+ _run_check() {
69
+ local action="$1"
70
+ CI_GATE_REASON=""
71
+ CI_GATE_CATEGORY=""
72
+ ( cd "$TEST_REPO" && \
73
+ FAKE_GH_OUTPUT="${FAKE_GH_OUTPUT:-}" FAKE_GH_EXIT="${FAKE_GH_EXIT:-0}" \
74
+ bash -c "source '$HOOKS_DIR/lib/gate-helpers.sh'; source '$HOOKS_DIR/lib/risk-gate.sh'; \
75
+ if check_ci_status '$TEST_SESSION' '$action'; then echo ALLOW; \
76
+ else echo \"DENY: \$CI_GATE_REASON\"; fi" )
77
+ }
78
+
79
+ @test "check_ci_status allows when latest CI run concluded success" {
80
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"success","databaseId":1,"url":"https://github.com/x/y/actions/runs/1"}]'
81
+ result=$(_run_check "push")
82
+ [[ "$result" == "ALLOW" ]]
83
+ }
84
+
85
+ @test "check_ci_status denies when latest CI run concluded failure (names conclusion + URL)" {
86
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":2,"url":"https://github.com/x/y/actions/runs/2"}]'
87
+ result=$(_run_check "push")
88
+ [[ "$result" == DENY:* ]]
89
+ [[ "$result" == *"failure"* ]]
90
+ [[ "$result" == *"https://github.com/x/y/actions/runs/2"* ]]
91
+ }
92
+
93
+ @test "check_ci_status denies when latest CI run concluded cancelled" {
94
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"cancelled","databaseId":3,"url":"https://github.com/x/y/actions/runs/3"}]'
95
+ result=$(_run_check "release")
96
+ [[ "$result" == DENY:* ]]
97
+ [[ "$result" == *"cancelled"* ]]
98
+ }
99
+
100
+ @test "check_ci_status denies when latest CI run concluded timed_out" {
101
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"timed_out","databaseId":4,"url":"https://github.com/x/y/actions/runs/4"}]'
102
+ result=$(_run_check "push")
103
+ [[ "$result" == DENY:* ]]
104
+ [[ "$result" == *"timed_out"* ]]
105
+ }
106
+
107
+ @test "check_ci_status denies when latest CI run status is in_progress" {
108
+ export FAKE_GH_OUTPUT='[{"status":"in_progress","conclusion":null,"databaseId":5,"url":"https://github.com/x/y/actions/runs/5"}]'
109
+ result=$(_run_check "push")
110
+ [[ "$result" == DENY:* ]]
111
+ [[ "$result" == *"in_progress"* ]]
112
+ }
113
+
114
+ @test "check_ci_status denies when latest CI run status is queued" {
115
+ export FAKE_GH_OUTPUT='[{"status":"queued","conclusion":null,"databaseId":6,"url":"https://github.com/x/y/actions/runs/6"}]'
116
+ result=$(_run_check "release")
117
+ [[ "$result" == DENY:* ]]
118
+ [[ "$result" == *"queued"* ]]
119
+ }
120
+
121
+ @test "check_ci_status allows when CI history is empty (first push triggers CI)" {
122
+ export FAKE_GH_OUTPUT='[]'
123
+ result=$(_run_check "push")
124
+ [[ "$result" == "ALLOW" ]]
125
+ }
126
+
127
+ @test "check_ci_status denies when gh exits non-zero (fail-closed, safe-high-fix-risk)" {
128
+ export FAKE_GH_OUTPUT=''
129
+ export FAKE_GH_EXIT=1
130
+ result=$(_run_check "push")
131
+ [[ "$result" == DENY:* ]]
132
+ # Must point at the ci-bypass marker for the documented override path.
133
+ [[ "$result" == *"ci-bypass-push"* ]]
134
+ }
135
+
136
+ @test "check_ci_status allows when ci-bypass marker is present and consumes it" {
137
+ : > "$RDIR/ci-bypass-push"
138
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":7,"url":"https://github.com/x/y/actions/runs/7"}]'
139
+ result=$(_run_check "push")
140
+ [[ "$result" == "ALLOW" ]]
141
+ # Bypass markers are one-shot — same family as reducing-push / incident-release.
142
+ [ ! -f "$RDIR/ci-bypass-push" ]
143
+ }
144
+
145
+ @test "check_ci_status bypass marker is action-scoped (push marker does not bypass release)" {
146
+ : > "$RDIR/ci-bypass-push"
147
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":8,"url":"https://github.com/x/y/actions/runs/8"}]'
148
+ result=$(_run_check "release")
149
+ [[ "$result" == DENY:* ]]
150
+ # push bypass must not have been consumed by a release check
151
+ [ -f "$RDIR/ci-bypass-push" ]
152
+ }
153
+
154
+ @test "check_ci_status allows when conclusion is skipped" {
155
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"skipped","databaseId":9,"url":"https://github.com/x/y/actions/runs/9"}]'
156
+ result=$(_run_check "push")
157
+ [[ "$result" == "ALLOW" ]]
158
+ }
159
+
160
+ @test "check_ci_status allows when conclusion is neutral" {
161
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"neutral","databaseId":10,"url":"https://github.com/x/y/actions/runs/10"}]'
162
+ result=$(_run_check "release")
163
+ [[ "$result" == "ALLOW" ]]
164
+ }
165
+
166
+ # ---------------------------------------------------------------------------
167
+ # Integration: git-push-gate.sh ordering — bypass-markers → CI status → risk
168
+ # gate. JTBD-201 demands the incident-release bypass MUST short-circuit
169
+ # BEFORE the new CI-status check fires.
170
+ # ---------------------------------------------------------------------------
171
+
172
+ # Helper: build a PreToolUse Bash input with a given command
173
+ _build_input() {
174
+ local cmd="$1"
175
+ cat <<JSON
176
+ {
177
+ "session_id": "$TEST_SESSION",
178
+ "tool_name": "Bash",
179
+ "tool_input": {
180
+ "command": "$cmd"
181
+ }
182
+ }
183
+ JSON
184
+ }
185
+
186
+ @test "git-push-gate.sh denies push:watch when CI is red even if risk score is within appetite" {
187
+ # Within-appetite risk score
188
+ echo "1" > "$RDIR/push"
189
+ # Disable drift check (no stored hash file)
190
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":11,"url":"https://github.com/x/y/actions/runs/11"}]'
191
+
192
+ INPUT=$(_build_input "npm run push:watch")
193
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
194
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
195
+ "$HOOKS_DIR/git-push-gate.sh" )
196
+ [[ "$output" == *"permissionDecision"* ]]
197
+ [[ "$output" == *"deny"* ]]
198
+ [[ "$output" == *"failure"* ]]
199
+ }
200
+
201
+ @test "git-push-gate.sh denies release:watch when CI is red even if risk score is within appetite" {
202
+ echo "1" > "$RDIR/release"
203
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":12,"url":"https://github.com/x/y/actions/runs/12"}]'
204
+
205
+ INPUT=$(_build_input "npm run release:watch")
206
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
207
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
208
+ "$HOOKS_DIR/git-push-gate.sh" )
209
+ [[ "$output" == *"permissionDecision"* ]]
210
+ [[ "$output" == *"deny"* ]]
211
+ [[ "$output" == *"failure"* ]]
212
+ }
213
+
214
+ @test "git-push-gate.sh allows release:watch when incident-release bypass is set, even if CI is red (JTBD-201)" {
215
+ echo "9" > "$RDIR/release"
216
+ : > "$RDIR/incident-release"
217
+ # Even with a red CI conclusion, the incident bypass must short-circuit
218
+ # both the CI check and the risk threshold.
219
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":13,"url":"https://github.com/x/y/actions/runs/13"}]'
220
+
221
+ INPUT=$(_build_input "npm run release:watch")
222
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
223
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
224
+ "$HOOKS_DIR/git-push-gate.sh" )
225
+ # No permissionDecision means allow (exit 0 with no JSON).
226
+ [[ "$output" != *"permissionDecision"* ]]
227
+ # incident-release marker is one-shot — must be consumed
228
+ [ ! -f "$RDIR/incident-release" ]
229
+ }
230
+
231
+ @test "git-push-gate.sh allows push:watch when CI history is empty (first push)" {
232
+ echo "1" > "$RDIR/push"
233
+ export FAKE_GH_OUTPUT='[]'
234
+
235
+ INPUT=$(_build_input "npm run push:watch")
236
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
237
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
238
+ "$HOOKS_DIR/git-push-gate.sh" )
239
+ [[ "$output" != *"permissionDecision"* ]]
240
+ }
@@ -0,0 +1,236 @@
1
+ #!/usr/bin/env bats
2
+ # P192: Risk-reducing/within-appetite bypass markers (`reducing-commit`,
3
+ # `reducing-push`, `reducing-release`) must persist across multiple commits/
4
+ # pushes/releases within the standard TTL window AS LONG AS the pipeline-state
5
+ # hash still matches what was scored — eliminating the per-commit re-mint
6
+ # round-trip that drives the 3+-rescores-per-session friction. Drift or TTL
7
+ # expiry consumes the marker and forces a fresh `wr-risk-scorer:pipeline`
8
+ # rescore. `incident-release` remains single-use (deliberate one-time
9
+ # override).
10
+ #
11
+ # Behavioural contract:
12
+ # (a) reducing-* marker exists + tree hash matches stored state-hash + TTL
13
+ # not expired → gate allows AND marker persists (reusable).
14
+ # (b) reducing-* marker exists + tree hash differs from state-hash → marker
15
+ # consumed, gate falls through to check_risk_gate (which denies on
16
+ # drift or missing score).
17
+ # (c) reducing-* marker exists + TTL expired (relative to marker mtime) →
18
+ # marker consumed, gate falls through.
19
+ # (d) incident-release marker stays single-use (unchanged behaviour) —
20
+ # regression guard.
21
+ #
22
+ # Tests invoke the gate hooks directly (script + stdin JSON), the way the
23
+ # Claude Code hook runtime calls them.
24
+
25
+ setup() {
26
+ HOOKS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
27
+ COMMIT_GATE="$HOOKS_DIR/risk-score-commit-gate.sh"
28
+ PUSH_GATE="$HOOKS_DIR/git-push-gate.sh"
29
+
30
+ TEST_SESSION="bats-p192-$$-${BATS_TEST_NUMBER}"
31
+ RDIR="${TMPDIR:-/tmp}/claude-risk-${TEST_SESSION}"
32
+ rm -rf "$RDIR"
33
+ mkdir -p "$RDIR"
34
+
35
+ # Minimal git repo so pipeline-state.sh --hash-inputs produces a stable
36
+ # tree hash (git stash create needs a real repo).
37
+ TMP_REPO="$(mktemp -d)"
38
+ cd "$TMP_REPO"
39
+ git init -q -b main
40
+ git config user.email "test@example.com"
41
+ git config user.name "Test"
42
+ cat > RISK-POLICY.md <<EOF
43
+ # Risk Policy
44
+
45
+ Last reviewed: $(date -u +%Y-%m-%d)
46
+
47
+ ## Risk Appetite
48
+
49
+ Pipeline gates block when cumulative residual risk exceeds 4.
50
+ EOF
51
+ git add RISK-POLICY.md
52
+ git commit -q -m "initial"
53
+
54
+ # Default short TTL so we can exercise expiry without slow tests.
55
+ export RISK_TTL=5
56
+ }
57
+
58
+ teardown() {
59
+ rm -rf "$RDIR"
60
+ rm -rf "$TMP_REPO"
61
+ unset RISK_TTL 2>/dev/null || true
62
+ }
63
+
64
+ # Compute the current pipeline-state hash the same way the gate does
65
+ _current_hash() {
66
+ bash -c "
67
+ source '$HOOKS_DIR/lib/gate-helpers.sh'
68
+ '$HOOKS_DIR/lib/pipeline-state.sh' --hash-inputs 2>/dev/null | _hashcmd | cut -d' ' -f1
69
+ "
70
+ }
71
+
72
+ # Portable backdate by N seconds
73
+ _backdate() {
74
+ local file="$1" seconds="$2"
75
+ local stamp
76
+ stamp=$(date -v-${seconds}S +%Y%m%d%H%M.%S 2>/dev/null \
77
+ || date -d "${seconds} seconds ago" +%Y%m%d%H%M.%S 2>/dev/null)
78
+ touch -t "$stamp" "$file"
79
+ }
80
+
81
+ invoke_commit_gate() {
82
+ local cmd="$1"
83
+ local input
84
+ input=$(python3 -c "
85
+ import json, sys
86
+ print(json.dumps({
87
+ 'tool_name': 'Bash',
88
+ 'tool_input': {'command': sys.argv[1]},
89
+ 'session_id': sys.argv[2],
90
+ }))
91
+ " "$cmd" "$TEST_SESSION")
92
+ echo "$input" | bash "$COMMIT_GATE"
93
+ }
94
+
95
+ invoke_push_gate() {
96
+ local cmd="$1"
97
+ local input
98
+ input=$(python3 -c "
99
+ import json, sys
100
+ print(json.dumps({
101
+ 'tool_name': 'Bash',
102
+ 'tool_input': {'command': sys.argv[1]},
103
+ 'session_id': sys.argv[2],
104
+ }))
105
+ " "$cmd" "$TEST_SESSION")
106
+ echo "$input" | bash "$PUSH_GATE"
107
+ }
108
+
109
+ # ---------------------------------------------------------------------------
110
+ # Commit gate — reducing-commit persistence
111
+ # ---------------------------------------------------------------------------
112
+
113
+ @test "reducing-commit marker persists when tree hash matches stored state-hash" {
114
+ HASH=$(_current_hash)
115
+ echo "$HASH" > "$RDIR/state-hash"
116
+ touch "$RDIR/reducing-commit"
117
+
118
+ run invoke_commit_gate 'git commit -m "x"'
119
+ [ "$status" -eq 0 ]
120
+ [[ "$output" != *"deny"* ]]
121
+
122
+ # Marker MUST still exist after a successful allow — this is the load-
123
+ # bearing behaviour change.
124
+ [ -f "$RDIR/reducing-commit" ]
125
+ }
126
+
127
+ @test "reducing-commit marker survives back-to-back commits (no rescore round-trip)" {
128
+ HASH=$(_current_hash)
129
+ echo "$HASH" > "$RDIR/state-hash"
130
+ touch "$RDIR/reducing-commit"
131
+
132
+ # Three sequential allows — current single-use marker consumes on first,
133
+ # leaving #2 and #3 to fall through to check_risk_gate (which denies on
134
+ # missing score). New persistent-within-TTL contract: all three pass.
135
+ run invoke_commit_gate 'git commit -m "1"'; [ "$status" -eq 0 ]; [[ "$output" != *"deny"* ]]
136
+ run invoke_commit_gate 'git commit -m "2"'; [ "$status" -eq 0 ]; [[ "$output" != *"deny"* ]]
137
+ run invoke_commit_gate 'git commit -m "3"'; [ "$status" -eq 0 ]; [[ "$output" != *"deny"* ]]
138
+
139
+ [ -f "$RDIR/reducing-commit" ]
140
+ }
141
+
142
+ @test "reducing-commit marker is consumed when tree hash drifts from stored state-hash" {
143
+ echo "stale-hash-value-from-prior-tree" > "$RDIR/state-hash"
144
+ touch "$RDIR/reducing-commit"
145
+
146
+ run invoke_commit_gate 'git commit -m "x"'
147
+ # Marker MUST be consumed when drift detected.
148
+ [ ! -f "$RDIR/reducing-commit" ]
149
+ }
150
+
151
+ @test "reducing-commit marker is consumed when TTL has expired" {
152
+ HASH=$(_current_hash)
153
+ echo "$HASH" > "$RDIR/state-hash"
154
+ touch "$RDIR/reducing-commit"
155
+ _backdate "$RDIR/reducing-commit" 10 # TTL is 5
156
+
157
+ run invoke_commit_gate 'git commit -m "x"'
158
+ # TTL-expired marker MUST be consumed and the gate must NOT silently allow
159
+ # purely on marker presence.
160
+ [ ! -f "$RDIR/reducing-commit" ]
161
+ }
162
+
163
+ @test "reducing-commit marker without state-hash file is consumed (no invariance proof)" {
164
+ rm -f "$RDIR/state-hash"
165
+ touch "$RDIR/reducing-commit"
166
+
167
+ run invoke_commit_gate 'git commit -m "x"'
168
+ # No way to verify tree-invariance → consume the marker rather than ride it.
169
+ [ ! -f "$RDIR/reducing-commit" ]
170
+ }
171
+
172
+ # ---------------------------------------------------------------------------
173
+ # Push gate — reducing-push persistence
174
+ # ---------------------------------------------------------------------------
175
+
176
+ @test "reducing-push marker persists when tree hash matches stored state-hash" {
177
+ HASH=$(_current_hash)
178
+ echo "$HASH" > "$RDIR/state-hash"
179
+ touch "$RDIR/reducing-push"
180
+
181
+ run invoke_push_gate 'npm run push:watch'
182
+ [ "$status" -eq 0 ]
183
+ [[ "$output" != *"deny"* ]]
184
+
185
+ [ -f "$RDIR/reducing-push" ]
186
+ }
187
+
188
+ @test "reducing-push marker is consumed when tree hash drifts" {
189
+ echo "stale-hash" > "$RDIR/state-hash"
190
+ touch "$RDIR/reducing-push"
191
+
192
+ run invoke_push_gate 'npm run push:watch'
193
+ [ ! -f "$RDIR/reducing-push" ]
194
+ }
195
+
196
+ # ---------------------------------------------------------------------------
197
+ # Release gate — reducing-release persistence
198
+ # ---------------------------------------------------------------------------
199
+
200
+ @test "reducing-release marker persists when tree hash matches stored state-hash" {
201
+ HASH=$(_current_hash)
202
+ echo "$HASH" > "$RDIR/state-hash"
203
+ touch "$RDIR/reducing-release"
204
+
205
+ run invoke_push_gate 'npm run release:watch'
206
+ [ "$status" -eq 0 ]
207
+ [[ "$output" != *"deny"* ]]
208
+
209
+ [ -f "$RDIR/reducing-release" ]
210
+ }
211
+
212
+ @test "reducing-release marker is consumed when tree hash drifts" {
213
+ echo "stale-hash" > "$RDIR/state-hash"
214
+ touch "$RDIR/reducing-release"
215
+
216
+ run invoke_push_gate 'npm run release:watch'
217
+ [ ! -f "$RDIR/reducing-release" ]
218
+ }
219
+
220
+ # ---------------------------------------------------------------------------
221
+ # incident-release — single-use regression guard
222
+ # ---------------------------------------------------------------------------
223
+
224
+ @test "incident-release marker REMAINS single-use (regression guard)" {
225
+ HASH=$(_current_hash)
226
+ echo "$HASH" > "$RDIR/state-hash"
227
+ touch "$RDIR/incident-release"
228
+
229
+ run invoke_push_gate 'npm run release:watch'
230
+ [ "$status" -eq 0 ]
231
+ [[ "$output" != *"deny"* ]]
232
+
233
+ # incident bypass is a deliberate one-time override — must be consumed
234
+ # even when tree hash matches.
235
+ [ ! -f "$RDIR/incident-release" ]
236
+ }
@@ -98,3 +98,21 @@ _backdate() {
98
98
  # Fail-safe: when the hook input cannot be parsed, treat as error and skip
99
99
  [ "$BEFORE" = "$AFTER" ]
100
100
  }
101
+
102
+ @test "slide: triggers correctly on Skill tool_response shape (P213, ADR-009 2026-06-08 amendment)" {
103
+ # hooks.json matcher expansion Agent|Bash → Agent|Bash|Skill (P213 Option D)
104
+ # widens slide-marker coverage to PostToolUse:Skill completions (e.g. the
105
+ # /wr-risk-scorer:assess-* sibling assessor SKILLs run as long subprocesses
106
+ # by the AFK orchestrator). The Skill tool_response shape is identical to
107
+ # Agent|Bash (Claude Code's uniform PostToolUse contract), so the matcher-
108
+ # agnostic helper composes without code changes. This test documents that
109
+ # contract explicitly so a future hook_input shape divergence regression
110
+ # surfaces here rather than at the gate-denial site.
111
+ touch "$MARKER"
112
+ _backdate "$MARKER" 60
113
+ BEFORE=$(_mtime "$MARKER")
114
+ _HOOK_INPUT='{"tool_name":"Skill","tool_response":{"content":[{"type":"text","text":"OK"}]}}'
115
+ slide_marker_on_subprocess_return "$MARKER"
116
+ AFTER=$(_mtime "$MARKER")
117
+ [ "$AFTER" -gt "$BEFORE" ]
118
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/risk-scorer",
3
- "version": "0.12.7",
3
+ "version": "0.12.8-preview.617",
4
4
  "description": "Pipeline risk scoring, commit/push gates, and secret leak detection",
5
5
  "bin": {
6
6
  "windyroad-risk-scorer": "./bin/install.mjs"
@@ -72,14 +72,14 @@ The orchestrator does NOT pre-compute the key — the hook derives it from the p
72
72
 
73
73
  ### 4. Delegate to wr-risk-scorer:external-comms
74
74
 
75
- Invoke the subagent via the `Skill` tool:
75
+ Invoke the external-comms reviewer via the `Skill` tool. The `wr-risk-scorer:external-comms` SKILL is a thin wrapper around the external-comms agent (per ADR-015 — see `packages/risk-scorer/skills/external-comms/SKILL.md`):
76
76
 
77
77
  ```
78
- subagent_type: wr-risk-scorer:external-comms
78
+ skill: wr-risk-scorer:external-comms
79
79
  prompt: <constructed review prompt from step 3>
80
80
  ```
81
81
 
82
- Wait for the subagent to complete. The subagent outputs a structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
82
+ Wait for the wrapper to return. The wrapper invokes the external-comms agent internally; the agent's structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL) flows back verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
83
83
 
84
84
  **Do not write to `${TMPDIR:-/tmp}/claude-risk-*` yourself.** The hook is the only correct mechanism.
85
85
 
@@ -64,14 +64,14 @@ Build a self-contained prompt for the pipeline subagent that includes:
64
64
 
65
65
  ### 5. Delegate to wr-risk-scorer:pipeline
66
66
 
67
- Invoke the pipeline subagent via the `Skill` tool:
67
+ Invoke the pipeline scorer via the `Skill` tool. The `wr-risk-scorer:pipeline` SKILL is a thin wrapper around the pipeline agent (per ADR-015 — see `packages/risk-scorer/skills/pipeline/SKILL.md`):
68
68
 
69
69
  ```
70
- subagent_type: wr-risk-scorer:pipeline
70
+ skill: wr-risk-scorer:pipeline
71
71
  prompt: <constructed assessment prompt from step 4>
72
72
  ```
73
73
 
74
- Wait for the subagent to complete. The subagent will output a structured `RISK_SCORES:` block. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads that output and writes the bypass marker files automatically.
74
+ Wait for the wrapper to return. The wrapper invokes the pipeline agent internally; the agent's structured `RISK_SCORES:` block flows back through the wrapper verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation and writes the bypass marker files automatically.
75
75
 
76
76
  **Do not write to `$TMPDIR/claude-risk-*` yourself.** The hook is the only correct mechanism.
77
77
 
@@ -0,0 +1,162 @@
1
+ #!/usr/bin/env bats
2
+ # Contract guard: the on-demand assessment SKILLs (assess-release, assess-wip,
3
+ # assess-external-comms) MUST delegate to their scoring agent via the Skill
4
+ # tool — not via the Agent tool — matching ADR-015's Confirmation literal
5
+ # phrasing ("the skill delegates to wr-risk-scorer:<agent> via the Skill
6
+ # tool"). Closes the P205 contradiction surfaced by ADR-015 Confirmation
7
+ # vs. SKILL.md prose mismatch.
8
+ #
9
+ # Structural assertions — Permitted Exception to the source-grep ban
10
+ # (ADR-005 / P011), same framing as risk-scorer-register-hint.bats. SKILL.md
11
+ # prose IS the contract document the orchestrator (Claude) consumes when
12
+ # executing the SKILL; an LLM-output behavioural check is out of scope for
13
+ # bats and is the responsibility of the promptfoo harness (ADR-075).
14
+ #
15
+ # What is asserted (contract, not implementation):
16
+ # 1. Each assess-* SKILL's step 5 (release) / step 3 (wip) / step 4
17
+ # (external-comms) names `skill:` as the delegation tool parameter
18
+ # with the correct wrapper SKILL name as the target.
19
+ # 2. None of the assess-* SKILLs name `subagent_type:` as the delegation
20
+ # tool parameter (the P205 contradiction class).
21
+ # 3. Each wrapper SKILL (`pipeline`, `wip`, `external-comms`) exists at
22
+ # its expected path, is namespaced `wr-risk-scorer:<name>`, and
23
+ # delegates to its sibling agent via `subagent_type:`.
24
+ #
25
+ # Cross-reference:
26
+ # P205: docs/problems/known-error/205-wr-risk-scorer-assess-release-skill-md-step-5-prose-says-skill-tool-but-provides-subagent-type.md
27
+ # ADR-015: docs/decisions/015-on-demand-assessment-skills.proposed.md (Confirmation criteria 189-193)
28
+ # ADR-052: docs/decisions/052-behavioural-tests-default.proposed.md (Permitted Exception)
29
+ # @jtbd JTBD-005 (invoke governance assessments on demand)
30
+ # @jtbd JTBD-101 (extend the suite — plugins expose corresponding skills)
31
+
32
+ setup() {
33
+ SKILLS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/../.." && pwd)"
34
+
35
+ ASSESS_RELEASE="${SKILLS_DIR}/assess-release/SKILL.md"
36
+ ASSESS_WIP="${SKILLS_DIR}/assess-wip/SKILL.md"
37
+ ASSESS_EXTERNAL_COMMS="${SKILLS_DIR}/assess-external-comms/SKILL.md"
38
+
39
+ WRAPPER_PIPELINE="${SKILLS_DIR}/pipeline/SKILL.md"
40
+ WRAPPER_WIP="${SKILLS_DIR}/wip/SKILL.md"
41
+ WRAPPER_EXTERNAL_COMMS="${SKILLS_DIR}/external-comms/SKILL.md"
42
+ }
43
+
44
+ # ──────────────────────────────────────────────────────────────────────────────
45
+ # Consumer SKILLs delegate via Skill tool (skill: parameter)
46
+ # ──────────────────────────────────────────────────────────────────────────────
47
+
48
+ @test "assess-release delegates via skill: wr-risk-scorer:pipeline" {
49
+ [ -f "$ASSESS_RELEASE" ]
50
+ run grep -E "^skill: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
51
+ [ "$status" -eq 0 ]
52
+ }
53
+
54
+ @test "assess-release does NOT use subagent_type: in its delegation block" {
55
+ [ -f "$ASSESS_RELEASE" ]
56
+ # The P205 contradiction was: prose says "Skill tool" but provides
57
+ # subagent_type: wr-risk-scorer:pipeline. After the fix, no
58
+ # `subagent_type:` line may appear in the delegation block.
59
+ run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
60
+ [ "$status" -ne 0 ]
61
+ }
62
+
63
+ @test "assess-wip delegates via skill: wr-risk-scorer:wip" {
64
+ [ -f "$ASSESS_WIP" ]
65
+ run grep -E "^skill: wr-risk-scorer:wip$" "$ASSESS_WIP"
66
+ [ "$status" -eq 0 ]
67
+ }
68
+
69
+ @test "assess-wip does NOT use subagent_type: in its delegation block" {
70
+ [ -f "$ASSESS_WIP" ]
71
+ run grep -E "^subagent_type: wr-risk-scorer:wip$" "$ASSESS_WIP"
72
+ [ "$status" -ne 0 ]
73
+ }
74
+
75
+ @test "assess-external-comms delegates via skill: wr-risk-scorer:external-comms" {
76
+ [ -f "$ASSESS_EXTERNAL_COMMS" ]
77
+ run grep -E "^skill: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
78
+ [ "$status" -eq 0 ]
79
+ }
80
+
81
+ @test "assess-external-comms does NOT use subagent_type: in its delegation block" {
82
+ [ -f "$ASSESS_EXTERNAL_COMMS" ]
83
+ run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
84
+ [ "$status" -ne 0 ]
85
+ }
86
+
87
+ # ──────────────────────────────────────────────────────────────────────────────
88
+ # Wrapper SKILLs exist with correct names and delegate to the agent
89
+ # ──────────────────────────────────────────────────────────────────────────────
90
+
91
+ @test "wrapper SKILL packages/risk-scorer/skills/pipeline/SKILL.md exists" {
92
+ [ -f "$WRAPPER_PIPELINE" ]
93
+ }
94
+
95
+ @test "wrapper SKILL pipeline declares name: wr-risk-scorer:pipeline" {
96
+ [ -f "$WRAPPER_PIPELINE" ]
97
+ run grep -E "^name: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
98
+ [ "$status" -eq 0 ]
99
+ }
100
+
101
+ @test "wrapper SKILL pipeline delegates to the pipeline agent via subagent_type:" {
102
+ [ -f "$WRAPPER_PIPELINE" ]
103
+ run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
104
+ [ "$status" -eq 0 ]
105
+ }
106
+
107
+ @test "wrapper SKILL packages/risk-scorer/skills/wip/SKILL.md exists" {
108
+ [ -f "$WRAPPER_WIP" ]
109
+ }
110
+
111
+ @test "wrapper SKILL wip declares name: wr-risk-scorer:wip" {
112
+ [ -f "$WRAPPER_WIP" ]
113
+ run grep -E "^name: wr-risk-scorer:wip$" "$WRAPPER_WIP"
114
+ [ "$status" -eq 0 ]
115
+ }
116
+
117
+ @test "wrapper SKILL wip delegates to the wip agent via subagent_type:" {
118
+ [ -f "$WRAPPER_WIP" ]
119
+ run grep -E "^subagent_type: wr-risk-scorer:wip$" "$WRAPPER_WIP"
120
+ [ "$status" -eq 0 ]
121
+ }
122
+
123
+ @test "wrapper SKILL packages/risk-scorer/skills/external-comms/SKILL.md exists" {
124
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
125
+ }
126
+
127
+ @test "wrapper SKILL external-comms declares name: wr-risk-scorer:external-comms" {
128
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
129
+ run grep -E "^name: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
130
+ [ "$status" -eq 0 ]
131
+ }
132
+
133
+ @test "wrapper SKILL external-comms delegates to the external-comms agent via subagent_type:" {
134
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
135
+ run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
136
+ [ "$status" -eq 0 ]
137
+ }
138
+
139
+ # ──────────────────────────────────────────────────────────────────────────────
140
+ # Wrapper SKILLs disambiguate from end-user assess-* surfaces
141
+ # ──────────────────────────────────────────────────────────────────────────────
142
+
143
+ @test "wrapper pipeline description names assess-release as the end-user surface" {
144
+ [ -f "$WRAPPER_PIPELINE" ]
145
+ # JTBD-005 persona-fit: solo developer must not land on the raw wrapper
146
+ # and miss the assess-* gate-satisfaction wrap-up. Description must
147
+ # disambiguate.
148
+ run grep -E "assess-release" "$WRAPPER_PIPELINE"
149
+ [ "$status" -eq 0 ]
150
+ }
151
+
152
+ @test "wrapper wip description names assess-wip as the end-user surface" {
153
+ [ -f "$WRAPPER_WIP" ]
154
+ run grep -E "assess-wip" "$WRAPPER_WIP"
155
+ [ "$status" -eq 0 ]
156
+ }
157
+
158
+ @test "wrapper external-comms description names assess-external-comms as the end-user surface" {
159
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
160
+ run grep -E "assess-external-comms" "$WRAPPER_EXTERNAL_COMMS"
161
+ [ "$status" -eq 0 ]
162
+ }
@@ -42,14 +42,14 @@ Build a self-contained prompt for the wip subagent that includes:
42
42
 
43
43
  ### 3. Delegate to wr-risk-scorer:wip
44
44
 
45
- Invoke the wip subagent via the `Skill` tool:
45
+ Invoke the WIP scorer via the `Skill` tool. The `wr-risk-scorer:wip` SKILL is a thin wrapper around the wip agent (per ADR-015 — see `packages/risk-scorer/skills/wip/SKILL.md`):
46
46
 
47
47
  ```
48
- subagent_type: wr-risk-scorer:wip
48
+ skill: wr-risk-scorer:wip
49
49
  prompt: <constructed assessment prompt from step 2>
50
50
  ```
51
51
 
52
- Wait for the subagent to complete.
52
+ Wait for the wrapper to return. The wrapper invokes the wip agent internally and returns the agent's verdict verbatim.
53
53
 
54
54
  ### 4. Present results
55
55
 
@@ -0,0 +1,37 @@
1
+ ---
2
+ name: wr-risk-scorer:external-comms
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:external-comms leak-review agent. Delegates to the agent via the Agent tool and returns the agent's structured EXTERNAL_COMMS_RISK_VERDICT. Internal-use plumbing used by `/wr-risk-scorer:assess-external-comms` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-external-comms` instead.
4
+ allowed-tools: Read, Glob, Grep, Bash, Agent
5
+ ---
6
+
7
+ # External-Comms Leak Review Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:external-comms` agent. It exists so consumer SKILLs can invoke the leak reviewer via the **Skill tool** with `skill: wr-risk-scorer:external-comms` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-external-comms` instead. This wrapper is internal-use plumbing — calling it directly returns the raw verdict without the structured AskUserQuestion above-appetite handling (Rewrite / Move to private channel / Override / Cancel) that `/wr-risk-scorer:assess-external-comms` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained leak-review prompt structured per `packages/risk-scorer/agents/external-comms.md` § "What you receive":
16
+ - A leading `SURFACE: <name>` line (one of the canonical surface strings).
17
+ - The draft body wrapped verbatim inside `<draft>...</draft>` markers (the PostToolUse hook derives the marker key from this).
18
+ - The destination when known.
19
+ - **Output**: the agent's verbatim verdict — `EXTERNAL_COMMS_RISK_VERDICT: PASS | FAIL` plus, on FAIL, an `EXTERNAL_COMMS_RISK_REASON:` block naming each Confidential Information class and the substrings that triggered it.
20
+ - **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict and writes the `external-comms-gate.sh` marker on PASS. The wrapper itself writes no files.
21
+
22
+ ## Steps
23
+
24
+ ### 1. Pass-through to the external-comms agent
25
+
26
+ Invoke the external-comms subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim. The `SURFACE:` line and `<draft>...</draft>` markers MUST be preserved exactly — the PostToolUse hook depends on the prompt structure for marker-key derivation:
27
+
28
+ ```
29
+ subagent_type: wr-risk-scorer:external-comms
30
+ prompt: $ARGUMENTS
31
+ ```
32
+
33
+ ### 2. Return the agent report verbatim
34
+
35
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `EXTERNAL_COMMS_RISK_VERDICT:` or `EXTERNAL_COMMS_RISK_REASON:` blocks — the hook parses the verdict and consumer SKILLs surface the reason directly.
36
+
37
+ $ARGUMENTS
@@ -0,0 +1,34 @@
1
+ ---
2
+ name: wr-risk-scorer:pipeline
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:pipeline scoring agent. Delegates to the agent via the Agent tool and returns the agent's structured RISK_SCORES output. Internal-use plumbing used by `/wr-risk-scorer:assess-release` and any other consumer SKILL that needs Skill-tool-shaped invocation of the pipeline scorer per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-release` instead.
4
+ allowed-tools: Read, Glob, Bash, Agent
5
+ ---
6
+
7
+ # Pipeline Scoring Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:pipeline` agent. It exists so consumer SKILLs can invoke the pipeline scorer via the **Skill tool** with `skill: wr-risk-scorer:pipeline` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-release` instead. This wrapper is internal-use plumbing — calling it directly returns raw scoring output without the gate-satisfaction wrap-up, AskUserQuestion above-appetite handling, or release-context resolution that `/wr-risk-scorer:assess-release` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained scoring prompt with pipeline state context. Caller assembles UNCOMMITTED / UNPUSHED / UNRELEASED sections per `packages/risk-scorer/agents/pipeline.md` § Pipeline State.
16
+ - **Output**: the agent's verbatim report, including the structured `RISK_SCORES: commit=N push=N release=N` block, optional `RISK_BYPASS:` line, optional `RISK_REMEDIATIONS:` block, optional `RISK_REGISTER_HINT:` block, and optional `CATALOG_HIT_RATE:` line.
17
+ - **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads the agent's output downstream of this wrapper and writes the bypass marker files to `${TMPDIR}/claude-risk-${SESSION_ID}/`. The wrapper itself writes no files.
18
+
19
+ ## Steps
20
+
21
+ ### 1. Pass-through to the pipeline agent
22
+
23
+ Invoke the pipeline subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
24
+
25
+ ```
26
+ subagent_type: wr-risk-scorer:pipeline
27
+ prompt: $ARGUMENTS
28
+ ```
29
+
30
+ ### 2. Return the agent report verbatim
31
+
32
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the structured output blocks (`RISK_SCORES:`, `RISK_BYPASS:`, `RISK_REMEDIATIONS:`, `RISK_REGISTER_HINT:`, `CATALOG_HIT_RATE:`). The PostToolUse hook depends on the exact byte sequence to parse.
33
+
34
+ $ARGUMENTS
@@ -0,0 +1,33 @@
1
+ ---
2
+ name: wr-risk-scorer:wip
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:wip nudge agent. Delegates to the agent via the Agent tool and returns the agent's structured WIP risk verdict. Internal-use plumbing used by `/wr-risk-scorer:assess-wip` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-wip` instead.
4
+ allowed-tools: Read, Glob, Bash, Agent
5
+ ---
6
+
7
+ # WIP Scoring Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:wip` agent. It exists so consumer SKILLs can invoke the WIP nudge scorer via the **Skill tool** with `skill: wr-risk-scorer:wip` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-wip` instead. This wrapper is internal-use plumbing — calling it directly returns raw nudge output without the present-results layer that `/wr-risk-scorer:assess-wip` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained WIP-scoring prompt — typically the edited file path(s) plus a `git diff HEAD --stat` summary per `packages/risk-scorer/agents/wip.md`.
16
+ - **Output**: the agent's verbatim report, including the WIP Risk Assessment markdown table, the cumulative pipeline risk picture, and the structured `RISK_VERDICT: CONTINUE | PAUSE | COMMIT` line.
17
+
18
+ ## Steps
19
+
20
+ ### 1. Pass-through to the wip agent
21
+
22
+ Invoke the wip subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
23
+
24
+ ```
25
+ subagent_type: wr-risk-scorer:wip
26
+ prompt: $ARGUMENTS
27
+ ```
28
+
29
+ ### 2. Return the agent report verbatim
30
+
31
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `RISK_VERDICT:`, `RISK_REMEDIATIONS:`, or `RISK_COMMIT_REASON:` blocks — consumer SKILLs parse them directly.
32
+
33
+ $ARGUMENTS