@windyroad/risk-scorer 0.3.3 → 0.3.4-preview.104

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "wr-risk-scorer",
3
- "version": "0.3.0",
3
+ "version": "0.3.4",
4
4
  "description": "Pipeline risk scoring, commit/push/release gates for Claude Code"
5
- }
5
+ }
@@ -94,7 +94,36 @@ Commit score >= push score >= release score (risk accumulates upward).
94
94
 
95
95
  ## Risk-Reducing and Risk-Neutral Bypass
96
96
 
97
- Assess whether each action is risk-reducing, risk-neutral, or risk-increasing. Include `RISK_BYPASS: reducing` in your output for reducing/neutral actions. Do not include it for risk-increasing actions.
97
+ `RISK_BYPASS: reducing` is reserved for commits that genuinely reduce risk.
98
+ The 329-report retrospective found this label applied to 97.9% of commits in
99
+ this repo because the old criteria were too loose — changeset metadata, ADR
100
+ checkbox ticks, and docs-only edits all earned the bypass. When nearly
101
+ everything is "reducing", the label provides no discriminating signal. These
102
+ criteria tighten that.
103
+
104
+ Emit `RISK_BYPASS: reducing` ONLY when ALL of the following are true:
105
+ 1. The commit closes a problem ticket (the diff includes a `.known-error.md` →
106
+ `.closed.md` rename, references "closes P<NNN>" in the commit message, or
107
+ adds a `## Fix Committed` section to a known-error ticket), OR
108
+ 2. The commit explicitly remediates a risk item previously flagged by the
109
+ scorer in a prior report (the diff fixes something a prior risk report
110
+ called out), OR
111
+ 3. The commit removes a documented risk (retires a hazardous hook, removes an
112
+ insecure API, deletes a known-defective code path)
113
+
114
+ Ordinary commits that do not meet at least one of these conditions are **risk-neutral, not risk-reducing**. Docs-only edits, test-only additions without a remediation link, and routine refactors are all neutral — do NOT emit the reducing bypass for them.
115
+
116
+ When emitting `RISK_BYPASS: reducing`, cite the reason on a companion
117
+ `RISK_BYPASS_REASON:` line so the bypass is auditable:
118
+
119
+ ```
120
+ RISK_BYPASS: reducing
121
+ RISK_BYPASS_REASON: closes P043 (tightens reducing-bypass criteria; removes previously-flagged over-application)
122
+ ```
123
+
124
+ Acceptable `RISK_BYPASS_REASON:` values cite the ticket ID closed, the prior
125
+ risk report remediated, or the removed risk — matching one of the three
126
+ criteria above.
98
127
 
99
128
  For live incidents (outage, security, information disclosure), include `RISK_BYPASS: incident`.
100
129
 
@@ -109,7 +138,18 @@ Do NOT emit: "Suggested Actions", "Your call:", advisory warnings, back-pressure
109
138
 
110
139
  ## Above-Appetite Remediations
111
140
 
112
- When ANY cumulative score exceeds appetite (> 4), emit a structured `RISK_REMEDIATIONS:` block after the `RISK_SCORES:` line. This gives the calling skill machine-readable input for structured decision prompts.
141
+ When ANY cumulative score exceeds appetite (> 4), the verbal verdict is **STOP**.
142
+ The scorer is not the primary decision-maker — the hook gate will block the
143
+ action — but the scorer's verdict must match the structured score so the agent
144
+ does not waste tool calls acting on an ambiguous nudge.
145
+
146
+ **Do NOT emit** "Proceed", "Proceed with release", "Continue", "You may ship",
147
+ "OK to commit/push/release", or any similar nudge language when cumulative risk
148
+ exceeds appetite. The only sanctioned above-appetite output is the Risk Report
149
+ structure, `RISK_SCORES: ...`, and the structured `RISK_REMEDIATIONS:` block
150
+ defined below.
151
+
152
+ Emit a structured `RISK_REMEDIATIONS:` block after the `RISK_SCORES:` line. This gives the calling skill machine-readable input for structured decision prompts.
113
153
 
114
154
  Format (5 columns — machine-readable for structured AskUserQuestion prompts in calling skills):
115
155
  ```
@@ -144,6 +184,42 @@ Do not rely on a static list. For each control claimed to reduce risk, you MUST:
144
184
  3. Ask: "Would this control catch this failure before reaching the user?"
145
185
  4. **Name the control**: "Tests pass" is not a control. Name the specific test file and scenario. If you cannot name it, it provides 0 reduction.
146
186
 
187
+ **Monitoring is not a control.** Monitoring, alerting, dashboards, and any other post-release detection activity MUST NOT be credited as a control that reduces residual risk. Post-release detection does NOT reduce pre-release risk — it only shortens the time to notice a failure after it has already reached users. A genuine control exercises the failure
188
+ scenario BEFORE the change ships: a test, a CI gate, a feature flag, a preview
189
+ verification, an architect review, an installer dry-run. Monitoring and rollback
190
+ readiness may be listed separately as "post-release follow-ups" outside the
191
+ residual risk computation, but MUST NOT appear in a Controls list and MUST NOT
192
+ reduce any inherent risk score.
193
+
194
+ ## User-Stated Preconditions Check
195
+
196
+ A technical control list never substitutes for an explicit user warning. Before
197
+ credit is given to any control, check for **user-stated preconditions** — conditions
198
+ the user has named in the current conversation, commit messages, changesets, or
199
+ problem tickets that tie this change to a paired capability (e.g., "A is only safe
200
+ if B ships alongside", "don't release X until Y is merged").
201
+
202
+ For each user-stated precondition:
203
+ 1. Determine whether the paired capability is released, queued in the unreleased
204
+ changeset batch, or unmet.
205
+ 2. If unmet, the precondition is a failed control — credit zero reduction from
206
+ otherwise-valid controls (tests, CI, architect review) that do not address the
207
+ precondition itself.
208
+ 3. Surface the unmet precondition as a standalone **Risk item** with inherent
209
+ impact and likelihood reflecting the consequence the user warned about.
210
+ Inherent risk MUST be >= Medium (>= 5), even when the diff's technical risk
211
+ alone would score Low. This routes the precondition through the existing
212
+ above-appetite `RISK_REMEDIATIONS:` flow rather than burying it in prose.
213
+
214
+ Sources to inspect for stated preconditions:
215
+ - Recent conversation messages directed to the agent
216
+ - Open or known-error problem tickets referenced in the diff or recent commits
217
+ - Commit messages and changeset files on the unreleased queue
218
+ - CLAUDE.md notes about cross-cutting dependencies
219
+
220
+ User warnings reflect domain context the scorer cannot derive from the diff alone.
221
+ They outrank the technical assessment.
222
+
147
223
  ## Constraints
148
224
 
149
225
  - You are a scorer, not an editor.
package/agents/plan.md CHANGED
@@ -49,7 +49,13 @@ You are the Risk Scorer in plan review mode. Assess both the plan's own risk AND
49
49
 
50
50
  End your report with `RISK_VERDICT: PASS` or `RISK_VERDICT: FAIL` on its own line. A PostToolUse hook reads this and writes the marker files — do NOT write files yourself.
51
51
 
52
- On FAIL, emit a structured `RISK_REMEDIATIONS:` block after the verdict (5 columns machine-readable for structured AskUserQuestion prompts in calling skills):
52
+ On FAIL, the verbal verdict is **STOP**. **Do NOT emit** "Proceed", "Continue",
53
+ "You may ship", "OK to implement", or any similar nudge language. The plan is
54
+ not policy-authorised — the only sanctioned FAIL output is the Plan Risk Report,
55
+ the `RISK_VERDICT: FAIL` marker, and the structured `RISK_REMEDIATIONS:` block
56
+ defined below.
57
+
58
+ Emit a structured `RISK_REMEDIATIONS:` block after the verdict (5 columns — machine-readable for structured AskUserQuestion prompts in calling skills):
53
59
  ```
54
60
  RISK_REMEDIATIONS:
55
61
  - R1 | <description of what the plan must add/change> | <effort S/M/L> | <risk_delta -N> | <affected area>
@@ -68,6 +74,27 @@ For each control claimed to reduce risk:
68
74
  2. Name the specific test file/scenario or hook
69
75
  3. If you cannot name it, it provides 0 reduction
70
76
 
77
+ **Monitoring is not a control.** Monitoring, alerting, dashboards, and any other post-release detection activity MUST NOT be credited as a control in a plan's residual risk. Post-release detection does NOT reduce pre-release risk — it only shortens the time to notice a failure after it has already reached users. A genuine control exercises the failure scenario before the
78
+ plan's changes ship: a test, a CI gate, a feature flag, a preview verification,
79
+ an architect review. Monitoring MUST NOT appear in a Controls list and MUST NOT
80
+ reduce any inherent risk score.
81
+
82
+ ## User-Stated Preconditions Check
83
+
84
+ Before crediting any control, check for **user-stated preconditions** — conditions
85
+ the user has named in the plan, associated problem tickets, commit messages, or
86
+ CLAUDE.md that tie this plan to a paired capability (e.g., "A is only safe if B
87
+ ships alongside", "don't release X until Y is merged").
88
+
89
+ For each user-stated precondition:
90
+ 1. Check whether the plan already addresses or queues the paired capability.
91
+ 2. If the precondition is unmet in the plan, credit zero reduction from controls
92
+ that do not cover the precondition, and surface the unmet precondition as a **Risk item** with inherent risk >= Medium (>= 5).
93
+ 3. A plan that ships a change without addressing a user-stated precondition
94
+ must be FAIL, regardless of the diff's technical score.
95
+
96
+ User warnings outrank technical control discovery.
97
+
71
98
  ## Constraints
72
99
 
73
100
  - You are a scorer, not an editor.
@@ -0,0 +1,80 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: risk-scorer agent prompts must contain an explicit
3
+ # STOP / do-not-proceed directive in their Above-Appetite sections.
4
+ #
5
+ # Structural assertions — Permitted Exception to the source-grep ban (ADR-005 / P011).
6
+ # These tests assert that the pipeline, wip, and plan scorer prompts forbid
7
+ # "Proceed", "Continue", or "You may ship" nudges when cumulative risk
8
+ # exceeds appetite.
9
+ #
10
+ # Background: P037 identified that scorer reports could include "Proceed
11
+ # with release" or similar nudge language even when residual risk exceeded
12
+ # appetite. The hook gate then correctly blocked the action, but only after
13
+ # the agent wasted tool calls and tokens acting on the nudge. The scorer
14
+ # is not the primary decision-maker, but its verbal verdict must match the
15
+ # structured score — ambiguous "proceed" language undermines this.
16
+ #
17
+ # The Below-Appetite Output Rule (ADR-013 Rule 5) already requires silent
18
+ # policy-authorised release when all scores are within appetite. This guard
19
+ # enforces the inverse: an explicit STOP directive above appetite.
20
+ #
21
+ # Cross-reference:
22
+ # P037: docs/problems/037-risk-scorer-proceeds-above-appetite.open.md
23
+ # ADR-013: docs/decisions/013-structured-user-interaction-for-governance-decisions.proposed.md
24
+ # @jtbd JTBD-001 (enforce governance without slowing down)
25
+ # @jtbd JTBD-002 (ship with confidence — verbal verdict must match structured score)
26
+ # @jtbd JTBD-202 (pre-flight governance — structured output is the only sanctioned channel)
27
+
28
+ setup() {
29
+ AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
30
+ PIPELINE="${AGENTS_DIR}/pipeline.md"
31
+ WIP="${AGENTS_DIR}/wip.md"
32
+ PLAN="${AGENTS_DIR}/plan.md"
33
+ }
34
+
35
+ # ──────────────────────────────────────────────────────────────────────────────
36
+ # pipeline.md: Above-Appetite STOP directive
37
+ # ──────────────────────────────────────────────────────────────────────────────
38
+
39
+ @test "pipeline.md Above-Appetite section contains explicit STOP directive" {
40
+ # Must contain the word STOP (or BLOCKED) as the verdict above appetite.
41
+ run grep -qE "STOP|BLOCKED" "$PIPELINE"
42
+ [ "$status" -eq 0 ]
43
+ }
44
+
45
+ @test "pipeline.md Above-Appetite section forbids Proceed nudges" {
46
+ # Must explicitly forbid emitting "Proceed" / "Continue" nudges
47
+ # when risk exceeds appetite.
48
+ run grep -qE "[Dd]o NOT emit.*Proceed|forbid.*Proceed|not emit.*Continue|must not.*proceed" "$PIPELINE"
49
+ [ "$status" -eq 0 ]
50
+ }
51
+
52
+ # ──────────────────────────────────────────────────────────────────────────────
53
+ # wip.md: Above-Appetite STOP directive
54
+ # ──────────────────────────────────────────────────────────────────────────────
55
+
56
+ @test "wip.md Above-Appetite section contains explicit STOP directive" {
57
+ # PAUSE is the wip-mode verdict equivalent of STOP.
58
+ run grep -qE "STOP|BLOCKED|PAUSE" "$WIP"
59
+ [ "$status" -eq 0 ]
60
+ }
61
+
62
+ @test "wip.md Above-Appetite section forbids Proceed nudges" {
63
+ run grep -qE "[Dd]o NOT emit.*Proceed|forbid.*Proceed|not emit.*Continue|must not.*proceed" "$WIP"
64
+ [ "$status" -eq 0 ]
65
+ }
66
+
67
+ # ──────────────────────────────────────────────────────────────────────────────
68
+ # plan.md: FAIL directive (plan-mode equivalent of STOP)
69
+ # ──────────────────────────────────────────────────────────────────────────────
70
+
71
+ @test "plan.md FAIL section contains explicit STOP directive" {
72
+ # FAIL is the plan-mode verdict; reinforces STOP language.
73
+ run grep -qE "STOP|BLOCKED|FAIL" "$PLAN"
74
+ [ "$status" -eq 0 ]
75
+ }
76
+
77
+ @test "plan.md FAIL section forbids Proceed nudges" {
78
+ run grep -qE "[Dd]o NOT emit.*Proceed|forbid.*Proceed|not emit.*Continue|must not.*proceed" "$PLAN"
79
+ [ "$status" -eq 0 ]
80
+ }
@@ -0,0 +1,76 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: risk-scorer agent prompts must explicitly state that
3
+ # monitoring, alerting, and other post-release detection activities are
4
+ # NOT controls and MUST NOT be credited against residual risk.
5
+ #
6
+ # Structural assertions — Permitted Exception to the source-grep ban (ADR-005 / P011).
7
+ #
8
+ # Background: P038 identified that scorer reports were crediting
9
+ # "monitor for elevated errors", "be ready to rollback", and similar
10
+ # post-release detection activities as controls that reduced residual
11
+ # risk. These activities help detect failures after they occur — they
12
+ # are incident response, not release-gate risk reduction. Crediting
13
+ # them creates false confidence in risky releases.
14
+ #
15
+ # A genuine control exercises the failure scenario BEFORE the change
16
+ # ships (tests, CI gates, feature flags, preview verification, architect
17
+ # review). Monitoring shortens detection time; it does not prevent the
18
+ # failure from reaching users.
19
+ #
20
+ # Cross-reference:
21
+ # P038: docs/problems/038-risk-scorer-suggests-monitoring-as-control.open.md
22
+ # ADR-013: docs/decisions/013-structured-user-interaction-for-governance-decisions.proposed.md
23
+ # @jtbd JTBD-001 (enforce governance — control list must reflect actual prevention)
24
+ # @jtbd JTBD-002 (ship with confidence — no false-confidence releases)
25
+ # @jtbd JTBD-202 (pre-flight governance — scorer must distinguish prevention from detection)
26
+
27
+ setup() {
28
+ AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
29
+ PIPELINE="${AGENTS_DIR}/pipeline.md"
30
+ WIP="${AGENTS_DIR}/wip.md"
31
+ PLAN="${AGENTS_DIR}/plan.md"
32
+ }
33
+
34
+ # ──────────────────────────────────────────────────────────────────────────────
35
+ # pipeline.md
36
+ # ──────────────────────────────────────────────────────────────────────────────
37
+
38
+ @test "pipeline.md states monitoring is not a control" {
39
+ run grep -qE "[Mm]onitoring is (not|NOT) a control|[Mm]onitoring.*MUST NOT.*credit" "$PIPELINE"
40
+ [ "$status" -eq 0 ]
41
+ }
42
+
43
+ @test "pipeline.md forbids crediting post-release detection as risk reduction" {
44
+ # Post-release detection activities (monitoring, alerting, rollback readiness)
45
+ # must not reduce residual risk.
46
+ run grep -qE "post-release.*(not|NOT) (reduce|control|credit)|detection.*(not|NOT) (reduce|prevention)" "$PIPELINE"
47
+ [ "$status" -eq 0 ]
48
+ }
49
+
50
+ # ──────────────────────────────────────────────────────────────────────────────
51
+ # wip.md
52
+ # ──────────────────────────────────────────────────────────────────────────────
53
+
54
+ @test "wip.md states monitoring is not a control" {
55
+ run grep -qE "[Mm]onitoring is (not|NOT) a control|[Mm]onitoring.*MUST NOT.*credit" "$WIP"
56
+ [ "$status" -eq 0 ]
57
+ }
58
+
59
+ @test "wip.md forbids crediting post-release detection as risk reduction" {
60
+ run grep -qE "post-release.*(not|NOT) (reduce|control|credit)|detection.*(not|NOT) (reduce|prevention)" "$WIP"
61
+ [ "$status" -eq 0 ]
62
+ }
63
+
64
+ # ──────────────────────────────────────────────────────────────────────────────
65
+ # plan.md
66
+ # ──────────────────────────────────────────────────────────────────────────────
67
+
68
+ @test "plan.md states monitoring is not a control" {
69
+ run grep -qE "[Mm]onitoring is (not|NOT) a control|[Mm]onitoring.*MUST NOT.*credit" "$PLAN"
70
+ [ "$status" -eq 0 ]
71
+ }
72
+
73
+ @test "plan.md forbids crediting post-release detection as risk reduction" {
74
+ run grep -qE "post-release.*(not|NOT) (reduce|control|credit)|detection.*(not|NOT) (reduce|prevention)" "$PLAN"
75
+ [ "$status" -eq 0 ]
76
+ }
@@ -0,0 +1,62 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: risk-scorer agent prompts must scope the
3
+ # `RISK_BYPASS: reducing` label to commits that actually reduce risk.
4
+ #
5
+ # Structural assertions — Permitted Exception to the source-grep ban (ADR-005 / P011).
6
+ #
7
+ # Background: P043 analysed 329 risk reports across 6 projects and found
8
+ # `RISK_BYPASS: reducing` applied to 97.9% of commits in this repo and
9
+ # 79.6% across consumer projects. The scorer treated changeset metadata,
10
+ # ADR checkbox ticks, docs-only edits, and genuinely risk-reducing fixes
11
+ # all the same way. When nearly every commit is "reducing", the label
12
+ # provides no discriminating signal.
13
+ #
14
+ # The tightened criteria require the commit to:
15
+ # 1. Close a problem ticket, OR
16
+ # 2. Explicitly remediate a previously-flagged risk, OR
17
+ # 3. Remove a documented risk
18
+ # Ordinary docs-only or test-only commits that don't meet one of these
19
+ # conditions are risk-neutral — no bypass label.
20
+ #
21
+ # Cross-reference:
22
+ # P043: docs/problems/043-risk-bypass-reducing-lost-discriminating-power.open.md
23
+ # ADR-013: docs/decisions/013-structured-user-interaction-for-governance-decisions.proposed.md
24
+ # @jtbd JTBD-001 (enforce governance — bypass must reflect real risk reduction)
25
+ # @jtbd JTBD-202 (pre-flight governance — bypass label must be auditable)
26
+
27
+ setup() {
28
+ AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
29
+ PIPELINE="${AGENTS_DIR}/pipeline.md"
30
+ }
31
+
32
+ # NOTE: wip.md is intentionally excluded from these assertions — wip-mode emits
33
+ # RISK_VERDICT: CONTINUE/PAUSE, not RISK_BYPASS labels. Bypass criteria apply
34
+ # only to the pipeline (commit/push/release) scorer.
35
+
36
+ # ──────────────────────────────────────────────────────────────────────────────
37
+ # pipeline.md: tightened reducing criteria
38
+ # ──────────────────────────────────────────────────────────────────────────────
39
+
40
+ @test "pipeline.md reducing bypass requires closing a ticket" {
41
+ # Must reference ticket closure as a valid trigger for reducing bypass.
42
+ run grep -qE "[Cc]lose[sd]?.*ticket|[Cc]loses P[0-9]|problem.*close" "$PIPELINE"
43
+ [ "$status" -eq 0 ]
44
+ }
45
+
46
+ @test "pipeline.md reducing bypass requires remediating a flagged risk" {
47
+ run grep -qE "remediate.*risk|remediates.*risk|flagged risk" "$PIPELINE"
48
+ [ "$status" -eq 0 ]
49
+ }
50
+
51
+ @test "pipeline.md reducing bypass excludes docs-only neutral commits" {
52
+ # Ordinary docs/test commits without ticket closure must NOT earn the bypass.
53
+ run grep -qE "docs-only.*neutral|test-only.*neutral|ordinary.*neutral|neutral.*no bypass" "$PIPELINE"
54
+ [ "$status" -eq 0 ]
55
+ }
56
+
57
+ @test "pipeline.md requires audit reason for reducing bypass" {
58
+ # Audit trail: cite which ticket closed, which risk remediated, etc.
59
+ run grep -qE "RISK_BYPASS_REASON|cite.*ticket|reason.*bypass|bypass.*reason" "$PIPELINE"
60
+ [ "$status" -eq 0 ]
61
+ }
62
+
@@ -0,0 +1,89 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: risk-scorer agent prompts must define a User-Stated
3
+ # Preconditions Check as a sub-rule of Control Discovery.
4
+ #
5
+ # Structural assertions — Permitted Exception to the source-grep ban (ADR-005 / P011).
6
+ # These tests assert that the pipeline, wip, and plan scorer prompts
7
+ # instruct the scorer to detect user-stated conditional-delivery warnings
8
+ # and surface unmet preconditions as Risk items.
9
+ #
10
+ # Background: P041 identified that the risk scorer evaluated technical
11
+ # risk of a diff in isolation and missed explicit user-stated warnings
12
+ # that a change was conditional on a paired capability. Downstream this
13
+ # caused a breaking change to ship to production despite a twice-stated
14
+ # user warning. This guard prevents regression of the fix: every scoring
15
+ # agent must have a User-Stated Preconditions Check.
16
+ #
17
+ # Cross-reference:
18
+ # P041: docs/problems/041-risk-scorer-misses-user-stated-dependencies.known-error.md
19
+ # ADR-013: structured user interaction for governance decisions
20
+ # @jtbd JTBD-002 (ship with confidence — user-stated preconditions are honoured)
21
+ # @jtbd JTBD-202 (pre-flight governance checks surface explicit warnings)
22
+
23
+ setup() {
24
+ AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
25
+ PIPELINE="${AGENTS_DIR}/pipeline.md"
26
+ WIP="${AGENTS_DIR}/wip.md"
27
+ PLAN="${AGENTS_DIR}/plan.md"
28
+ }
29
+
30
+ # ──────────────────────────────────────────────────────────────────────────────
31
+ # pipeline.md: user-stated precondition check
32
+ # ──────────────────────────────────────────────────────────────────────────────
33
+
34
+ @test "pipeline.md defines User-Stated Preconditions Check section" {
35
+ run grep -q "User-Stated Preconditions" "$PIPELINE"
36
+ [ "$status" -eq 0 ]
37
+ }
38
+
39
+ @test "pipeline.md precondition check surfaces unmet preconditions as Risk items" {
40
+ # Unmet preconditions must flow through the existing Risk item structure,
41
+ # which feeds RISK_REMEDIATIONS above appetite (>= 5).
42
+ run grep -qE "precondition.*Risk item|Risk item.*precondition" "$PIPELINE"
43
+ [ "$status" -eq 0 ]
44
+ }
45
+
46
+ @test "pipeline.md precondition check credits zero reduction when paired capability is unmet" {
47
+ # Aligns with existing Control Discovery rule: if a control cannot be named,
48
+ # or a stated precondition is unmet, the control provides 0 reduction.
49
+ run grep -qE "zero reduction|0 reduction" "$PIPELINE"
50
+ [ "$status" -eq 0 ]
51
+ }
52
+
53
+ # ──────────────────────────────────────────────────────────────────────────────
54
+ # wip.md: user-stated precondition check
55
+ # ──────────────────────────────────────────────────────────────────────────────
56
+
57
+ @test "wip.md defines User-Stated Preconditions Check section" {
58
+ run grep -q "User-Stated Preconditions" "$WIP"
59
+ [ "$status" -eq 0 ]
60
+ }
61
+
62
+ @test "wip.md precondition check surfaces unmet preconditions as Risk items" {
63
+ run grep -qE "precondition.*Risk item|Risk item.*precondition" "$WIP"
64
+ [ "$status" -eq 0 ]
65
+ }
66
+
67
+ @test "wip.md precondition check credits zero reduction when paired capability is unmet" {
68
+ run grep -qE "zero reduction|0 reduction" "$WIP"
69
+ [ "$status" -eq 0 ]
70
+ }
71
+
72
+ # ──────────────────────────────────────────────────────────────────────────────
73
+ # plan.md: user-stated precondition check
74
+ # ──────────────────────────────────────────────────────────────────────────────
75
+
76
+ @test "plan.md defines User-Stated Preconditions Check section" {
77
+ run grep -q "User-Stated Preconditions" "$PLAN"
78
+ [ "$status" -eq 0 ]
79
+ }
80
+
81
+ @test "plan.md precondition check surfaces unmet preconditions as Risk items" {
82
+ run grep -qE "precondition.*Risk item|Risk item.*precondition" "$PLAN"
83
+ [ "$status" -eq 0 ]
84
+ }
85
+
86
+ @test "plan.md precondition check credits zero reduction when paired capability is unmet" {
87
+ run grep -qE "zero reduction|0 reduction" "$PLAN"
88
+ [ "$status" -eq 0 ]
89
+ }
package/agents/wip.md CHANGED
@@ -51,7 +51,15 @@ If cumulative risk is **within appetite** (< 5): provide the assessment table an
51
51
 
52
52
  ### Above-Appetite Remediations
53
53
 
54
- If cumulative risk **exceeds appetite** (>= 5): provide the assessment table, then emit a structured `RISK_REMEDIATIONS:` block with specific risk-reducing actions:
54
+ If cumulative risk **exceeds appetite** (>= 5), the verbal verdict is **PAUSE**
55
+ (the wip-mode equivalent of STOP).
56
+
57
+ **Do NOT emit** "Proceed", "Continue", "OK to edit", "You may commit", or any
58
+ similar nudge language when cumulative risk exceeds appetite. The only
59
+ sanctioned above-appetite output is the WIP Risk Assessment table and the
60
+ structured `RISK_REMEDIATIONS:` block defined below.
61
+
62
+ Provide the assessment table, then emit a structured `RISK_REMEDIATIONS:` block with specific risk-reducing actions:
55
63
 
56
64
  Format (5 columns — machine-readable for structured AskUserQuestion prompts in calling skills):
57
65
  ```
@@ -100,6 +108,25 @@ RISK_COMMIT_REASON: <one-line description of the completed governance work detec
100
108
 
101
109
  For each control claimed to reduce risk, name the specific test file/scenario. If you cannot name it, it provides 0 reduction.
102
110
 
111
+ **Monitoring is not a control.** Monitoring, alerting, dashboards, and any other post-release detection activity MUST NOT be credited or reduce residual risk. Post-release detection does NOT reduce pre-release risk — it only shortens the time to notice a failure after it has already reached users.
112
+ A genuine control exercises the failure scenario before the change ships: a
113
+ test, a CI gate, a feature flag, a preview verification. Monitoring MUST NOT
114
+ appear in a Controls list and MUST NOT reduce any inherent risk score.
115
+
116
+ ## User-Stated Preconditions Check
117
+
118
+ Before crediting any control, check for **user-stated preconditions** — conditions
119
+ the user has named in the current conversation, commit messages, changesets, or
120
+ problem tickets that tie this change to a paired capability (e.g., "A is only safe
121
+ if B ships alongside").
122
+
123
+ If a paired capability is unmet, credit zero reduction from controls that do not
124
+ address the precondition, and surface the unmet precondition as a **Risk item**
125
+ with inherent risk >= Medium (>= 5). This routes it into the above-appetite
126
+ `RISK_REMEDIATIONS:` flow and forces a PAUSE verdict until the precondition is
127
+ met or the change is revised. User warnings outrank the diff's technical
128
+ assessment.
129
+
103
130
  ## Constraints
104
131
 
105
132
  - You are a scorer, not an editor. Do NOT write files — a PostToolUse hook handles that.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/risk-scorer",
3
- "version": "0.3.3",
3
+ "version": "0.3.4-preview.104",
4
4
  "description": "Pipeline risk scoring, commit/push gates, and secret leak detection",
5
5
  "bin": {
6
6
  "windyroad-risk-scorer": "./bin/install.mjs"