@windyroad/risk-scorer 0.5.0 → 0.6.0-preview.283
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +21 -1
- package/agents/pipeline.md +30 -0
- package/agents/test/risk-scorer-catalog-consumption.bats +138 -0
- package/bin/wr-risk-scorer-extract-risks-from-reports +3 -0
- package/package.json +1 -1
- package/scripts/extract-risks-from-reports.sh +445 -0
- package/scripts/test/drain-register-queue.bats +48 -3
- package/scripts/test/extract-risks-from-reports.bats +267 -0
- package/skills/bootstrap-catalog/SKILL.md +218 -0
- package/skills/bootstrap-catalog/test/bootstrap-catalog.bats +168 -0
- package/skills/create-risk/SKILL.md +47 -4
- package/skills/create-risk/test/create-risk-flag-driven.bats +136 -0
package/README.md
CHANGED
|
@@ -51,6 +51,7 @@ This creates a `RISK-POLICY.md` tailored to your project, defining impact levels
|
|
|
51
51
|
| `wip-risk-mark.sh` | After edit | Records WIP risk assessment |
|
|
52
52
|
| `risk-score-mark.sh` | Agent completes | Marks risk review as done; writes external-comms marker on `wr-risk-scorer:external-comms` PASS |
|
|
53
53
|
| `risk-hash-refresh.sh` | After Bash | Refreshes content hashes |
|
|
54
|
+
| `risk-slide-marker.sh` | Agent or Bash | Slides the review marker forward across non-edit operations so an active review session is not invalidated by intervening Bash or sub-agent calls |
|
|
54
55
|
|
|
55
56
|
## Agents
|
|
56
57
|
|
|
@@ -72,7 +73,8 @@ The plugin includes six specialised agents:
|
|
|
72
73
|
| `/wr-risk-scorer:assess-wip` | WIP risk nudge for the current uncommitted diff |
|
|
73
74
|
| `/wr-risk-scorer:assess-release` | Pipeline risk assessment for the unpushed queue (pre-satisfies the commit gate) |
|
|
74
75
|
| `/wr-risk-scorer:assess-external-comms` | External-comms leak review for a draft outbound body (pre-satisfies the external-comms gate) |
|
|
75
|
-
| `/wr-risk-scorer:create-risk` | Create a standing-risk register entry |
|
|
76
|
+
| `/wr-risk-scorer:create-risk` | Create a standing-risk register entry (interactive authoring; orchestrator-driven prefilled invocation via `--slug` / `--prefill` flags per ADR-059) |
|
|
77
|
+
| `/wr-risk-scorer:bootstrap-catalog` | Bootstrap `docs/risks/` register from existing `.risk-reports/` corpus per ADR-059 — walks reports, dedupes by ADR-056 slug, emits one `R<NNN>-<slug>.active.md` per unique slug. Idempotent. Auto-triggers from `/install-updates` Step 6.5.1 when register is empty + `RISK-POLICY.md` present + `.risk-reports/` non-empty |
|
|
76
78
|
| `/wr-risk-scorer:update-policy` | Generate or update `RISK-POLICY.md` |
|
|
77
79
|
|
|
78
80
|
## External-comms gate
|
|
@@ -106,6 +108,24 @@ The canonical hook lives at `packages/shared/hooks/external-comms-gate.sh` and
|
|
|
106
108
|
is synced into each consumer plugin via `scripts/sync-external-comms-gate.sh`
|
|
107
109
|
per ADR-017 (CI runs `npm run check:external-comms-gate` to detect drift).
|
|
108
110
|
|
|
111
|
+
## Jobs to be Done
|
|
112
|
+
|
|
113
|
+
This plugin serves the [Jobs to be Done](../../docs/jtbd/) below. Per [ADR-051](../../docs/decisions/051-jtbd-anchored-readme-with-drift-advisory.proposed.md), the persona-grouped JTBD anchor is the canonical source of truth for the README's value framing.
|
|
114
|
+
|
|
115
|
+
### Tech lead / consultant
|
|
116
|
+
|
|
117
|
+
- **[JTBD-202 Run Pre-Flight Governance Checks Before Release or Handover](../../docs/jtbd/tech-lead/JTBD-202-pre-flight-governance-check.proposed.md)** — `/wr-risk-scorer:assess-release` produces a structured release-readiness score (commit, push, release layers) that is attachable to a release note or handover doc.
|
|
118
|
+
|
|
119
|
+
### Solo developer
|
|
120
|
+
|
|
121
|
+
- **[JTBD-001 Enforce Governance Without Slowing Down](../../docs/jtbd/solo-developer/JTBD-001-enforce-governance.proposed.md)** — pipeline risk is scored on every edit, commit, and push without manual invocation; secret-leak detection runs in the same gate.
|
|
122
|
+
- **[JTBD-002 Ship AI-Assisted Code with Confidence](../../docs/jtbd/solo-developer/JTBD-002-ship-with-confidence.proposed.md)** — every release passes through ISO 31000-aligned criteria defined in the project's own `RISK-POLICY.md` so the safety bar is the team's, not the agent's.
|
|
123
|
+
- **[JTBD-005 Invoke Governance Assessments On Demand](../../docs/jtbd/solo-developer/JTBD-005-assess-on-demand.proposed.md)** — `/wr-risk-scorer:assess-wip`, `assess-release`, and `assess-external-comms` give an on-demand assessment surface outside the hook gate cycle.
|
|
124
|
+
|
|
125
|
+
### Plugin user
|
|
126
|
+
|
|
127
|
+
- **[JTBD-302 Trust That the README Describes the Plugin I Just Installed](../../docs/jtbd/plugin-user/JTBD-302-trust-readme-describes-installed-behaviour.proposed.md)** — this README is anchored on current JTBD job IDs; drift between prose and shipped behaviour is detectable at retro time per ADR-051.
|
|
128
|
+
|
|
109
129
|
## Updating and Uninstalling
|
|
110
130
|
|
|
111
131
|
```bash
|
package/agents/pipeline.md
CHANGED
|
@@ -42,6 +42,32 @@ You receive structured pipeline state context with these sections:
|
|
|
42
42
|
- **UNRELEASED CHANGES**: Changeset count and cumulative diff
|
|
43
43
|
- **STALE FILES**: Modified files uncommitted for over 24h
|
|
44
44
|
|
|
45
|
+
## Catalog Consumption Protocol (ADR-059)
|
|
46
|
+
|
|
47
|
+
Before scoring, READ the standing-risk catalog at `docs/risks/` and filter to risks applicable to THIS action. The catalog is the persistent record of risk classes the project has surfaced; consuming it eliminates the wasted-effort cost of re-deriving risk classes on every assessment AND closes the missed-risk-class hazard (forgetting a class the agent surfaced before because it didn't think of it this time). Per `RISK-POLICY.md` `## Risk Catalog` section.
|
|
48
|
+
|
|
49
|
+
**Filter mechanism — hybrid (slug-token-match primary, judgement fallback):**
|
|
50
|
+
|
|
51
|
+
1. **Slug-token-match (primary, deterministic)** — for each `R<NNN>-<slug>.active.md` entry in `docs/risks/`, extract the slug from the filename. Tokenise the slug (split on hyphens). If any token appears in the diff content, commit message, or recent prompt context, the entry is **slug-matched** for this action.
|
|
52
|
+
2. **Free-form judgement (fallback)** — for entries the slug-match path missed, READ the entry's `## Description` section and judge applicability against the diff/commit/prompt context. If the description names a risk shape that THIS action plausibly triggers, the entry is **judgement-matched**.
|
|
53
|
+
3. **Logging** — record the match path on each matched risk-item so the next agent can carry it forward (see Risk Item Format below).
|
|
54
|
+
|
|
55
|
+
**Residual reconciliation:**
|
|
56
|
+
|
|
57
|
+
- The catalog entry's residual is the **lifetime baseline** under documented controls (the controls present in the project as a whole).
|
|
58
|
+
- THIS action's residual is the baseline modulated by the controls present (or absent) in this specific change.
|
|
59
|
+
- The pipeline's `RISK_SCORES:` output MUST carry the per-action residual, NOT the catalog's lifetime baseline. Gates fire on per-action thresholds.
|
|
60
|
+
- The catalog's residual is meaningful CONTEXT: log it as `Catalog baseline:` in the risk-item block so reviewers can compare the lifetime baseline against this-action's residual.
|
|
61
|
+
|
|
62
|
+
**Empty catalog handling:**
|
|
63
|
+
|
|
64
|
+
- If `docs/risks/` is empty (no `R*-*.active.md` files) BUT `RISK-POLICY.md` is present AND `.risk-reports/` is non-empty, emit a one-line nudge in the report body (NOT the `RISK_SCORES:` line): `"Risk register is empty; run /install-updates or /wr-risk-scorer:bootstrap-catalog to bootstrap from .risk-reports/ corpus."` Do NOT halt; do NOT block; do NOT inflate the per-action residual to compensate.
|
|
65
|
+
- If `docs/risks/` is empty AND `RISK-POLICY.md` is absent, the project hasn't opted into the catalog framing. Silent skip the catalog protocol; proceed with regeneration-from-scratch as before this protocol landed.
|
|
66
|
+
|
|
67
|
+
**Per-run hit-rate observability:**
|
|
68
|
+
|
|
69
|
+
After scoring, emit a `CATALOG_HIT_RATE: matched=N missed=M` line to the report (where `matched` counts catalog-matched risks AND `missed` counts risks the agent surfaced this run that weren't in the catalog — those become `RISK_REGISTER_HINT:` candidates per ADR-056). Below ~30% sustained hit rate is a Reassessment signal per ADR-059.
|
|
70
|
+
|
|
45
71
|
## Cumulative Risk Report
|
|
46
72
|
|
|
47
73
|
The report MUST assess risk cumulatively, building up from the release queue:
|
|
@@ -81,11 +107,15 @@ Commit score >= push score >= release score (risk accumulates upward).
|
|
|
81
107
|
- Inherent impact: N/5 (Label) - [why]
|
|
82
108
|
- Inherent likelihood: N/5 (Label) - [why]
|
|
83
109
|
- Inherent risk: N/25 (Label)
|
|
110
|
+
- Catalog match: [slug-token | judgement | none]
|
|
111
|
+
- Catalog baseline: R<NNN> residual=N/25 (Label) — [if matched, cite the catalog entry's lifetime residual; omit line entirely when match=none]
|
|
84
112
|
- Controls:
|
|
85
113
|
- [Specific test file/scenario or hook name] - reduces [dimension] from N to N because [rationale]
|
|
86
114
|
- **Residual risk: N/25 (Label)**
|
|
87
115
|
```
|
|
88
116
|
|
|
117
|
+
The `Catalog match:` and `Catalog baseline:` lines (ADR-059) make the catalog consumption auditable per risk-item. `slug-token` indicates the primary deterministic match; `judgement` indicates the fallback applicability judgement; `none` indicates the risk wasn't in the catalog (and the agent should consider whether to emit a `RISK_REGISTER_HINT:` for it per ADR-056).
|
|
118
|
+
|
|
89
119
|
### Score File Values
|
|
90
120
|
|
|
91
121
|
- Commit score: Layer 3 cumulative (highest)
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
# Doc-lint guard: pipeline scorer MUST define the catalog consumption protocol
|
|
3
|
+
# per ADR-059 — read docs/risks/ first, hybrid filter (slug-token primary,
|
|
4
|
+
# judgement fallback), residual reconciliation (per-action residual in
|
|
5
|
+
# RISK_SCORES, catalog lifetime baseline in risk-item block), per-run
|
|
6
|
+
# CATALOG_HIT_RATE observability line.
|
|
7
|
+
#
|
|
8
|
+
# Structural assertions — Permitted Exception to the source-grep ban (ADR-005 / P011).
|
|
9
|
+
# Agent prompts are specification documents; behavioural verification of an LLM's
|
|
10
|
+
# output is out of scope for bats — the contract document is what consuming
|
|
11
|
+
# orchestrators and reviewers rely on. This pattern matches existing tests in
|
|
12
|
+
# this directory (see risk-scorer-register-hint.bats).
|
|
13
|
+
#
|
|
14
|
+
# Cross-reference:
|
|
15
|
+
# ADR-059: docs/decisions/059-pipeline-consume-catalog-and-bootstrap-from-reports.proposed.md
|
|
16
|
+
# ADR-056: docs/decisions/056-risk-register-back-channel-write-contract.proposed.md (slug primitive consumed)
|
|
17
|
+
# ADR-015: docs/decisions/015-on-demand-assessment-skills.proposed.md (pure-scorer contract preserved)
|
|
18
|
+
# ADR-026: docs/decisions/026-agent-output-grounding.proposed.md
|
|
19
|
+
# P168: docs/problems/168-risk-scorer-doesnt-consume-catalog-or-bootstrap.known-error.md
|
|
20
|
+
# P167: docs/problems/167-risk-register-aggregate-reads-as-dont-ship.known-error.md
|
|
21
|
+
# @jtbd JTBD-001 (enforce governance without slowing down — closes missed-risk-class hazard)
|
|
22
|
+
# @jtbd JTBD-202 (pre-flight governance — catalog as ISO 31000/27001 audit-trail artefact)
|
|
23
|
+
|
|
24
|
+
setup() {
|
|
25
|
+
AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
|
|
26
|
+
PIPELINE="${AGENTS_DIR}/pipeline.md"
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
30
|
+
# Contract surface: Catalog Consumption Protocol section exists
|
|
31
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
32
|
+
|
|
33
|
+
@test "pipeline.md defines Catalog Consumption Protocol section" {
|
|
34
|
+
run grep -q "## Catalog Consumption Protocol" "$PIPELINE"
|
|
35
|
+
[ "$status" -eq 0 ]
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
@test "pipeline.md cites ADR-059 in the catalog protocol section" {
|
|
39
|
+
run grep -q "ADR-059" "$PIPELINE"
|
|
40
|
+
[ "$status" -eq 0 ]
|
|
41
|
+
}
|
|
42
|
+
|
|
43
|
+
@test "pipeline.md names docs/risks/ as the catalog read source" {
|
|
44
|
+
run grep -qE "READ.*docs/risks/|read.*standing-risk catalog at .docs/risks/" "$PIPELINE"
|
|
45
|
+
[ "$status" -eq 0 ]
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
49
|
+
# Hybrid filter: slug-token-match primary, judgement fallback
|
|
50
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
51
|
+
|
|
52
|
+
@test "pipeline.md describes slug-token-match as primary filter path" {
|
|
53
|
+
run grep -qE "[Ss]lug-token-match.*primary|[Ss]lug-token-match \(primary" "$PIPELINE"
|
|
54
|
+
[ "$status" -eq 0 ]
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
@test "pipeline.md describes judgement as fallback filter path" {
|
|
58
|
+
run grep -qE "[Jj]udgement.*fallback|[Ff]ree-form judgement.*fallback" "$PIPELINE"
|
|
59
|
+
[ "$status" -eq 0 ]
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
63
|
+
# Risk Item Format: Catalog match + Catalog baseline lines
|
|
64
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
65
|
+
|
|
66
|
+
@test "pipeline.md Risk Item Format includes Catalog match line" {
|
|
67
|
+
run grep -q "Catalog match:" "$PIPELINE"
|
|
68
|
+
[ "$status" -eq 0 ]
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
@test "pipeline.md Risk Item Format includes Catalog baseline line" {
|
|
72
|
+
run grep -q "Catalog baseline:" "$PIPELINE"
|
|
73
|
+
[ "$status" -eq 0 ]
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
@test "pipeline.md names the three Catalog match values" {
|
|
77
|
+
# slug-token | judgement | none — matches the ADR-059 verdict E3 contract.
|
|
78
|
+
run grep -qE "slug-token.*judgement.*none|slug-token \| judgement \| none" "$PIPELINE"
|
|
79
|
+
[ "$status" -eq 0 ]
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
83
|
+
# Residual reconciliation: per-action residual in RISK_SCORES, baseline contextual
|
|
84
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
85
|
+
|
|
86
|
+
@test "pipeline.md names per-action residual as RISK_SCORES output" {
|
|
87
|
+
run grep -qE "RISK_SCORES.*per-action residual|per-action residual.*RISK_SCORES" "$PIPELINE"
|
|
88
|
+
[ "$status" -eq 0 ]
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
@test "pipeline.md describes catalog lifetime baseline as context not RISK_SCORES" {
|
|
92
|
+
run grep -qE "lifetime baseline|Catalog baseline:" "$PIPELINE"
|
|
93
|
+
[ "$status" -eq 0 ]
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
97
|
+
# Hit-rate observability: CATALOG_HIT_RATE line emitted per run
|
|
98
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
99
|
+
|
|
100
|
+
@test "pipeline.md defines CATALOG_HIT_RATE observability line" {
|
|
101
|
+
run grep -q "CATALOG_HIT_RATE:" "$PIPELINE"
|
|
102
|
+
[ "$status" -eq 0 ]
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
@test "pipeline.md names the CATALOG_HIT_RATE matched + missed columns" {
|
|
106
|
+
run grep -qE "matched=N missed=M|CATALOG_HIT_RATE: matched" "$PIPELINE"
|
|
107
|
+
[ "$status" -eq 0 ]
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
111
|
+
# Empty catalog handling: nudge but do NOT halt or inflate residual
|
|
112
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
113
|
+
|
|
114
|
+
@test "pipeline.md handles empty catalog with nudge not halt" {
|
|
115
|
+
run grep -qE "[Ee]mpty catalog|catalog is empty.*nudge|do NOT halt" "$PIPELINE"
|
|
116
|
+
[ "$status" -eq 0 ]
|
|
117
|
+
}
|
|
118
|
+
|
|
119
|
+
@test "pipeline.md cites bootstrap-catalog skill in empty catalog nudge" {
|
|
120
|
+
run grep -qE "bootstrap-catalog|/install-updates.*bootstrap" "$PIPELINE"
|
|
121
|
+
[ "$status" -eq 0 ]
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
125
|
+
# Pure-scorer contract preserved: no Write tool grant added
|
|
126
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
127
|
+
|
|
128
|
+
@test "pipeline.md preserves pure-scorer contract (Read + Glob only)" {
|
|
129
|
+
# The agent's tool grant must remain Read + Glob per ADR-015.
|
|
130
|
+
# Adding Write would break the architectural boundary ADR-059 verdict F2 preserves.
|
|
131
|
+
run grep -qE "^ - Read$" "$PIPELINE"
|
|
132
|
+
[ "$status" -eq 0 ]
|
|
133
|
+
run grep -qE "^ - Glob$" "$PIPELINE"
|
|
134
|
+
[ "$status" -eq 0 ]
|
|
135
|
+
# Negative: Write tool MUST NOT appear in tool grant
|
|
136
|
+
run grep -qE "^ - Write$" "$PIPELINE"
|
|
137
|
+
[ "$status" -ne 0 ]
|
|
138
|
+
}
|
package/package.json
CHANGED