npm - @qball-inc/the-bulwark - Versions diffs - 1.0.1 → 1.2.0 - Mend

@qball-inc/the-bulwark 1.0.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (232) hide show

package/.claude-plugin/plugin.json +2 -3
package/.gitattributes +48 -0
package/CHANGELOG.md +121 -0
package/LICENSE +21 -0
package/README.md +426 -368
package/agents/bulwark-fix-validator.md +643 -633
package/agents/bulwark-implementer.md +407 -391
package/agents/bulwark-issue-analyzer.md +310 -308
package/agents/bulwark-standards-reviewer.md +305 -221
package/agents/plan-creation-architect.md +325 -323
package/agents/plan-creation-eng-lead.md +354 -352
package/agents/plan-creation-po.md +302 -300
package/agents/plan-creation-qa-critic.md +336 -334
package/agents/product-ideation-competitive-analyzer.md +2 -0
package/agents/product-ideation-idea-validator.md +2 -0
package/agents/product-ideation-market-researcher.md +2 -0
package/agents/product-ideation-pattern-documenter.md +2 -0
package/agents/product-ideation-segment-analyzer.md +2 -0
package/agents/product-ideation-strategist.md +2 -0
package/agents/statusline-setup.md +99 -97
package/hooks/hooks.json +30 -1
package/package.json +6 -5
package/scripts/apply-section.sh +243 -0
package/scripts/hooks/check-template-drift.sh +191 -0
package/scripts/hooks/cleanup-review-registry.sh +106 -0
package/scripts/hooks/cleanup-stale.sh +19 -2
package/scripts/hooks/enforce-quality.sh +72 -23
package/scripts/hooks/lib/coverage_check.py +513 -0
package/scripts/hooks/suggest-pipeline-stop.sh +234 -0
package/scripts/hooks/suggest-pipeline.sh +12 -0
package/scripts/init.sh +64 -0
package/scripts/install-bun.sh +327 -0
package/scripts/install-just.sh +404 -0
package/scripts/toolchain-smoke-run.sh +219 -0
package/scripts/update.sh +342 -0
package/skills/anthropic-validator/SKILL.md +497 -607
package/skills/anthropic-validator/references/agents-checklist.md +144 -131
package/skills/anthropic-validator/references/agents-validation.md +90 -0
package/skills/anthropic-validator/references/commands-checklist.md +102 -102
package/skills/anthropic-validator/references/commands-validation.md +42 -0
package/skills/anthropic-validator/references/hooks-checklist.md +160 -151
package/skills/anthropic-validator/references/hooks-validation.md +82 -0
package/skills/anthropic-validator/references/mcp-checklist.md +136 -136
package/skills/anthropic-validator/references/mcp-validation.md +39 -0
package/skills/anthropic-validator/references/plugins-checklist.md +154 -148
package/skills/anthropic-validator/references/plugins-validation.md +68 -0
package/skills/anthropic-validator/references/skills-checklist.md +105 -85
package/skills/anthropic-validator/references/skills-validation.md +79 -0
package/skills/assertion-patterns/SKILL.md +298 -296
package/skills/bug-magnet-data/SKILL.md +286 -284
package/skills/bug-magnet-data/context/cli-args.md +91 -91
package/skills/bug-magnet-data/context/db-query.md +104 -104
package/skills/bug-magnet-data/context/file-contents.md +103 -103
package/skills/bug-magnet-data/context/http-body.md +91 -91
package/skills/bug-magnet-data/context/process-spawn.md +123 -123
package/skills/bug-magnet-data/data/booleans/boundaries.yaml +143 -143
package/skills/bug-magnet-data/data/collections/arrays.yaml +114 -114
package/skills/bug-magnet-data/data/collections/objects.yaml +123 -123
package/skills/bug-magnet-data/data/concurrency/race-conditions.yaml +118 -118
package/skills/bug-magnet-data/data/concurrency/state-machines.yaml +115 -115
package/skills/bug-magnet-data/data/dates/boundaries.yaml +137 -137
package/skills/bug-magnet-data/data/dates/invalid.yaml +132 -132
package/skills/bug-magnet-data/data/dates/timezone.yaml +118 -118
package/skills/bug-magnet-data/data/encoding/charset.yaml +79 -79
package/skills/bug-magnet-data/data/encoding/normalization.yaml +105 -105
package/skills/bug-magnet-data/data/formats/email.yaml +154 -154
package/skills/bug-magnet-data/data/formats/json.yaml +187 -187
package/skills/bug-magnet-data/data/formats/url.yaml +165 -165
package/skills/bug-magnet-data/data/language-specific/javascript.yaml +182 -182
package/skills/bug-magnet-data/data/language-specific/python.yaml +174 -174
package/skills/bug-magnet-data/data/language-specific/rust.yaml +148 -148
package/skills/bug-magnet-data/data/numbers/boundaries.yaml +161 -161
package/skills/bug-magnet-data/data/numbers/precision.yaml +89 -89
package/skills/bug-magnet-data/data/numbers/special.yaml +69 -69
package/skills/bug-magnet-data/data/strings/boundaries.yaml +109 -109
package/skills/bug-magnet-data/data/strings/injection.yaml +208 -208
package/skills/bug-magnet-data/data/strings/special-chars.yaml +190 -190
package/skills/bug-magnet-data/data/strings/unicode.yaml +139 -139
package/skills/bug-magnet-data/references/external-lists.md +115 -115
package/skills/bulwark-brainstorm/SKILL.md +566 -563
package/skills/bulwark-brainstorm/references/at-teammate-prompts.md +95 -60
package/skills/bulwark-brainstorm/references/role-critical-analyst.md +78 -78
package/skills/bulwark-brainstorm/references/role-development-lead.md +66 -66
package/skills/bulwark-brainstorm/references/role-product-delivery-lead.md +79 -79
package/skills/bulwark-brainstorm/references/role-product-manager.md +62 -62
package/skills/bulwark-brainstorm/references/role-project-sme.md +59 -59
package/skills/bulwark-brainstorm/references/role-technical-architect.md +66 -66
package/skills/bulwark-research/SKILL.md +300 -298
package/skills/bulwark-research/references/viewpoint-contrarian.md +63 -63
package/skills/bulwark-research/references/viewpoint-direct-investigation.md +62 -62
package/skills/bulwark-research/references/viewpoint-first-principles.md +65 -65
package/skills/bulwark-research/references/viewpoint-practitioner.md +62 -62
package/skills/bulwark-research/references/viewpoint-prior-art.md +66 -66
package/skills/bulwark-scaffold/SKILL.md +483 -330
package/skills/bulwark-statusline/SKILL.md +166 -161
package/skills/bulwark-statusline/scripts/statusline.sh +1 -1
package/skills/bulwark-verify/SKILL.md +532 -519
package/skills/code-review/SKILL.md +488 -428
package/skills/code-review/examples/anti-patterns/linting.ts +181 -181
package/skills/code-review/examples/anti-patterns/security.ts +91 -91
package/skills/code-review/examples/anti-patterns/standards.ts +195 -195
package/skills/code-review/examples/anti-patterns/type-safety.ts +108 -108
package/skills/code-review/examples/recommended/linting.ts +195 -195
package/skills/code-review/examples/recommended/security.ts +154 -154
package/skills/code-review/examples/recommended/standards.ts +231 -231
package/skills/code-review/examples/recommended/type-safety.ts +181 -181
package/skills/code-review/frameworks/angular.md +218 -218
package/skills/code-review/frameworks/django.md +235 -235
package/skills/code-review/frameworks/express.md +207 -207
package/skills/code-review/frameworks/fastapi.md +326 -0
package/skills/code-review/frameworks/flask.md +298 -298
package/skills/code-review/frameworks/generic.md +146 -146
package/skills/code-review/frameworks/react.md +152 -152
package/skills/code-review/frameworks/vue.md +244 -244
package/skills/code-review/references/linting-patterns.md +221 -221
package/skills/code-review/references/security-patterns.md +125 -125
package/skills/code-review/references/standards-patterns.md +246 -246
package/skills/code-review/references/type-safety-patterns.md +130 -130
package/skills/component-patterns/SKILL.md +133 -131
package/skills/component-patterns/references/pattern-cli-command.md +118 -118
package/skills/component-patterns/references/pattern-database.md +166 -166
package/skills/component-patterns/references/pattern-external-api.md +139 -139
package/skills/component-patterns/references/pattern-file-parser.md +168 -168
package/skills/component-patterns/references/pattern-http-server.md +162 -162
package/skills/component-patterns/references/pattern-process-spawner.md +133 -133
package/skills/continuous-feedback/SKILL.md +329 -327
package/skills/continuous-feedback/references/collect-instructions.md +81 -81
package/skills/continuous-feedback/references/specialize-code-review.md +82 -82
package/skills/continuous-feedback/references/specialize-general.md +98 -98
package/skills/continuous-feedback/references/specialize-test-audit.md +81 -81
package/skills/create-skill/SKILL.md +550 -359
package/skills/create-skill/agents/skill-eval-comparator.md +158 -0
package/skills/create-skill/agents/skill-eval-grader.md +168 -0
package/skills/create-skill/references/agent-conventions.md +194 -194
package/skills/create-skill/references/agent-template.md +195 -195
package/skills/create-skill/references/content-guidance.md +541 -291
package/skills/create-skill/references/decision-framework.md +232 -124
package/skills/create-skill/references/eval-scaffolding.md +468 -0
package/skills/create-skill/references/eval-shape.md +383 -0
package/skills/create-skill/references/scripts-conventions.md +142 -0
package/skills/create-skill/references/template-generator.md +183 -0
package/skills/create-skill/references/template-inversion.md +269 -0
package/skills/create-skill/references/template-pipeline.md +248 -217
package/skills/create-skill/references/template-research.md +234 -210
package/skills/create-skill/references/template-reviewer.md +231 -0
package/skills/create-skill/references/template-script-driven.md +185 -172
package/skills/create-skill/references/template-tool-wrapper.md +199 -0
package/skills/create-skill/scripts/check-description.ts +238 -0
package/skills/create-skill/scripts/check-skill-size.ts +201 -0
package/skills/create-skill/scripts/grade.ts +855 -0
package/skills/create-skill/scripts/run-loop.ts +297 -0
package/skills/create-subagent/SKILL.md +355 -353
package/skills/create-subagent/references/agent-conventions.md +268 -268
package/skills/create-subagent/references/content-guidance.md +232 -232
package/skills/create-subagent/references/decision-framework.md +134 -134
package/skills/create-subagent/references/template-single-agent.md +194 -192
package/skills/fix-bug/SKILL.md +243 -241
package/skills/governance-protocol/SKILL.md +118 -116
package/skills/init/SKILL.md +519 -341
package/skills/init/references/update-askuser-prompts.md +198 -0
package/skills/init/references/update-mode.md +305 -0
package/skills/init/references/update-section-anchor-diff.md +163 -0
package/skills/issue-debugging/SKILL.md +387 -385
package/skills/issue-debugging/references/anti-patterns.md +245 -245
package/skills/issue-debugging/references/debug-report-schema.md +227 -227
package/skills/mock-detection/SKILL.md +528 -511
package/skills/mock-detection/references/false-positive-prevention.md +402 -402
package/skills/mock-detection/references/stub-patterns.md +236 -236
package/skills/pipeline-templates/SKILL.md +262 -215
package/skills/pipeline-templates/references/code-change-workflow.md +277 -277
package/skills/pipeline-templates/references/code-review.md +348 -336
package/skills/pipeline-templates/references/fix-validation.md +421 -421
package/skills/pipeline-templates/references/new-feature.md +335 -335
package/skills/pipeline-templates/references/research-brainstorm.md +161 -161
package/skills/pipeline-templates/references/research-planning.md +257 -257
package/skills/pipeline-templates/references/test-audit.md +389 -389
package/skills/pipeline-templates/references/test-execution-fix.md +238 -238
package/skills/plan-creation/SKILL.md +531 -497
package/skills/plan-to-tasks/SKILL.md +151 -0
package/skills/plan-to-tasks/references/askuserquestion-prompts.md +75 -0
package/skills/plan-to-tasks/references/transform.md +253 -0
package/skills/product-ideation/SKILL.md +2 -0
package/skills/session-handoff/SKILL.md +167 -139
package/skills/session-handoff/references/examples.md +223 -223
package/skills/setup-lsp/SKILL.md +314 -312
package/skills/setup-lsp/references/server-registry.md +85 -85
package/skills/setup-lsp/references/troubleshooting.md +135 -135
package/skills/spec-drift-check/SKILL.md +287 -0
package/skills/spec-drift-check/evals/evals.json +33 -0
package/skills/spec-drift-check/evals/triggers.json +19 -0
package/skills/spec-drift-check/examples/clean-spec.md +52 -0
package/skills/spec-drift-check/examples/expected-output-clean.yaml +96 -0
package/skills/spec-drift-check/examples/expected-output-high-drift.yaml +78 -0
package/skills/spec-drift-check/examples/expected-output-low-drift.yaml +67 -0
package/skills/spec-drift-check/examples/high-drift-spec.md +49 -0
package/skills/spec-drift-check/examples/low-drift-spec.md +39 -0
package/skills/spec-drift-check/references/anti-patterns.md +65 -0
package/skills/spec-drift-check/references/output-template.md +142 -0
package/skills/spec-drift-check/references/step-1-claim-extraction.md +147 -0
package/skills/spec-drift-check/references/step-2-verification-methods.md +203 -0
package/skills/spec-drift-check/references/step-3-categorization.md +105 -0
package/skills/spec-drift-check/references/step-4-plan-adjustment.md +122 -0
package/skills/spec-drift-check/references/step-5-log-template.md +220 -0
package/skills/spec-drift-check/references/step-6-decision-matrix.md +136 -0
package/skills/subagent-output-templating/SKILL.md +417 -415
package/skills/subagent-output-templating/references/examples.md +440 -440
package/skills/subagent-prompting/SKILL.md +366 -364
package/skills/subagent-prompting/references/examples.md +342 -342
package/skills/test-audit/SKILL.md +545 -531
package/skills/test-audit/references/known-limitations.md +41 -41
package/skills/test-audit/references/priority-classification.md +30 -30
package/skills/test-audit/references/prompts/deep-mode-detection.md +83 -83
package/skills/test-audit/references/prompts/synthesis.md +58 -57
package/skills/test-audit/references/rewrite-instructions.md +46 -46
package/skills/test-audit/references/schemas/audit-output.yaml +131 -100
package/skills/test-audit/references/schemas/diagnostic-output.yaml +56 -49
package/skills/test-audit/references/two-gate-logic.md +43 -0
package/skills/test-audit/scripts/data-flow-analyzer.ts +508 -509
package/skills/test-audit/scripts/integration-mock-detector.ts +462 -462
package/skills/test-audit/scripts/skip-detector.ts +211 -211
package/skills/test-audit/scripts/verification-counter.ts +295 -295
package/skills/test-classification/SKILL.md +326 -310
package/skills/test-fixture-creation/SKILL.md +297 -295
package/Infographics/01_product-ideation.png +0 -0
package/Infographics/02_feature-research.png +0 -0
package/Infographics/03_brainstorm.png +0 -0
package/Infographics/04_plan-creation.png +0 -0
package/Infographics/05_code-review.png +0 -0
package/Infographics/06_test-audit.png +0 -0
package/Infographics/07_fix-bug.png +0 -0
package/skills/create-skill/references/template-reference-heavy.md +0 -111
package/skills/create-skill/references/template-simple.md +0 -80

package/skills/spec-drift-check/references/step-3-categorization.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Step 3 — Categorization
+## Purpose
+For every Stage 2 verification result, assign:
+1. A **category** — what kind of drift (or non-drift) is this?
+2. A **severity** — how much does it block?
+3. A **decision implication** — what does the orchestrator do about it?
+Categorization is the bridge from raw verification evidence to the verdict. The category dictates how Stage 4 rewrites the plan; the severity dictates whether Stage 6 emits PROCEED, PROCEED_ADJUSTED, or STOP_USER_APPROVAL.
+---
+## Category Table
+| Category | Definition | Severity | Example | Decision Implication |
+|----------|------------|----------|---------|----------------------|
+| **CONFIRMED** | Claim matches reality (Stage 2 returned exact verbatim match) | n/a | Spec says `coverage_check.py:88`; Read at line 88 shows the asserted content. | No action; carry the original plan forward |
+| **DRIFT — line-ref** | File correct; content has moved to a different line OR a literal value off by a small amount | LOW | Spec says line 88; actual content is at line 91 (file is otherwise unchanged). | Doc fix in implementation comments; PROCEED_ADJUSTED |
+| **DRIFT — wrong file** | File path stale; content lives at a different path | HIGH | Spec says `scripts/hooks/foo.sh`; actual file is `scripts/foo.sh`. | STOP_USER_APPROVAL — wrong-file drifts often signal larger refactors the spec doesn't reflect |
+| **DRIFT — missing scope** | Spec lists deliverable X but X is a no-op (already done, or no longer needed) | MEDIUM | Spec says "add X function"; X already exists at the asserted path with the asserted shape. | Drop the deliverable; PROCEED_ADJUSTED |
+| **DRIFT — undeclared scope** | Spec missing a deliverable that real implementation requires | HIGH | Spec says "modify A"; modifying A safely also requires modifying B (uncovered by spec). | STOP_USER_APPROVAL — scope expansion needs sign-off |
+| **AC re-interpretation** | Acceptance criterion is ambiguous; can resolve against current code with documented choice | MEDIUM | Spec says "update the schema"; current code has two schemas; AC is unclear which one. | Resolve explicitly with rationale; PROCEED_ADJUSTED |
+| **GAP** | Claim references a thing that does not exist anywhere | HIGH | Spec says `parseFooBar()`; no such function in the repo. | STOP_USER_APPROVAL — the spec is referencing fiction |
+---
+## Severity Decision Tree
+Use this tree when a finding is borderline. The tree maps **finding shape** → **severity tier** deterministically.
+```
+Is the claim CONFIRMED (verbatim match)?
+├── YES → category=CONFIRMED, severity=n/a, no decision impact
+└── NO → drift exists; continue
+    │
+    Does the claim reference a file path?
+    ├── YES → does the file exist at the asserted path?
+    │   ├── YES (file ok, content drift only)
+    │   │   └── Is the content at a different line in the SAME file?
+    │   │       ├── YES → DRIFT-line-ref, severity LOW
+    │   │       └── NO  → content gone or wrong → GAP or DRIFT-wrong-file (case below)
+    │   └── NO (file path wrong)
+    │       └── Does similar content exist at a DIFFERENT path?
+    │           ├── YES → DRIFT-wrong-file, severity HIGH
+    │           └── NO  → GAP, severity HIGH
+    │
+    Does the claim reference a deliverable / scope item?
+    ├── Is the deliverable a no-op (already done)?
+    │   └── YES → DRIFT-missing-scope, severity MEDIUM
+    ├── Does completing the spec REQUIRE additional work the spec didn't list?
+    │   └── YES → DRIFT-undeclared-scope, severity HIGH
+    │
+    Is the claim ambiguous (multiple plausible readings against current code)?
+    ├── YES → AC re-interpretation, severity MEDIUM (must document chosen reading)
+    │
+    Does the claim reference a function/symbol/value that does not exist anywhere?
+    └── YES → GAP, severity HIGH
+```
+**Tie-breaker rules**:
+- When in doubt between LOW and MEDIUM, choose **MEDIUM** (you can downgrade in Stage 4 if it's truly trivial; you can't upgrade after the verdict ships).
+- When in doubt between MEDIUM and HIGH, choose **HIGH** (the user should see the finding; over-stopping is recoverable, under-stopping ships bad work).
+- CRITICAL is reserved (the rubric in SKILL.md notes it escalates to HIGH for verdict purposes). Do not use CRITICAL unless an explicit safety/security concern is in play; otherwise, HIGH is the top tier.
+---
+## Severity Tier Reference
+| Severity | Definition | Verdict Implication |
+|----------|------------|---------------------|
+| **CRITICAL** | Reserved — used only for explicit safety/security blockers; escalates to HIGH for verdict purposes | STOP_USER_APPROVAL |
+| **HIGH** | Wrong-file DRIFT, undeclared-scope DRIFT, GAP — scope-changing or block-level | STOP_USER_APPROVAL |
+| **MEDIUM** | Missing-scope DRIFT, AC re-interpretation — adjustable without scope expansion | PROCEED_ADJUSTED |
+| **LOW** | Line-ref DRIFT — file correct, line off | PROCEED_ADJUSTED (doc fix in implementation comments) |
+This tier table is mirrored in SKILL.md. If you change one, change both (cross-file consistency).
+---
+## Output Format Per Finding
+Each finding from Stage 3 carries:
+```yaml
+- id: D-NN                  # sequential within this run
+  claim_id: C-NN            # back-reference to the Stage 1 claim
+  category: confirmed | drift-line-ref | drift-wrong-file | drift-missing-scope | drift-undeclared-scope | ac-reinterpretation | gap
+  severity: LOW | MEDIUM | HIGH      # n/a for CONFIRMED
+  spec_claim: "<verbatim quote from spec>"
+  actual_state: "<verbatim evidence from Stage 2>"
+  resolution: "<doc fix | plan adjustment | scope expansion | ask user | none>"
+```
+Multiple findings can map to a single claim if the claim has compound assertions (e.g., "the function `foo` at `bar.ts:42`" yields one finding for the function name and one for the line ref).
+---
+## Anti-Pattern Reminders for This Stage
+- Do NOT mark a finding CONFIRMED based on recall. Stage 2 evidence must be verbatim.
+- Do NOT skip the severity field. Severity drives the verdict; an unsevered finding is invisible to Stage 6.
+- Do NOT invent new categories. The 7 categories above cover all observed drift shapes; extending the taxonomy mid-run breaks downstream consumers (Stage 6 decision matrix and the YAML schema).
+- Do NOT auto-downgrade HIGH → MEDIUM to "save the user a question". HIGH is HIGH because the orchestrator cannot proceed without a sign-off.

package/skills/spec-drift-check/references/step-4-plan-adjustment.md ADDED Viewed

@@ -0,0 +1,122 @@
+# Step 4 — Plan Adjustment
+## Purpose
+Rewrite the spec's implementation-plan section based on the Stage 3 findings. The output of this stage becomes the **adjusted plan** — the source of truth for the rest of the WP. Stage 7 binds it: the implementer follows this plan, not the original spec.
+If Stage 3 produced zero non-CONFIRMED findings, Stage 4 is a no-op (the original plan stands). Otherwise, every finding must be reflected in the rewrite.
+---
+## Procedure
+For each finding category, apply the matching adjustment:
+### DRIFT-missing-scope (deliverable is a no-op)
+**Action**: Drop the deliverable from the plan. Note the drop in the `verification_checklist:` so the implementer doesn't accidentally re-add it.
+**Token-budget impact**: subtract the original estimate for the dropped deliverable.
+### DRIFT-undeclared-scope (deliverable missing from spec)
+**Action**: Add the new deliverable to the plan. If severity is HIGH (which it should be by default), the verdict is STOP_USER_APPROVAL and the user must sign off before this addition is binding.
+**Token-budget impact**: add an estimate for the new deliverable.
+### DRIFT-wrong-file (path target is wrong)
+**Action**: Re-target the deliverable's path in the plan to the correct path. Note the re-target in the `verification_checklist:`.
+**Token-budget impact**: usually zero (same scope, different target).
+### DRIFT-line-ref (line off, file correct)
+**Action**: Update the line reference (or the contextual cue, if the spec used "around line 200" framing) to match current state. This is typically a doc fix; the deliverable itself is unchanged.
+**Token-budget impact**: zero.
+### AC re-interpretation (ambiguous AC)
+**Action**: Resolve the ambiguity explicitly. Document the chosen reading + the rationale in the `ac_reinterpretations:` section of the log. Update any plan deliverables that depend on the resolved AC.
+**Token-budget impact**: usually zero, occasionally negative (a tighter interpretation drops scope).
+### GAP (claim references nothing)
+**Action**: STOP. GAP findings escalate to STOP_USER_APPROVAL; the user must clarify the spec. Do NOT silently drop the deliverable that referenced the GAP — surface it.
+**Token-budget impact**: deferred until the user clarifies.
+### CONFIRMED
+**Action**: None. Carry the original deliverable forward unchanged.
+---
+## Re-Estimate Tokens
+After applying the adjustments above, re-estimate the token budget for the WP. The adjusted budget appears in the log's `adjusted_plan.estimated_token_delta:` field as a signed delta from the original estimate (`+5K`, `-10K`, `0`).
+A non-trivial delta (≥10K either direction) is itself a signal worth surfacing in the verdict summary, even if the finding mix doesn't otherwise force STOP_USER_APPROVAL.
+---
+## Before / After Example
+### Original spec plan section
+```markdown
+## Implementation Plan
+### Step 1 — Add `parseFollowupEdits()` helper to `coverage_check.py`
+- 80 lines, ~5K tokens
+- File: `scripts/hooks/coverage_check.py:88`
+### Step 2 — Update `code-review` skill schema
+- Add `followup_edits_expected` field to diagnostic YAML template
+- File: `skills/code-review/SKILL.md:391`
+### Step 3 — Test coverage
+- Add 5 test cases to `tests/hooks/test-suggest-pipeline-stop.sh`
+```
+### Stage 3 findings (hypothetical)
+- D-01: `coverage_check.py:88` actually shows the helper at line 91 → DRIFT-line-ref, LOW
+- D-02: `code-review` SKILL.md does not have a line 391; the schema section is at line 416 → DRIFT-line-ref, LOW
+- D-03: `parseFollowupEdits()` already exists at `coverage_check.py:91` from a prior session → DRIFT-missing-scope, MEDIUM (Step 1 is a no-op)
+- D-04: tests file `tests/hooks/test-suggest-pipeline-stop.sh` exists, but adding tests requires the file to be writable and a `set -e` audit per `process_test_harness_set_e_pattern.md` — undeclared in spec → DRIFT-undeclared-scope, HIGH
+### Adjusted plan after Stage 4
+```markdown
+## Adjusted Implementation Plan
+(supersedes original spec; binding per Stage 7)
+### Step 1 — DROPPED — `parseFollowupEdits()` already exists
+- Dropped per finding D-03 (DRIFT-missing-scope, MEDIUM)
+- Token savings: -5K
+### Step 2 — Update `code-review` skill schema
+- Add `followup_edits_expected` field to diagnostic YAML template
+- File: `skills/code-review/SKILL.md` (line ref updated; current location ~line 416 per finding D-02)
+### Step 3 — Test coverage
+- Add 5 test cases to `tests/hooks/test-suggest-pipeline-stop.sh`
+- ALSO: confirm `set -e` is present in the test harness per `process_test_harness_set_e_pattern.md` (undeclared in original spec; finding D-04 surfaces this)
+- Token addition: +1K
+estimated_token_delta: -4K
+```
+If finding D-04 is severity HIGH (undeclared scope), the verdict is STOP_USER_APPROVAL and the user must approve adding the `set -e` audit task before Stage 7 binding takes effect.
+---
+## What NOT To Do at This Stage
+- Do NOT auto-apply LOW findings to the original spec by editing the brief. The skill is read-only. Capture the corrections in the log, not in the source.
+- Do NOT silently merge HIGH findings into the adjusted plan — they require Stage 6 to emit STOP_USER_APPROVAL and the user to confirm.
+- Do NOT carry forward a deliverable that DRIFT-missing-scope flagged as a no-op. Doing so wastes the next session's tokens.
+- Do NOT estimate token deltas with false precision. Round to the nearest 1K; the goal is signal, not accounting.

package/skills/spec-drift-check/references/step-5-log-template.md ADDED Viewed

@@ -0,0 +1,220 @@
+# Step 5 — Verification Log Template
+## Purpose
+Capture the audit's findings, AC re-interpretations, adjusted plan, verification checklist, decision, and ROI estimate in a single canonical log file. The log is the deliverable; downstream sessions and implementers consume it directly.
+**Log path**: `$PROJECT_DIR/logs/spec-verify-{session}-{topic}.md`
+- `{session}` is the current Bulwark session id (e.g., `122`).
+- `{topic}` is a short slug derived from the subject spec filename (e.g., `P10.16` for `P10.16-statusline-lock-cleanup-observability.md`).
+Example: `logs/spec-verify-122-P10.16.md`.
+---
+## Full Template
+```yaml
+# logs/spec-verify-{session}-{topic}.md
+# Verification log for spec-drift-check run.
+# Subject: {path-to-spec}
+# Run timestamp: {ISO-8601}
+# Skill version: 1.0.0
+metadata:
+  reviewer: spec-drift-check
+  subject: {path-to-spec}
+  session: {session-id}
+  timestamp: {ISO-8601}
+  spec_word_count: {N}
+  claims_extracted: {N}
+  claims_verified: {N}
+findings:
+  - id: D-01
+    claim_id: C-01
+    category: drift-line-ref | drift-wrong-file | drift-missing-scope | drift-undeclared-scope | ac-reinterpretation | gap | confirmed
+    severity: LOW | MEDIUM | HIGH        # n/a for confirmed
+    spec_claim: "<verbatim quote from spec>"
+    actual_state: "<what current code/state shows, verbatim>"
+    resolution: "<doc fix | plan adjustment | scope expansion | ask user | none>"
+  # ... D-02, D-03, ...
+ac_reinterpretations:
+  - ac: AC-N
+    ambiguity: "<what's unclear in the original AC>"
+    chosen_interpretation: "<the reading we will execute against>"
+    rationale: "<why this reading; what it implies for the plan>"
+  # ... AC-M, ...
+adjusted_plan:
+  binding_status: "supersedes original spec for rest of WP"
+  deliverables:
+    - id: AD-01
+      description: "<deliverable text>"
+      original_spec_step: <number or null if newly added>
+      change_from_spec: dropped | re-targeted | unchanged | added
+      target_path: <path>
+      token_estimate: "~5K"
+    # ... AD-02, ...
+  estimated_token_delta: "+5K | -10K | 0"
+verification_checklist:
+  - "<item the implementer must confirm at end of WP — typically per finding>"
+  # ...
+proceed_decision: PROCEED | PROCEED_ADJUSTED | STOP_USER_APPROVAL
+decision_rationale: |
+  {1-3 sentences explaining the verdict in terms of the finding mix.}
+roi:
+  spent_tokens_estimate: "~{N}K"
+  estimated_savings: "~{N}K"
+  net: positive | break-even | negative
+  rationale: |
+    {1-3 sentences explaining the savings estimate — what rework was avoided.}
+```
+---
+## How to Fill Each Section
+### `metadata`
+Captured as the run begins. `claims_extracted` is the count from Stage 1; `claims_verified` should equal `claims_extracted` unless a verification command failed.
+**Filled-in example**:
+```yaml
+metadata:
+  reviewer: spec-drift-check
+  subject: plans/task-briefs/P10.16-statusline-lock-cleanup-observability.md
+  session: 122
+  timestamp: 2026-05-09T14:32:00Z
+  spec_word_count: 4521
+  claims_extracted: 23
+  claims_verified: 23
+```
+### `findings`
+One entry per claim from Stage 1. CONFIRMED claims SHOULD be included (with `severity: n/a`) so the log is a complete record of what was checked. Drift findings carry the actual_state verbatim.
+**Filled-in example**:
+```yaml
+findings:
+  - id: D-01
+    claim_id: C-03
+    category: drift-line-ref
+    severity: LOW
+    spec_claim: "The recursion bug is at coverage_check.py:88."
+    actual_state: "Line 88 is blank (the helper function moved to line 91 in S121 cleanup; verbatim line 91: `def parse_followup_edits_expected(diagnostic_path):`)."
+    resolution: "doc fix — update spec line ref to 91 in implementation comments"
+  - id: D-02
+    claim_id: C-07
+    category: drift-undeclared-scope
+    severity: HIGH
+    spec_claim: "Add 5 test cases to test-suggest-pipeline-stop.sh."
+    actual_state: "Test harness `tests/hooks/test-suggest-pipeline-stop.sh:1` sets `set -euo pipefail` per process_test_harness_set_e_pattern.md. Adding test cases requires assertion-counter pattern (failures via counter, not exit), which the spec does not mention."
+    resolution: "scope expansion — add `set -e` audit task to plan; ask user"
+```
+### `ac_reinterpretations`
+One entry per acceptance criterion that Stage 3 flagged as ambiguous. The `chosen_interpretation` becomes binding for the WP via Stage 7.
+**Filled-in example**:
+```yaml
+ac_reinterpretations:
+  - ac: AC-3
+    ambiguity: "Spec says 'update the schema' but two schemas exist (diagnostic schema in code-review/SKILL.md and findings schema in templates/findings-output.yaml)."
+    chosen_interpretation: "Update the diagnostic schema (code-review/SKILL.md). The findings template already has the field."
+    rationale: "Stage 2 verification of C-12 showed the findings template already includes `followup_edits_expected`; only the diagnostic surface is missing."
+```
+### `adjusted_plan`
+The binding plan for the rest of the WP. Each deliverable has an explicit `change_from_spec` field so the implementer (and future readers) can audit the rewrite.
+**Filled-in example**:
+```yaml
+adjusted_plan:
+  binding_status: "supersedes original spec for rest of WP"
+  deliverables:
+    - id: AD-01
+      description: "Update diagnostic schema in code-review/SKILL.md to include followup_edits_expected"
+      original_spec_step: 2
+      change_from_spec: re-targeted
+      target_path: skills/code-review/SKILL.md
+      token_estimate: "~3K"
+    - id: AD-02
+      description: "Add 5 test cases to test-suggest-pipeline-stop.sh + audit set -e"
+      original_spec_step: 3
+      change_from_spec: unchanged
+      target_path: tests/hooks/test-suggest-pipeline-stop.sh
+      token_estimate: "~6K"
+  estimated_token_delta: "-4K"
+```
+### `verification_checklist`
+Items the implementer (or a downstream reviewer) must confirm at the end of the WP to verify the verified plan was executed. Typically one item per drift finding that survives into the adjusted plan.
+**Filled-in example**:
+```yaml
+verification_checklist:
+  - "Diagnostic schema in skills/code-review/SKILL.md includes followup_edits_expected (field, not just prose)"
+  - "test-suggest-pipeline-stop.sh has 5 new test cases, all using assertion-counter pattern (no premature exit)"
+  - "AC-3 resolution committed: only diagnostic schema modified; findings template unchanged"
+```
+### `proceed_decision` + `decision_rationale`
+The verdict itself + a 1-3 sentence summary tying the finding mix to the verdict. See `step-6-decision-matrix.md` for the matrix.
+**Filled-in example**:
+```yaml
+proceed_decision: STOP_USER_APPROVAL
+decision_rationale: |
+  Finding D-02 is HIGH (DRIFT-undeclared-scope: set -e audit task missing from spec).
+  Per the decision matrix, any HIGH finding triggers STOP_USER_APPROVAL.
+  Surface the scope expansion to the user via AskUserQuestion before binding the adjusted plan.
+```
+### `roi`
+The cost-vs-savings estimate. Approximate; the goal is signal, not accounting. Round to the nearest 1K. `net: positive` means estimated savings exceed spent tokens.
+**Filled-in example**:
+```yaml
+roi:
+  spent_tokens_estimate: "~6K"
+  estimated_savings: "~20K"
+  net: positive
+  rationale: |
+    Caught D-02 (undeclared scope: set -e audit) before implementation.
+    Without this finding, the implementer would have added tests, hit a fail-fast
+    on the missing audit, debugged for ~15K tokens, then re-implemented. ROI = +14K net.
+```
+---
+## Field-Level Validation
+Before writing the log, confirm:
+- Every `findings` entry has BOTH `spec_claim` AND `actual_state` populated verbatim.
+- Every `severity` value is one of LOW / MEDIUM / HIGH (or `n/a` for CONFIRMED).
+- Every `category` value matches the 7-entry taxonomy from `step-3-categorization.md`.
+- `proceed_decision` value matches the matrix in `step-6-decision-matrix.md` (any HIGH → STOP_USER_APPROVAL).
+- `adjusted_plan.deliverables[*].change_from_spec` is set on every entry.
+If any check fails, fix before writing — the log is binding (Stage 7) and downstream consumers parse it.
+---
+## After Writing the Log
+The log path is the artifact. Surface it to the user. If the verdict is STOP_USER_APPROVAL, immediately follow with the AskUserQuestion flow per `step-6-decision-matrix.md`. The implementer (next session, next command) reads from this log, not from the original spec.

package/skills/spec-drift-check/references/step-6-decision-matrix.md ADDED Viewed

@@ -0,0 +1,136 @@
+# Step 6 — Decision Matrix
+## Purpose
+Map the **mix of findings** from Stage 3 to a verdict. The verdict is one of three values:
+- **PROCEED** — original plan stands; no adjustments needed.
+- **PROCEED_ADJUSTED** — adjusted plan in the verification log supersedes the original; flag in summary; no user gate.
+- **STOP_USER_APPROVAL** — surface findings to the user via AskUserQuestion; do NOT bind the adjusted plan until the user signs off.
+---
+## Decision Matrix
+| Finding mix | Verdict | Rationale |
+|-------------|---------|-----------|
+| All CONFIRMED | **PROCEED** | The spec is in alignment with current code; no plan rewrite needed. |
+| Only LOW + MEDIUM findings | **PROCEED_ADJUSTED** | Drift exists but is contained; the adjusted plan resolves it without scope changes. |
+| Any HIGH finding | **STOP_USER_APPROVAL** | Scope changes, wrong-file drift, GAPs, or undeclared-scope additions need explicit user sign-off. |
+Reading order: **HIGH wins**. If the run produces 30 LOW findings and 1 HIGH, the verdict is STOP_USER_APPROVAL. The matrix is not a vote count.
+---
+## Tie-Breaking
+The matrix above resolves cleanly — there are no genuine ties. Two situations that look like ties:
+- **All CONFIRMED + a single AC-reinterpretation MEDIUM** → still PROCEED_ADJUSTED. The ambiguity must be resolved in the log; that resolution is binding via Stage 7.
+- **Mostly CONFIRMED + a borderline finding (LOW vs MEDIUM)** → choose MEDIUM (per `step-3-categorization.md` tie-breaker). Verdict stays PROCEED_ADJUSTED in either case.
+---
+## STOP_USER_APPROVAL Escalation
+When the verdict is STOP_USER_APPROVAL, the orchestrator must:
+1. **Write the verification log first** (Stage 5). The log is the source of truth; the user reviews from it.
+2. **Surface each HIGH finding individually** to the user via AskUserQuestion. Do NOT batch HIGH findings into a single yes/no — the user must adjudicate each.
+3. **Propose an adjusted scope** for each HIGH finding: drop the deliverable / add the deliverable / re-target the path / clarify the AC.
+4. **Wait for user input** before binding the adjusted plan via Stage 7. Do NOT proceed to implementation.
+### AskUserQuestion Template
+Use this fragment, customized per finding. One AskUserQuestion call per HIGH finding (or per logically-coupled group of findings).
+```text
+question: "spec-drift-check found HIGH-severity drift in {spec-path}. Approve scope adjustment?"
+multiSelect: false
+header: "Spec Drift — HIGH ({finding-id})"
+options:
+  - label: "Approve adjusted scope"
+    description: |
+      Finding {D-NN} ({category}): {short summary}.
+      Spec claim: "{verbatim spec quote, truncated to ~120 chars}"
+      Actual state: "{verbatim actual quote, truncated to ~120 chars}"
+      Proposed adjustment: {proposed adjustment in 1-2 sentences}
+      Token-budget impact: {+NK | -NK | 0}
+  - label: "Reject adjustment — keep original spec"
+    description: |
+      Implementation will follow the original spec verbatim despite the drift.
+      The orchestrator will document the override in the verification log.
+  - label: "Need more detail"
+    description: |
+      Open the verification log at logs/spec-verify-{session}-{topic}.md and
+      review finding {D-NN} in full before deciding.
+```
+### Worked AskUserQuestion Example
+For a hypothetical D-02 (DRIFT-undeclared-scope: `set -e` audit task missing from spec):
+```text
+question: "spec-drift-check found HIGH-severity drift in plans/task-briefs/P10.16-...md. Approve scope adjustment?"
+multiSelect: false
+header: "Spec Drift — HIGH (D-02)"
+options:
+  - label: "Approve adjusted scope"
+    description: |
+      Finding D-02 (drift-undeclared-scope): the test plan implicitly requires
+      a set -e audit per process_test_harness_set_e_pattern.md, which the spec
+      does not mention.
+      Spec claim: "Add 5 test cases to test-suggest-pipeline-stop.sh."
+      Actual state: "Harness uses set -euo pipefail; new tests must use the
+      assertion-counter pattern, not naive expect-fail."
+      Proposed adjustment: add a Step 3a "audit set -e + counter pattern"
+      task before adding test cases.
+      Token-budget impact: +1K
+  - label: "Reject adjustment — keep original spec"
+    description: |
+      Implementation will follow the original spec verbatim. The orchestrator
+      will document the override in the verification log.
+  - label: "Need more detail"
+    description: |
+      Open the verification log at logs/spec-verify-122-P10.16.md and
+      review finding D-02 in full before deciding.
+```
+---
+## Procedure: STOP and Surface
+When the verdict fires STOP_USER_APPROVAL, follow this exact sequence:
+1. **Stage 5 log written.** Confirm the log file exists and contains every HIGH finding with both `spec_claim` and `actual_state` populated verbatim.
+2. **List each HIGH finding** to the user in the orchestrator's text response. Format: `D-NN ({category}, severity HIGH): {one-line summary}`. This gives the user a quick scan before the AskUserQuestion fires.
+3. **Propose adjusted scope** in the same text response. One bullet per HIGH finding.
+4. **Fire AskUserQuestion** per HIGH finding (in the next assistant message). Use the template above.
+5. **Wait for the response** to each AskUserQuestion before proceeding. Do NOT chain multiple AskUserQuestion calls in parallel — sequential adjudication is intentional.
+6. **Bind the adjusted plan (Stage 7)** only after every HIGH finding has an explicit user decision recorded in the log.
+If the user rejects an adjustment, document the rejection in the log under `decision_rationale:` and either:
+- Carry the original spec deliverable forward unchanged (and accept the implementation risk), OR
+- Defer the entire WP back to the user for spec revision.
+---
+## Decision Rationale Field
+Every verdict — even PROCEED — should populate `decision_rationale:` in the log with a 1-3 sentence summary. Examples:
+- **PROCEED**: "All 23 claims CONFIRMED. Spec aligns with current code; no adjustments needed."
+- **PROCEED_ADJUSTED**: "5 LOW findings (line-ref drift), 2 MEDIUM (1 missing-scope, 1 AC-reinterpretation). Adjusted plan drops the no-op deliverable and resolves AC-3 explicitly. No scope changes."
+- **STOP_USER_APPROVAL**: "1 HIGH finding (D-02, drift-undeclared-scope: set -e audit). Surface to user before binding adjusted plan."
+The rationale becomes part of the audit trail; downstream sessions read it to understand why this WP's plan was rewritten.