sequant 2.3.0 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (101) hide show
  1. package/.claude-plugin/marketplace.json +2 -2
  2. package/.claude-plugin/plugin.json +2 -2
  3. package/README.md +125 -160
  4. package/dist/bin/cli.js +59 -4
  5. package/dist/dashboard/server.js +1 -0
  6. package/dist/marketplace/external_plugins/sequant/.claude-plugin/plugin.json +2 -2
  7. package/dist/marketplace/external_plugins/sequant/README.md +6 -3
  8. package/dist/marketplace/external_plugins/sequant/hooks/post-tool.sh +92 -0
  9. package/dist/marketplace/external_plugins/sequant/hooks/pre-tool.sh +18 -9
  10. package/dist/marketplace/external_plugins/sequant/hooks/relay-check.sh +107 -0
  11. package/dist/marketplace/external_plugins/sequant/skills/_shared/references/behavior-rule-detection.md +205 -0
  12. package/dist/marketplace/external_plugins/sequant/skills/_shared/references/subagent-types.md +21 -8
  13. package/dist/marketplace/external_plugins/sequant/skills/assess/SKILL.md +302 -86
  14. package/dist/marketplace/external_plugins/sequant/skills/assess/references/predicted-collision-detection.md +109 -0
  15. package/dist/marketplace/external_plugins/sequant/skills/docs/SKILL.md +141 -22
  16. package/dist/marketplace/external_plugins/sequant/skills/exec/SKILL.md +83 -78
  17. package/dist/marketplace/external_plugins/sequant/skills/fullsolve/SKILL.md +377 -137
  18. package/dist/marketplace/external_plugins/sequant/skills/loop/SKILL.md +28 -0
  19. package/dist/marketplace/external_plugins/sequant/skills/merger/SKILL.md +621 -0
  20. package/dist/marketplace/external_plugins/sequant/skills/qa/SKILL.md +741 -232
  21. package/dist/marketplace/external_plugins/sequant/skills/qa/scripts/quality-checks.sh +47 -1
  22. package/dist/marketplace/external_plugins/sequant/skills/setup/SKILL.md +12 -6
  23. package/dist/marketplace/external_plugins/sequant/skills/spec/SKILL.md +217 -964
  24. package/dist/marketplace/external_plugins/sequant/skills/spec/references/parallel-groups.md +7 -0
  25. package/dist/marketplace/external_plugins/sequant/skills/spec/references/quality-checklist.md +75 -0
  26. package/dist/marketplace/external_plugins/sequant/skills/spec/references/recommended-workflow.md +4 -2
  27. package/dist/marketplace/external_plugins/sequant/skills/test/SKILL.md +0 -27
  28. package/dist/marketplace/external_plugins/sequant/skills/testgen/SKILL.md +24 -44
  29. package/dist/src/commands/abort.d.ts +36 -0
  30. package/dist/src/commands/abort.js +138 -0
  31. package/dist/src/commands/prompt.d.ts +7 -0
  32. package/dist/src/commands/prompt.js +101 -7
  33. package/dist/src/commands/ready-tui-adapter.d.ts +59 -0
  34. package/dist/src/commands/ready-tui-adapter.js +130 -0
  35. package/dist/src/commands/ready.d.ts +49 -0
  36. package/dist/src/commands/ready.js +243 -0
  37. package/dist/src/commands/run-progress.d.ts +11 -1
  38. package/dist/src/commands/run-progress.js +20 -3
  39. package/dist/src/commands/run.js +12 -2
  40. package/dist/src/commands/status.js +4 -0
  41. package/dist/src/commands/watch.d.ts +2 -0
  42. package/dist/src/commands/watch.js +67 -3
  43. package/dist/src/lib/assess-collision-detect.js +1 -1
  44. package/dist/src/lib/cli-ui/run-renderer-types.d.ts +39 -0
  45. package/dist/src/lib/cli-ui/run-renderer.d.ts +34 -2
  46. package/dist/src/lib/cli-ui/run-renderer.js +250 -33
  47. package/dist/src/lib/cli-ui/scrollback-harness.d.ts +112 -0
  48. package/dist/src/lib/cli-ui/scrollback-harness.js +294 -0
  49. package/dist/src/lib/merge-check/types.js +1 -1
  50. package/dist/src/lib/relay/archive.js +6 -0
  51. package/dist/src/lib/relay/types.d.ts +2 -0
  52. package/dist/src/lib/relay/types.js +9 -0
  53. package/dist/src/lib/settings.d.ts +34 -0
  54. package/dist/src/lib/settings.js +23 -1
  55. package/dist/src/lib/workflow/batch-executor.js +34 -18
  56. package/dist/src/lib/workflow/drivers/agent-driver.d.ts +48 -1
  57. package/dist/src/lib/workflow/drivers/aider.d.ts +7 -1
  58. package/dist/src/lib/workflow/drivers/aider.js +9 -0
  59. package/dist/src/lib/workflow/drivers/claude-code.d.ts +17 -1
  60. package/dist/src/lib/workflow/drivers/claude-code.js +51 -2
  61. package/dist/src/lib/workflow/drivers/index.d.ts +1 -1
  62. package/dist/src/lib/workflow/event-emitter.d.ts +157 -0
  63. package/dist/src/lib/workflow/event-emitter.js +102 -0
  64. package/dist/src/lib/workflow/notice.d.ts +32 -0
  65. package/dist/src/lib/workflow/notice.js +38 -0
  66. package/dist/src/lib/workflow/phase-executor.d.ts +9 -21
  67. package/dist/src/lib/workflow/phase-executor.js +105 -117
  68. package/dist/src/lib/workflow/phase-mapper.d.ts +26 -13
  69. package/dist/src/lib/workflow/phase-mapper.js +55 -33
  70. package/dist/src/lib/workflow/phase-registry.d.ts +127 -0
  71. package/dist/src/lib/workflow/phase-registry.js +233 -0
  72. package/dist/src/lib/workflow/platforms/github.d.ts +6 -0
  73. package/dist/src/lib/workflow/platforms/github.js +17 -0
  74. package/dist/src/lib/workflow/ready-gate.d.ts +155 -0
  75. package/dist/src/lib/workflow/ready-gate.js +374 -0
  76. package/dist/src/lib/workflow/reconcile.js +6 -0
  77. package/dist/src/lib/workflow/run-log-schema.d.ts +5 -55
  78. package/dist/src/lib/workflow/run-orchestrator.d.ts +32 -2
  79. package/dist/src/lib/workflow/run-orchestrator.js +125 -11
  80. package/dist/src/lib/workflow/state-manager.d.ts +19 -1
  81. package/dist/src/lib/workflow/state-manager.js +27 -1
  82. package/dist/src/lib/workflow/state-schema.d.ts +23 -35
  83. package/dist/src/lib/workflow/state-schema.js +29 -3
  84. package/dist/src/lib/workflow/types.d.ts +74 -15
  85. package/dist/src/lib/workflow/types.js +18 -13
  86. package/dist/src/ui/tui/App.js +8 -2
  87. package/dist/src/ui/tui/IssueBox.js +3 -4
  88. package/dist/src/ui/tui/index.d.ts +13 -4
  89. package/dist/src/ui/tui/index.js +19 -5
  90. package/dist/src/ui/tui/row-cap.d.ts +51 -0
  91. package/dist/src/ui/tui/row-cap.js +76 -0
  92. package/dist/src/ui/tui/teardown.d.ts +20 -0
  93. package/dist/src/ui/tui/teardown.js +29 -0
  94. package/dist/src/ui/tui/theme.d.ts +3 -0
  95. package/dist/src/ui/tui/theme.js +3 -0
  96. package/package.json +23 -11
  97. package/templates/hooks/post-tool.sh +81 -0
  98. package/templates/skills/assess/SKILL.md +28 -28
  99. package/templates/skills/assess/references/predicted-collision-detection.md +1 -1
  100. package/templates/skills/qa/SKILL.md +5 -2
  101. package/templates/skills/setup/SKILL.md +6 -6
@@ -48,6 +48,7 @@ When running as part of an orchestrated workflow (e.g., `sequant run` or `/fulls
48
48
  | `SEQUANT_PHASE` | Current phase in the workflow | `qa` |
49
49
  | `SEQUANT_ISSUE` | Issue number being processed | `123` |
50
50
  | `SEQUANT_WORKTREE` | Path to the feature worktree | `/path/to/worktrees/feature/...` |
51
+ | `SEQUANT_FULL_QA` | Force full-weight (standalone) QA pre-flight even under an orchestrator (#683) | `1` |
51
52
 
52
53
  **Behavior when orchestrated (SEQUANT_ORCHESTRATOR is set):**
53
54
 
@@ -57,6 +58,8 @@ When running as part of an orchestrated workflow (e.g., `sequant run` or `/fulls
57
58
  4. **Reduce GitHub comment frequency** - Defer updates to orchestrator
58
59
  5. **Trust git state** - Orchestrator verified branch status
59
60
 
61
+ > **Full-weight override (`SEQUANT_FULL_QA=1`, #683).** When this flag is set (e.g. by `sequant ready`), do NOT take the git-state shortcuts above. Run the **standalone** pre-flight sync check, the stale-branch detection, and the process-state inspection (uncommitted work, divergent/zero-commit branches) even though `SEQUANT_ORCHESTRATOR` is set — this is the deliberate fresh full-weight pass that catches the no-implementation / divergent-branch class. The other orchestrated behaviors (skip issue fetch, reduced GitHub comments) still apply.
62
+
60
63
  **Behavior when standalone (SEQUANT_ORCHESTRATOR is NOT set):**
61
64
 
62
65
  - Perform pre-flight sync check
@@ -107,9 +110,11 @@ COMMIT_SHA=$(git rev-parse HEAD)
107
110
  ```
108
111
 
109
112
  ```markdown
110
- <!-- SEQUANT_PHASE: {"phase":"qa","status":"completed","timestamp":"<ISO-8601>","commitSHA":"<HEAD-SHA>"} -->
113
+ <!-- SEQUANT_PHASE: {"phase":"qa","status":"completed","timestamp":"<ISO-8601>","commitSHA":"<HEAD-SHA>","verdict":"<READY_FOR_MERGE|AC_MET_BUT_NOT_A_PLUS|NEEDS_VERIFICATION>"} -->
111
114
  ```
112
115
 
116
+ **Note:** The `verdict` field is required on `status:"completed"` markers so the Phase 0a short-circuit can surface the prior verdict without re-reading the comment body. Older markers without this field are still accepted — Phase 0a falls back to `(see prior QA comment)`.
117
+
113
118
  If QA determines AC_NOT_MET, emit:
114
119
  ```markdown
115
120
  <!-- SEQUANT_PHASE: {"phase":"qa","status":"failed","timestamp":"<ISO-8601>","error":"AC_NOT_MET","commitSHA":"<HEAD-SHA>"} -->
@@ -122,9 +127,23 @@ Include this marker in every `gh issue comment` that represents QA completion.
122
127
  Invocation:
123
128
 
124
129
  - `/qa 123`: Treat `123` as the GitHub issue/PR identifier in context.
130
+ - `/qa 123 172`: Treat both as issue numbers — process each sequentially.
125
131
  - `/qa <freeform description>`: Treat the text as context about the change to review.
126
132
  - `/qa 123 --parallel`: Force parallel agent execution (faster, higher token usage).
127
133
  - `/qa 123 --sequential`: Force sequential agent execution (slower, lower token usage).
134
+ - `/qa 123 --force`: Bypass prior-QA short-circuit and force a full re-run even if the last QA covers the current commit.
135
+
136
+ ### Multi-Issue Invocation
137
+
138
+ When multiple issue numbers are provided (e.g., `/qa 167 172`):
139
+
140
+ 1. **Parse all issue numbers** from args
141
+ 2. **Process each issue sequentially** with inline code review — do NOT spawn ad-hoc background agents for the diff reading or AC verification portions
142
+ 3. The built-in `sequant-qa-checker` sub-agents (type safety, scope, security) continue to run per the size gate rules for each issue
143
+ 4. Each issue gets its own full QA cycle: context fetch → diff review → quality checks → verdict → comment
144
+ 5. Post a **separate QA comment** to each issue's GitHub thread
145
+
146
+ **Why sequential with inline review:** Ad-hoc background agents for code review are unreliable — they hallucinate about file existence, misattribute API patterns, and hit permission issues on worktree reads. The narrowly-scoped `sequant-qa-checker` agents work well because they have specific, bounded tasks. The code review portion must stay inline for accuracy.
128
147
 
129
148
  ### Agent Execution Mode
130
149
 
@@ -214,7 +233,7 @@ If the cache is corrupted or unreadable:
214
233
 
215
234
  ### Pre-flight Sync Check
216
235
 
217
- **Skip this section if `SEQUANT_ORCHESTRATOR` is set** - the orchestrator has already verified sync status.
236
+ **Skip this section if `SEQUANT_ORCHESTRATOR` is set** (the orchestrator has already verified sync status) — **unless `SEQUANT_FULL_QA=1`**, in which case run this check even under an orchestrator (#683).
218
237
 
219
238
  Before starting QA (standalone mode), verify the local branch is in sync with remote:
220
239
 
@@ -235,7 +254,7 @@ git pull origin main # Or merge origin/main if pull fails
235
254
 
236
255
  ### Stale Branch Detection
237
256
 
238
- **Skip this section if `SEQUANT_ORCHESTRATOR` is set** - the orchestrator handles branch freshness checks.
257
+ **Skip this section if `SEQUANT_ORCHESTRATOR` is set** (the orchestrator handles branch freshness checks) — **unless `SEQUANT_FULL_QA=1`**, in which case run the branch-freshness check even under an orchestrator (#683).
239
258
 
240
259
  **Purpose:** Detect when the feature branch is significantly behind main, which can lead to:
241
260
  - QA cycles wasted reviewing code that won't cleanly merge
@@ -487,6 +506,87 @@ fi
487
506
 
488
507
  ---
489
508
 
509
+ ### Phase 0a: Prior QA Short-Circuit Check
510
+
511
+ **After confirming implementation exists** (Phase 0 passed), check whether a prior QA run already covers the current commit. This avoids re-running the full QA pipeline when nothing has changed.
512
+
513
+ **Skip this check if any of these are true:**
514
+ - `--force` flag is present in the invocation args
515
+ - `--no-cache` flag is present in the invocation args
516
+ - `SEQUANT_ORCHESTRATOR` is set and the orchestrator explicitly requests a fresh run
517
+
518
+ **Detection Logic:**
519
+
520
+ ```bash
521
+ # 1. Get current HEAD SHA
522
+ current_sha=$(git rev-parse HEAD)
523
+
524
+ # 2. Fetch the latest qa:completed or qa:failed phase marker from issue comments
525
+ # NOTE: Use `.comments[].body` (NOT `[.comments[].body]`). The array form JSON-encodes
526
+ # each body, escaping internal quotes (`"phase":"qa"` → `\"phase\":\"qa\"`) and `<` →
527
+ # `\u003c`, which defeats the grep pattern below. The streaming form outputs raw bodies.
528
+ latest_qa_marker=$(gh issue view <issue-number> --json comments --jq '.comments[].body' | \
529
+ grep -o '<!-- SEQUANT_PHASE: {[^}]*"phase":"qa"[^}]*} -->' | \
530
+ tail -1 || true)
531
+
532
+ # 3. Extract status, commitSHA, verdict, and timestamp from the marker
533
+ if [[ -n "$latest_qa_marker" ]]; then
534
+ marker_json=$(echo "$latest_qa_marker" | grep -o '{[^}]*}')
535
+ marker_status=$(echo "$marker_json" | jq -r '.status // empty' 2>/dev/null || true)
536
+ marker_sha=$(echo "$marker_json" | jq -r '.commitSHA // empty' 2>/dev/null || true)
537
+ marker_timestamp=$(echo "$marker_json" | jq -r '.timestamp // empty' 2>/dev/null || true)
538
+ marker_verdict=$(echo "$marker_json" | jq -r '.verdict // empty' 2>/dev/null || true)
539
+ fi
540
+ ```
541
+
542
+ **Short-Circuit Decision Matrix:**
543
+
544
+ | marker_status | marker_sha == HEAD | Action |
545
+ |---------------|-------------------|--------|
546
+ | `completed` | Yes | **Short-circuit** — skip full QA |
547
+ | `completed` | No | Proceed with full QA (new commits since last run) |
548
+ | `failed` | Yes or No | Proceed with full QA (user likely wants re-run after fix) |
549
+ | (not found) | N/A | Proceed with full QA (no prior run) |
550
+
551
+ **When short-circuiting (status=completed, SHA matches):**
552
+
553
+ 1. **Skip** sub-agent spawning
554
+ 2. **Skip** code review and quality checks
555
+ 3. **Output** the short-circuit summary (template below)
556
+ 4. **Do NOT** post a new GitHub comment (the prior comment is still valid)
557
+
558
+ **Short-Circuit Output Template:**
559
+
560
+ Populate `**Prior Verdict:**` from `$marker_verdict` when non-empty. When empty (legacy marker without the field), substitute the literal string `(see prior QA comment)`.
561
+
562
+ ```markdown
563
+ ## QA Review for Issue #<N>
564
+
565
+ ### Prior QA Still Valid
566
+
567
+ QA already completed at commit `<SHA>` on <timestamp> — no changes since last run.
568
+ Current HEAD (`<current_sha>`) matches the previously reviewed commit.
569
+
570
+ **Prior Verdict:** <$marker_verdict OR "(see prior QA comment)" if empty>
571
+
572
+ To force a full re-run, use: `/qa <N> --force` or `/qa <N> --no-cache`
573
+
574
+ ---
575
+
576
+ *QA short-circuited: prior run at same SHA is still valid*
577
+ ```
578
+
579
+ **Verdict field handling:**
580
+
581
+ | `$marker_verdict` | Action |
582
+ |-------------------|--------|
583
+ | Non-empty (new markers) | Emit literally: `**Prior Verdict:** READY_FOR_MERGE` (etc.) |
584
+ | Empty (legacy markers) | Emit: `**Prior Verdict:** (see prior QA comment)` — the prior comment body contains the full verdict |
585
+
586
+ The short-circuit itself still triggers in both cases — only the displayed verdict text differs.
587
+
588
+ ---
589
+
490
590
  ### Phase 0b: Quality Plan Verification (CONDITIONAL)
491
591
 
492
592
  **When to apply:** If issue has a Feature Quality Planning section in comments (from `/spec`).
@@ -599,131 +699,79 @@ quality_plan_exists=$(gh issue view <issue> --comments --json comments -q '.comm
599
699
 
600
700
  ---
601
701
 
602
- ### Phase 0c: Incremental Re-Run Detection (CONDITIONAL)
702
+ ### Phase 0c: Precheck Findings (CONDITIONAL)
603
703
 
604
- **When to apply:** On QA re-runs (when a prior QA phase marker exists in issue comments).
704
+ **Purpose:** Consume deterministic gap-check output from `scripts/qa/precheck.ts`. The script handles three checks the agent doesn't need to evaluate but pays token cost for if inlined: verbatim motivating-example fixture extraction, cross-file sibling-grep on changed identifiers, and AC literal-id diff between issue body and PR body. Downstream sections (§1 AC Literal Verification, §5 Sibling-site Scan, §6c Step 4 Fixture Verification, §6d Q1 Verbatim Fixtures) consult the precheck output when available and fall back to inline logic on miss/error.
605
705
 
606
- **Purpose:** Optimize QA re-runs by detecting what changed since the last QA run and skipping checks whose inputs haven't changed. This significantly reduces token usage and execution time on iterative QA cycles.
706
+ **Origin:** #609 extract deterministic gap-checks into a pre-QA gate. Backed by #608's signal-to-noise study (e.g. §6c at 0/11 actioned findings, ~1,800 tokens/invoke).
607
707
 
608
- **Detection:**
708
+ **Run (best-effort, exit code is always 0 even on partial failure):**
609
709
 
610
710
  ```bash
611
- # Step 1: Check for prior QA run context in cache
612
- prior_context=$(npx tsx scripts/qa/qa-cache-cli.ts get-run-context 2>/dev/null || true)
613
-
614
- # Step 2: If no cache context found, fall through to full QA run
615
- if [[ -z "$prior_context" ]] || echo "$prior_context" | grep -q "No QA run context"; then
616
- echo "No prior QA context found — running full QA"
617
- INCREMENTAL_MODE=false
618
- else
619
- LAST_QA_SHA=$(echo "$prior_context" | jq -r '.lastQACommitSHA')
620
- LAST_QA_HASH=$(echo "$prior_context" | jq -r '.lastQADiffHash')
621
-
622
- # Step 3: Validate the commit SHA still exists in git history
623
- if ! git cat-file -t "$LAST_QA_SHA" &>/dev/null; then
624
- echo "Warning: Last QA commit SHA ($LAST_QA_SHA) not found in history — running full QA"
625
- INCREMENTAL_MODE=false
626
- else
627
- # Step 4: Get files changed since last QA
628
- changed_files=$(npx tsx scripts/qa/qa-cache-cli.ts changed-since "$LAST_QA_SHA" 2>/dev/null || true)
629
-
630
- if [[ "$changed_files" == "NO_CHANGES" ]]; then
631
- echo "No changes since last QA — all checks can use cached results"
632
- INCREMENTAL_MODE=true
633
- NO_FILE_CHANGES=true
634
- else
635
- echo "Changes detected since last QA ($LAST_QA_SHA):"
636
- echo "$changed_files" | head -20
637
- INCREMENTAL_MODE=true
638
- NO_FILE_CHANGES=false
639
- fi
640
- fi
641
- fi
711
+ issue=<issue-number>
712
+ npx tsx scripts/qa/precheck.ts --issue "$issue" 2>/dev/null || true
642
713
  ```
643
714
 
644
- **Skip Logic (when INCREMENTAL_MODE=true):**
715
+ The script also auto-detects the PR via `gh pr view --json number` and runs `git diff origin/main...HEAD` for the changed-identifier scan.
645
716
 
646
- | Check / Item | Skip Condition | Re-run Condition |
647
- |-------------|----------------|------------------|
648
- | Quality checks (type-safety, security, etc.) | Existing diff-hash cache handles this | Hash mismatch -> re-run |
649
- | Build verification | **Never skip** (always re-run) | Always — cheap and can regress |
650
- | CI status | **Never skip** (always re-run) | Always — external state changes |
651
- | AC items with prior status `met` | Skip if NO_FILE_CHANGES=true | Any file changes since last QA |
652
- | AC items with prior status `not_met` | **Never skip** | Always re-evaluate |
653
- | AC items with prior status `partially_met` | **Never skip** | Always re-evaluate |
654
- | AC items with prior status `pending`/`blocked` | **Never skip** | Always re-evaluate |
717
+ **Output:** `.sequant/gap-precheck.json` (schemaVersion 1):
655
718
 
656
- **AC Re-evaluation Rules:**
719
+ ```json
720
+ {
721
+ "schemaVersion": 1,
722
+ "issue": 609,
723
+ "pr": 999,
724
+ "generatedAt": "...",
725
+ "checks": {
726
+ "fixtures": { "status": "pass | not_applicable | fail", "count": N, "fixtures": [...] },
727
+ "siblingGrep": { "status": "...", "identifiers": [{ "name": "...", "definedIn": "...", "siblingSites": [...] }] },
728
+ "acLiteralDiff": { "status": "...", "issueACs": [...], "prACs": [...], "missingInPR": [...] }
729
+ }
730
+ }
731
+ ```
657
732
 
658
- When `INCREMENTAL_MODE=true`:
733
+ **Consumption rules per downstream section:**
659
734
 
660
- 1. **Load prior AC statuses** from run context:
661
- ```bash
662
- # Extract AC statuses from prior context
663
- ac_statuses=$(echo "$prior_context" | jq -r '.acStatuses | to_entries[] | "\(.key)=\(.value)"')
664
- ```
735
+ | Precheck status | Downstream section behavior |
736
+ |-----------------|-----------------------------|
737
+ | `pass` | Use precheck output as primary input; agent judgment evaluates surfaced candidates (e.g. is each sibling-grep hit a real sibling site?) |
738
+ | `not_applicable` | Treat as section N/A; do NOT inline-re-extract |
739
+ | `fail` | Fall back to inline extraction (the section's pre-existing logic) |
740
+ | Precheck JSON missing / malformed | Fall back to inline extraction; note "precheck unavailable" |
665
741
 
666
- 2. **For each AC item:**
667
- - If prior status is `met` AND `NO_FILE_CHANGES=true`:
668
- - **Skip full re-evaluation** — output "Cached: previously MET, no file changes"
669
- - Mark as `MET (cached)` in output
670
- - If prior status is `met` AND files changed:
671
- - **Re-evaluate** — changes may have caused regression
672
- - If prior status is `not_met` or `partially_met`:
673
- - **Always re-evaluate** — this is the primary purpose of re-runs
674
- - If prior status is `pending` or `blocked`:
675
- - **Always re-evaluate** — status may have changed
742
+ **Fallback (precheck JSON missing / malformed):**
676
743
 
677
- 3. **`--no-cache` flag behavior:**
678
- - When `--no-cache` is passed, set `INCREMENTAL_MODE=false`
679
- - This forces full re-evaluation of ALL checks and AC items
680
- - Run context is still saved at the end for future re-runs
744
+ ```bash
745
+ precheck_path=".sequant/gap-precheck.json"
746
+ precheck_ok="no"
747
+ if [[ -f "$precheck_path" ]]; then
748
+ schema=$(jq -r .schemaVersion "$precheck_path" 2>/dev/null || echo "")
749
+ if [[ "$schema" == "1" ]]; then
750
+ precheck_ok="yes"
751
+ fi
752
+ fi
753
+ # When precheck_ok=no, every downstream consumer falls back to its inline path.
754
+ ```
681
755
 
682
- **Output Format (Incremental QA Summary):**
756
+ Do NOT block the QA run on a missing precheck — the script is best-effort. The fallback path is the pre-#609 behavior, which still produces a correct verdict (just at higher token cost).
683
757
 
684
- When `INCREMENTAL_MODE=true`, prepend this section to the QA output:
758
+ **Output Format:**
685
759
 
686
760
  ```markdown
687
- ### Incremental QA Summary
688
-
689
- **Last QA:** <timestamp> (commit: <sha-short>)
690
- **Changes since last QA:** N files
761
+ ### Precheck Findings
691
762
 
692
- | Check / AC | Status | Re-run? | Reason |
693
- |------------|--------|---------|--------|
694
- | type-safety | PASS | Cached | Diff hash unchanged |
695
- | security | PASS | Cached | Diff hash unchanged |
696
- | build | PASS | Re-run | Always fresh |
697
- | CI status | PASS | Re-run | Always fresh |
698
- | AC-1 | MET | Cached | Previously MET, no file changes |
699
- | AC-2 | MET | Re-evaluated | Was NOT_MET |
700
- | AC-3 | MET | Re-evaluated | Files changed since last QA |
763
+ | Section | Status | Surfaced |
764
+ |---------|--------|----------|
765
+ | Fixtures | pass | 3 motivating-example fixtures (consumed by §6c Step 4 / §6d Q1) |
766
+ | Sibling-grep | pass | 5 changed identifiers (candidate sites surfaced to §5) |
767
+ | AC literal-diff | fail | Issue lists AC-3 / AC-4; PR body omits them |
701
768
 
702
- **Summary:** X checks cached, Y re-evaluated, Z always-fresh
769
+ **Source:** `.sequant/gap-precheck.json` (schemaVersion 1)
703
770
  ```
704
771
 
705
- **Run Context Persistence:**
706
-
707
- After QA completes (regardless of incremental mode), save the run context:
708
-
709
- ```bash
710
- # Get current HEAD SHA
711
- current_sha=$(git rev-parse HEAD)
712
- # Get current diff hash
713
- current_hash=$(npx tsx scripts/qa/qa-cache-cli.ts hash)
772
+ If precheck unavailable, omit the table and emit a single line: `**Precheck Findings:** unavailable — inline fallback used.`
714
773
 
715
- # Build AC statuses JSON from QA results
716
- # Example: {"AC-1":"met","AC-2":"not_met","AC-3":"met"}
717
- ac_json='{"AC-1":"met","AC-2":"not_met"}' # Replace with actual results
718
-
719
- # Save run context
720
- echo "{
721
- \"lastQACommitSHA\": \"$current_sha\",
722
- \"lastQADiffHash\": \"$current_hash\",
723
- \"acStatuses\": $ac_json,
724
- \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%S.000Z)\"
725
- }" | npx tsx scripts/qa/qa-cache-cli.ts set-run-context
726
- ```
774
+ **Verdict impact:** None directly. Phase 0c is plumbing — the downstream sections own the verdict effect when they consume the surfaced candidates.
727
775
 
728
776
  ---
729
777
 
@@ -752,11 +800,13 @@ fi
752
800
  | `FAILURE` | `fail` | `NOT_MET` | Blocks merge |
753
801
  | `CANCELLED` | `fail` | `NOT_MET` | Blocks merge |
754
802
  | `SKIPPED` | `pass` | `N/A` | No impact |
755
- | `PENDING` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
756
- | `QUEUED` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
757
- | `IN_PROGRESS` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
803
+ | `PENDING` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
804
+ | `QUEUED` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
805
+ | `IN_PROGRESS` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
758
806
  | (empty response) | - | `N/A` | No CI configured |
759
807
 
808
+ \* Pending checks may be reclassified as `MET` (informational) when the diff is markdown-only and the check name matches `qa.markdownOnlySafeCiPatterns` — see "Markdown-Only Diff Relaxation" below for the gating-vs-relaxed partitioning rules. Failed checks are never relaxed.
809
+
760
810
  **CI-Related AC Detection:**
761
811
 
762
812
  Identify AC items that depend on CI by matching these patterns:
@@ -824,12 +874,72 @@ No CI checks configured for this repository.
824
874
  **Verdict Integration:**
825
875
 
826
876
  CI status affects the final verdict through the standard verdict algorithm:
827
- - CI `PENDING` → AC item marked `PENDING` → Verdict: `NEEDS_VERIFICATION`
877
+ - CI `PENDING` (gating) → AC item marked `PENDING` → Verdict: `NEEDS_VERIFICATION`
878
+ - CI `PENDING` (relaxed via the markdown-only block below) → AC item marked `MET` → no impact on verdict
828
879
  - CI `failure` → AC item marked `NOT_MET` → Verdict: `AC_NOT_MET`
829
880
  - CI `success` → AC item marked `MET` → No additional impact
830
881
  - No CI → AC item marked `N/A` → No impact on verdict
831
882
 
832
- **Important:** Do NOT give `READY_FOR_MERGE` if any CI check is still pending. The correct verdict is `NEEDS_VERIFICATION` with a note to re-run QA after CI completes.
883
+ **Important:** Do NOT give `READY_FOR_MERGE` if any *gating* CI check is still pending. Pending checks are gating by default; the markdown-only relaxation below reclassifies a narrow allowlist as informational. When any gating-pending check remains, the correct verdict is `NEEDS_VERIFICATION` with a note to re-run QA after CI completes.
884
+
885
+ #### Markdown-Only Diff Relaxation
886
+
887
+ **Purpose:** A diff that touches only `.md` files cannot break the build matrix. Forcing `NEEDS_VERIFICATION` until the build matrix reports back wastes wakeup cycles when `typecheck` (already green) has proven the change is structurally inert. This relaxation reclassifies a small, configurable allowlist of pending checks (default: build matrix + Plugin Structure Validation) as informational for markdown-only diffs.
888
+
889
+ **Scope and limits:**
890
+
891
+ - **Only pending checks are relaxed.** Failed checks always gate, regardless of diff type.
892
+ - **Only the configured allowlist is relaxed.** Other pending checks (`validate-skills`, `Hooks Validation`, `validate-plugin`, etc.) still gate.
893
+ - **Build-affecting files disqualify the diff** even if every other change is `.md`. The detector treats `package.json`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`, `tsconfig*.json`, `*.config.{js,ts,mjs,cjs}`, and `.github/workflows/**` as non-markdown.
894
+
895
+ **Settings (`.sequant/settings.json`, `qa` section):**
896
+
897
+ | Key | Default | Effect |
898
+ |-----|---------|--------|
899
+ | `markdownOnlyCiRelaxed` | `true` | Master switch. Set to `false` to restore strict gating for paranoid projects. |
900
+ | `markdownOnlySafeCiPatterns` | `["build (*)", "Plugin Structure Validation"]` | Glob patterns (single `*` wildcard) for CI check names that are safe to ignore when pending on a markdown-only diff. Override per project to match local CI step names. |
901
+
902
+ **Procedure:**
903
+
904
+ ```bash
905
+ # 1. Read settings — silently fall back to defaults on parse failure.
906
+ relaxed_enabled=$(cat .sequant/settings.json 2>/dev/null \
907
+ | grep -o '"markdownOnlyCiRelaxed"[[:space:]]*:[[:space:]]*\(true\|false\)' \
908
+ | grep -o 'true\|false' || echo "true")
909
+
910
+ # 2. Compute changed-file list against origin/main.
911
+ changed_files=$(git diff origin/main...HEAD --name-only || true)
912
+
913
+ # 3. Run the helpers (single npx call returns the gating-pending bucket).
914
+ result=$(SEQUANT_QA_RELAX_FILES="$changed_files" SEQUANT_QA_RELAX_PENDING="$pending_check_names" \
915
+ npx tsx -e '
916
+ (async () => {
917
+ const m = await import("./src/lib/qa/markdown-only-ci.ts");
918
+ const { getSettings } = await import("./src/lib/settings.ts");
919
+ const files = (process.env.SEQUANT_QA_RELAX_FILES || "").split("\n").filter(Boolean);
920
+ const pending = (process.env.SEQUANT_QA_RELAX_PENDING || "").split("\n").filter(Boolean);
921
+ const settings = await getSettings();
922
+ const isMdOnly = m.detectMarkdownOnlyDiff(files);
923
+ const enabled = settings.qa.markdownOnlyCiRelaxed && isMdOnly;
924
+ const buckets = enabled
925
+ ? m.filterRelaxablePending(pending, settings.qa.markdownOnlySafeCiPatterns)
926
+ : { relaxed: [], gating: pending };
927
+ console.log(JSON.stringify({ isMdOnly, enabled, ...buckets }));
928
+ })();
929
+ ' 2>/dev/null || echo '{"isMdOnly":false,"enabled":false,"relaxed":[],"gating":[]}')
930
+ ```
931
+
932
+ When `enabled === true` (markdown-only diff AND `markdownOnlyCiRelaxed` is `true`), use `gating` for the verdict's `pending_count`; the `relaxed` list is informational and rendered in the output.
933
+
934
+ **Output transparency (REQUIRED when relaxation triggers):**
935
+
936
+ When `enabled === true` AND `relaxed.length > 0`, the `### CI Status` section MUST include this labeled note immediately after the CI summary line, listing the relaxed check names verbatim:
937
+
938
+ ```markdown
939
+ **Markdown-only diff detected — pending build-matrix checks treated as informational. Relaxed: build (20.x), build (22.x), Plugin Structure Validation.**
940
+ ```
941
+
942
+ If `enabled === false` (flag off, or diff includes non-markdown files), do not emit the note — render the CI Status section unchanged.
833
943
 
834
944
  ---
835
945
 
@@ -886,21 +996,21 @@ echo "Size gate: $total_changes lines changed (threshold: $threshold), pkg_chang
886
996
 
887
997
  Run these checks directly (no sub-agents needed):
888
998
 
889
- ```bash
890
- # Type safety: check for 'any' additions
891
- any_count=$(git diff origin/main...HEAD | grep '^\+' | grep -v '^\+\+\+' | grep -cw 'any' || true)
999
+ **IMPORTANT:** Use the Grep tool (not bash `grep`) for pattern matching — bash grep uses BSD regex on macOS which is incompatible with some patterns below. The Grep tool uses ripgrep which works cross-platform.
892
1000
 
1001
+ ```bash
893
1002
  # Deleted tests check
894
1003
  deleted_tests=$(git diff origin/main...HEAD --name-only --diff-filter=D | grep -cE '\.(test|spec)\.' || true)
895
1004
 
896
1005
  # Scope: files changed count
897
1006
  files_changed=$(git diff origin/main...HEAD --name-only | wc -l | tr -d ' ')
1007
+ ```
898
1008
 
899
- # Security scan (lightweight just check for obvious patterns in added lines)
900
- security_issues=$(git diff origin/main...HEAD | grep '^\+' | grep -v '^\+\+\+' | grep -ciE 'eval\(|innerHTML|dangerouslySetInnerHTML|exec\(|password.*=.*["']|secret.*=.*["']|api.?key.*=.*["']' || true)
1009
+ For type safety and security scans, use the Grep tool instead of bash:
1010
+ - **Type safety:** `Grep(pattern=":\\s*any[,;)\\]]|as any", path="<changed-files>")` on added lines
1011
+ - **Security scan:** `Grep(pattern="eval\\(|innerHTML|dangerouslySetInnerHTML|password.*=.*[\"']|secret.*=.*[\"']", path="<changed-files>")` on added lines
901
1012
 
902
- echo "Inline checks: any=$any_count, deleted_tests=$deleted_tests, files=$files_changed, security_issues=$security_issues"
903
- ```
1013
+ Count results from the Grep tool output to get `any_count` and `security_issues`.
904
1014
 
905
1015
  **After inline checks, skip to the output template** (the sub-agent section below is not executed).
906
1016
 
@@ -966,6 +1076,12 @@ issue_type="${SEQUANT_ISSUE_TYPE:-}"
966
1076
  admin_modified=$(git diff main...HEAD --name-only | grep -E "^app/admin/" | head -1 || true)
967
1077
  ```
968
1078
 
1079
+ **Add skill sync check if skill files modified:**
1080
+ ```bash
1081
+ skill_modified=$(git diff main...HEAD --name-only | grep -E "^\.(claude/skills|skills|templates/skills)/" | head -1 || true)
1082
+ ```
1083
+ If skill files are modified, the quality-checks.sh script automatically runs the three-directory sync check (section 12). If divergence is detected, this blocks `READY_FOR_MERGE` — verdict becomes `AC_MET_BUT_NOT_A_PLUS` with a note to run `npx tsx scripts/check-skill-sync.ts --fix`.
1084
+
969
1085
  See [quality-gates.md](references/quality-gates.md) for detailed verdict synthesis.
970
1086
 
971
1087
  ### Using MCP Tools (Optional)
@@ -1476,43 +1592,6 @@ Provide a sentence or two explaining why.
1476
1592
 
1477
1593
  Do NOT mark MET based on "the general intent is satisfied." The AC text is the contract — verify it literally.
1478
1594
 
1479
- ### 3a. AC Status Persistence — REQUIRED
1480
-
1481
- **After evaluating each AC item**, update the status in workflow state using the state CLI:
1482
-
1483
- ```bash
1484
- # Step 1: Initialize AC items for the issue (run once, before updating statuses)
1485
- npx tsx scripts/state/update.ts init-ac <issue-number> <ac-count>
1486
-
1487
- # Example: Initialize 4 AC items for issue #250
1488
- npx tsx scripts/state/update.ts init-ac 250 4
1489
- ```
1490
-
1491
- ```bash
1492
- # Step 2: Update each AC item's status
1493
- npx tsx scripts/state/update.ts ac <issue-number> <ac-id> <status> "<notes>"
1494
-
1495
- # Examples:
1496
- npx tsx scripts/state/update.ts ac 250 AC-1 met "Verified: tests pass and feature works"
1497
- npx tsx scripts/state/update.ts ac 250 AC-2 not_met "Missing error handling for edge case"
1498
- npx tsx scripts/state/update.ts ac 250 AC-3 blocked "Waiting on upstream dependency"
1499
- ```
1500
-
1501
- **Status mapping:**
1502
- - `MET` → `met`
1503
- - `PARTIALLY_MET` → `not_met` (with notes explaining what's missing)
1504
- - `NOT_MET` → `not_met`
1505
- - `BLOCKED` → `blocked` (external dependency issue)
1506
-
1507
- **Why this matters:** Updating AC status in state enables:
1508
- - Dashboard shows real-time AC progress per issue
1509
- - Cross-skill tracking of which AC items need work
1510
- - Summary badges show "X/Y met" status
1511
-
1512
- **If issue has no stored AC:**
1513
- - Run `init-ac` first to create the AC items
1514
- - Then update each AC status individually
1515
-
1516
1595
  ### 4. Failure Path & Edge Case Testing (REQUIRED)
1517
1596
 
1518
1597
  Before any READY_FOR_MERGE verdict, complete the adversarial thinking checklist:
@@ -1521,42 +1600,36 @@ Before any READY_FOR_MERGE verdict, complete the adversarial thinking checklist:
1521
1600
  2. **"What assumptions am I making?"** - List and validate key assumptions
1522
1601
  3. **"What's the unhappy path?"** - Test invalid inputs, failed dependencies
1523
1602
  4. **"Did I test the feature's PRIMARY PURPOSE?"** - If it handles errors, trigger an error
1603
+ 5. **"Does the same root-cause pattern exist at sibling sites in this file?"** - The literal repro from the issue body is necessary but not sufficient. After the cited bug is fixed, audit other call sites in the same file (and same function/loop) that share the root-cause pattern. Example: if a destructive operation invalidates a resource that subsequent code depends on, scan for other destructive operations on that resource type in the same function/loop; if a wrong null-check is the bug, scan for the same access pattern elsewhere. **Complementary to Section 5's cross-file sibling-site scan: §4's question is intra-file (other lines/functions in the same file with the same root cause); §5 is cross-file (other files in the codebase with the same vulnerability).**
1524
1604
 
1525
1605
  See [testing-requirements.md](references/testing-requirements.md) for edge case checklists.
1526
1606
 
1527
- ### 5. Adversarial Self-Evaluation (REQUIRED)
1607
+ ### 5. Risk Assessment (REQUIRED)
1528
1608
 
1529
- **Before issuing your verdict**, you MUST complete this adversarial self-evaluation to catch issues that automated quality checks miss.
1530
-
1531
- **Why this matters:** QA automation catches type issues, deleted tests, and scope creep - but misses:
1532
- - Features that don't actually work as expected
1533
- - Tests that pass but don't test the right things
1534
- - Edge cases only apparent when actually using the feature
1535
-
1536
- **Answer these questions honestly:**
1537
- 1. "Did the implementation actually work when I reviewed it, or am I assuming it works?"
1538
- 2. "Do the tests actually test the feature's primary purpose, or just pass?"
1539
- 3. "What's the most likely way this feature could break in production?"
1540
- 4. "Am I giving a positive verdict because the code looks clean, or because I verified it works?"
1541
- 5. "Are there 'design choices' I'm excusing that are actually bad practices?" (e.g., no version pinning, leaking secrets to unnecessary env vars, non-portable shell in example code, no input validation). Would I accept this in a code review from a junior developer?
1609
+ **Before issuing your verdict**, state the implementation risks in 2-3 sentences.
1542
1610
 
1543
1611
  **Include this section in your output:**
1544
1612
 
1545
1613
  ```markdown
1546
- ### Self-Evaluation
1614
+ ### Risk Assessment
1547
1615
 
1548
- - **Verified working:** [Yes/No - did you actually verify the feature works, or assume it does?]
1549
- - **Test efficacy:** [High/Medium/Low - do tests catch the feature breaking?]
1550
- - **Likely failure mode:** [What would most likely break this in production?]
1551
- - **Verdict confidence:** [High/Medium/Low - explain any uncertainty]
1616
+ - **Likely failure mode:** [How would this break in production? Be specific.]
1617
+ - **Not tested:** [What gaps exist in test coverage for these changes?]
1618
+ - **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
1619
+ - **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
1552
1620
  ```
1553
1621
 
1554
- **If any answer reveals concerns:**
1555
- - Factor the concerns into your verdict
1556
- - If significant, change verdict to `AC_NOT_MET` or `AC_MET_BUT_NOT_A_PLUS`
1557
- - Document the concerns in the QA comment
1622
+ **If either field reveals significant concerns**, factor them into your verdict. A serious failure mode with no test coverage should downgrade to `AC_MET_BUT_NOT_A_PLUS` or `AC_NOT_MET`.
1623
+
1624
+ #### Sibling-site Scan (Conditional)
1625
+
1626
+ **When to apply:** Focused AC + a localized fix where the same root-cause pattern likely exists in other files in the codebase (≥3 occurrences of the affected pattern across files — e.g. regex blocks repeated across multiple hook scripts). Intra-file sibling sites are covered by §4 Q5; this scan is the cross-file complement.
1627
+
1628
+ **Before declaring AC met**, scan other files in the codebase for sibling code with the same pattern as the bug being fixed. If sibling sites would exhibit the same root cause but weren't part of the literal AC, surface them in the verdict's `Sibling sites considered:` slot — as expanded scope (only when trivial) or follow-up issue suggestion. **Don't widen scope mid-PR; file a follow-up issue instead.** Sibling sites alone do not produce `NEEDS_VERIFICATION`; that verdict is reserved for external/temporal gates (CI pending, manual-test ACs unexecuted).
1558
1629
 
1559
- **Do NOT skip this self-evaluation.** Honest reflection catches issues that code review cannot.
1630
+ **Scope:** orchestrator/inline-review only `sequant-qa-checker` sub-agents are not asked to do this scan; the orchestrator owns it during verdict synthesis.
1631
+
1632
+ This operationalizes the principle in `feedback_qa_second_look.md` (structured QA biases positive on clean code; an adversarial re-read of core logic surfaces real gaps). Don't automate via grep — false-positive risk; this is a "look at adjacent files" prompt.
1560
1633
 
1561
1634
  #### Skill Change Review (Conditional)
1562
1635
 
@@ -1567,7 +1640,7 @@ See [testing-requirements.md](references/testing-requirements.md) for edge case
1567
1640
  skills_changed=$(git diff main...HEAD --name-only | grep -E "^\.claude/skills/.*\.md$" | wc -l | xargs || true)
1568
1641
  ```
1569
1642
 
1570
- **If skills_changed > 0, add these adversarial prompts:**
1643
+ **If skills_changed > 0, add these verification prompts:**
1571
1644
 
1572
1645
  | Prompt | Why It Matters |
1573
1646
  |--------|----------------|
@@ -1807,6 +1880,293 @@ fi
1807
1880
 
1808
1881
  ---
1809
1882
 
1883
+ ### 6c. Detection Pattern Verification (REQUIRED for skill regex/grep/awk/jq/sed changes)
1884
+
1885
+ **HARD PRECONDITION (REQUIRED — emit nothing when false):**
1886
+
1887
+ ```bash
1888
+ # Skill markdown files whose DIFF HUNKS (added lines) contain
1889
+ # regex/grep/awk/jq/sed literals. Grepping diff hunks rather than current
1890
+ # file content is load-bearing: all 19 sequant SKILL.md files mention these
1891
+ # tokens in unrelated example code, so a content-grep gate fires for ~100%
1892
+ # of skill-md PRs and the cost-saving intent of #608 / #609 evaporates.
1893
+ pattern_files=$(git diff origin/main...HEAD --name-only | \
1894
+ grep -E '^(\.claude/skills|templates/skills|skills)/.*\.md$' | \
1895
+ while read -r f; do
1896
+ if git diff origin/main...HEAD -- "$f" | grep -E '^\+[^+]' | \
1897
+ grep -qE '\b(grep|awk|jq|sed)\b|/[^/]+/[gim]?'; then
1898
+ echo "$f"
1899
+ fi
1900
+ done || true)
1901
+ ```
1902
+
1903
+ If `pattern_files` is empty, **omit the entire §6c block — including its output template row — from the QA comment.** Do NOT emit "Not Required." Per the #608 signal-to-noise study, every one of §6c's 11 prior emissions said exactly "N/A — no skill regex/grep/awk changes" and produced zero substantive findings. The header recitation itself is the cost (~1,800 tokens / invoke). When the precondition is false, treat §6c as not loaded for this run.
1904
+
1905
+ **When precondition is TRUE:** continue with Steps 1–5 below.
1906
+
1907
+ **When to apply:** Diff modifies skill markdown files (`.claude/skills/**/*.md`, `templates/skills/**/*.md`, `skills/**/*.md`) AND adds or modifies regex literals or `grep`/`awk`/`jq`/`sed` commands inside those files.
1908
+
1909
+ **Purpose:** Prompt-only skill changes (regex/grep/awk/jq inside `SKILL.md`) have **no automated test coverage**. Section 6a (Skill Command Verification) checks command syntax — whether `gh pr checks --json conclusion` is a valid field — but does NOT check whether `awk '/^### AC-[0-9]+/'` actually matches real spec headers. A pattern that is syntactically valid but matches the wrong corpus produces a **silent detection failure** — the worst kind of bug, because the pipeline reports success.
1910
+
1911
+ **Origin (PR #547 / issue #529):** Three such bugs shipped in a single PR before adversarial review surfaced them:
1912
+
1913
+ 1. `jq 'select(contains("SEQUANT_PHASE") and contains("spec"))'` matched 5 unrelated comments and returned a QA comment as "the spec plan"
1914
+ 2. `awk '/^### AC-[0-9]+/'` only matched 3-hash headers, missing `#### AC-N` and `**AC-N:**` (~45% of sampled past specs)
1915
+ 3. The grep regex omitted `**Verify:**` as a prefix — even though the issue body's verbatim motivating example used that exact prefix
1916
+
1917
+ Each was a 30-second diagnostic once piped through real corpus. None showed up in static review of the diff against AC text because every pattern was syntactically valid and matched the AC description in the abstract.
1918
+
1919
+ #### Step 1: Detect Pattern Changes
1920
+
1921
+ ```bash
1922
+ # Skill markdown files modified in this PR
1923
+ skill_md_changed=$(git diff origin/main...HEAD --name-only | \
1924
+ grep -E '^(\.claude/skills|templates/skills|skills)/.*\.md$' || true)
1925
+
1926
+ # Among those, ADDED lines that introduce a grep/awk/jq/sed command or a regex literal
1927
+ pattern_changes=""
1928
+ for f in $skill_md_changed; do
1929
+ added=$(git diff origin/main...HEAD -- "$f" | \
1930
+ awk '/^\+[^+]/ { print substr($0, 2) }' | \
1931
+ grep -E '(\b(grep|awk|jq|sed) [-\x27"]|/\^[^/]+/|\(\?[:=!]|\\b[A-Za-z]+\\b)' || true)
1932
+ if [[ -n "$added" ]]; then
1933
+ pattern_changes="${pattern_changes}\n=== $f ===\n${added}"
1934
+ fi
1935
+ done
1936
+
1937
+ if [[ -z "$pattern_changes" ]]; then
1938
+ echo "No pattern changes detected — verification not required"
1939
+ # Set detection_pattern_status = "Not Required"
1940
+ fi
1941
+ ```
1942
+
1943
+ **Manual review fallback:** Heuristic regex misses some pattern shapes — bare `grep "pattern"` with no flag, multi-line `awk` blocks, complex `sed` programs, regex literals embedded in JSON examples, non-anchored character classes (`[A-Z]+`, `(\d+)`). Even when the script reports zero matches, scan the diff for pattern-shaped additions if any of `grep`/`awk`/`jq`/`sed`/`regex`/`pattern` appear in the diff text.
1944
+
1945
+ #### Step 2: Identify Intended Corpus per Pattern
1946
+
1947
+ For each modified pattern, determine WHERE it is supposed to match. Use the surrounding skill prose (section title, preceding paragraph, the bash variable name) to infer the corpus:
1948
+
1949
+ | Pattern Context | Corpus Source | How to Sample |
1950
+ |-----------------|---------------|---------------|
1951
+ | Spec plan parsing | Past `/spec` comments on GitHub issues | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| contains("SEQUANT_PHASE: spec")) \| .body'` |
1952
+ | Assess action parsing | Past `/assess` comments | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| contains("assess:action=")) \| .body'` |
1953
+ | QA verdict parsing | Past `/qa` review comments | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| startswith("## /qa Review")) \| .body'` |
1954
+ | Issue body extraction | Real issue bodies | `gh issue view <N> --json body` for ≥5 representative issues |
1955
+ | AC checkbox detection | Issue/PR bodies with `- [ ] AC-N` | Sample ≥5 issues from current milestone |
1956
+ | Skill markdown | This repo's own `.claude/skills/**/*.md` | Local files (no API call needed) |
1957
+ | Generic markdown | Repo `*.md` fixtures, `docs/`, examples | Local files |
1958
+
1959
+ **Why `gh api` over `gh search issues`:** `gh search` relies on GitHub's full-text search index, which does not reliably cover HTML-comment markers (`<!-- SEQUANT_PHASE: spec -->`) buried in comment bodies. Query strings containing `:` (e.g. `assess:action`) are also parsed as search qualifiers and return empty. `gh api` returns raw JSON that local `jq` filters can match deterministically against the actual marker text.
1960
+
1961
+ If a pattern's corpus cannot be identified from surrounding prose, that itself is a finding — the skill author needs to document where the pattern is meant to match before it ships.
1962
+
1963
+ #### Step 3: Execute Pattern Against ≥5 Real Samples
1964
+
1965
+ For each detected pattern, sample at least 5 real instances from the identified corpus and run the pattern. Record actual matches vs. AC-claimed expected matches.
1966
+
1967
+ ```bash
1968
+ # Example: verifying a new awk header regex against past /spec comments.
1969
+ # Fetch comment bodies via gh api (full-text search of HTML markers is unreliable).
1970
+ spec_bodies=$(gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate \
1971
+ -q '.[] | select(.body | contains("SEQUANT_PHASE: spec")) | .body' || true)
1972
+ sample_count=0
1973
+ while IFS= read -r body; do
1974
+ [[ -z "$body" ]] && continue
1975
+ matches=$(echo "$body" | awk '/^### AC-[0-9]+/' | wc -l | xargs)
1976
+ expected_at_least=1 # AC says pattern should match ≥1 AC header per spec
1977
+ status="Passed"
1978
+ [[ "$matches" -lt "$expected_at_least" ]] && status="Failed"
1979
+ sample_count=$((sample_count + 1))
1980
+ echo "sample $sample_count: matched=$matches expected≥$expected_at_least → $status"
1981
+ done <<< "$spec_bodies"
1982
+ if [[ "$sample_count" -lt 5 ]]; then
1983
+ echo "⚠ Only $sample_count samples available — set detection_pattern_status='Insufficient Samples'"
1984
+ fi
1985
+ ```
1986
+
1987
+ **Output table (REQUIRED):**
1988
+
1989
+ | Pattern | Corpus | Samples | Expected | Actual | Status |
1990
+ |---------|--------|---------|----------|--------|--------|
1991
+ | `awk '/^### AC-[0-9]+/'` | past `/spec` comments | 5 | ≥1 match per | 3/5 had 0 matches | ❌ Failed |
1992
+ | `jq 'select(contains("SEQUANT_PHASE"))'` | issue comments | 5 | exactly 1 spec comment | returned 5 | ❌ Failed |
1993
+
1994
+ #### Step 4: Motivating-Example Fixture Verification (REQUIRED)
1995
+
1996
+ Snippets quoted in the **issue body** as motivating examples or AC verification targets are **mandatory test fixtures**. The new pattern MUST produce the AC-claimed result on each. This is belt-and-suspenders to Step 3 — Step 3 catches general corpus drift; Step 4 catches the specific case the AC was written to address.
1997
+
1998
+ **What counts as a motivating-example fixture:**
1999
+
2000
+ - Blockquoted text (lines starting with `>`)
2001
+ - Fenced code blocks (` ``` `) under non-Setup/non-Install headings
2002
+ - Lines prefixed with `**Verify:**`, `**Verbatim:**`, `**Example:**`, `**AC verification:**`
2003
+ - Verbatim spec excerpts referenced in the AC text (e.g. "the issue body's verbatim motivating example used that exact prefix")
2004
+
2005
+ **What is excluded:**
2006
+
2007
+ - Code blocks under headings named `## Setup`, `## Install`, `## Prerequisites`, `## How to install` (these are environment commands, not fixtures)
2008
+ - Generic shell session transcripts unrelated to the pattern under test
2009
+
2010
+ **Extraction:**
2011
+
2012
+ Prefer the Phase 0c precheck output when available — fixture extraction is
2013
+ deterministic and #609 moved it out of the QA prompt:
2014
+
2015
+ ```bash
2016
+ precheck=".sequant/gap-precheck.json"
2017
+ if [[ -f "$precheck" ]] && [[ "$(jq -r .schemaVersion "$precheck" 2>/dev/null)" == "1" ]]; then
2018
+ jq -r '.checks.fixtures.fixtures[] | "[\(.kind)\(if .label then ":" + .label else "" end) line \(.line)] \(.content)"' "$precheck"
2019
+ else
2020
+ # Inline fallback (pre-#609 behavior)
2021
+ issue_body=$(gh issue view <issue-number> --json body -q '.body')
2022
+ echo "$issue_body" | grep -E '^>' || true
2023
+ echo "$issue_body" | awk '
2024
+ /^## (Setup|Install|Prerequisites|How to install)/ { skip=1; next }
2025
+ /^## / && skip { skip=0 }
2026
+ /^```/ { in_block=!in_block; next }
2027
+ in_block && !skip { print }
2028
+ '
2029
+ echo "$issue_body" | grep -E '\*\*(Verify|Verbatim|Example|AC verification|Repro):\*\*' || true
2030
+ fi
2031
+ ```
2032
+
2033
+ **Run the new pattern against each extracted fixture.** For every fixture the AC says should match, the pattern MUST produce a match. **A 0-match result on a verbatim AC fixture is automatically `Failed`.**
2034
+
2035
+ #### Step 5: Detection Pattern Verification Status
2036
+
2037
+ | Status | Meaning |
2038
+ |--------|---------|
2039
+ | **Passed** | All patterns matched their corpora at the AC-claimed rate (≥5 samples each); all motivating-example fixtures matched |
2040
+ | **Failed** | At least one pattern produced 0 matches against input the AC says should match (corpus or fixture) |
2041
+ | **Insufficient Samples** | Corpus exists but fewer than 5 representative instances were available; reviewer must record the actual count |
2042
+ | **Skipped** | Patterns identified but corpus fully unavailable (e.g. `gh` offline / unauth / rate-limited); document the specific reason |
2043
+ | **Not Required** | No pattern changes in skill markdown |
2044
+
2045
+ **CORPUS-UNAVAILABLE RULE:** If `gh` is unauthenticated, offline, or rate-limited, mark `Skipped` with the specific reason. Do **NOT** silently mark `Passed`.
2046
+
2047
+ **SPARSE-CORPUS RULE:** If corpus is reachable but fewer than 5 samples are found (and all matched), mark `Insufficient Samples` with the actual count. Do **NOT** silently mark `Passed` — AC-2 requires ≥5 samples.
2048
+
2049
+ #### Verdict Gating (STRICTER than Section 6a)
2050
+
2051
+ | Verification Status | Maximum Verdict |
2052
+ |---------------------|-----------------|
2053
+ | Passed | READY_FOR_MERGE |
2054
+ | Insufficient Samples | AC_MET_BUT_NOT_A_PLUS (sparse corpus — record actual count) |
2055
+ | Skipped | AC_MET_BUT_NOT_A_PLUS (note unverified corpus) |
2056
+ | **Failed** | **`AC_NOT_MET`** (silent detection failures are worse than wrong CLI flags — block merge) |
2057
+ | Not Required | READY_FOR_MERGE |
2058
+
2059
+ **Rationale for stricter gate vs Section 6a:** Section 6a catches CLI commands that error out at runtime (loud failures). Detection patterns silently match the wrong thing (quiet failures the pipeline reports as success). The 3 bugs in PR #547 all passed Section 6a-style review and only surfaced when patterns were piped through real input.
2060
+
2061
+ **Output Format:**
2062
+
2063
+ ```markdown
2064
+ ### Detection Pattern Verification
2065
+
2066
+ **Skill markdown files with pattern changes:** N
2067
+
2068
+ | Pattern | Corpus | Samples | Expected | Actual | Status |
2069
+ |---------|--------|---------|----------|--------|--------|
2070
+ | `awk '/^### AC-[0-9]+/'` | past `/spec` comments | 5 | ≥1 per | 3/5 = 0 | ❌ Failed |
2071
+ | `grep -E '\*\*Verify:\*\*'` | issue #551 body | 1 (verbatim fixture) | matched | matched | ✅ Passed |
2072
+
2073
+ **Motivating-example fixtures from issue body:** M (all run against the new pattern)
2074
+
2075
+ **Verification Status:** Failed (1 pattern misses ~45% of corpus)
2076
+ ```
2077
+
2078
+ ---
2079
+
2080
+ ### 6d. Adversarial Re-Read (REQUIRED for Standard QA before READY_FOR_MERGE)
2081
+
2082
+ **Purpose:** Catch what the structured pipeline doesn't gate on. Operationalizes `feedback_qa_second_look.md` — structured QA biases positive on clean code; an adversarial re-read of core logic surfaces real gaps.
2083
+
2084
+ **When to apply:** Required for non-Simple-Fix verdicts before issuing `READY_FOR_MERGE`. Omitted entirely for Simple Fix mode (`SMALL_DIFF=true`).
2085
+
2086
+ **How to perform:** Before declaring READY_FOR_MERGE, walk through the diff once more adversarially and surface anything the structured pipeline didn't gate on. In particular: (1) run the implementation against every verbatim motivating-example fixture from the issue body — Phase 0c precheck surfaces these in `.checks.fixtures.fixtures`; if precheck unavailable, extract inline per `feedback_motivating_example_regression.md`; (2) flag any "evidence" claim that is actually a pre-fix bug repro rather than a post-fix validation; (3) inspect process state the pipeline normalizes away (uncommitted work, divergent branches, stashed changes, orchestrator state); (4) cite sibling sites explicitly — §5 (cross-file) and §4 Q5 (intra-file); do not hand-wave with "N/A"; (5) surface any Non-Goals from the issue body that have silently expanded into scope. A bare "No gaps" without specific reasoning fails output verification — name what you scanned, ran, or traced.
2087
+
2088
+ **Status outcomes:** **Clean** = walked the 5 checks above, surfaced no gaps. **Gaps Found** = surfaced gaps that map to recommendations or follow-up issues but no missing AC fixture. **Severe Gap** = surfaced (a) a verbatim motivating-example fixture not run, OR (b) an evidence claim that's actually a bug repro not a validation, OR (c) an AC marked MET on code review alone without the runtime / corpus check the AC's text required.
2089
+
2090
+ **Verdict gating:**
2091
+
2092
+ | Status | Maximum Verdict |
2093
+ |--------|-----------------|
2094
+ | Clean | READY_FOR_MERGE |
2095
+ | Gaps Found | AC_MET_BUT_NOT_A_PLUS |
2096
+ | Severe Gap | AC_NOT_MET |
2097
+
2098
+ **Output Format:**
2099
+
2100
+ ```markdown
2101
+ ### Adversarial Re-Read
2102
+
2103
+ **Findings:** [Concrete enumeration of gaps surfaced, OR "No gaps found because: <specific reason citing what was scanned/run/traced — fixtures consulted, evidence claims audited, process state inspected, sibling sites cited, Non-Goals checked>"]
2104
+
2105
+ **Status:** Clean / Gaps Found / Severe Gap
2106
+ ```
2107
+
2108
+ **Origin:** This section was promoted to a structured 5-sub-prompt table in #582. #608's signal-to-noise study found 9/14 emits surfaced findings but 0 were actioned — the structure produced visibility without action. #609 trims back to a single-paragraph prompt while preserving the safety net and verdict gating.
2109
+
2110
+ ### 6e. Behavior-Rule Survival Check (REQUIRED for behavior-rule ACs)
2111
+
2112
+ **Purpose:** When an AC asserts a behavior rule (e.g. "default becomes X", "always include Y", "never skip Z"), verify the OLD-rule's implementation has been removed from every touchpoint inside the diff blast radius. Catches the #533-class miss where the SKILL.md was updated but the runtime CLI's short-circuit (`BUG_LABELS`/`DOCS_LABELS`) survived. See [behavior-rule-detection.md](../_shared/references/behavior-rule-detection.md).
2113
+
2114
+ **When to apply:** Run on every AC for which the behavior-rule heuristic triggers (>= 2 distinct keywords from `default | always | never | rule | behavior | skip` OR explicit pattern like `always X unless Y`). Skip entirely when no AC triggers across the issue (cheap short-circuit per the reference doc's Performance budget).
2115
+
2116
+ **How to perform:**
2117
+
2118
+ ```bash
2119
+ # Per-AC survival check. Run once per behavior-rule AC.
2120
+ QA_AC_ID="AC-1" \
2121
+ QA_AC_TEXT="<verbatim AC description>" \
2122
+ QA_DIFF_PATHS="$(git diff main...HEAD --name-only | tr '\n' '|')" \
2123
+ npx tsx -e '
2124
+ (async () => {
2125
+ const m = await import("./src/lib/heuristics/behavior-rule-detector.ts");
2126
+ const ac = {
2127
+ id: process.env.QA_AC_ID,
2128
+ description: process.env.QA_AC_TEXT,
2129
+ verificationMethod: "manual",
2130
+ status: "pending",
2131
+ };
2132
+ const detection = m.detectBehaviorRule(ac);
2133
+ if (!detection.triggered) { console.log(JSON.stringify({ triggered: false })); return; }
2134
+ const diffPaths = (process.env.QA_DIFF_PATHS || "").split("|").filter(Boolean);
2135
+ const survivors = m.findSurvivingInverseSymbols(ac, process.cwd(), diffPaths);
2136
+ console.log(JSON.stringify({ triggered: true, survivors }));
2137
+ })();
2138
+ '
2139
+ ```
2140
+
2141
+ **Status outcomes:**
2142
+
2143
+ | Status | Criteria |
2144
+ |--------|----------|
2145
+ | **Clean** | Detector triggered, `survivors` is empty across diff blast radius |
2146
+ | **Survivors Found** | One or more inverse symbols / inverse-keyword lines survive in the diff blast radius |
2147
+ | **N/A** | No AC triggers the behavior-rule detector (skip section entirely) |
2148
+
2149
+ **AC marking and verdict gating:**
2150
+
2151
+ - Survival inside the diff blast radius -> the corresponding AC is marked `NOT_MET` with the `path:line` list in the AC explanation (per AC-2 of #552).
2152
+ - Survivors -> verdict floors at `AC_NOT_MET` via the §7 algorithm's `behavior_rule_survival_status` gate.
2153
+
2154
+ **Output Format:**
2155
+
2156
+ ```markdown
2157
+ ### Behavior-Rule Survival Check
2158
+
2159
+ | AC | Triggered? | Survivors | Status |
2160
+ |----|-----------|-----------|--------|
2161
+ | AC-N | Yes | path/to/file.ts:LINE — `<snippet>` | Survivors Found |
2162
+ | AC-M | No | — | N/A |
2163
+
2164
+ **Status:** Clean / Survivors Found / N/A
2165
+ ```
2166
+
2167
+ ---
2168
+
2169
+
1810
2170
  ### 7. A+ Status Verdict
1811
2171
 
1812
2172
  Provide an overall verdict:
@@ -1833,6 +2193,11 @@ Provide an overall verdict:
1833
2193
  - execution_evidence = status from Section 6 (Complete/Incomplete/Waived/Not Required)
1834
2194
  - quality_plan_status = status from Phase 0b (Complete/Partial/Not Addressed/N/A)
1835
2195
  - smoke_test_status = status from Section 6b (Complete/Partial/Not Required)
2196
+ - detection_pattern_status = status from Section 6c (Passed/Failed/Insufficient Samples/Skipped/Not Required)
2197
+ - adversarial_reread_status = status from Section 6d (Clean/Gaps Found/Severe Gap) — REQUIRED for Standard QA, omitted for Simple Fix
2198
+ - behavior_rule_survival_status = status from Section 6e (Clean/Survivors Found/N/A) — REQUIRED when any AC triggers the behavior-rule heuristic, omitted otherwise
2199
+ - changelog_required = true IFF Section 10a's `CHANGELOG.md` exists AND Section 10a's `user_facing` count is >0 (single source of truth — see §10a for the conventional-commit detection regex, which accepts unscoped, scoped, and breaking variants of `feat`/`fix`/`perf`/`refactor`/`docs`); false otherwise
2200
+ - changelog_missing = true IFF `changelog_required` AND Section 10a's `[Unreleased]` entry check finds no entry for the issue/PR; false otherwise
1836
2201
 
1837
2202
  3. Browser testing enforcement check:
1838
2203
  - Check if any .tsx files were changed: git diff main...HEAD --name-only | grep '\.tsx$' || true
@@ -1841,13 +2206,30 @@ Provide an overall verdict:
1841
2206
  - IF .tsx files changed AND /test did NOT run AND no 'no-browser-test' label:
1842
2207
  → Set browser_test_missing = true
1843
2208
 
2209
+ 3a. Manual test AC enforcement check:
2210
+ - Scan spec plan comment for ACs with **Verification:** Manual Test (or freeform: try X confirm Y, verify by, test that)
2211
+ - For each detected manual-test AC:
2212
+ - IF runtime test was executed → AC status from test result (MET/NOT_MET)
2213
+ - IF approved override documented → AC status = MET
2214
+ - ELSE → AC status = PENDING (this increments pending_count)
2215
+ - NOTE: No new verdict branch needed — PENDING manual-test ACs flow through
2216
+ the existing pending_count > 0 → NEEDS_VERIFICATION path in step 4
2217
+
1844
2218
  4. Determine verdict (in order):
1845
2219
  - IF not_met_count > 0 OR partial_count > 0:
1846
2220
  → AC_NOT_MET (block merge)
2221
+ - ELSE IF detection_pattern_status == "Failed":
2222
+ → AC_NOT_MET (silent detection failures - block merge; STRICTER than skill_verification because pattern bugs report success but match the wrong corpus)
2223
+ - ELSE IF behavior_rule_survival_status == "Survivors Found":
2224
+ → AC_NOT_MET (OLD-rule symbol survived inside the diff blast radius — see #533 motivating miss in Section 6e and references/behavior-rule-detection.md)
2225
+ - ELSE IF adversarial_reread_status == "Severe Gap":
2226
+ → AC_NOT_MET (verbatim motivating-example fixture not run / evidence claim is bug reproduction not validation / AC marked MET without runtime or corpus check the AC text required)
1847
2227
  - ELSE IF skill_verification == "Failed":
1848
2228
  → AC_MET_BUT_NOT_A_PLUS (skill commands have issues - cannot be READY_FOR_MERGE)
1849
2229
  - ELSE IF execution_evidence == "Incomplete":
1850
2230
  → AC_MET_BUT_NOT_A_PLUS (scripts not verified - cannot be READY_FOR_MERGE)
2231
+ - ELSE IF changelog_required AND changelog_missing:
2232
+ → AC_MET_BUT_NOT_A_PLUS (CHANGELOG entry required for user-facing changes - see Section 10a for remediation)
1851
2233
  - ELSE IF quality_plan_status == "Not Addressed" AND quality_plan_exists:
1852
2234
  → AC_MET_BUT_NOT_A_PLUS (quality dimensions not addressed - flag for review)
1853
2235
  - ELSE IF browser_test_missing (from step 3):
@@ -1860,6 +2242,12 @@ Provide an overall verdict:
1860
2242
  → AC_MET_BUT_NOT_A_PLUS (some quality dimensions incomplete - can merge with notes)
1861
2243
  - ELSE IF smoke_test_status == "Partial":
1862
2244
  → AC_MET_BUT_NOT_A_PLUS (smoke tests incomplete - document gaps before merge)
2245
+ - ELSE IF detection_pattern_status == "Insufficient Samples":
2246
+ → AC_MET_BUT_NOT_A_PLUS (sparse corpus - record actual sample count)
2247
+ - ELSE IF detection_pattern_status == "Skipped":
2248
+ → AC_MET_BUT_NOT_A_PLUS (corpus unavailable for pattern verification - document reason)
2249
+ - ELSE IF adversarial_reread_status == "Gaps Found":
2250
+ → AC_MET_BUT_NOT_A_PLUS (adversarial re-read surfaced non-blocking gaps; address as follow-up or improvement suggestions)
1863
2251
  - ELSE IF improvement_suggestions.length > 0:
1864
2252
  → AC_MET_BUT_NOT_A_PLUS (can merge with notes)
1865
2253
  - ELSE:
@@ -1895,10 +2283,111 @@ fi
1895
2283
  | `.tsx` changed + no `/test` + no opt-out | Force `AC_MET_BUT_NOT_A_PLUS` |
1896
2284
  | No `.tsx` changed | Normal verdict |
1897
2285
 
2286
+ **Manual Test AC Enforcement:**
2287
+
2288
+ Before finalizing the verdict, check if any ACs require manual (runtime) verification that was specified in the `/spec` plan:
2289
+
2290
+ ```bash
2291
+ # 1. Extract spec plan comment from issue
2292
+ spec_comment=$(gh issue view <issue-number> --json comments --jq \
2293
+ '[.comments[].body | select(contains("\"phase\":\"spec\""))] | last' || true)
2294
+
2295
+ # 2. Detect ACs with manual-test verification methods
2296
+ # Matches: "**Verification:** Manual Test", "**Verify:** ...", "try X, confirm Y", "verify by", "test that"
2297
+ manual_test_acs=$(echo "$spec_comment" | \
2298
+ grep -iE '(\*\*Verification:\*\*\s*Manual Test|\*\*Verify:\*\*\s*|try .*, confirm|verify by|test that|verify:?\s*manual)' || true)
2299
+
2300
+ # 3. Extract AC IDs associated with manual-test lines
2301
+ # Scan backwards from each match to find the nearest ### AC-N header
2302
+ manual_ac_ids=$(echo "$spec_comment" | \
2303
+ awk 'BEGIN{IGNORECASE=1} /^(#+ AC-[0-9]+|\*\*AC-[0-9]+)/{ac=$0} /Manual Test|\*\*Verify:\*\*|try .*, confirm|verify by|test that/{print ac}' | \
2304
+ grep -oE 'AC-[0-9]+' | sort -u || true)
2305
+ ```
2306
+
2307
+ **If manual-test ACs are detected**, include this section in QA output:
2308
+
2309
+ ```markdown
2310
+ ### Manual Test ACs Detected
2311
+
2312
+ | AC | Verification Method | Runtime Test Status |
2313
+ |----|--------------------|--------------------|
2314
+ | AC-N | Manual Test | ✅ Executed / ⚠️ PENDING / 🔄 Overridden |
2315
+ ```
2316
+
2317
+ **Enforcement Rules:**
2318
+
2319
+ For each detected manual-test AC, QA must do ONE of:
2320
+
2321
+ 1. **Execute the test** using available tools (chrome-devtools MCP, dev server, CLI invocation) and record pass/fail evidence → mark AC `MET` or `NOT_MET` based on result
2322
+ 2. **Mark AC `PENDING`** with note: `⚠️ Manual verification required — runtime test not executed` → flows through `pending_count > 0 → NEEDS_VERIFICATION` verdict path
2323
+ 3. **Override** with approved justification (see Manual Test Override below) → mark AC `MET`
2324
+
2325
+ **Key Rule:** A manual-test AC CANNOT be marked `MET` from static code review alone. QA must either execute the runtime test, provide an approved override, or mark `PENDING`.
2326
+
2327
+ | Scenario | AC Status | Verdict Impact |
2328
+ |----------|-----------|----------------|
2329
+ | Runtime test executed and passed | `MET` | Normal verdict |
2330
+ | Runtime test executed and failed | `NOT_MET` | → `AC_NOT_MET` |
2331
+ | Runtime test not executed, no override | `PENDING` | → `NEEDS_VERIFICATION` |
2332
+ | Override with approved justification | `MET` | Normal verdict |
2333
+ | Override with unapproved justification | `PENDING` | → `NEEDS_VERIFICATION` |
2334
+
2335
+ ### Manual Test Override
2336
+
2337
+ In some cases, runtime verification can be safely skipped for manual-test ACs when the verification target has no runtime surface or is covered by equivalent automated tests. **Overrides require explicit justification and risk assessment.**
2338
+
2339
+ **Override Format (REQUIRED when skipping manual-test execution):**
2340
+
2341
+ ```markdown
2342
+ ### Manual Test Override
2343
+
2344
+ **AC:** AC-N
2345
+ **Requirement:** Runtime verification for manual-test AC
2346
+ **Override:** Yes
2347
+ **Justification:** [One of the approved categories below]
2348
+ **Risk Assessment:** [None/Low]
2349
+ ```
2350
+
2351
+ **Approved Override Categories:**
2352
+
2353
+ | Category | Example | Risk |
2354
+ |----------|---------|------|
2355
+ | No runtime surface | Pure type definitions, config schema validation | None |
2356
+ | Equivalent unit test coverage | Automated test covers the exact same code path the manual test would exercise | Low |
2357
+ | Tested in sibling issue | Cross-reference to another issue where the same runtime behavior was verified | Low |
2358
+
2359
+ **NOT Approved for Override (always require runtime test):**
2360
+
2361
+ | Category | Example | Why |
2362
+ |----------|---------|-----|
2363
+ | Logic changes with UI surface | Modified form validation, new user flows | Runtime behavior may diverge from code review expectations |
2364
+ | New user-facing features | Added pages, new interactions | Must verify actual user experience |
2365
+ | Integration points | API calls, database writes, auth flows | Runtime dependencies may behave differently |
2366
+ | Error handling with user feedback | Toast messages, error pages, redirects | Presentation layer needs runtime check |
2367
+
2368
+ **Risk Assessment Definitions:**
2369
+
2370
+ | Level | Meaning | Criteria |
2371
+ |-------|---------|----------|
2372
+ | **None** | Zero runtime impact | Change has no executable runtime surface (types, config) |
2373
+ | **Low** | Negligible runtime impact | Automated tests cover the same path; manual test would be redundant |
2374
+ | **Medium** | Possible runtime impact | **Should NOT be overridden** — run the manual test |
2375
+
2376
+ **Override Decision Flow:**
2377
+
2378
+ 1. Check if change matches an approved category → If no, runtime test is required
2379
+ 2. Assess risk level → If Medium or higher, runtime test is required
2380
+ 3. Document override using the format above in the QA output
2381
+ 4. Include override in the GitHub issue comment for audit trail
2382
+
2383
+ **CRITICAL:** When in doubt, execute the manual test. Overrides are for clear-cut cases only. The motivation for this gate (issue #529) was a real bug that passed QA because `minRows: 1` appeared correct in code review but did not work at runtime.
2384
+
1898
2385
  **CRITICAL:** `PARTIALLY_MET` is NOT sufficient for merge. It MUST be treated as `NOT_MET` for verdict purposes.
1899
2386
 
1900
2387
  **CRITICAL:** If skill command verification = "Failed", verdict CANNOT be `READY_FOR_MERGE`. This prevents shipping skills with broken commands (like issue #178's `conclusion` field).
1901
2388
 
2389
+ **CRITICAL:** If detection pattern verification = "Failed" (Section 6c), verdict MUST be `AC_NOT_MET` — stricter than the skill_verification gate. Silent detection failures (regex/grep/awk/jq matching the wrong corpus while reporting success) are worse than wrong CLI flags because the pipeline reports success.
2390
+
1902
2391
  See [quality-gates.md](references/quality-gates.md) for detailed verdict criteria.
1903
2392
 
1904
2393
  ---
@@ -1963,6 +2452,10 @@ If verdict is `READY_FOR_MERGE` or `AC_MET_BUT_NOT_A_PLUS`:
1963
2452
 
1964
2453
  **Purpose:** Verify user-facing changes have corresponding CHANGELOG entries before `READY_FOR_MERGE`.
1965
2454
 
2455
+ **Wired into §7 verdict algorithm:** This gate is enforced via the `changelog_required AND changelog_missing` branch in §7 — when both conditions are true, the verdict is demoted from `READY_FOR_MERGE` to `AC_MET_BUT_NOT_A_PLUS`. The branch is no-op when `CHANGELOG.md` is absent or no user-facing commit prefix is detected.
2456
+
2457
+ **Caveat — conventional-commit dependency:** Detection requires conventional-commit prefixes — `feat`, `fix`, `perf`, `refactor`, `docs`, with optional scope (`(...)`) and breaking marker (`!`) — in `git log main..HEAD`. Projects whose commits don't follow this pattern silently skip this gate (failsafe-off). Acceptable for sequant's typical user base; document in your project's contributing guide if you rely on this gate.
2458
+
1966
2459
  **Detection:**
1967
2460
 
1968
2461
  ```bash
@@ -1977,7 +2470,7 @@ unreleased_entries=$(sed -n '/^## \[Unreleased\]/,/^## \[/p' CHANGELOG.md | grep
1977
2470
 
1978
2471
  # Determine if change is user-facing (new features, bug fixes, etc.)
1979
2472
  # Look at commit messages or file changes
1980
- user_facing=$(git log main..HEAD --oneline | grep -iE '^[a-f0-9]+ (feat|fix|perf|refactor|docs):' | wc -l | xargs || true)
2473
+ user_facing=$(git log main..HEAD --oneline | grep -iE '^[a-f0-9]+ (feat|fix|perf|refactor|docs)(\([^)]*\))?!?:' | wc -l | xargs || true)
1981
2474
  ```
1982
2475
 
1983
2476
  **Verification Logic:**
@@ -2125,34 +2618,6 @@ In some cases, `/verify` execution can be safely skipped when script changes are
2125
2618
 
2126
2619
  ---
2127
2620
 
2128
- ## State Tracking
2129
-
2130
- **IMPORTANT:** Update workflow state when running standalone (not orchestrated).
2131
-
2132
- ### State Updates (Standalone Only)
2133
-
2134
- When NOT orchestrated (`SEQUANT_ORCHESTRATOR` is not set):
2135
-
2136
- **At skill start:**
2137
- ```bash
2138
- npx tsx scripts/state/update.ts start <issue-number> qa
2139
- ```
2140
-
2141
- **On successful completion (READY_FOR_MERGE or AC_MET_BUT_NOT_A_PLUS):**
2142
- ```bash
2143
- npx tsx scripts/state/update.ts complete <issue-number> qa
2144
- npx tsx scripts/state/update.ts status <issue-number> ready_for_merge
2145
- ```
2146
-
2147
- **On failure (AC_NOT_MET):**
2148
- ```bash
2149
- npx tsx scripts/state/update.ts fail <issue-number> qa "AC not met"
2150
- ```
2151
-
2152
- **Why this matters:** State tracking enables dashboard visibility, resume capability, and workflow orchestration. Skills update state when standalone; orchestrators handle state when running workflows.
2153
-
2154
- ---
2155
-
2156
2621
  ## Output Verification
2157
2622
 
2158
2623
  **Before responding, verify your output includes ALL of these:**
@@ -2162,14 +2627,15 @@ npx tsx scripts/state/update.ts fail <issue-number> qa "AC not met"
2162
2627
  When the size gate determined `SMALL_DIFF=true`, use the **simplified output template**. The following sections are **omitted** (not marked N/A — completely absent):
2163
2628
 
2164
2629
  - Quality Plan Verification
2165
- - Incremental QA Summary
2166
2630
  - Call-Site Review
2167
2631
  - Product Review
2168
2632
  - Smoke Test
2169
2633
  - CLI Registration Verification
2170
2634
  - Skill Command Verification
2635
+ - Detection Pattern Verification
2171
2636
  - Script Verification Override
2172
2637
  - Skill Change Review
2638
+ - Adversarial Re-Read
2173
2639
 
2174
2640
  **Required sections for simple fix mode:**
2175
2641
 
@@ -2179,14 +2645,15 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
2179
2645
  - [ ] **Code Review Findings** - Strengths, issues, suggestions
2180
2646
  - [ ] **Test Coverage Analysis** - Changed files with/without tests, critical paths flagged
2181
2647
  - [ ] **Anti-Pattern Detection** - Code patterns check (lightweight)
2182
- - [ ] **Self-Evaluation Completed** - Adversarial self-evaluation section included
2648
+ - [ ] **Risk Assessment** - Likely failure mode and coverage gaps stated
2183
2649
  - [ ] **Verdict** - One of: READY_FOR_MERGE, AC_MET_BUT_NOT_A_PLUS, NEEDS_VERIFICATION, AC_NOT_MET
2184
2650
  - [ ] **Documentation Check** - README/docs updated if feature adds new functionality
2185
2651
  - [ ] **Next Steps** - Clear, actionable recommendations
2652
+ - [ ] Adversarial re-read of core logic — list anything the structured pipeline didn't surface
2186
2653
 
2187
2654
  ### Standard QA (Implementation Exists, `SMALL_DIFF=false`)
2188
2655
 
2189
- - [ ] **Self-Evaluation Completed** - Adversarial self-evaluation section included in output
2656
+ - [ ] **Risk Assessment** - Likely failure mode and coverage gaps stated in output
2190
2657
  - [ ] **AC Coverage** - Each AC item marked as MET, PARTIALLY_MET, NOT_MET, PENDING, or N/A
2191
2658
  - [ ] **Quality Plan Verification** - Included if quality plan exists (or marked N/A if no quality plan)
2192
2659
  - [ ] **CI Status** - Included if PR exists (or marked "No PR" / "No CI configured")
@@ -2202,9 +2669,12 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
2202
2669
  - [ ] **Execution Evidence** - Included if scripts/CLI modified (or marked N/A)
2203
2670
  - [ ] **Script Verification Override** - Included if scripts/CLI modified AND /verify was skipped (with justification and risk assessment)
2204
2671
  - [ ] **Skill Command Verification** - Included if `.claude/skills/**/*.md` modified (or marked N/A)
2205
- - [ ] **Skill Change Review** - Skill-specific adversarial prompts included if skills changed
2672
+ - [ ] **Detection Pattern Verification** - Included if skill markdown adds new `grep`/`awk`/`jq`/`sed`/regex (or marked N/A)
2673
+ - [ ] **Skill Change Review** - Skill-specific verification prompts included if skills changed
2206
2674
  - [ ] **Smoke Test** - Included if workflow-affecting changes (skills, scripts, CLI), or marked "Not Required"
2675
+ - [ ] **Manual Test AC Enforcement** - Included if spec plan has Manual Test ACs (or marked N/A if no manual-test ACs detected)
2207
2676
  - [ ] **CHANGELOG Verification** - User-facing changes have `[Unreleased]` entry (or marked N/A)
2677
+ - [ ] **Adversarial Re-Read** - Required structured section: all 5 sub-prompts answered with concrete content; "Findings:" and "Status:" lines populated; bare "No gaps" without specific reasoning fails verification (see Section 6d)
2208
2678
  - [ ] **Documentation Check** - README/docs updated if feature adds new functionality
2209
2679
  - [ ] **Next Steps** - Clear, actionable recommendations
2210
2680
 
@@ -2291,12 +2761,12 @@ When the size gate triggers simple fix mode, use this shorter template:
2291
2761
 
2292
2762
  ---
2293
2763
 
2294
- ### Self-Evaluation
2764
+ ### Risk Assessment
2295
2765
 
2296
- - **Verified working:** [Yes/No]
2297
- - **Test efficacy:** [High/Medium/Low]
2298
- - **Likely failure mode:** [description]
2299
- - **Verdict confidence:** [High/Medium/Low]
2766
+ - **Likely failure mode:** [How would this break in production?]
2767
+ - **Not tested:** [What gaps exist in test coverage?]
2768
+ - **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
2769
+ - **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
2300
2770
 
2301
2771
  ---
2302
2772
 
@@ -2357,18 +2827,17 @@ You MUST include these sections:
2357
2827
 
2358
2828
  ---
2359
2829
 
2360
- ### Incremental QA Summary
2361
-
2362
- [Include if INCREMENTAL_MODE=true from Phase 0c, otherwise: "N/A - First QA run"]
2830
+ ### Precheck Findings
2363
2831
 
2364
- **Last QA:** <timestamp> (commit: <sha-short>)
2365
- **Changes since last QA:** N files
2832
+ [Include if `.sequant/gap-precheck.json` (schemaVersion 1) is present. Otherwise emit one line: "Precheck Findings: unavailable — inline fallback used." and omit the table.]
2366
2833
 
2367
- | Check / AC | Status | Re-run? | Reason |
2368
- |------------|--------|---------|--------|
2369
- | [check/AC] | [status] | Cached / Re-run / Re-evaluated | [reason] |
2834
+ | Section | Status | Surfaced |
2835
+ |---------|--------|----------|
2836
+ | Fixtures | pass / not_applicable / fail | [N fixtures, consumed by §6c Step 4 / §6d Q1] |
2837
+ | Sibling-grep | pass / not_applicable / fail | [N identifiers, candidate sites surfaced to §5] |
2838
+ | AC literal-diff | pass / not_applicable / fail | [IDs missing from PR body, if any] |
2370
2839
 
2371
- **Summary:** X checks cached, Y re-evaluated, Z always-fresh
2840
+ **Source:** `.sequant/gap-precheck.json` (schemaVersion 1)
2372
2841
 
2373
2842
  ---
2374
2843
 
@@ -2554,6 +3023,24 @@ You MUST include these sections:
2554
3023
 
2555
3024
  ---
2556
3025
 
3026
+ ### Detection Pattern Verification
3027
+
3028
+ [Include if skill markdown adds/modifies `grep`/`awk`/`jq`/`sed` commands or regex literals, otherwise: "N/A - No pattern changes in skill markdown"]
3029
+
3030
+ **Skill markdown files with pattern changes:** X
3031
+
3032
+ | Pattern | Corpus | Samples | Expected | Actual | Status |
3033
+ |---------|--------|---------|----------|--------|--------|
3034
+ | `[pattern]` | [corpus source] | [N samples] | [AC-claimed result] | [observed] | ✅ Passed / ❌ Failed |
3035
+
3036
+ **Motivating-example fixtures from issue body:** Y (run against the new pattern)
3037
+
3038
+ **Verification Status:** Passed / Failed / Insufficient Samples / Skipped / Not Required
3039
+
3040
+ **Verdict impact (stricter than 6a):** Failed → `AC_NOT_MET` (silent detection failures block merge)
3041
+
3042
+ ---
3043
+
2557
3044
  ### CLI Registration Verification
2558
3045
 
2559
3046
  [Include if option interfaces or CLI file modified, otherwise: "N/A - No option interface changes"]
@@ -2596,12 +3083,34 @@ You MUST include these sections:
2596
3083
 
2597
3084
  ---
2598
3085
 
2599
- ### Self-Evaluation
3086
+ ### Manual Test ACs
3087
+
3088
+ [Include if spec plan has ACs with **Verification:** Manual Test, otherwise: "N/A - No manual-test ACs detected"]
3089
+
3090
+ | AC | Verification Method | Runtime Test Status | Evidence |
3091
+ |----|--------------------|--------------------|----------|
3092
+ | AC-N | Manual Test | ✅ Executed / ⚠️ PENDING / 🔄 Overridden | [result or override justification] |
3093
+
3094
+ **Manual Test Enforcement:** X/Y manual-test ACs verified at runtime
3095
+
3096
+ [If any overrides applied, include Manual Test Override block per Section 7]
3097
+
3098
+ ---
3099
+
3100
+ ### Risk Assessment
3101
+
3102
+ - **Likely failure mode:** [How would this break in production? Be specific.]
3103
+ - **Not tested:** [What gaps exist in test coverage for these changes?]
3104
+ - **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
3105
+ - **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
3106
+
3107
+ ---
3108
+
3109
+ ### Adversarial Re-Read
3110
+
3111
+ **Findings:** [Concrete enumeration of gaps surfaced, OR "No gaps found because: <specific reason citing what was scanned/run/traced — fixtures consulted, evidence claims audited, process state inspected, sibling sites cited, Non-Goals checked>"]
2600
3112
 
2601
- - **Verified working:** [Yes/No - did you actually verify the feature works?]
2602
- - **Test efficacy:** [High/Medium/Low - do tests catch the feature breaking?]
2603
- - **Likely failure mode:** [What would most likely break this in production?]
2604
- - **Verdict confidence:** [High/Medium/Low - explain any uncertainty]
3113
+ **Status:** Clean / Gaps Found / Severe Gap
2605
3114
 
2606
3115
  ---
2607
3116