sequant 2.1.2 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +73 -0
- package/dist/bin/cli.js +95 -9
- package/dist/src/commands/doctor.d.ts +25 -0
- package/dist/src/commands/doctor.js +36 -1
- package/dist/src/commands/init.d.ts +1 -0
- package/dist/src/commands/init.js +118 -0
- package/dist/src/commands/locks.d.ts +67 -0
- package/dist/src/commands/locks.js +290 -0
- package/dist/src/commands/merge.js +11 -0
- package/dist/src/commands/prompt.d.ts +39 -0
- package/dist/src/commands/prompt.js +179 -0
- package/dist/src/commands/run-display.d.ts +26 -0
- package/dist/src/commands/run-display.js +150 -0
- package/dist/src/commands/run-progress.d.ts +32 -0
- package/dist/src/commands/run-progress.js +76 -0
- package/dist/src/commands/run.js +83 -73
- package/dist/src/commands/stats.d.ts +2 -0
- package/dist/src/commands/stats.js +94 -8
- package/dist/src/commands/status.js +27 -1
- package/dist/src/commands/watch.d.ts +16 -0
- package/dist/src/commands/watch.js +147 -0
- package/dist/src/lib/ac-linter.d.ts +1 -1
- package/dist/src/lib/ac-linter.js +81 -0
- package/dist/src/lib/assess-collision-detect.d.ts +91 -0
- package/dist/src/lib/assess-collision-detect.js +217 -0
- package/dist/src/lib/assess-comment-parser.d.ts +59 -1
- package/dist/src/lib/assess-comment-parser.js +124 -2
- package/dist/src/lib/cli-ui/format.d.ts +19 -0
- package/dist/src/lib/cli-ui/format.js +34 -0
- package/dist/src/lib/cli-ui/run-renderer-types.d.ts +181 -0
- package/dist/src/lib/cli-ui/run-renderer-types.js +7 -0
- package/dist/src/lib/cli-ui/run-renderer.d.ts +239 -0
- package/dist/src/lib/cli-ui/run-renderer.js +1173 -0
- package/dist/src/lib/heuristics/behavior-rule-detector.d.ts +94 -0
- package/dist/src/lib/heuristics/behavior-rule-detector.js +467 -0
- package/dist/src/lib/locks/index.d.ts +7 -0
- package/dist/src/lib/locks/index.js +5 -0
- package/dist/src/lib/locks/lock-manager.d.ts +168 -0
- package/dist/src/lib/locks/lock-manager.js +433 -0
- package/dist/src/lib/locks/types.d.ts +59 -0
- package/dist/src/lib/locks/types.js +31 -0
- package/dist/src/lib/qa/markdown-only-ci.d.ts +46 -0
- package/dist/src/lib/qa/markdown-only-ci.js +74 -0
- package/dist/src/lib/relay/activation.d.ts +60 -0
- package/dist/src/lib/relay/activation.js +122 -0
- package/dist/src/lib/relay/archive.d.ts +34 -0
- package/dist/src/lib/relay/archive.js +106 -0
- package/dist/src/lib/relay/frame.d.ts +20 -0
- package/dist/src/lib/relay/frame.js +76 -0
- package/dist/src/lib/relay/index.d.ts +13 -0
- package/dist/src/lib/relay/index.js +13 -0
- package/dist/src/lib/relay/paths.d.ts +43 -0
- package/dist/src/lib/relay/paths.js +59 -0
- package/dist/src/lib/relay/pid.d.ts +34 -0
- package/dist/src/lib/relay/pid.js +72 -0
- package/dist/src/lib/relay/reader.d.ts +35 -0
- package/dist/src/lib/relay/reader.js +115 -0
- package/dist/src/lib/relay/types.d.ts +68 -0
- package/dist/src/lib/relay/types.js +76 -0
- package/dist/src/lib/relay/writer.d.ts +48 -0
- package/dist/src/lib/relay/writer.js +113 -0
- package/dist/src/lib/settings.d.ts +31 -1
- package/dist/src/lib/settings.js +18 -3
- package/dist/src/lib/skill-version.d.ts +19 -0
- package/dist/src/lib/skill-version.js +68 -0
- package/dist/src/lib/templates.d.ts +1 -0
- package/dist/src/lib/templates.js +1 -1
- package/dist/src/lib/version-check.d.ts +60 -5
- package/dist/src/lib/version-check.js +97 -9
- package/dist/src/lib/workflow/batch-executor.d.ts +20 -1
- package/dist/src/lib/workflow/batch-executor.js +249 -176
- package/dist/src/lib/workflow/config-resolver.js +4 -0
- package/dist/src/lib/workflow/heartbeat.d.ts +71 -0
- package/dist/src/lib/workflow/heartbeat.js +194 -0
- package/dist/src/lib/workflow/phase-executor.d.ts +88 -3
- package/dist/src/lib/workflow/phase-executor.js +276 -52
- package/dist/src/lib/workflow/phase-mapper.d.ts +3 -2
- package/dist/src/lib/workflow/phase-mapper.js +17 -20
- package/dist/src/lib/workflow/platforms/github.d.ts +1 -1
- package/dist/src/lib/workflow/platforms/github.js +20 -3
- package/dist/src/lib/workflow/pr-status.d.ts +18 -2
- package/dist/src/lib/workflow/pr-status.js +41 -9
- package/dist/src/lib/workflow/qa-stagnation.d.ts +117 -0
- package/dist/src/lib/workflow/qa-stagnation.js +179 -0
- package/dist/src/lib/workflow/run-orchestrator.d.ts +76 -0
- package/dist/src/lib/workflow/run-orchestrator.js +382 -29
- package/dist/src/lib/workflow/run-reflect.js +1 -1
- package/dist/src/lib/workflow/run-state.d.ts +71 -0
- package/dist/src/lib/workflow/run-state.js +14 -0
- package/dist/src/lib/workflow/state-cleanup.d.ts +13 -5
- package/dist/src/lib/workflow/state-cleanup.js +17 -5
- package/dist/src/lib/workflow/state-manager.d.ts +12 -1
- package/dist/src/lib/workflow/state-manager.js +37 -0
- package/dist/src/lib/workflow/state-schema.d.ts +62 -0
- package/dist/src/lib/workflow/state-schema.js +35 -1
- package/dist/src/lib/workflow/types.d.ts +74 -1
- package/dist/src/lib/workflow/worktree-manager.d.ts +12 -4
- package/dist/src/lib/workflow/worktree-manager.js +76 -17
- package/dist/src/mcp/tools/run.d.ts +44 -0
- package/dist/src/mcp/tools/run.js +104 -13
- package/dist/src/ui/tui/App.d.ts +14 -0
- package/dist/src/ui/tui/App.js +41 -0
- package/dist/src/ui/tui/ElapsedTimer.d.ts +10 -0
- package/dist/src/ui/tui/ElapsedTimer.js +31 -0
- package/dist/src/ui/tui/Header.d.ts +6 -0
- package/dist/src/ui/tui/Header.js +15 -0
- package/dist/src/ui/tui/IssueBox.d.ts +16 -0
- package/dist/src/ui/tui/IssueBox.js +68 -0
- package/dist/src/ui/tui/Spinner.d.ts +9 -0
- package/dist/src/ui/tui/Spinner.js +18 -0
- package/dist/src/ui/tui/index.d.ts +15 -0
- package/dist/src/ui/tui/index.js +29 -0
- package/dist/src/ui/tui/theme.d.ts +29 -0
- package/dist/src/ui/tui/theme.js +52 -0
- package/dist/src/ui/tui/truncate.d.ts +11 -0
- package/dist/src/ui/tui/truncate.js +31 -0
- package/package.json +10 -3
- package/templates/agents/sequant-explorer.md +1 -0
- package/templates/agents/sequant-qa-checker.md +2 -1
- package/templates/agents/sequant-testgen.md +1 -0
- package/templates/hooks/post-tool.sh +11 -0
- package/templates/hooks/pre-tool.sh +18 -9
- package/templates/hooks/relay-check.sh +107 -0
- package/templates/relay/frame.txt +11 -0
- package/templates/scripts/cleanup-worktree.sh +25 -3
- package/templates/scripts/new-feature.sh +6 -0
- package/templates/skills/_shared/references/behavior-rule-detection.md +205 -0
- package/templates/skills/_shared/references/subagent-types.md +21 -8
- package/templates/skills/assess/SKILL.md +261 -94
- package/templates/skills/assess/references/predicted-collision-detection.md +109 -0
- package/templates/skills/docs/SKILL.md +141 -22
- package/templates/skills/exec/SKILL.md +10 -49
- package/templates/skills/fullsolve/SKILL.md +80 -32
- package/templates/skills/loop/SKILL.md +28 -0
- package/templates/skills/merger/SKILL.md +621 -0
- package/templates/skills/qa/SKILL.md +746 -8
- package/templates/skills/qa/scripts/quality-checks.sh +47 -1
- package/templates/skills/setup/SKILL.md +6 -0
- package/templates/skills/spec/SKILL.md +217 -964
- package/templates/skills/spec/references/parallel-groups.md +7 -0
- package/templates/skills/spec/references/quality-checklist.md +75 -0
- package/templates/skills/spec/references/recommended-workflow.md +4 -2
- package/templates/skills/test/SKILL.md +0 -27
- package/templates/skills/testgen/SKILL.md +24 -44
|
@@ -107,9 +107,11 @@ COMMIT_SHA=$(git rev-parse HEAD)
|
|
|
107
107
|
```
|
|
108
108
|
|
|
109
109
|
```markdown
|
|
110
|
-
<!-- SEQUANT_PHASE: {"phase":"qa","status":"completed","timestamp":"<ISO-8601>","commitSHA":"<HEAD-SHA>"} -->
|
|
110
|
+
<!-- SEQUANT_PHASE: {"phase":"qa","status":"completed","timestamp":"<ISO-8601>","commitSHA":"<HEAD-SHA>","verdict":"<READY_FOR_MERGE|AC_MET_BUT_NOT_A_PLUS|NEEDS_VERIFICATION>"} -->
|
|
111
111
|
```
|
|
112
112
|
|
|
113
|
+
**Note:** The `verdict` field is required on `status:"completed"` markers so the Phase 0a short-circuit can surface the prior verdict without re-reading the comment body. Older markers without this field are still accepted — Phase 0a falls back to `(see prior QA comment)`.
|
|
114
|
+
|
|
113
115
|
If QA determines AC_NOT_MET, emit:
|
|
114
116
|
```markdown
|
|
115
117
|
<!-- SEQUANT_PHASE: {"phase":"qa","status":"failed","timestamp":"<ISO-8601>","error":"AC_NOT_MET","commitSHA":"<HEAD-SHA>"} -->
|
|
@@ -122,9 +124,23 @@ Include this marker in every `gh issue comment` that represents QA completion.
|
|
|
122
124
|
Invocation:
|
|
123
125
|
|
|
124
126
|
- `/qa 123`: Treat `123` as the GitHub issue/PR identifier in context.
|
|
127
|
+
- `/qa 123 172`: Treat both as issue numbers — process each sequentially.
|
|
125
128
|
- `/qa <freeform description>`: Treat the text as context about the change to review.
|
|
126
129
|
- `/qa 123 --parallel`: Force parallel agent execution (faster, higher token usage).
|
|
127
130
|
- `/qa 123 --sequential`: Force sequential agent execution (slower, lower token usage).
|
|
131
|
+
- `/qa 123 --force`: Bypass prior-QA short-circuit and force a full re-run even if the last QA covers the current commit.
|
|
132
|
+
|
|
133
|
+
### Multi-Issue Invocation
|
|
134
|
+
|
|
135
|
+
When multiple issue numbers are provided (e.g., `/qa 167 172`):
|
|
136
|
+
|
|
137
|
+
1. **Parse all issue numbers** from args
|
|
138
|
+
2. **Process each issue sequentially** with inline code review — do NOT spawn ad-hoc background agents for the diff reading or AC verification portions
|
|
139
|
+
3. The built-in `sequant-qa-checker` sub-agents (type safety, scope, security) continue to run per the size gate rules for each issue
|
|
140
|
+
4. Each issue gets its own full QA cycle: context fetch → diff review → quality checks → verdict → comment
|
|
141
|
+
5. Post a **separate QA comment** to each issue's GitHub thread
|
|
142
|
+
|
|
143
|
+
**Why sequential with inline review:** Ad-hoc background agents for code review are unreliable — they hallucinate about file existence, misattribute API patterns, and hit permission issues on worktree reads. The narrowly-scoped `sequant-qa-checker` agents work well because they have specific, bounded tasks. The code review portion must stay inline for accuracy.
|
|
128
144
|
|
|
129
145
|
### Agent Execution Mode
|
|
130
146
|
|
|
@@ -487,6 +503,87 @@ fi
|
|
|
487
503
|
|
|
488
504
|
---
|
|
489
505
|
|
|
506
|
+
### Phase 0a: Prior QA Short-Circuit Check
|
|
507
|
+
|
|
508
|
+
**After confirming implementation exists** (Phase 0 passed), check whether a prior QA run already covers the current commit. This avoids re-running the full QA pipeline when nothing has changed.
|
|
509
|
+
|
|
510
|
+
**Skip this check if any of these are true:**
|
|
511
|
+
- `--force` flag is present in the invocation args
|
|
512
|
+
- `--no-cache` flag is present in the invocation args
|
|
513
|
+
- `SEQUANT_ORCHESTRATOR` is set and the orchestrator explicitly requests a fresh run
|
|
514
|
+
|
|
515
|
+
**Detection Logic:**
|
|
516
|
+
|
|
517
|
+
```bash
|
|
518
|
+
# 1. Get current HEAD SHA
|
|
519
|
+
current_sha=$(git rev-parse HEAD)
|
|
520
|
+
|
|
521
|
+
# 2. Fetch the latest qa:completed or qa:failed phase marker from issue comments
|
|
522
|
+
# NOTE: Use `.comments[].body` (NOT `[.comments[].body]`). The array form JSON-encodes
|
|
523
|
+
# each body, escaping internal quotes (`"phase":"qa"` → `\"phase\":\"qa\"`) and `<` →
|
|
524
|
+
# `\u003c`, which defeats the grep pattern below. The streaming form outputs raw bodies.
|
|
525
|
+
latest_qa_marker=$(gh issue view <issue-number> --json comments --jq '.comments[].body' | \
|
|
526
|
+
grep -o '<!-- SEQUANT_PHASE: {[^}]*"phase":"qa"[^}]*} -->' | \
|
|
527
|
+
tail -1 || true)
|
|
528
|
+
|
|
529
|
+
# 3. Extract status, commitSHA, verdict, and timestamp from the marker
|
|
530
|
+
if [[ -n "$latest_qa_marker" ]]; then
|
|
531
|
+
marker_json=$(echo "$latest_qa_marker" | grep -o '{[^}]*}')
|
|
532
|
+
marker_status=$(echo "$marker_json" | jq -r '.status // empty' 2>/dev/null || true)
|
|
533
|
+
marker_sha=$(echo "$marker_json" | jq -r '.commitSHA // empty' 2>/dev/null || true)
|
|
534
|
+
marker_timestamp=$(echo "$marker_json" | jq -r '.timestamp // empty' 2>/dev/null || true)
|
|
535
|
+
marker_verdict=$(echo "$marker_json" | jq -r '.verdict // empty' 2>/dev/null || true)
|
|
536
|
+
fi
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
**Short-Circuit Decision Matrix:**
|
|
540
|
+
|
|
541
|
+
| marker_status | marker_sha == HEAD | Action |
|
|
542
|
+
|---------------|-------------------|--------|
|
|
543
|
+
| `completed` | Yes | **Short-circuit** — skip full QA |
|
|
544
|
+
| `completed` | No | Proceed with full QA (new commits since last run) |
|
|
545
|
+
| `failed` | Yes or No | Proceed with full QA (user likely wants re-run after fix) |
|
|
546
|
+
| (not found) | N/A | Proceed with full QA (no prior run) |
|
|
547
|
+
|
|
548
|
+
**When short-circuiting (status=completed, SHA matches):**
|
|
549
|
+
|
|
550
|
+
1. **Skip** sub-agent spawning
|
|
551
|
+
2. **Skip** code review and quality checks
|
|
552
|
+
3. **Output** the short-circuit summary (template below)
|
|
553
|
+
4. **Do NOT** post a new GitHub comment (the prior comment is still valid)
|
|
554
|
+
|
|
555
|
+
**Short-Circuit Output Template:**
|
|
556
|
+
|
|
557
|
+
Populate `**Prior Verdict:**` from `$marker_verdict` when non-empty. When empty (legacy marker without the field), substitute the literal string `(see prior QA comment)`.
|
|
558
|
+
|
|
559
|
+
```markdown
|
|
560
|
+
## QA Review for Issue #<N>
|
|
561
|
+
|
|
562
|
+
### Prior QA Still Valid
|
|
563
|
+
|
|
564
|
+
QA already completed at commit `<SHA>` on <timestamp> — no changes since last run.
|
|
565
|
+
Current HEAD (`<current_sha>`) matches the previously reviewed commit.
|
|
566
|
+
|
|
567
|
+
**Prior Verdict:** <$marker_verdict OR "(see prior QA comment)" if empty>
|
|
568
|
+
|
|
569
|
+
To force a full re-run, use: `/qa <N> --force` or `/qa <N> --no-cache`
|
|
570
|
+
|
|
571
|
+
---
|
|
572
|
+
|
|
573
|
+
*QA short-circuited: prior run at same SHA is still valid*
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
**Verdict field handling:**
|
|
577
|
+
|
|
578
|
+
| `$marker_verdict` | Action |
|
|
579
|
+
|-------------------|--------|
|
|
580
|
+
| Non-empty (new markers) | Emit literally: `**Prior Verdict:** READY_FOR_MERGE` (etc.) |
|
|
581
|
+
| Empty (legacy markers) | Emit: `**Prior Verdict:** (see prior QA comment)` — the prior comment body contains the full verdict |
|
|
582
|
+
|
|
583
|
+
The short-circuit itself still triggers in both cases — only the displayed verdict text differs.
|
|
584
|
+
|
|
585
|
+
---
|
|
586
|
+
|
|
490
587
|
### Phase 0b: Quality Plan Verification (CONDITIONAL)
|
|
491
588
|
|
|
492
589
|
**When to apply:** If issue has a Feature Quality Planning section in comments (from `/spec`).
|
|
@@ -599,6 +696,82 @@ quality_plan_exists=$(gh issue view <issue> --comments --json comments -q '.comm
|
|
|
599
696
|
|
|
600
697
|
---
|
|
601
698
|
|
|
699
|
+
### Phase 0c: Precheck Findings (CONDITIONAL)
|
|
700
|
+
|
|
701
|
+
**Purpose:** Consume deterministic gap-check output from `scripts/qa/precheck.ts`. The script handles three checks the agent doesn't need to evaluate but pays token cost for if inlined: verbatim motivating-example fixture extraction, cross-file sibling-grep on changed identifiers, and AC literal-id diff between issue body and PR body. Downstream sections (§1 AC Literal Verification, §5 Sibling-site Scan, §6c Step 4 Fixture Verification, §6d Q1 Verbatim Fixtures) consult the precheck output when available and fall back to inline logic on miss/error.
|
|
702
|
+
|
|
703
|
+
**Origin:** #609 — extract deterministic gap-checks into a pre-QA gate. Backed by #608's signal-to-noise study (e.g. §6c at 0/11 actioned findings, ~1,800 tokens/invoke).
|
|
704
|
+
|
|
705
|
+
**Run (best-effort, exit code is always 0 even on partial failure):**
|
|
706
|
+
|
|
707
|
+
```bash
|
|
708
|
+
issue=<issue-number>
|
|
709
|
+
npx tsx scripts/qa/precheck.ts --issue "$issue" 2>/dev/null || true
|
|
710
|
+
```
|
|
711
|
+
|
|
712
|
+
The script also auto-detects the PR via `gh pr view --json number` and runs `git diff origin/main...HEAD` for the changed-identifier scan.
|
|
713
|
+
|
|
714
|
+
**Output:** `.sequant/gap-precheck.json` (schemaVersion 1):
|
|
715
|
+
|
|
716
|
+
```json
|
|
717
|
+
{
|
|
718
|
+
"schemaVersion": 1,
|
|
719
|
+
"issue": 609,
|
|
720
|
+
"pr": 999,
|
|
721
|
+
"generatedAt": "...",
|
|
722
|
+
"checks": {
|
|
723
|
+
"fixtures": { "status": "pass | not_applicable | fail", "count": N, "fixtures": [...] },
|
|
724
|
+
"siblingGrep": { "status": "...", "identifiers": [{ "name": "...", "definedIn": "...", "siblingSites": [...] }] },
|
|
725
|
+
"acLiteralDiff": { "status": "...", "issueACs": [...], "prACs": [...], "missingInPR": [...] }
|
|
726
|
+
}
|
|
727
|
+
}
|
|
728
|
+
```
|
|
729
|
+
|
|
730
|
+
**Consumption rules per downstream section:**
|
|
731
|
+
|
|
732
|
+
| Precheck status | Downstream section behavior |
|
|
733
|
+
|-----------------|-----------------------------|
|
|
734
|
+
| `pass` | Use precheck output as primary input; agent judgment evaluates surfaced candidates (e.g. is each sibling-grep hit a real sibling site?) |
|
|
735
|
+
| `not_applicable` | Treat as section N/A; do NOT inline-re-extract |
|
|
736
|
+
| `fail` | Fall back to inline extraction (the section's pre-existing logic) |
|
|
737
|
+
| Precheck JSON missing / malformed | Fall back to inline extraction; note "precheck unavailable" |
|
|
738
|
+
|
|
739
|
+
**Fallback (precheck JSON missing / malformed):**
|
|
740
|
+
|
|
741
|
+
```bash
|
|
742
|
+
precheck_path=".sequant/gap-precheck.json"
|
|
743
|
+
precheck_ok="no"
|
|
744
|
+
if [[ -f "$precheck_path" ]]; then
|
|
745
|
+
schema=$(jq -r .schemaVersion "$precheck_path" 2>/dev/null || echo "")
|
|
746
|
+
if [[ "$schema" == "1" ]]; then
|
|
747
|
+
precheck_ok="yes"
|
|
748
|
+
fi
|
|
749
|
+
fi
|
|
750
|
+
# When precheck_ok=no, every downstream consumer falls back to its inline path.
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
Do NOT block the QA run on a missing precheck — the script is best-effort. The fallback path is the pre-#609 behavior, which still produces a correct verdict (just at higher token cost).
|
|
754
|
+
|
|
755
|
+
**Output Format:**
|
|
756
|
+
|
|
757
|
+
```markdown
|
|
758
|
+
### Precheck Findings
|
|
759
|
+
|
|
760
|
+
| Section | Status | Surfaced |
|
|
761
|
+
|---------|--------|----------|
|
|
762
|
+
| Fixtures | pass | 3 motivating-example fixtures (consumed by §6c Step 4 / §6d Q1) |
|
|
763
|
+
| Sibling-grep | pass | 5 changed identifiers (candidate sites surfaced to §5) |
|
|
764
|
+
| AC literal-diff | fail | Issue lists AC-3 / AC-4; PR body omits them |
|
|
765
|
+
|
|
766
|
+
**Source:** `.sequant/gap-precheck.json` (schemaVersion 1)
|
|
767
|
+
```
|
|
768
|
+
|
|
769
|
+
If precheck unavailable, omit the table and emit a single line: `**Precheck Findings:** unavailable — inline fallback used.`
|
|
770
|
+
|
|
771
|
+
**Verdict impact:** None directly. Phase 0c is plumbing — the downstream sections own the verdict effect when they consume the surfaced candidates.
|
|
772
|
+
|
|
773
|
+
---
|
|
774
|
+
|
|
602
775
|
### Phase 1: CI Status Check — REQUIRED
|
|
603
776
|
|
|
604
777
|
**Purpose:** Check GitHub CI status before finalizing verdict. CI-dependent AC items (e.g., "Tests pass in CI") should reflect actual CI status, not just local test results.
|
|
@@ -624,11 +797,13 @@ fi
|
|
|
624
797
|
| `FAILURE` | `fail` | `NOT_MET` | Blocks merge |
|
|
625
798
|
| `CANCELLED` | `fail` | `NOT_MET` | Blocks merge |
|
|
626
799
|
| `SKIPPED` | `pass` | `N/A` | No impact |
|
|
627
|
-
| `PENDING` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
|
|
628
|
-
| `QUEUED` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
|
|
629
|
-
| `IN_PROGRESS` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
|
|
800
|
+
| `PENDING` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
|
|
801
|
+
| `QUEUED` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
|
|
802
|
+
| `IN_PROGRESS` | `pending` | `PENDING` * | → `NEEDS_VERIFICATION` * |
|
|
630
803
|
| (empty response) | - | `N/A` | No CI configured |
|
|
631
804
|
|
|
805
|
+
\* Pending checks may be reclassified as `MET` (informational) when the diff is markdown-only and the check name matches `qa.markdownOnlySafeCiPatterns` — see "Markdown-Only Diff Relaxation" below for the gating-vs-relaxed partitioning rules. Failed checks are never relaxed.
|
|
806
|
+
|
|
632
807
|
**CI-Related AC Detection:**
|
|
633
808
|
|
|
634
809
|
Identify AC items that depend on CI by matching these patterns:
|
|
@@ -696,12 +871,72 @@ No CI checks configured for this repository.
|
|
|
696
871
|
**Verdict Integration:**
|
|
697
872
|
|
|
698
873
|
CI status affects the final verdict through the standard verdict algorithm:
|
|
699
|
-
- CI `PENDING` → AC item marked `PENDING` → Verdict: `NEEDS_VERIFICATION`
|
|
874
|
+
- CI `PENDING` (gating) → AC item marked `PENDING` → Verdict: `NEEDS_VERIFICATION`
|
|
875
|
+
- CI `PENDING` (relaxed via the markdown-only block below) → AC item marked `MET` → no impact on verdict
|
|
700
876
|
- CI `failure` → AC item marked `NOT_MET` → Verdict: `AC_NOT_MET`
|
|
701
877
|
- CI `success` → AC item marked `MET` → No additional impact
|
|
702
878
|
- No CI → AC item marked `N/A` → No impact on verdict
|
|
703
879
|
|
|
704
|
-
**Important:** Do NOT give `READY_FOR_MERGE` if any CI check is still pending.
|
|
880
|
+
**Important:** Do NOT give `READY_FOR_MERGE` if any *gating* CI check is still pending. Pending checks are gating by default; the markdown-only relaxation below reclassifies a narrow allowlist as informational. When any gating-pending check remains, the correct verdict is `NEEDS_VERIFICATION` with a note to re-run QA after CI completes.
|
|
881
|
+
|
|
882
|
+
#### Markdown-Only Diff Relaxation
|
|
883
|
+
|
|
884
|
+
**Purpose:** A diff that touches only `.md` files cannot break the build matrix. Forcing `NEEDS_VERIFICATION` until the build matrix reports back wastes wakeup cycles when `typecheck` (already green) has proven the change is structurally inert. This relaxation reclassifies a small, configurable allowlist of pending checks (default: build matrix + Plugin Structure Validation) as informational for markdown-only diffs.
|
|
885
|
+
|
|
886
|
+
**Scope and limits:**
|
|
887
|
+
|
|
888
|
+
- **Only pending checks are relaxed.** Failed checks always gate, regardless of diff type.
|
|
889
|
+
- **Only the configured allowlist is relaxed.** Other pending checks (`validate-skills`, `Hooks Validation`, `validate-plugin`, etc.) still gate.
|
|
890
|
+
- **Build-affecting files disqualify the diff** even if every other change is `.md`. The detector treats `package.json`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`, `tsconfig*.json`, `*.config.{js,ts,mjs,cjs}`, and `.github/workflows/**` as non-markdown.
|
|
891
|
+
|
|
892
|
+
**Settings (`.sequant/settings.json`, `qa` section):**
|
|
893
|
+
|
|
894
|
+
| Key | Default | Effect |
|
|
895
|
+
|-----|---------|--------|
|
|
896
|
+
| `markdownOnlyCiRelaxed` | `true` | Master switch. Set to `false` to restore strict gating for paranoid projects. |
|
|
897
|
+
| `markdownOnlySafeCiPatterns` | `["build (*)", "Plugin Structure Validation"]` | Glob patterns (single `*` wildcard) for CI check names that are safe to ignore when pending on a markdown-only diff. Override per project to match local CI step names. |
|
|
898
|
+
|
|
899
|
+
**Procedure:**
|
|
900
|
+
|
|
901
|
+
```bash
|
|
902
|
+
# 1. Read settings — silently fall back to defaults on parse failure.
|
|
903
|
+
relaxed_enabled=$(cat .sequant/settings.json 2>/dev/null \
|
|
904
|
+
| grep -o '"markdownOnlyCiRelaxed"[[:space:]]*:[[:space:]]*\(true\|false\)' \
|
|
905
|
+
| grep -o 'true\|false' || echo "true")
|
|
906
|
+
|
|
907
|
+
# 2. Compute changed-file list against origin/main.
|
|
908
|
+
changed_files=$(git diff origin/main...HEAD --name-only || true)
|
|
909
|
+
|
|
910
|
+
# 3. Run the helpers (single npx call returns the gating-pending bucket).
|
|
911
|
+
result=$(SEQUANT_QA_RELAX_FILES="$changed_files" SEQUANT_QA_RELAX_PENDING="$pending_check_names" \
|
|
912
|
+
npx tsx -e '
|
|
913
|
+
(async () => {
|
|
914
|
+
const m = await import("./src/lib/qa/markdown-only-ci.ts");
|
|
915
|
+
const { getSettings } = await import("./src/lib/settings.ts");
|
|
916
|
+
const files = (process.env.SEQUANT_QA_RELAX_FILES || "").split("\n").filter(Boolean);
|
|
917
|
+
const pending = (process.env.SEQUANT_QA_RELAX_PENDING || "").split("\n").filter(Boolean);
|
|
918
|
+
const settings = await getSettings();
|
|
919
|
+
const isMdOnly = m.detectMarkdownOnlyDiff(files);
|
|
920
|
+
const enabled = settings.qa.markdownOnlyCiRelaxed && isMdOnly;
|
|
921
|
+
const buckets = enabled
|
|
922
|
+
? m.filterRelaxablePending(pending, settings.qa.markdownOnlySafeCiPatterns)
|
|
923
|
+
: { relaxed: [], gating: pending };
|
|
924
|
+
console.log(JSON.stringify({ isMdOnly, enabled, ...buckets }));
|
|
925
|
+
})();
|
|
926
|
+
' 2>/dev/null || echo '{"isMdOnly":false,"enabled":false,"relaxed":[],"gating":[]}')
|
|
927
|
+
```
|
|
928
|
+
|
|
929
|
+
When `enabled === true` (markdown-only diff AND `markdownOnlyCiRelaxed` is `true`), use `gating` for the verdict's `pending_count`; the `relaxed` list is informational and rendered in the output.
|
|
930
|
+
|
|
931
|
+
**Output transparency (REQUIRED when relaxation triggers):**
|
|
932
|
+
|
|
933
|
+
When `enabled === true` AND `relaxed.length > 0`, the `### CI Status` section MUST include this labeled note immediately after the CI summary line, listing the relaxed check names verbatim:
|
|
934
|
+
|
|
935
|
+
```markdown
|
|
936
|
+
**Markdown-only diff detected — pending build-matrix checks treated as informational. Relaxed: build (20.x), build (22.x), Plugin Structure Validation.**
|
|
937
|
+
```
|
|
938
|
+
|
|
939
|
+
If `enabled === false` (flag off, or diff includes non-markdown files), do not emit the note — render the CI Status section unchanged.
|
|
705
940
|
|
|
706
941
|
---
|
|
707
942
|
|
|
@@ -838,6 +1073,12 @@ issue_type="${SEQUANT_ISSUE_TYPE:-}"
|
|
|
838
1073
|
admin_modified=$(git diff main...HEAD --name-only | grep -E "^app/admin/" | head -1 || true)
|
|
839
1074
|
```
|
|
840
1075
|
|
|
1076
|
+
**Add skill sync check if skill files modified:**
|
|
1077
|
+
```bash
|
|
1078
|
+
skill_modified=$(git diff main...HEAD --name-only | grep -E "^\.(claude/skills|skills|templates/skills)/" | head -1 || true)
|
|
1079
|
+
```
|
|
1080
|
+
If skill files are modified, the quality-checks.sh script automatically runs the three-directory sync check (section 12). If divergence is detected, this blocks `READY_FOR_MERGE` — verdict becomes `AC_MET_BUT_NOT_A_PLUS` with a note to run `npx tsx scripts/check-skill-sync.ts --fix`.
|
|
1081
|
+
|
|
841
1082
|
See [quality-gates.md](references/quality-gates.md) for detailed verdict synthesis.
|
|
842
1083
|
|
|
843
1084
|
### Using MCP Tools (Optional)
|
|
@@ -1356,10 +1597,11 @@ Before any READY_FOR_MERGE verdict, complete the adversarial thinking checklist:
|
|
|
1356
1597
|
2. **"What assumptions am I making?"** - List and validate key assumptions
|
|
1357
1598
|
3. **"What's the unhappy path?"** - Test invalid inputs, failed dependencies
|
|
1358
1599
|
4. **"Did I test the feature's PRIMARY PURPOSE?"** - If it handles errors, trigger an error
|
|
1600
|
+
5. **"Does the same root-cause pattern exist at sibling sites in this file?"** - The literal repro from the issue body is necessary but not sufficient. After the cited bug is fixed, audit other call sites in the same file (and same function/loop) that share the root-cause pattern. Example: if a destructive operation invalidates a resource that subsequent code depends on, scan for other destructive operations on that resource type in the same function/loop; if a wrong null-check is the bug, scan for the same access pattern elsewhere. **Complementary to Section 5's cross-file sibling-site scan: §4's question is intra-file (other lines/functions in the same file with the same root cause); §5 is cross-file (other files in the codebase with the same vulnerability).**
|
|
1359
1601
|
|
|
1360
1602
|
See [testing-requirements.md](references/testing-requirements.md) for edge case checklists.
|
|
1361
1603
|
|
|
1362
|
-
### 5. Risk Assessment (REQUIRED
|
|
1604
|
+
### 5. Risk Assessment (REQUIRED)
|
|
1363
1605
|
|
|
1364
1606
|
**Before issuing your verdict**, state the implementation risks in 2-3 sentences.
|
|
1365
1607
|
|
|
@@ -1370,10 +1612,22 @@ See [testing-requirements.md](references/testing-requirements.md) for edge case
|
|
|
1370
1612
|
|
|
1371
1613
|
- **Likely failure mode:** [How would this break in production? Be specific.]
|
|
1372
1614
|
- **Not tested:** [What gaps exist in test coverage for these changes?]
|
|
1615
|
+
- **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
|
|
1616
|
+
- **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
|
|
1373
1617
|
```
|
|
1374
1618
|
|
|
1375
1619
|
**If either field reveals significant concerns**, factor them into your verdict. A serious failure mode with no test coverage should downgrade to `AC_MET_BUT_NOT_A_PLUS` or `AC_NOT_MET`.
|
|
1376
1620
|
|
|
1621
|
+
#### Sibling-site Scan (Conditional)
|
|
1622
|
+
|
|
1623
|
+
**When to apply:** Focused AC + a localized fix where the same root-cause pattern likely exists in other files in the codebase (≥3 occurrences of the affected pattern across files — e.g. regex blocks repeated across multiple hook scripts). Intra-file sibling sites are covered by §4 Q5; this scan is the cross-file complement.
|
|
1624
|
+
|
|
1625
|
+
**Before declaring AC met**, scan other files in the codebase for sibling code with the same pattern as the bug being fixed. If sibling sites would exhibit the same root cause but weren't part of the literal AC, surface them in the verdict's `Sibling sites considered:` slot — as expanded scope (only when trivial) or follow-up issue suggestion. **Don't widen scope mid-PR; file a follow-up issue instead.** Sibling sites alone do not produce `NEEDS_VERIFICATION`; that verdict is reserved for external/temporal gates (CI pending, manual-test ACs unexecuted).
|
|
1626
|
+
|
|
1627
|
+
**Scope:** orchestrator/inline-review only — `sequant-qa-checker` sub-agents are not asked to do this scan; the orchestrator owns it during verdict synthesis.
|
|
1628
|
+
|
|
1629
|
+
This operationalizes the principle in `feedback_qa_second_look.md` (structured QA biases positive on clean code; an adversarial re-read of core logic surfaces real gaps). Don't automate via grep — false-positive risk; this is a "look at adjacent files" prompt.
|
|
1630
|
+
|
|
1377
1631
|
#### Skill Change Review (Conditional)
|
|
1378
1632
|
|
|
1379
1633
|
**When to apply:** `.claude/skills/**/*.md` files were modified.
|
|
@@ -1623,6 +1877,293 @@ fi
|
|
|
1623
1877
|
|
|
1624
1878
|
---
|
|
1625
1879
|
|
|
1880
|
+
### 6c. Detection Pattern Verification (REQUIRED for skill regex/grep/awk/jq/sed changes)
|
|
1881
|
+
|
|
1882
|
+
**HARD PRECONDITION (REQUIRED — emit nothing when false):**
|
|
1883
|
+
|
|
1884
|
+
```bash
|
|
1885
|
+
# Skill markdown files whose DIFF HUNKS (added lines) contain
|
|
1886
|
+
# regex/grep/awk/jq/sed literals. Grepping diff hunks rather than current
|
|
1887
|
+
# file content is load-bearing: all 19 sequant SKILL.md files mention these
|
|
1888
|
+
# tokens in unrelated example code, so a content-grep gate fires for ~100%
|
|
1889
|
+
# of skill-md PRs and the cost-saving intent of #608 / #609 evaporates.
|
|
1890
|
+
pattern_files=$(git diff origin/main...HEAD --name-only | \
|
|
1891
|
+
grep -E '^(\.claude/skills|templates/skills|skills)/.*\.md$' | \
|
|
1892
|
+
while read -r f; do
|
|
1893
|
+
if git diff origin/main...HEAD -- "$f" | grep -E '^\+[^+]' | \
|
|
1894
|
+
grep -qE '\b(grep|awk|jq|sed)\b|/[^/]+/[gim]?'; then
|
|
1895
|
+
echo "$f"
|
|
1896
|
+
fi
|
|
1897
|
+
done || true)
|
|
1898
|
+
```
|
|
1899
|
+
|
|
1900
|
+
If `pattern_files` is empty, **omit the entire §6c block — including its output template row — from the QA comment.** Do NOT emit "Not Required." Per the #608 signal-to-noise study, every one of §6c's 11 prior emissions said exactly "N/A — no skill regex/grep/awk changes" and produced zero substantive findings. The header recitation itself is the cost (~1,800 tokens / invoke). When the precondition is false, treat §6c as not loaded for this run.
|
|
1901
|
+
|
|
1902
|
+
**When precondition is TRUE:** continue with Steps 1–5 below.
|
|
1903
|
+
|
|
1904
|
+
**When to apply:** Diff modifies skill markdown files (`.claude/skills/**/*.md`, `templates/skills/**/*.md`, `skills/**/*.md`) AND adds or modifies regex literals or `grep`/`awk`/`jq`/`sed` commands inside those files.
|
|
1905
|
+
|
|
1906
|
+
**Purpose:** Prompt-only skill changes (regex/grep/awk/jq inside `SKILL.md`) have **no automated test coverage**. Section 6a (Skill Command Verification) checks command syntax — whether `gh pr checks --json conclusion` is a valid field — but does NOT check whether `awk '/^### AC-[0-9]+/'` actually matches real spec headers. A pattern that is syntactically valid but matches the wrong corpus produces a **silent detection failure** — the worst kind of bug, because the pipeline reports success.
|
|
1907
|
+
|
|
1908
|
+
**Origin (PR #547 / issue #529):** Three such bugs shipped in a single PR before adversarial review surfaced them:
|
|
1909
|
+
|
|
1910
|
+
1. `jq 'select(contains("SEQUANT_PHASE") and contains("spec"))'` matched 5 unrelated comments and returned a QA comment as "the spec plan"
|
|
1911
|
+
2. `awk '/^### AC-[0-9]+/'` only matched 3-hash headers, missing `#### AC-N` and `**AC-N:**` (~45% of sampled past specs)
|
|
1912
|
+
3. The grep regex omitted `**Verify:**` as a prefix — even though the issue body's verbatim motivating example used that exact prefix
|
|
1913
|
+
|
|
1914
|
+
Each was a 30-second diagnostic once piped through real corpus. None showed up in static review of the diff against AC text because every pattern was syntactically valid and matched the AC description in the abstract.
|
|
1915
|
+
|
|
1916
|
+
#### Step 1: Detect Pattern Changes
|
|
1917
|
+
|
|
1918
|
+
```bash
|
|
1919
|
+
# Skill markdown files modified in this PR
|
|
1920
|
+
skill_md_changed=$(git diff origin/main...HEAD --name-only | \
|
|
1921
|
+
grep -E '^(\.claude/skills|templates/skills|skills)/.*\.md$' || true)
|
|
1922
|
+
|
|
1923
|
+
# Among those, ADDED lines that introduce a grep/awk/jq/sed command or a regex literal
|
|
1924
|
+
pattern_changes=""
|
|
1925
|
+
for f in $skill_md_changed; do
|
|
1926
|
+
added=$(git diff origin/main...HEAD -- "$f" | \
|
|
1927
|
+
awk '/^\+[^+]/ { print substr($0, 2) }' | \
|
|
1928
|
+
grep -E '(\b(grep|awk|jq|sed) [-\x27"]|/\^[^/]+/|\(\?[:=!]|\\b[A-Za-z]+\\b)' || true)
|
|
1929
|
+
if [[ -n "$added" ]]; then
|
|
1930
|
+
pattern_changes="${pattern_changes}\n=== $f ===\n${added}"
|
|
1931
|
+
fi
|
|
1932
|
+
done
|
|
1933
|
+
|
|
1934
|
+
if [[ -z "$pattern_changes" ]]; then
|
|
1935
|
+
echo "No pattern changes detected — verification not required"
|
|
1936
|
+
# Set detection_pattern_status = "Not Required"
|
|
1937
|
+
fi
|
|
1938
|
+
```
|
|
1939
|
+
|
|
1940
|
+
**Manual review fallback:** Heuristic regex misses some pattern shapes — bare `grep "pattern"` with no flag, multi-line `awk` blocks, complex `sed` programs, regex literals embedded in JSON examples, non-anchored character classes (`[A-Z]+`, `(\d+)`). Even when the script reports zero matches, scan the diff for pattern-shaped additions if any of `grep`/`awk`/`jq`/`sed`/`regex`/`pattern` appear in the diff text.
|
|
1941
|
+
|
|
1942
|
+
#### Step 2: Identify Intended Corpus per Pattern
|
|
1943
|
+
|
|
1944
|
+
For each modified pattern, determine WHERE it is supposed to match. Use the surrounding skill prose (section title, preceding paragraph, the bash variable name) to infer the corpus:
|
|
1945
|
+
|
|
1946
|
+
| Pattern Context | Corpus Source | How to Sample |
|
|
1947
|
+
|-----------------|---------------|---------------|
|
|
1948
|
+
| Spec plan parsing | Past `/spec` comments on GitHub issues | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| contains("SEQUANT_PHASE: spec")) \| .body'` |
|
|
1949
|
+
| Assess action parsing | Past `/assess` comments | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| contains("assess:action=")) \| .body'` |
|
|
1950
|
+
| QA verdict parsing | Past `/qa` review comments | `gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate -q '.[] \| select(.body \| startswith("## /qa Review")) \| .body'` |
|
|
1951
|
+
| Issue body extraction | Real issue bodies | `gh issue view <N> --json body` for ≥5 representative issues |
|
|
1952
|
+
| AC checkbox detection | Issue/PR bodies with `- [ ] AC-N` | Sample ≥5 issues from current milestone |
|
|
1953
|
+
| Skill markdown | This repo's own `.claude/skills/**/*.md` | Local files (no API call needed) |
|
|
1954
|
+
| Generic markdown | Repo `*.md` fixtures, `docs/`, examples | Local files |
|
|
1955
|
+
|
|
1956
|
+
**Why `gh api` over `gh search issues`:** `gh search` relies on GitHub's full-text search index, which does not reliably cover HTML-comment markers (`<!-- SEQUANT_PHASE: spec -->`) buried in comment bodies. Query strings containing `:` (e.g. `assess:action`) are also parsed as search qualifiers and return empty. `gh api` returns raw JSON that local `jq` filters can match deterministically against the actual marker text.
|
|
1957
|
+
|
|
1958
|
+
If a pattern's corpus cannot be identified from surrounding prose, that itself is a finding — the skill author needs to document where the pattern is meant to match before it ships.
|
|
1959
|
+
|
|
1960
|
+
#### Step 3: Execute Pattern Against ≥5 Real Samples
|
|
1961
|
+
|
|
1962
|
+
For each detected pattern, sample at least 5 real instances from the identified corpus and run the pattern. Record actual matches vs. AC-claimed expected matches.
|
|
1963
|
+
|
|
1964
|
+
```bash
|
|
1965
|
+
# Example: verifying a new awk header regex against past /spec comments.
|
|
1966
|
+
# Fetch comment bodies via gh api (full-text search of HTML markers is unreliable).
|
|
1967
|
+
spec_bodies=$(gh api 'repos/{owner}/{repo}/issues/comments?per_page=100' --paginate \
|
|
1968
|
+
-q '.[] | select(.body | contains("SEQUANT_PHASE: spec")) | .body' || true)
|
|
1969
|
+
sample_count=0
|
|
1970
|
+
while IFS= read -r body; do
|
|
1971
|
+
[[ -z "$body" ]] && continue
|
|
1972
|
+
matches=$(echo "$body" | awk '/^### AC-[0-9]+/' | wc -l | xargs)
|
|
1973
|
+
expected_at_least=1 # AC says pattern should match ≥1 AC header per spec
|
|
1974
|
+
status="Passed"
|
|
1975
|
+
[[ "$matches" -lt "$expected_at_least" ]] && status="Failed"
|
|
1976
|
+
sample_count=$((sample_count + 1))
|
|
1977
|
+
echo "sample $sample_count: matched=$matches expected≥$expected_at_least → $status"
|
|
1978
|
+
done <<< "$spec_bodies"
|
|
1979
|
+
if [[ "$sample_count" -lt 5 ]]; then
|
|
1980
|
+
echo "⚠ Only $sample_count samples available — set detection_pattern_status='Insufficient Samples'"
|
|
1981
|
+
fi
|
|
1982
|
+
```
|
|
1983
|
+
|
|
1984
|
+
**Output table (REQUIRED):**
|
|
1985
|
+
|
|
1986
|
+
| Pattern | Corpus | Samples | Expected | Actual | Status |
|
|
1987
|
+
|---------|--------|---------|----------|--------|--------|
|
|
1988
|
+
| `awk '/^### AC-[0-9]+/'` | past `/spec` comments | 5 | ≥1 match per | 3/5 had 0 matches | ❌ Failed |
|
|
1989
|
+
| `jq 'select(contains("SEQUANT_PHASE"))'` | issue comments | 5 | exactly 1 spec comment | returned 5 | ❌ Failed |
|
|
1990
|
+
|
|
1991
|
+
#### Step 4: Motivating-Example Fixture Verification (REQUIRED)
|
|
1992
|
+
|
|
1993
|
+
Snippets quoted in the **issue body** as motivating examples or AC verification targets are **mandatory test fixtures**. The new pattern MUST produce the AC-claimed result on each. This is belt-and-suspenders to Step 3 — Step 3 catches general corpus drift; Step 4 catches the specific case the AC was written to address.
|
|
1994
|
+
|
|
1995
|
+
**What counts as a motivating-example fixture:**
|
|
1996
|
+
|
|
1997
|
+
- Blockquoted text (lines starting with `>`)
|
|
1998
|
+
- Fenced code blocks (` ``` `) under non-Setup/non-Install headings
|
|
1999
|
+
- Lines prefixed with `**Verify:**`, `**Verbatim:**`, `**Example:**`, `**AC verification:**`
|
|
2000
|
+
- Verbatim spec excerpts referenced in the AC text (e.g. "the issue body's verbatim motivating example used that exact prefix")
|
|
2001
|
+
|
|
2002
|
+
**What is excluded:**
|
|
2003
|
+
|
|
2004
|
+
- Code blocks under headings named `## Setup`, `## Install`, `## Prerequisites`, `## How to install` (these are environment commands, not fixtures)
|
|
2005
|
+
- Generic shell session transcripts unrelated to the pattern under test
|
|
2006
|
+
|
|
2007
|
+
**Extraction:**
|
|
2008
|
+
|
|
2009
|
+
Prefer the Phase 0c precheck output when available — fixture extraction is
|
|
2010
|
+
deterministic and #609 moved it out of the QA prompt:
|
|
2011
|
+
|
|
2012
|
+
```bash
|
|
2013
|
+
precheck=".sequant/gap-precheck.json"
|
|
2014
|
+
if [[ -f "$precheck" ]] && [[ "$(jq -r .schemaVersion "$precheck" 2>/dev/null)" == "1" ]]; then
|
|
2015
|
+
jq -r '.checks.fixtures.fixtures[] | "[\(.kind)\(if .label then ":" + .label else "" end) line \(.line)] \(.content)"' "$precheck"
|
|
2016
|
+
else
|
|
2017
|
+
# Inline fallback (pre-#609 behavior)
|
|
2018
|
+
issue_body=$(gh issue view <issue-number> --json body -q '.body')
|
|
2019
|
+
echo "$issue_body" | grep -E '^>' || true
|
|
2020
|
+
echo "$issue_body" | awk '
|
|
2021
|
+
/^## (Setup|Install|Prerequisites|How to install)/ { skip=1; next }
|
|
2022
|
+
/^## / && skip { skip=0 }
|
|
2023
|
+
/^```/ { in_block=!in_block; next }
|
|
2024
|
+
in_block && !skip { print }
|
|
2025
|
+
'
|
|
2026
|
+
echo "$issue_body" | grep -E '\*\*(Verify|Verbatim|Example|AC verification|Repro):\*\*' || true
|
|
2027
|
+
fi
|
|
2028
|
+
```
|
|
2029
|
+
|
|
2030
|
+
**Run the new pattern against each extracted fixture.** For every fixture the AC says should match, the pattern MUST produce a match. **A 0-match result on a verbatim AC fixture is automatically `Failed`.**
|
|
2031
|
+
|
|
2032
|
+
#### Step 5: Detection Pattern Verification Status
|
|
2033
|
+
|
|
2034
|
+
| Status | Meaning |
|
|
2035
|
+
|--------|---------|
|
|
2036
|
+
| **Passed** | All patterns matched their corpora at the AC-claimed rate (≥5 samples each); all motivating-example fixtures matched |
|
|
2037
|
+
| **Failed** | At least one pattern produced 0 matches against input the AC says should match (corpus or fixture) |
|
|
2038
|
+
| **Insufficient Samples** | Corpus exists but fewer than 5 representative instances were available; reviewer must record the actual count |
|
|
2039
|
+
| **Skipped** | Patterns identified but corpus fully unavailable (e.g. `gh` offline / unauth / rate-limited); document the specific reason |
|
|
2040
|
+
| **Not Required** | No pattern changes in skill markdown |
|
|
2041
|
+
|
|
2042
|
+
**CORPUS-UNAVAILABLE RULE:** If `gh` is unauthenticated, offline, or rate-limited, mark `Skipped` with the specific reason. Do **NOT** silently mark `Passed`.
|
|
2043
|
+
|
|
2044
|
+
**SPARSE-CORPUS RULE:** If corpus is reachable but fewer than 5 samples are found (and all matched), mark `Insufficient Samples` with the actual count. Do **NOT** silently mark `Passed` — AC-2 requires ≥5 samples.
|
|
2045
|
+
|
|
2046
|
+
#### Verdict Gating (STRICTER than Section 6a)
|
|
2047
|
+
|
|
2048
|
+
| Verification Status | Maximum Verdict |
|
|
2049
|
+
|---------------------|-----------------|
|
|
2050
|
+
| Passed | READY_FOR_MERGE |
|
|
2051
|
+
| Insufficient Samples | AC_MET_BUT_NOT_A_PLUS (sparse corpus — record actual count) |
|
|
2052
|
+
| Skipped | AC_MET_BUT_NOT_A_PLUS (note unverified corpus) |
|
|
2053
|
+
| **Failed** | **`AC_NOT_MET`** (silent detection failures are worse than wrong CLI flags — block merge) |
|
|
2054
|
+
| Not Required | READY_FOR_MERGE |
|
|
2055
|
+
|
|
2056
|
+
**Rationale for stricter gate vs Section 6a:** Section 6a catches CLI commands that error out at runtime (loud failures). Detection patterns silently match the wrong thing (quiet failures the pipeline reports as success). The 3 bugs in PR #547 all passed Section 6a-style review and only surfaced when patterns were piped through real input.
|
|
2057
|
+
|
|
2058
|
+
**Output Format:**
|
|
2059
|
+
|
|
2060
|
+
```markdown
|
|
2061
|
+
### Detection Pattern Verification
|
|
2062
|
+
|
|
2063
|
+
**Skill markdown files with pattern changes:** N
|
|
2064
|
+
|
|
2065
|
+
| Pattern | Corpus | Samples | Expected | Actual | Status |
|
|
2066
|
+
|---------|--------|---------|----------|--------|--------|
|
|
2067
|
+
| `awk '/^### AC-[0-9]+/'` | past `/spec` comments | 5 | ≥1 per | 3/5 = 0 | ❌ Failed |
|
|
2068
|
+
| `grep -E '\*\*Verify:\*\*'` | issue #551 body | 1 (verbatim fixture) | matched | matched | ✅ Passed |
|
|
2069
|
+
|
|
2070
|
+
**Motivating-example fixtures from issue body:** M (all run against the new pattern)
|
|
2071
|
+
|
|
2072
|
+
**Verification Status:** Failed (1 pattern misses ~45% of corpus)
|
|
2073
|
+
```
|
|
2074
|
+
|
|
2075
|
+
---
|
|
2076
|
+
|
|
2077
|
+
### 6d. Adversarial Re-Read (REQUIRED for Standard QA before READY_FOR_MERGE)
|
|
2078
|
+
|
|
2079
|
+
**Purpose:** Catch what the structured pipeline doesn't gate on. Operationalizes `feedback_qa_second_look.md` — structured QA biases positive on clean code; an adversarial re-read of core logic surfaces real gaps.
|
|
2080
|
+
|
|
2081
|
+
**When to apply:** Required for non-Simple-Fix verdicts before issuing `READY_FOR_MERGE`. Omitted entirely for Simple Fix mode (`SMALL_DIFF=true`).
|
|
2082
|
+
|
|
2083
|
+
**How to perform:** Before declaring READY_FOR_MERGE, walk through the diff once more adversarially and surface anything the structured pipeline didn't gate on. In particular: (1) run the implementation against every verbatim motivating-example fixture from the issue body — Phase 0c precheck surfaces these in `.checks.fixtures.fixtures`; if precheck unavailable, extract inline per `feedback_motivating_example_regression.md`; (2) flag any "evidence" claim that is actually a pre-fix bug repro rather than a post-fix validation; (3) inspect process state the pipeline normalizes away (uncommitted work, divergent branches, stashed changes, orchestrator state); (4) cite sibling sites explicitly — §5 (cross-file) and §4 Q5 (intra-file); do not hand-wave with "N/A"; (5) surface any Non-Goals from the issue body that have silently expanded into scope. A bare "No gaps" without specific reasoning fails output verification — name what you scanned, ran, or traced.
|
|
2084
|
+
|
|
2085
|
+
**Status outcomes:** **Clean** = walked the 5 checks above, surfaced no gaps. **Gaps Found** = surfaced gaps that map to recommendations or follow-up issues but no missing AC fixture. **Severe Gap** = surfaced (a) a verbatim motivating-example fixture not run, OR (b) an evidence claim that's actually a bug repro not a validation, OR (c) an AC marked MET on code review alone without the runtime / corpus check the AC's text required.
|
|
2086
|
+
|
|
2087
|
+
**Verdict gating:**
|
|
2088
|
+
|
|
2089
|
+
| Status | Maximum Verdict |
|
|
2090
|
+
|--------|-----------------|
|
|
2091
|
+
| Clean | READY_FOR_MERGE |
|
|
2092
|
+
| Gaps Found | AC_MET_BUT_NOT_A_PLUS |
|
|
2093
|
+
| Severe Gap | AC_NOT_MET |
|
|
2094
|
+
|
|
2095
|
+
**Output Format:**
|
|
2096
|
+
|
|
2097
|
+
```markdown
|
|
2098
|
+
### Adversarial Re-Read
|
|
2099
|
+
|
|
2100
|
+
**Findings:** [Concrete enumeration of gaps surfaced, OR "No gaps found because: <specific reason citing what was scanned/run/traced — fixtures consulted, evidence claims audited, process state inspected, sibling sites cited, Non-Goals checked>"]
|
|
2101
|
+
|
|
2102
|
+
**Status:** Clean / Gaps Found / Severe Gap
|
|
2103
|
+
```
|
|
2104
|
+
|
|
2105
|
+
**Origin:** This section was promoted to a structured 5-sub-prompt table in #582. #608's signal-to-noise study found 9/14 emits surfaced findings but 0 were actioned — the structure produced visibility without action. #609 trims back to a single-paragraph prompt while preserving the safety net and verdict gating.
|
|
2106
|
+
|
|
2107
|
+
### 6e. Behavior-Rule Survival Check (REQUIRED for behavior-rule ACs)
|
|
2108
|
+
|
|
2109
|
+
**Purpose:** When an AC asserts a behavior rule (e.g. "default becomes X", "always include Y", "never skip Z"), verify the OLD-rule's implementation has been removed from every touchpoint inside the diff blast radius. Catches the #533-class miss where the SKILL.md was updated but the runtime CLI's short-circuit (`BUG_LABELS`/`DOCS_LABELS`) survived. See [behavior-rule-detection.md](../_shared/references/behavior-rule-detection.md).
|
|
2110
|
+
|
|
2111
|
+
**When to apply:** Run on every AC for which the behavior-rule heuristic triggers (>= 2 distinct keywords from `default | always | never | rule | behavior | skip` OR explicit pattern like `always X unless Y`). Skip entirely when no AC triggers across the issue (cheap short-circuit per the reference doc's Performance budget).
|
|
2112
|
+
|
|
2113
|
+
**How to perform:**
|
|
2114
|
+
|
|
2115
|
+
```bash
|
|
2116
|
+
# Per-AC survival check. Run once per behavior-rule AC.
|
|
2117
|
+
QA_AC_ID="AC-1" \
|
|
2118
|
+
QA_AC_TEXT="<verbatim AC description>" \
|
|
2119
|
+
QA_DIFF_PATHS="$(git diff main...HEAD --name-only | tr '\n' '|')" \
|
|
2120
|
+
npx tsx -e '
|
|
2121
|
+
(async () => {
|
|
2122
|
+
const m = await import("./src/lib/heuristics/behavior-rule-detector.ts");
|
|
2123
|
+
const ac = {
|
|
2124
|
+
id: process.env.QA_AC_ID,
|
|
2125
|
+
description: process.env.QA_AC_TEXT,
|
|
2126
|
+
verificationMethod: "manual",
|
|
2127
|
+
status: "pending",
|
|
2128
|
+
};
|
|
2129
|
+
const detection = m.detectBehaviorRule(ac);
|
|
2130
|
+
if (!detection.triggered) { console.log(JSON.stringify({ triggered: false })); return; }
|
|
2131
|
+
const diffPaths = (process.env.QA_DIFF_PATHS || "").split("|").filter(Boolean);
|
|
2132
|
+
const survivors = m.findSurvivingInverseSymbols(ac, process.cwd(), diffPaths);
|
|
2133
|
+
console.log(JSON.stringify({ triggered: true, survivors }));
|
|
2134
|
+
})();
|
|
2135
|
+
'
|
|
2136
|
+
```
|
|
2137
|
+
|
|
2138
|
+
**Status outcomes:**
|
|
2139
|
+
|
|
2140
|
+
| Status | Criteria |
|
|
2141
|
+
|--------|----------|
|
|
2142
|
+
| **Clean** | Detector triggered, `survivors` is empty across diff blast radius |
|
|
2143
|
+
| **Survivors Found** | One or more inverse symbols / inverse-keyword lines survive in the diff blast radius |
|
|
2144
|
+
| **N/A** | No AC triggers the behavior-rule detector (skip section entirely) |
|
|
2145
|
+
|
|
2146
|
+
**AC marking and verdict gating:**
|
|
2147
|
+
|
|
2148
|
+
- Survival inside the diff blast radius -> the corresponding AC is marked `NOT_MET` with the `path:line` list in the AC explanation (per AC-2 of #552).
|
|
2149
|
+
- Survivors -> verdict floors at `AC_NOT_MET` via the §7 algorithm's `behavior_rule_survival_status` gate.
|
|
2150
|
+
|
|
2151
|
+
**Output Format:**
|
|
2152
|
+
|
|
2153
|
+
```markdown
|
|
2154
|
+
### Behavior-Rule Survival Check
|
|
2155
|
+
|
|
2156
|
+
| AC | Triggered? | Survivors | Status |
|
|
2157
|
+
|----|-----------|-----------|--------|
|
|
2158
|
+
| AC-N | Yes | path/to/file.ts:LINE — `<snippet>` | Survivors Found |
|
|
2159
|
+
| AC-M | No | — | N/A |
|
|
2160
|
+
|
|
2161
|
+
**Status:** Clean / Survivors Found / N/A
|
|
2162
|
+
```
|
|
2163
|
+
|
|
2164
|
+
---
|
|
2165
|
+
|
|
2166
|
+
|
|
1626
2167
|
### 7. A+ Status Verdict
|
|
1627
2168
|
|
|
1628
2169
|
Provide an overall verdict:
|
|
@@ -1649,6 +2190,11 @@ Provide an overall verdict:
|
|
|
1649
2190
|
- execution_evidence = status from Section 6 (Complete/Incomplete/Waived/Not Required)
|
|
1650
2191
|
- quality_plan_status = status from Phase 0b (Complete/Partial/Not Addressed/N/A)
|
|
1651
2192
|
- smoke_test_status = status from Section 6b (Complete/Partial/Not Required)
|
|
2193
|
+
- detection_pattern_status = status from Section 6c (Passed/Failed/Insufficient Samples/Skipped/Not Required)
|
|
2194
|
+
- adversarial_reread_status = status from Section 6d (Clean/Gaps Found/Severe Gap) — REQUIRED for Standard QA, omitted for Simple Fix
|
|
2195
|
+
- behavior_rule_survival_status = status from Section 6e (Clean/Survivors Found/N/A) — REQUIRED when any AC triggers the behavior-rule heuristic, omitted otherwise
|
|
2196
|
+
- changelog_required = true IFF Section 10a's `CHANGELOG.md` exists AND Section 10a's `user_facing` count is >0 (single source of truth — see §10a for the conventional-commit detection regex, which accepts unscoped, scoped, and breaking variants of `feat`/`fix`/`perf`/`refactor`/`docs`); false otherwise
|
|
2197
|
+
- changelog_missing = true IFF `changelog_required` AND Section 10a's `[Unreleased]` entry check finds no entry for the issue/PR; false otherwise
|
|
1652
2198
|
|
|
1653
2199
|
3. Browser testing enforcement check:
|
|
1654
2200
|
- Check if any .tsx files were changed: git diff main...HEAD --name-only | grep '\.tsx$' || true
|
|
@@ -1657,13 +2203,30 @@ Provide an overall verdict:
|
|
|
1657
2203
|
- IF .tsx files changed AND /test did NOT run AND no 'no-browser-test' label:
|
|
1658
2204
|
→ Set browser_test_missing = true
|
|
1659
2205
|
|
|
2206
|
+
3a. Manual test AC enforcement check:
|
|
2207
|
+
- Scan spec plan comment for ACs with **Verification:** Manual Test (or freeform: try X confirm Y, verify by, test that)
|
|
2208
|
+
- For each detected manual-test AC:
|
|
2209
|
+
- IF runtime test was executed → AC status from test result (MET/NOT_MET)
|
|
2210
|
+
- IF approved override documented → AC status = MET
|
|
2211
|
+
- ELSE → AC status = PENDING (this increments pending_count)
|
|
2212
|
+
- NOTE: No new verdict branch needed — PENDING manual-test ACs flow through
|
|
2213
|
+
the existing pending_count > 0 → NEEDS_VERIFICATION path in step 4
|
|
2214
|
+
|
|
1660
2215
|
4. Determine verdict (in order):
|
|
1661
2216
|
- IF not_met_count > 0 OR partial_count > 0:
|
|
1662
2217
|
→ AC_NOT_MET (block merge)
|
|
2218
|
+
- ELSE IF detection_pattern_status == "Failed":
|
|
2219
|
+
→ AC_NOT_MET (silent detection failures - block merge; STRICTER than skill_verification because pattern bugs report success but match the wrong corpus)
|
|
2220
|
+
- ELSE IF behavior_rule_survival_status == "Survivors Found":
|
|
2221
|
+
→ AC_NOT_MET (OLD-rule symbol survived inside the diff blast radius — see #533 motivating miss in Section 6e and references/behavior-rule-detection.md)
|
|
2222
|
+
- ELSE IF adversarial_reread_status == "Severe Gap":
|
|
2223
|
+
→ AC_NOT_MET (verbatim motivating-example fixture not run / evidence claim is bug reproduction not validation / AC marked MET without runtime or corpus check the AC text required)
|
|
1663
2224
|
- ELSE IF skill_verification == "Failed":
|
|
1664
2225
|
→ AC_MET_BUT_NOT_A_PLUS (skill commands have issues - cannot be READY_FOR_MERGE)
|
|
1665
2226
|
- ELSE IF execution_evidence == "Incomplete":
|
|
1666
2227
|
→ AC_MET_BUT_NOT_A_PLUS (scripts not verified - cannot be READY_FOR_MERGE)
|
|
2228
|
+
- ELSE IF changelog_required AND changelog_missing:
|
|
2229
|
+
→ AC_MET_BUT_NOT_A_PLUS (CHANGELOG entry required for user-facing changes - see Section 10a for remediation)
|
|
1667
2230
|
- ELSE IF quality_plan_status == "Not Addressed" AND quality_plan_exists:
|
|
1668
2231
|
→ AC_MET_BUT_NOT_A_PLUS (quality dimensions not addressed - flag for review)
|
|
1669
2232
|
- ELSE IF browser_test_missing (from step 3):
|
|
@@ -1676,6 +2239,12 @@ Provide an overall verdict:
|
|
|
1676
2239
|
→ AC_MET_BUT_NOT_A_PLUS (some quality dimensions incomplete - can merge with notes)
|
|
1677
2240
|
- ELSE IF smoke_test_status == "Partial":
|
|
1678
2241
|
→ AC_MET_BUT_NOT_A_PLUS (smoke tests incomplete - document gaps before merge)
|
|
2242
|
+
- ELSE IF detection_pattern_status == "Insufficient Samples":
|
|
2243
|
+
→ AC_MET_BUT_NOT_A_PLUS (sparse corpus - record actual sample count)
|
|
2244
|
+
- ELSE IF detection_pattern_status == "Skipped":
|
|
2245
|
+
→ AC_MET_BUT_NOT_A_PLUS (corpus unavailable for pattern verification - document reason)
|
|
2246
|
+
- ELSE IF adversarial_reread_status == "Gaps Found":
|
|
2247
|
+
→ AC_MET_BUT_NOT_A_PLUS (adversarial re-read surfaced non-blocking gaps; address as follow-up or improvement suggestions)
|
|
1679
2248
|
- ELSE IF improvement_suggestions.length > 0:
|
|
1680
2249
|
→ AC_MET_BUT_NOT_A_PLUS (can merge with notes)
|
|
1681
2250
|
- ELSE:
|
|
@@ -1711,10 +2280,111 @@ fi
|
|
|
1711
2280
|
| `.tsx` changed + no `/test` + no opt-out | Force `AC_MET_BUT_NOT_A_PLUS` |
|
|
1712
2281
|
| No `.tsx` changed | Normal verdict |
|
|
1713
2282
|
|
|
2283
|
+
**Manual Test AC Enforcement:**
|
|
2284
|
+
|
|
2285
|
+
Before finalizing the verdict, check if any ACs require manual (runtime) verification that was specified in the `/spec` plan:
|
|
2286
|
+
|
|
2287
|
+
```bash
|
|
2288
|
+
# 1. Extract spec plan comment from issue
|
|
2289
|
+
spec_comment=$(gh issue view <issue-number> --json comments --jq \
|
|
2290
|
+
'[.comments[].body | select(contains("\"phase\":\"spec\""))] | last' || true)
|
|
2291
|
+
|
|
2292
|
+
# 2. Detect ACs with manual-test verification methods
|
|
2293
|
+
# Matches: "**Verification:** Manual Test", "**Verify:** ...", "try X, confirm Y", "verify by", "test that"
|
|
2294
|
+
manual_test_acs=$(echo "$spec_comment" | \
|
|
2295
|
+
grep -iE '(\*\*Verification:\*\*\s*Manual Test|\*\*Verify:\*\*\s*|try .*, confirm|verify by|test that|verify:?\s*manual)' || true)
|
|
2296
|
+
|
|
2297
|
+
# 3. Extract AC IDs associated with manual-test lines
|
|
2298
|
+
# Scan backwards from each match to find the nearest ### AC-N header
|
|
2299
|
+
manual_ac_ids=$(echo "$spec_comment" | \
|
|
2300
|
+
awk 'BEGIN{IGNORECASE=1} /^(#+ AC-[0-9]+|\*\*AC-[0-9]+)/{ac=$0} /Manual Test|\*\*Verify:\*\*|try .*, confirm|verify by|test that/{print ac}' | \
|
|
2301
|
+
grep -oE 'AC-[0-9]+' | sort -u || true)
|
|
2302
|
+
```
|
|
2303
|
+
|
|
2304
|
+
**If manual-test ACs are detected**, include this section in QA output:
|
|
2305
|
+
|
|
2306
|
+
```markdown
|
|
2307
|
+
### Manual Test ACs Detected
|
|
2308
|
+
|
|
2309
|
+
| AC | Verification Method | Runtime Test Status |
|
|
2310
|
+
|----|--------------------|--------------------|
|
|
2311
|
+
| AC-N | Manual Test | ✅ Executed / ⚠️ PENDING / 🔄 Overridden |
|
|
2312
|
+
```
|
|
2313
|
+
|
|
2314
|
+
**Enforcement Rules:**
|
|
2315
|
+
|
|
2316
|
+
For each detected manual-test AC, QA must do ONE of:
|
|
2317
|
+
|
|
2318
|
+
1. **Execute the test** using available tools (chrome-devtools MCP, dev server, CLI invocation) and record pass/fail evidence → mark AC `MET` or `NOT_MET` based on result
|
|
2319
|
+
2. **Mark AC `PENDING`** with note: `⚠️ Manual verification required — runtime test not executed` → flows through `pending_count > 0 → NEEDS_VERIFICATION` verdict path
|
|
2320
|
+
3. **Override** with approved justification (see Manual Test Override below) → mark AC `MET`
|
|
2321
|
+
|
|
2322
|
+
**Key Rule:** A manual-test AC CANNOT be marked `MET` from static code review alone. QA must either execute the runtime test, provide an approved override, or mark `PENDING`.
|
|
2323
|
+
|
|
2324
|
+
| Scenario | AC Status | Verdict Impact |
|
|
2325
|
+
|----------|-----------|----------------|
|
|
2326
|
+
| Runtime test executed and passed | `MET` | Normal verdict |
|
|
2327
|
+
| Runtime test executed and failed | `NOT_MET` | → `AC_NOT_MET` |
|
|
2328
|
+
| Runtime test not executed, no override | `PENDING` | → `NEEDS_VERIFICATION` |
|
|
2329
|
+
| Override with approved justification | `MET` | Normal verdict |
|
|
2330
|
+
| Override with unapproved justification | `PENDING` | → `NEEDS_VERIFICATION` |
|
|
2331
|
+
|
|
2332
|
+
### Manual Test Override
|
|
2333
|
+
|
|
2334
|
+
In some cases, runtime verification can be safely skipped for manual-test ACs when the verification target has no runtime surface or is covered by equivalent automated tests. **Overrides require explicit justification and risk assessment.**
|
|
2335
|
+
|
|
2336
|
+
**Override Format (REQUIRED when skipping manual-test execution):**
|
|
2337
|
+
|
|
2338
|
+
```markdown
|
|
2339
|
+
### Manual Test Override
|
|
2340
|
+
|
|
2341
|
+
**AC:** AC-N
|
|
2342
|
+
**Requirement:** Runtime verification for manual-test AC
|
|
2343
|
+
**Override:** Yes
|
|
2344
|
+
**Justification:** [One of the approved categories below]
|
|
2345
|
+
**Risk Assessment:** [None/Low]
|
|
2346
|
+
```
|
|
2347
|
+
|
|
2348
|
+
**Approved Override Categories:**
|
|
2349
|
+
|
|
2350
|
+
| Category | Example | Risk |
|
|
2351
|
+
|----------|---------|------|
|
|
2352
|
+
| No runtime surface | Pure type definitions, config schema validation | None |
|
|
2353
|
+
| Equivalent unit test coverage | Automated test covers the exact same code path the manual test would exercise | Low |
|
|
2354
|
+
| Tested in sibling issue | Cross-reference to another issue where the same runtime behavior was verified | Low |
|
|
2355
|
+
|
|
2356
|
+
**NOT Approved for Override (always require runtime test):**
|
|
2357
|
+
|
|
2358
|
+
| Category | Example | Why |
|
|
2359
|
+
|----------|---------|-----|
|
|
2360
|
+
| Logic changes with UI surface | Modified form validation, new user flows | Runtime behavior may diverge from code review expectations |
|
|
2361
|
+
| New user-facing features | Added pages, new interactions | Must verify actual user experience |
|
|
2362
|
+
| Integration points | API calls, database writes, auth flows | Runtime dependencies may behave differently |
|
|
2363
|
+
| Error handling with user feedback | Toast messages, error pages, redirects | Presentation layer needs runtime check |
|
|
2364
|
+
|
|
2365
|
+
**Risk Assessment Definitions:**
|
|
2366
|
+
|
|
2367
|
+
| Level | Meaning | Criteria |
|
|
2368
|
+
|-------|---------|----------|
|
|
2369
|
+
| **None** | Zero runtime impact | Change has no executable runtime surface (types, config) |
|
|
2370
|
+
| **Low** | Negligible runtime impact | Automated tests cover the same path; manual test would be redundant |
|
|
2371
|
+
| **Medium** | Possible runtime impact | **Should NOT be overridden** — run the manual test |
|
|
2372
|
+
|
|
2373
|
+
**Override Decision Flow:**
|
|
2374
|
+
|
|
2375
|
+
1. Check if change matches an approved category → If no, runtime test is required
|
|
2376
|
+
2. Assess risk level → If Medium or higher, runtime test is required
|
|
2377
|
+
3. Document override using the format above in the QA output
|
|
2378
|
+
4. Include override in the GitHub issue comment for audit trail
|
|
2379
|
+
|
|
2380
|
+
**CRITICAL:** When in doubt, execute the manual test. Overrides are for clear-cut cases only. The motivation for this gate (issue #529) was a real bug that passed QA because `minRows: 1` appeared correct in code review but did not work at runtime.
|
|
2381
|
+
|
|
1714
2382
|
**CRITICAL:** `PARTIALLY_MET` is NOT sufficient for merge. It MUST be treated as `NOT_MET` for verdict purposes.
|
|
1715
2383
|
|
|
1716
2384
|
**CRITICAL:** If skill command verification = "Failed", verdict CANNOT be `READY_FOR_MERGE`. This prevents shipping skills with broken commands (like issue #178's `conclusion` field).
|
|
1717
2385
|
|
|
2386
|
+
**CRITICAL:** If detection pattern verification = "Failed" (Section 6c), verdict MUST be `AC_NOT_MET` — stricter than the skill_verification gate. Silent detection failures (regex/grep/awk/jq matching the wrong corpus while reporting success) are worse than wrong CLI flags because the pipeline reports success.
|
|
2387
|
+
|
|
1718
2388
|
See [quality-gates.md](references/quality-gates.md) for detailed verdict criteria.
|
|
1719
2389
|
|
|
1720
2390
|
---
|
|
@@ -1779,6 +2449,10 @@ If verdict is `READY_FOR_MERGE` or `AC_MET_BUT_NOT_A_PLUS`:
|
|
|
1779
2449
|
|
|
1780
2450
|
**Purpose:** Verify user-facing changes have corresponding CHANGELOG entries before `READY_FOR_MERGE`.
|
|
1781
2451
|
|
|
2452
|
+
**Wired into §7 verdict algorithm:** This gate is enforced via the `changelog_required AND changelog_missing` branch in §7 — when both conditions are true, the verdict is demoted from `READY_FOR_MERGE` to `AC_MET_BUT_NOT_A_PLUS`. The branch is no-op when `CHANGELOG.md` is absent or no user-facing commit prefix is detected.
|
|
2453
|
+
|
|
2454
|
+
**Caveat — conventional-commit dependency:** Detection requires conventional-commit prefixes — `feat`, `fix`, `perf`, `refactor`, `docs`, with optional scope (`(...)`) and breaking marker (`!`) — in `git log main..HEAD`. Projects whose commits don't follow this pattern silently skip this gate (failsafe-off). Acceptable for sequant's typical user base; document in your project's contributing guide if you rely on this gate.
|
|
2455
|
+
|
|
1782
2456
|
**Detection:**
|
|
1783
2457
|
|
|
1784
2458
|
```bash
|
|
@@ -1793,7 +2467,7 @@ unreleased_entries=$(sed -n '/^## \[Unreleased\]/,/^## \[/p' CHANGELOG.md | grep
|
|
|
1793
2467
|
|
|
1794
2468
|
# Determine if change is user-facing (new features, bug fixes, etc.)
|
|
1795
2469
|
# Look at commit messages or file changes
|
|
1796
|
-
user_facing=$(git log main..HEAD --oneline | grep -iE '^[a-f0-9]+ (feat|fix|perf|refactor|docs)
|
|
2470
|
+
user_facing=$(git log main..HEAD --oneline | grep -iE '^[a-f0-9]+ (feat|fix|perf|refactor|docs)(\([^)]*\))?!?:' | wc -l | xargs || true)
|
|
1797
2471
|
```
|
|
1798
2472
|
|
|
1799
2473
|
**Verification Logic:**
|
|
@@ -1955,8 +2629,10 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
|
|
|
1955
2629
|
- Smoke Test
|
|
1956
2630
|
- CLI Registration Verification
|
|
1957
2631
|
- Skill Command Verification
|
|
2632
|
+
- Detection Pattern Verification
|
|
1958
2633
|
- Script Verification Override
|
|
1959
2634
|
- Skill Change Review
|
|
2635
|
+
- Adversarial Re-Read
|
|
1960
2636
|
|
|
1961
2637
|
**Required sections for simple fix mode:**
|
|
1962
2638
|
|
|
@@ -1970,6 +2646,7 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
|
|
|
1970
2646
|
- [ ] **Verdict** - One of: READY_FOR_MERGE, AC_MET_BUT_NOT_A_PLUS, NEEDS_VERIFICATION, AC_NOT_MET
|
|
1971
2647
|
- [ ] **Documentation Check** - README/docs updated if feature adds new functionality
|
|
1972
2648
|
- [ ] **Next Steps** - Clear, actionable recommendations
|
|
2649
|
+
- [ ] Adversarial re-read of core logic — list anything the structured pipeline didn't surface
|
|
1973
2650
|
|
|
1974
2651
|
### Standard QA (Implementation Exists, `SMALL_DIFF=false`)
|
|
1975
2652
|
|
|
@@ -1989,9 +2666,12 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
|
|
|
1989
2666
|
- [ ] **Execution Evidence** - Included if scripts/CLI modified (or marked N/A)
|
|
1990
2667
|
- [ ] **Script Verification Override** - Included if scripts/CLI modified AND /verify was skipped (with justification and risk assessment)
|
|
1991
2668
|
- [ ] **Skill Command Verification** - Included if `.claude/skills/**/*.md` modified (or marked N/A)
|
|
2669
|
+
- [ ] **Detection Pattern Verification** - Included if skill markdown adds new `grep`/`awk`/`jq`/`sed`/regex (or marked N/A)
|
|
1992
2670
|
- [ ] **Skill Change Review** - Skill-specific verification prompts included if skills changed
|
|
1993
2671
|
- [ ] **Smoke Test** - Included if workflow-affecting changes (skills, scripts, CLI), or marked "Not Required"
|
|
2672
|
+
- [ ] **Manual Test AC Enforcement** - Included if spec plan has Manual Test ACs (or marked N/A if no manual-test ACs detected)
|
|
1994
2673
|
- [ ] **CHANGELOG Verification** - User-facing changes have `[Unreleased]` entry (or marked N/A)
|
|
2674
|
+
- [ ] **Adversarial Re-Read** - Required structured section: all 5 sub-prompts answered with concrete content; "Findings:" and "Status:" lines populated; bare "No gaps" without specific reasoning fails verification (see Section 6d)
|
|
1995
2675
|
- [ ] **Documentation Check** - README/docs updated if feature adds new functionality
|
|
1996
2676
|
- [ ] **Next Steps** - Clear, actionable recommendations
|
|
1997
2677
|
|
|
@@ -2082,6 +2762,8 @@ When the size gate triggers simple fix mode, use this shorter template:
|
|
|
2082
2762
|
|
|
2083
2763
|
- **Likely failure mode:** [How would this break in production?]
|
|
2084
2764
|
- **Not tested:** [What gaps exist in test coverage?]
|
|
2765
|
+
- **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
|
|
2766
|
+
- **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
|
|
2085
2767
|
|
|
2086
2768
|
---
|
|
2087
2769
|
|
|
@@ -2142,6 +2824,20 @@ You MUST include these sections:
|
|
|
2142
2824
|
|
|
2143
2825
|
---
|
|
2144
2826
|
|
|
2827
|
+
### Precheck Findings
|
|
2828
|
+
|
|
2829
|
+
[Include if `.sequant/gap-precheck.json` (schemaVersion 1) is present. Otherwise emit one line: "Precheck Findings: unavailable — inline fallback used." and omit the table.]
|
|
2830
|
+
|
|
2831
|
+
| Section | Status | Surfaced |
|
|
2832
|
+
|---------|--------|----------|
|
|
2833
|
+
| Fixtures | pass / not_applicable / fail | [N fixtures, consumed by §6c Step 4 / §6d Q1] |
|
|
2834
|
+
| Sibling-grep | pass / not_applicable / fail | [N identifiers, candidate sites surfaced to §5] |
|
|
2835
|
+
| AC literal-diff | pass / not_applicable / fail | [IDs missing from PR body, if any] |
|
|
2836
|
+
|
|
2837
|
+
**Source:** `.sequant/gap-precheck.json` (schemaVersion 1)
|
|
2838
|
+
|
|
2839
|
+
---
|
|
2840
|
+
|
|
2145
2841
|
### CI Status
|
|
2146
2842
|
|
|
2147
2843
|
[Include if PR exists, otherwise: "No PR exists yet" or "No CI configured"]
|
|
@@ -2324,6 +3020,24 @@ You MUST include these sections:
|
|
|
2324
3020
|
|
|
2325
3021
|
---
|
|
2326
3022
|
|
|
3023
|
+
### Detection Pattern Verification
|
|
3024
|
+
|
|
3025
|
+
[Include if skill markdown adds/modifies `grep`/`awk`/`jq`/`sed` commands or regex literals, otherwise: "N/A - No pattern changes in skill markdown"]
|
|
3026
|
+
|
|
3027
|
+
**Skill markdown files with pattern changes:** X
|
|
3028
|
+
|
|
3029
|
+
| Pattern | Corpus | Samples | Expected | Actual | Status |
|
|
3030
|
+
|---------|--------|---------|----------|--------|--------|
|
|
3031
|
+
| `[pattern]` | [corpus source] | [N samples] | [AC-claimed result] | [observed] | ✅ Passed / ❌ Failed |
|
|
3032
|
+
|
|
3033
|
+
**Motivating-example fixtures from issue body:** Y (run against the new pattern)
|
|
3034
|
+
|
|
3035
|
+
**Verification Status:** Passed / Failed / Insufficient Samples / Skipped / Not Required
|
|
3036
|
+
|
|
3037
|
+
**Verdict impact (stricter than 6a):** Failed → `AC_NOT_MET` (silent detection failures block merge)
|
|
3038
|
+
|
|
3039
|
+
---
|
|
3040
|
+
|
|
2327
3041
|
### CLI Registration Verification
|
|
2328
3042
|
|
|
2329
3043
|
[Include if option interfaces or CLI file modified, otherwise: "N/A - No option interface changes"]
|
|
@@ -2366,10 +3080,34 @@ You MUST include these sections:
|
|
|
2366
3080
|
|
|
2367
3081
|
---
|
|
2368
3082
|
|
|
3083
|
+
### Manual Test ACs
|
|
3084
|
+
|
|
3085
|
+
[Include if spec plan has ACs with **Verification:** Manual Test, otherwise: "N/A - No manual-test ACs detected"]
|
|
3086
|
+
|
|
3087
|
+
| AC | Verification Method | Runtime Test Status | Evidence |
|
|
3088
|
+
|----|--------------------|--------------------|----------|
|
|
3089
|
+
| AC-N | Manual Test | ✅ Executed / ⚠️ PENDING / 🔄 Overridden | [result or override justification] |
|
|
3090
|
+
|
|
3091
|
+
**Manual Test Enforcement:** X/Y manual-test ACs verified at runtime
|
|
3092
|
+
|
|
3093
|
+
[If any overrides applied, include Manual Test Override block per Section 7]
|
|
3094
|
+
|
|
3095
|
+
---
|
|
3096
|
+
|
|
2369
3097
|
### Risk Assessment
|
|
2370
3098
|
|
|
2371
3099
|
- **Likely failure mode:** [How would this break in production? Be specific.]
|
|
2372
3100
|
- **Not tested:** [What gaps exist in test coverage for these changes?]
|
|
3101
|
+
- **Sibling sites considered:** [List sibling code in other files in the codebase with the same root cause, or "none — no cross-file siblings" / "N/A — cross-file sibling-site scan does not apply"]
|
|
3102
|
+
- **Sibling-line audit:** [Adjacent call sites in the same file/function audited with the same root-cause pattern, OR "none — single-call-site fix"]
|
|
3103
|
+
|
|
3104
|
+
---
|
|
3105
|
+
|
|
3106
|
+
### Adversarial Re-Read
|
|
3107
|
+
|
|
3108
|
+
**Findings:** [Concrete enumeration of gaps surfaced, OR "No gaps found because: <specific reason citing what was scanned/run/traced — fixtures consulted, evidence claims audited, process state inspected, sibling sites cited, Non-Goals checked>"]
|
|
3109
|
+
|
|
3110
|
+
**Status:** Clean / Gaps Found / Severe Gap
|
|
2373
3111
|
|
|
2374
3112
|
---
|
|
2375
3113
|
|