sequant 2.1.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/.claude-plugin/marketplace.json +1 -1
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/dist/bin/cli.js +1 -0
  4. package/dist/src/commands/init.d.ts +1 -0
  5. package/dist/src/commands/init.js +122 -3
  6. package/dist/src/commands/run-compat.d.ts +14 -0
  7. package/dist/src/commands/run-compat.js +12 -0
  8. package/dist/src/commands/run-display.d.ts +17 -0
  9. package/dist/src/commands/run-display.js +116 -0
  10. package/dist/src/commands/run.d.ts +4 -26
  11. package/dist/src/commands/run.js +47 -772
  12. package/dist/src/commands/status.js +24 -1
  13. package/dist/src/index.d.ts +11 -0
  14. package/dist/src/index.js +9 -0
  15. package/dist/src/lib/errors.d.ts +93 -0
  16. package/dist/src/lib/errors.js +97 -0
  17. package/dist/src/lib/settings.d.ts +236 -0
  18. package/dist/src/lib/settings.js +482 -37
  19. package/dist/src/lib/skill-version.d.ts +19 -0
  20. package/dist/src/lib/skill-version.js +68 -0
  21. package/dist/src/lib/templates.d.ts +1 -0
  22. package/dist/src/lib/templates.js +1 -1
  23. package/dist/src/lib/workflow/batch-executor.js +13 -5
  24. package/dist/src/lib/workflow/config-resolver.d.ts +50 -0
  25. package/dist/src/lib/workflow/config-resolver.js +167 -0
  26. package/dist/src/lib/workflow/error-classifier.d.ts +17 -7
  27. package/dist/src/lib/workflow/error-classifier.js +113 -15
  28. package/dist/src/lib/workflow/phase-executor.d.ts +31 -0
  29. package/dist/src/lib/workflow/phase-executor.js +143 -48
  30. package/dist/src/lib/workflow/run-log-schema.d.ts +12 -0
  31. package/dist/src/lib/workflow/run-log-schema.js +7 -1
  32. package/dist/src/lib/workflow/run-orchestrator.d.ts +161 -0
  33. package/dist/src/lib/workflow/run-orchestrator.js +510 -0
  34. package/dist/src/lib/workflow/worktree-manager.d.ts +4 -3
  35. package/dist/src/lib/workflow/worktree-manager.js +61 -11
  36. package/package.json +1 -1
  37. package/templates/skills/assess/SKILL.md +239 -77
  38. package/templates/skills/exec/SKILL.md +7 -68
  39. package/templates/skills/fullsolve/SKILL.md +303 -137
  40. package/templates/skills/qa/SKILL.md +42 -46
  41. package/templates/skills/qa/scripts/quality-checks.sh +47 -1
  42. package/templates/skills/spec/SKILL.md +183 -982
  43. package/templates/skills/spec/references/quality-checklist.md +75 -0
  44. package/templates/skills/test/SKILL.md +0 -27
  45. package/templates/skills/testgen/SKILL.md +0 -27
@@ -122,10 +122,23 @@ Include this marker in every `gh issue comment` that represents QA completion.
122
122
  Invocation:
123
123
 
124
124
  - `/qa 123`: Treat `123` as the GitHub issue/PR identifier in context.
125
+ - `/qa 123 172`: Treat both as issue numbers — process each sequentially.
125
126
  - `/qa <freeform description>`: Treat the text as context about the change to review.
126
127
  - `/qa 123 --parallel`: Force parallel agent execution (faster, higher token usage).
127
128
  - `/qa 123 --sequential`: Force sequential agent execution (slower, lower token usage).
128
129
 
130
+ ### Multi-Issue Invocation
131
+
132
+ When multiple issue numbers are provided (e.g., `/qa 167 172`):
133
+
134
+ 1. **Parse all issue numbers** from args
135
+ 2. **Process each issue sequentially** with inline code review — do NOT spawn ad-hoc background agents for the diff reading or AC verification portions
136
+ 3. The built-in `sequant-qa-checker` sub-agents (type safety, scope, security) continue to run per the size gate rules for each issue
137
+ 4. Each issue gets its own full QA cycle: context fetch → diff review → quality checks → verdict → comment
138
+ 5. Post a **separate QA comment** to each issue's GitHub thread
139
+
140
+ **Why sequential with inline review:** Ad-hoc background agents for code review are unreliable — they hallucinate about file existence, misattribute API patterns, and hit permission issues on worktree reads. The narrowly-scoped `sequant-qa-checker` agents work well because they have specific, bounded tasks. The code review portion must stay inline for accuracy.
141
+
129
142
  ### Agent Execution Mode
130
143
 
131
144
  Before spawning quality check agents, determine the execution mode:
@@ -758,21 +771,21 @@ echo "Size gate: $total_changes lines changed (threshold: $threshold), pkg_chang
758
771
 
759
772
  Run these checks directly (no sub-agents needed):
760
773
 
761
- ```bash
762
- # Type safety: check for 'any' additions
763
- any_count=$(git diff origin/main...HEAD | grep '^\+' | grep -v '^\+\+\+' | grep -cw 'any' || true)
774
+ **IMPORTANT:** Use the Grep tool (not bash `grep`) for pattern matching — bash grep uses BSD regex on macOS which is incompatible with some patterns below. The Grep tool uses ripgrep which works cross-platform.
764
775
 
776
+ ```bash
765
777
  # Deleted tests check
766
778
  deleted_tests=$(git diff origin/main...HEAD --name-only --diff-filter=D | grep -cE '\.(test|spec)\.' || true)
767
779
 
768
780
  # Scope: files changed count
769
781
  files_changed=$(git diff origin/main...HEAD --name-only | wc -l | tr -d ' ')
782
+ ```
770
783
 
771
- # Security scan (lightweight just check for obvious patterns in added lines)
772
- security_issues=$(git diff origin/main...HEAD | grep '^\+' | grep -v '^\+\+\+' | grep -ciE 'eval\(|innerHTML|dangerouslySetInnerHTML|exec\(|password.*=.*["']|secret.*=.*["']|api.?key.*=.*["']' || true)
784
+ For type safety and security scans, use the Grep tool instead of bash:
785
+ - **Type safety:** `Grep(pattern=":\\s*any[,;)\\]]|as any", path="<changed-files>")` on added lines
786
+ - **Security scan:** `Grep(pattern="eval\\(|innerHTML|dangerouslySetInnerHTML|password.*=.*[\"']|secret.*=.*[\"']", path="<changed-files>")` on added lines
773
787
 
774
- echo "Inline checks: any=$any_count, deleted_tests=$deleted_tests, files=$files_changed, security_issues=$security_issues"
775
- ```
788
+ Count results from the Grep tool output to get `any_count` and `security_issues`.
776
789
 
777
790
  **After inline checks, skip to the output template** (the sub-agent section below is not executed).
778
791
 
@@ -838,6 +851,12 @@ issue_type="${SEQUANT_ISSUE_TYPE:-}"
838
851
  admin_modified=$(git diff main...HEAD --name-only | grep -E "^app/admin/" | head -1 || true)
839
852
  ```
840
853
 
854
+ **Add skill sync check if skill files modified:**
855
+ ```bash
856
+ skill_modified=$(git diff main...HEAD --name-only | grep -E "^\.(claude/skills|skills|templates/skills)/" | head -1 || true)
857
+ ```
858
+ If skill files are modified, the quality-checks.sh script automatically runs the three-directory sync check (section 12). If divergence is detected, this blocks `READY_FOR_MERGE` — verdict becomes `AC_MET_BUT_NOT_A_PLUS` with a note to run `npx tsx scripts/check-skill-sync.ts --fix`.
859
+
841
860
  See [quality-gates.md](references/quality-gates.md) for detailed verdict synthesis.
842
861
 
843
862
  ### Using MCP Tools (Optional)
@@ -1359,39 +1378,20 @@ Before any READY_FOR_MERGE verdict, complete the adversarial thinking checklist:
1359
1378
 
1360
1379
  See [testing-requirements.md](references/testing-requirements.md) for edge case checklists.
1361
1380
 
1362
- ### 5. Adversarial Self-Evaluation (REQUIRED)
1381
+ ### 5. Risk Assessment (REQUIRED unless SMALL_DIFF)
1363
1382
 
1364
- **Before issuing your verdict**, you MUST complete this adversarial self-evaluation to catch issues that automated quality checks miss.
1365
-
1366
- **Why this matters:** QA automation catches type issues, deleted tests, and scope creep - but misses:
1367
- - Features that don't actually work as expected
1368
- - Tests that pass but don't test the right things
1369
- - Edge cases only apparent when actually using the feature
1370
-
1371
- **Answer these questions honestly:**
1372
- 1. "Did the implementation actually work when I reviewed it, or am I assuming it works?"
1373
- 2. "Do the tests actually test the feature's primary purpose, or just pass?"
1374
- 3. "What's the most likely way this feature could break in production?"
1375
- 4. "Am I giving a positive verdict because the code looks clean, or because I verified it works?"
1376
- 5. "Are there 'design choices' I'm excusing that are actually bad practices?" (e.g., no version pinning, leaking secrets to unnecessary env vars, non-portable shell in example code, no input validation). Would I accept this in a code review from a junior developer?
1383
+ **Before issuing your verdict**, state the implementation risks in 2-3 sentences.
1377
1384
 
1378
1385
  **Include this section in your output:**
1379
1386
 
1380
1387
  ```markdown
1381
- ### Self-Evaluation
1388
+ ### Risk Assessment
1382
1389
 
1383
- - **Verified working:** [Yes/No - did you actually verify the feature works, or assume it does?]
1384
- - **Test efficacy:** [High/Medium/Low - do tests catch the feature breaking?]
1385
- - **Likely failure mode:** [What would most likely break this in production?]
1386
- - **Verdict confidence:** [High/Medium/Low - explain any uncertainty]
1390
+ - **Likely failure mode:** [How would this break in production? Be specific.]
1391
+ - **Not tested:** [What gaps exist in test coverage for these changes?]
1387
1392
  ```
1388
1393
 
1389
- **If any answer reveals concerns:**
1390
- - Factor the concerns into your verdict
1391
- - If significant, change verdict to `AC_NOT_MET` or `AC_MET_BUT_NOT_A_PLUS`
1392
- - Document the concerns in the QA comment
1393
-
1394
- **Do NOT skip this self-evaluation.** Honest reflection catches issues that code review cannot.
1394
+ **If either field reveals significant concerns**, factor them into your verdict. A serious failure mode with no test coverage should downgrade to `AC_MET_BUT_NOT_A_PLUS` or `AC_NOT_MET`.
1395
1395
 
1396
1396
  #### Skill Change Review (Conditional)
1397
1397
 
@@ -1402,7 +1402,7 @@ See [testing-requirements.md](references/testing-requirements.md) for edge case
1402
1402
  skills_changed=$(git diff main...HEAD --name-only | grep -E "^\.claude/skills/.*\.md$" | wc -l | xargs || true)
1403
1403
  ```
1404
1404
 
1405
- **If skills_changed > 0, add these adversarial prompts:**
1405
+ **If skills_changed > 0, add these verification prompts:**
1406
1406
 
1407
1407
  | Prompt | Why It Matters |
1408
1408
  |--------|----------------|
@@ -1985,14 +1985,14 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
1985
1985
  - [ ] **Code Review Findings** - Strengths, issues, suggestions
1986
1986
  - [ ] **Test Coverage Analysis** - Changed files with/without tests, critical paths flagged
1987
1987
  - [ ] **Anti-Pattern Detection** - Code patterns check (lightweight)
1988
- - [ ] **Self-Evaluation Completed** - Adversarial self-evaluation section included
1988
+ - [ ] **Risk Assessment** - Likely failure mode and coverage gaps stated
1989
1989
  - [ ] **Verdict** - One of: READY_FOR_MERGE, AC_MET_BUT_NOT_A_PLUS, NEEDS_VERIFICATION, AC_NOT_MET
1990
1990
  - [ ] **Documentation Check** - README/docs updated if feature adds new functionality
1991
1991
  - [ ] **Next Steps** - Clear, actionable recommendations
1992
1992
 
1993
1993
  ### Standard QA (Implementation Exists, `SMALL_DIFF=false`)
1994
1994
 
1995
- - [ ] **Self-Evaluation Completed** - Adversarial self-evaluation section included in output
1995
+ - [ ] **Risk Assessment** - Likely failure mode and coverage gaps stated in output
1996
1996
  - [ ] **AC Coverage** - Each AC item marked as MET, PARTIALLY_MET, NOT_MET, PENDING, or N/A
1997
1997
  - [ ] **Quality Plan Verification** - Included if quality plan exists (or marked N/A if no quality plan)
1998
1998
  - [ ] **CI Status** - Included if PR exists (or marked "No PR" / "No CI configured")
@@ -2008,7 +2008,7 @@ When the size gate determined `SMALL_DIFF=true`, use the **simplified output tem
2008
2008
  - [ ] **Execution Evidence** - Included if scripts/CLI modified (or marked N/A)
2009
2009
  - [ ] **Script Verification Override** - Included if scripts/CLI modified AND /verify was skipped (with justification and risk assessment)
2010
2010
  - [ ] **Skill Command Verification** - Included if `.claude/skills/**/*.md` modified (or marked N/A)
2011
- - [ ] **Skill Change Review** - Skill-specific adversarial prompts included if skills changed
2011
+ - [ ] **Skill Change Review** - Skill-specific verification prompts included if skills changed
2012
2012
  - [ ] **Smoke Test** - Included if workflow-affecting changes (skills, scripts, CLI), or marked "Not Required"
2013
2013
  - [ ] **CHANGELOG Verification** - User-facing changes have `[Unreleased]` entry (or marked N/A)
2014
2014
  - [ ] **Documentation Check** - README/docs updated if feature adds new functionality
@@ -2097,12 +2097,10 @@ When the size gate triggers simple fix mode, use this shorter template:
2097
2097
 
2098
2098
  ---
2099
2099
 
2100
- ### Self-Evaluation
2100
+ ### Risk Assessment
2101
2101
 
2102
- - **Verified working:** [Yes/No]
2103
- - **Test efficacy:** [High/Medium/Low]
2104
- - **Likely failure mode:** [description]
2105
- - **Verdict confidence:** [High/Medium/Low]
2102
+ - **Likely failure mode:** [How would this break in production?]
2103
+ - **Not tested:** [What gaps exist in test coverage?]
2106
2104
 
2107
2105
  ---
2108
2106
 
@@ -2387,12 +2385,10 @@ You MUST include these sections:
2387
2385
 
2388
2386
  ---
2389
2387
 
2390
- ### Self-Evaluation
2388
+ ### Risk Assessment
2391
2389
 
2392
- - **Verified working:** [Yes/No - did you actually verify the feature works?]
2393
- - **Test efficacy:** [High/Medium/Low - do tests catch the feature breaking?]
2394
- - **Likely failure mode:** [What would most likely break this in production?]
2395
- - **Verdict confidence:** [High/Medium/Low - explain any uncertainty]
2390
+ - **Likely failure mode:** [How would this break in production? Be specific.]
2391
+ - **Not tested:** [What gaps exist in test coverage for these changes?]
2396
2392
 
2397
2393
  ---
2398
2394
 
@@ -385,7 +385,53 @@ else
385
385
  fi
386
386
 
387
387
  # =============================================================================
388
- # 11. Build Verification (cacheable - expensive operation)
388
+ # =============================================================================
389
+ # 11.5. Skill Sync Check (when skill files modified)
390
+ # =============================================================================
391
+ echo ""
392
+ skill_files_changed=$(git diff main...HEAD --name-only | grep -E '^\.(claude/skills|skills|templates/skills)/' || true)
393
+ if [[ -n "$skill_files_changed" ]]; then
394
+ echo "🔍 Checking three-directory skill sync..."
395
+ if [[ -f "scripts/check-skill-sync.ts" ]]; then
396
+ sync_output=$(npx tsx scripts/check-skill-sync.ts 2>&1 || true)
397
+ sync_exit=$?
398
+ sync_summary=$(echo "$sync_output" | grep "^Summary:" || true)
399
+ if [[ $sync_exit -ne 0 ]]; then
400
+ echo "⚠️ Skill sync: DIVERGENCE DETECTED"
401
+ echo "$sync_summary"
402
+ echo " Run: npx tsx scripts/check-skill-sync.ts --fix"
403
+ else
404
+ echo "✅ Skill sync: All files synced across 3 directories"
405
+ fi
406
+ else
407
+ echo " (scripts/check-skill-sync.ts not found — using inline diff)"
408
+ diverged=0
409
+ for f in $skill_files_changed; do
410
+ if [[ "$f" == .claude/skills/* ]]; then
411
+ rel="${f#.claude/skills/}"
412
+ for mirror in "templates/skills" "skills"; do
413
+ if [[ -f "${mirror}/${rel}" ]]; then
414
+ if ! diff -q ".claude/skills/${rel}" "${mirror}/${rel}" > /dev/null 2>&1; then
415
+ echo " ⚠️ DIVERGED: ${rel} (.claude/skills vs ${mirror})"
416
+ diverged=$((diverged + 1))
417
+ fi
418
+ fi
419
+ done
420
+ fi
421
+ done
422
+ if [[ $diverged -eq 0 ]]; then
423
+ echo "✅ Skill sync: Changed skill files are synced"
424
+ else
425
+ echo "⚠️ Skill sync: ${diverged} file(s) diverged"
426
+ echo " Fix: copy from .claude/skills/ to templates/skills/ and skills/"
427
+ fi
428
+ fi
429
+ else
430
+ echo "🔍 Skill sync: No skill files changed (skipped)"
431
+ fi
432
+
433
+ # =============================================================================
434
+ # 12. Build Verification (cacheable - expensive operation)
389
435
  # =============================================================================
390
436
 
391
437
  verify_build_against_main() {