sequant 2.7.0 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (59) hide show
  1. package/.claude-plugin/marketplace.json +1 -1
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/README.md +9 -1
  4. package/dist/bin/cli.d.ts +1 -1
  5. package/dist/bin/cli.js +10 -1
  6. package/dist/bin/preflight.d.ts +21 -0
  7. package/dist/bin/preflight.js +45 -0
  8. package/dist/marketplace/external_plugins/sequant/.claude-plugin/plugin.json +1 -1
  9. package/dist/marketplace/external_plugins/sequant/skills/_shared/references/force-push.md +34 -0
  10. package/dist/marketplace/external_plugins/sequant/skills/assess/SKILL.md +24 -7
  11. package/dist/marketplace/external_plugins/sequant/skills/exec/SKILL.md +29 -0
  12. package/dist/marketplace/external_plugins/sequant/skills/loop/SKILL.md +100 -2
  13. package/dist/marketplace/external_plugins/sequant/skills/qa/SKILL.md +24 -0
  14. package/dist/marketplace/external_plugins/sequant/skills/qa/references/anti-pattern-detection.md +285 -0
  15. package/dist/marketplace/external_plugins/sequant/skills/qa/references/call-site-review.md +202 -0
  16. package/dist/marketplace/external_plugins/sequant/skills/qa/references/quality-gates.md +287 -0
  17. package/dist/marketplace/external_plugins/sequant/skills/qa/references/test-quality-checklist.md +272 -0
  18. package/dist/marketplace/external_plugins/sequant/skills/qa/references/testing-requirements.md +40 -0
  19. package/dist/marketplace/external_plugins/sequant/skills/qa/scripts/quality-checks.sh +95 -11
  20. package/dist/marketplace/external_plugins/sequant/skills/references/shared/framework-gotchas.md +186 -0
  21. package/dist/marketplace/external_plugins/sequant/skills/release/SKILL.md +661 -0
  22. package/dist/marketplace/external_plugins/sequant/skills/test/references/browser-testing-patterns.md +423 -0
  23. package/dist/marketplace/external_plugins/sequant/skills/upstream/SKILL.md +419 -0
  24. package/dist/src/lib/errors.d.ts +85 -0
  25. package/dist/src/lib/errors.js +111 -0
  26. package/dist/src/lib/version-check.d.ts +19 -0
  27. package/dist/src/lib/version-check.js +44 -0
  28. package/dist/src/lib/workflow/batch-executor.js +61 -6
  29. package/dist/src/lib/workflow/drivers/agent-driver.d.ts +17 -0
  30. package/dist/src/lib/workflow/drivers/claude-code.d.ts +22 -0
  31. package/dist/src/lib/workflow/drivers/claude-code.js +111 -7
  32. package/dist/src/lib/workflow/log-writer.d.ts +1 -1
  33. package/dist/src/lib/workflow/phase-executor.d.ts +18 -0
  34. package/dist/src/lib/workflow/phase-executor.js +76 -14
  35. package/dist/src/lib/workflow/run-log-schema.d.ts +3 -0
  36. package/dist/src/lib/workflow/run-log-schema.js +7 -0
  37. package/dist/src/lib/workflow/state-manager.d.ts +1 -0
  38. package/dist/src/lib/workflow/state-manager.js +6 -0
  39. package/dist/src/lib/workflow/state-schema.d.ts +3 -0
  40. package/dist/src/lib/workflow/state-schema.js +7 -0
  41. package/dist/src/lib/workflow/types.d.ts +17 -0
  42. package/dist/src/ui/tui/theme.d.ts +18 -4
  43. package/dist/src/ui/tui/theme.js +18 -4
  44. package/package.json +4 -3
  45. package/templates/skills/_shared/references/force-push.md +34 -0
  46. package/templates/skills/assess/SKILL.md +24 -7
  47. package/templates/skills/exec/SKILL.md +29 -0
  48. package/templates/skills/loop/SKILL.md +100 -2
  49. package/templates/skills/qa/SKILL.md +24 -0
  50. package/templates/skills/qa/references/anti-pattern-detection.md +285 -0
  51. package/templates/skills/qa/references/call-site-review.md +202 -0
  52. package/templates/skills/qa/references/quality-gates.md +287 -0
  53. package/templates/skills/qa/references/test-quality-checklist.md +272 -0
  54. package/templates/skills/qa/references/testing-requirements.md +40 -0
  55. package/templates/skills/qa/scripts/quality-checks.sh +95 -11
  56. package/templates/skills/references/shared/framework-gotchas.md +186 -0
  57. package/templates/skills/release/SKILL.md +661 -0
  58. package/templates/skills/test/references/browser-testing-patterns.md +423 -0
  59. package/templates/skills/upstream/SKILL.md +419 -0
@@ -10,14 +10,67 @@ Combine agent outputs into a unified quality assessment:
10
10
  | Scope/Size Checker | Files changed, LOC, assessment | Medium - warning if very large |
11
11
  | Security Scanner | Critical/warning/info counts | High - blocking if criticals > 0 |
12
12
  | Semgrep Static Analysis | Critical/warning findings | High - blocking if criticals > 0 |
13
+ | Test Tautology Detector | Tautological test count, percentage | High - blocking if >50% tautological |
13
14
  | RLS Checker (conditional) | Violations found | High - blocking if violations |
14
15
 
15
16
  **Synthesis Rules:**
16
17
  - **Any FAIL verdict** → Flag as blocker in manual review
17
18
  - **Security criticals (including Semgrep)** → Block merge, require fix before proceeding
19
+ - **Build regression detected** → Block merge, require fix before proceeding
20
+ - **Test tautology >50%** → Block merge, tests provide no regression protection
18
21
  - **All PASS** → Proceed with confidence to manual review
19
22
  - **WARN verdicts** → Note in review, verify manually
20
23
 
24
+ ## Build Verification
25
+
26
+ When `npm run build` fails on the feature branch, QA must verify whether the failure is a regression (new) or pre-existing (already on main).
27
+
28
+ ### Verification Logic
29
+
30
+ | Feature Build | Main Build | Error Match | Classification |
31
+ |---------------|------------|-------------|----------------|
32
+ | ❌ Fail | ✅ Pass | N/A | **Regression** - failure introduced by PR |
33
+ | ❌ Fail | ❌ Fail | Same error | **Pre-existing** - not blocking |
34
+ | ❌ Fail | ❌ Fail | Different | **Unknown** - requires manual review |
35
+ | ✅ Pass | * | N/A | N/A - no verification needed |
36
+
37
+ ### Verdict Mapping
38
+
39
+ | Build Verification Result | QA Verdict Impact |
40
+ |---------------------------|-------------------|
41
+ | **Regression detected** | **BLOCKING** - `AC_NOT_MET`, must fix before merge |
42
+ | **Pre-existing failure** | Non-blocking - document and proceed |
43
+ | **Unknown (different errors)** | `AC_MET_BUT_NOT_A_PLUS` - manual review recommended |
44
+ | **Build passes** | No impact |
45
+
46
+ ### Output Format
47
+
48
+ ```markdown
49
+ ### Build Verification
50
+
51
+ | Check | Status |
52
+ |-------|--------|
53
+ | Feature branch build | ❌ Failed |
54
+ | Main branch build | ❌ Failed |
55
+ | Error match | ✅ Same error |
56
+ | Regression | **No** (pre-existing) |
57
+
58
+ **Note:** Build failure is pre-existing on main branch. Not blocking this PR.
59
+ ```
60
+
61
+ ### Implementation
62
+
63
+ The `quality-checks.sh` script includes:
64
+ - `run_build_with_verification()` - runs build and triggers verification on failure
65
+ - `verify_build_against_main()` - compares build results against main branch
66
+
67
+ **How it works:**
68
+ 1. Run `npm run build` on feature branch
69
+ 2. If build fails, capture exit code and error output
70
+ 3. Run build on main branch (via main repo directory, not checkout)
71
+ 4. Compare exit codes and first error line
72
+ 5. Output Build Verification table with classification
73
+
21
74
  ## Semgrep Integration
22
75
 
23
76
  Semgrep provides static analysis for security vulnerabilities and anti-patterns.
@@ -59,6 +112,71 @@ Semgrep uses stack-specific rulesets for targeted analysis:
59
112
 
60
113
  Projects can add custom rules in `.sequant/semgrep-rules.yaml`. These are loaded alongside stack rules automatically.
61
114
 
115
+ ## Test Tautology Detection
116
+
117
+ Tautological tests are tests that pass but don't call any production code. They provide zero regression protection as they only assert on local values.
118
+
119
+ ### Detection Logic
120
+
121
+ A test block is flagged as tautological if:
122
+ 1. It's an `it()` or `test()` block
123
+ 2. It contains zero calls to functions imported from source modules
124
+ 3. Source modules are relative imports (`./`, `../`) excluding mocks/fixtures/test libraries
125
+
126
+ ### Verdict Mapping
127
+
128
+ | Tautology Result | QA Verdict Impact |
129
+ |------------------|-------------------|
130
+ | >50% of test blocks tautological | **BLOCKING** - `AC_NOT_MET` |
131
+ | 1-50% of test blocks tautological | Warning - `AC_MET_BUT_NOT_A_PLUS` |
132
+ | 0% tautological | No impact |
133
+ | No test blocks in diff | No impact (skipped) |
134
+
135
+ ### Output Format
136
+
137
+ ```markdown
138
+ ### Test Quality Review
139
+
140
+ | Category | Status | Notes |
141
+ |----------|--------|-------|
142
+ | Tautology Check | ⚠️ WARN | 2 tautological test blocks found (25%) |
143
+
144
+ **Tautological Tests Found:**
145
+ - `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
146
+ - `src/lib/bar.test.ts:12` - `test("validates")` - No production function calls
147
+ ```
148
+
149
+ ### Example - Tautological vs Real Test
150
+
151
+ ```typescript
152
+ // TAUTOLOGICAL — flags as warning/blocker
153
+ import { executePhaseWithRetry } from "./run.js";
154
+ it("should retry", () => {
155
+ const retry = true;
156
+ expect(retry).toBe(true); // No production function called!
157
+ });
158
+
159
+ // REAL — this is fine
160
+ import { executePhaseWithRetry } from "./run.js";
161
+ it("should retry", async () => {
162
+ const result = await executePhaseWithRetry(123, "exec", config, ...);
163
+ expect(result.success).toBe(true); // Calls production function
164
+ });
165
+ ```
166
+
167
+ ### Implementation
168
+
169
+ The `quality-checks.sh` script includes:
170
+ - Tautology detector CLI: `scripts/qa/tautology-detector-cli.ts`
171
+ - Detection library: `src/lib/test-tautology-detector.ts`
172
+
173
+ **How it works:**
174
+ 1. Get test files from `git diff main...HEAD`
175
+ 2. For each test file, extract imports and test blocks
176
+ 3. Check if any imported production function is called within each test block
177
+ 4. Report tautological tests with file:line references
178
+ 5. Block if >50% of test blocks are tautological
179
+
62
180
  ## Verdict Criteria
63
181
 
64
182
  ### `READY_FOR_MERGE`
@@ -73,10 +191,12 @@ Must meet ALL of:
73
191
  - ✅ Reversibility Test: Clean revert possible
74
192
  - ✅ **Adversarial Test: Failure path tested**
75
193
  - ✅ **Edge Case Test: At least 1 edge case per AC tested**
194
+ - ✅ **Execution Evidence: Complete or waived** (see below)
76
195
 
77
196
  ### `AC_MET_BUT_NOT_A_PLUS`
78
197
 
79
198
  AC met, but one or more issues:
199
+
80
200
  - ⚠️ Minor scope creep (1-2 extra files)
81
201
  - ⚠️ Over-engineering (abstraction not required)
82
202
  - ⚠️ Size larger than expected but justified
@@ -99,6 +219,7 @@ All AC items are `MET`, but one or more items have `PENDING` status requiring ex
99
219
  ### `AC_NOT_MET`
100
220
 
101
221
  Any of:
222
+
102
223
  - ❌ One or more AC items `NOT_MET` or `PARTIALLY_MET`
103
224
  - ❌ Deleted tests without justification
104
225
  - ❌ Major scope creep (many unrelated files)
@@ -139,6 +260,65 @@ Any of:
139
260
 
140
261
  **Important:** `PARTIALLY_MET` is NOT sufficient for merge. It must be treated as `NOT_MET` for verdict purposes.
141
262
 
263
+ ## CI Status Impact on Verdict
264
+
265
+ **Purpose:** CI status directly affects verdict when AC items depend on CI (e.g., "Tests pass in CI").
266
+
267
+ ### CI Status Mapping
268
+
269
+ | State | Bucket | AC Status | Verdict Impact |
270
+ |-------|--------|-----------|----------------|
271
+ | `SUCCESS` | `pass` | `MET` | No impact |
272
+ | `FAILURE` | `fail` | `NOT_MET` | Blocks merge |
273
+ | `CANCELLED` | `fail` | `NOT_MET` | Blocks merge |
274
+ | `SKIPPED` | `pass` | `N/A` | No impact |
275
+ | `PENDING` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
276
+ | `QUEUED` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
277
+ | `IN_PROGRESS` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
278
+ | (no checks) | - | `N/A` | No CI configured |
279
+
280
+ ### CI-Related AC Detection
281
+
282
+ Identify AC items that depend on CI by matching patterns:
283
+ - "Tests pass in CI"
284
+ - "CI passes"
285
+ - "Build succeeds in CI"
286
+ - "GitHub Actions pass"
287
+ - "Pipeline passes"
288
+ - "Workflow passes"
289
+ - "Checks pass"
290
+ - "Actions succeed"
291
+ - "CI/CD passes"
292
+
293
+ ### Error Handling
294
+
295
+ If `gh pr checks` fails:
296
+ - **`gh` not installed** → Skip CI section: "CI status unavailable (gh CLI not found)"
297
+ - **`gh` not authenticated** → Skip CI section: "CI status unavailable (gh auth required)"
298
+ - **Network/auth error** → Treat as N/A: "CI status unavailable"
299
+ - **No PR exists** → Skip CI check entirely
300
+ - **Empty response** → No CI configured (not an error)
301
+
302
+ **Portability:** CI detection requires GitHub (`gh` CLI). GitLab, Bitbucket, Azure DevOps not supported.
303
+
304
+ ### CI Verdict Rules
305
+
306
+ 1. **CI failure → AC_NOT_MET:** Any failed CI check that maps to an AC item means that AC is NOT_MET
307
+ 2. **CI pending → NEEDS_VERIFICATION:** If CI is still running for a CI-related AC, verdict is NEEDS_VERIFICATION
308
+ 3. **No CI configured → N/A:** Mark CI-related AC items as N/A, don't block on missing CI
309
+ 4. **CI success → MET:** CI-related AC items are MET when all relevant checks pass
310
+
311
+ **Example Scenario:**
312
+
313
+ ```markdown
314
+ AC-1: "Feature implemented" → MET (code review)
315
+ AC-2: "Tests pass locally" → MET (npm test passed)
316
+ AC-3: "Tests pass in CI" → PENDING (CI in progress)
317
+ AC-4: "Docs updated" → MET (README updated)
318
+
319
+ Verdict: NEEDS_VERIFICATION (due to AC-3 PENDING)
320
+ ```
321
+
142
322
  ## Code Review Decision Framework
143
323
 
144
324
  ### 1. Purpose Test
@@ -177,3 +357,110 @@ Any of:
177
357
  - **Unintegrated exports**: ⚠️ Warning only
178
358
  - **Security criticals** > 0: ❌ Blocker
179
359
  - **Security warnings** > 0: ⚠️ Review each case
360
+
361
+ ---
362
+
363
+ ## Execution Evidence Requirements
364
+
365
+ ### Purpose
366
+
367
+ QA must actually execute code for scripts/CLI changes, not just review it. Analysis of 34 run logs shows zero `/loop` phases triggered - QA passes every time without catching runtime issues.
368
+
369
+ ### Change Type Detection
370
+
371
+ Determine execution requirement based on what files were changed:
372
+
373
+ ```bash
374
+ # Detect change type
375
+ scripts_changed=$(git diff main...HEAD --name-only | grep -E "^scripts/" | wc -l | xargs)
376
+ cli_changed=$(git diff main...HEAD --name-only | grep -E "(cli|commands?)" | wc -l | xargs)
377
+ ui_changed=$(git diff main...HEAD --name-only | grep -E "^(app|components|pages)/" | wc -l | xargs)
378
+ types_only=$(git diff main...HEAD --name-only | grep -E "\.d\.ts$|^types/" | wc -l | xargs)
379
+ tests_only=$(git diff main...HEAD --name-only | grep -E "\.test\.|\.spec\.|__tests__" | wc -l | xargs)
380
+ ```
381
+
382
+ ### Execution Matrix
383
+
384
+ | Change Type | QA Must Execute | Example Command |
385
+ |-------------|-----------------|-----------------|
386
+ | `scripts/` files | ✅ Required | `npx tsx scripts/foo.ts --help` |
387
+ | CLI commands | ✅ Required | `npx sequant <cmd> --help` or dry-run |
388
+ | UI components | ⚠️ Via `/test` | Browser testing required |
389
+ | Types/config only | ❌ Waiver OK | Note: "Types-only change, execution waived" |
390
+ | Tests only | ✅ Run tests | `npm test -- --grep "feature"` |
391
+
392
+ ### Evidence Collection
393
+
394
+ For each executable change, QA must:
395
+
396
+ 1. **Identify a safe smoke command:**
397
+ - Prefer `--help`, `--dry-run`, or `--version` flags
398
+ - For scripts: pass minimal safe arguments
399
+ - Never execute destructive operations
400
+
401
+ 2. **Execute and capture:**
402
+ ```bash
403
+ # Example for a script
404
+ npx tsx scripts/analytics.ts --help 2>&1
405
+ echo "Exit code: $?"
406
+ ```
407
+
408
+ 3. **Record in output:**
409
+ ```markdown
410
+ ### Execution Evidence
411
+
412
+ | Test Type | Command | Exit Code | Result |
413
+ |-----------|---------|-----------|--------|
414
+ | Smoke test | `npx tsx scripts/analytics.ts --help` | 0 | Usage info displayed ✓ |
415
+ | Dry run | `npx tsx scripts/migrate.ts --dry-run` | 0 | Plan shown, no changes ✓ |
416
+
417
+ **Evidence status:** Complete
418
+ ```
419
+
420
+ ### Evidence Status Definitions
421
+
422
+ | Status | Definition | Verdict Eligibility |
423
+ |--------|------------|---------------------|
424
+ | **Complete** | All required commands executed successfully | `READY_FOR_MERGE` eligible |
425
+ | **Incomplete** | Some commands not run or failed | `AC_MET_BUT_NOT_A_PLUS` max |
426
+ | **Waived** | Explicit reason documented | `READY_FOR_MERGE` eligible |
427
+ | **Not Required** | No executable changes | `READY_FOR_MERGE` eligible |
428
+
429
+ ### Waiver Criteria
430
+
431
+ Execution can be waived with documented reason:
432
+
433
+ | Waiver Reason | Example |
434
+ |---------------|---------|
435
+ | Types-only change | "Only `.d.ts` files modified" |
436
+ | Config-only change | "Only `tsconfig.json` or `.eslintrc` modified" |
437
+ | Documentation-only | "Only `.md` files modified" |
438
+ | Test-only change | "Only test files modified, tests run via `npm test`" |
439
+
440
+ **Waiver format:**
441
+ ```markdown
442
+ ### Execution Evidence
443
+
444
+ **Status:** Waived
445
+ **Reason:** Types-only change - only `.d.ts` files modified
446
+ ```
447
+
448
+ ### Verdict Gating
449
+
450
+ | Verdict | Evidence Requirement |
451
+ |---------|---------------------|
452
+ | `READY_FOR_MERGE` | Evidence: Complete OR Waived (with reason) OR Not Required |
453
+ | `AC_MET_BUT_NOT_A_PLUS` | Evidence: Incomplete + note explaining gap |
454
+ | `AC_NOT_MET` | N/A (AC issues take precedence) |
455
+
456
+ ### Integration with /verify
457
+
458
+ For complex CLI features, `/verify` provides more comprehensive execution testing:
459
+
460
+ 1. QA detects `scripts/` changes
461
+ 2. Basic smoke test in QA (--help, --dry-run)
462
+ 3. For full verification: recommend `/verify <issue> --command "..."`
463
+ 4. `/verify` posts evidence to issue
464
+ 5. Re-run QA to see verification evidence
465
+
466
+ See [/verify skill](../../verify/SKILL.md) for detailed execution verification.
@@ -0,0 +1,272 @@
1
+ # Test Quality Checklist
2
+
3
+ ## Purpose
4
+
5
+ This checklist helps QA evaluate the quality of tests added or modified during implementation. Tests that pass but don't actually validate behavior create false confidence.
6
+
7
+ ## When to Apply
8
+
9
+ Apply this checklist when:
10
+ - New test files are added
11
+ - Existing test files are modified
12
+ - AC items specifically mention testing requirements
13
+
14
+ **Skip if:** No test files were added or modified.
15
+
16
+ ## Checklist Sections
17
+
18
+ ### 1. Behavior vs Implementation
19
+
20
+ Tests should assert on **observable outputs**, not internal state.
21
+
22
+ | Check | Pass | Fail |
23
+ |-------|------|------|
24
+ | Tests assert on return values, rendered output, or API responses | ✅ | ❌ Asserts on private variables or internal state |
25
+ | Refactoring internals wouldn't require test changes | ✅ | ❌ Tests break when implementation changes but behavior doesn't |
26
+ | Tests describe "what" not "how" | ✅ | ❌ Test names describe implementation details |
27
+
28
+ **Example - Good:**
29
+ ```typescript
30
+ it('returns user profile when authenticated', async () => {
31
+ const result = await getProfile(validToken);
32
+ expect(result.name).toBe('John');
33
+ });
34
+ ```
35
+
36
+ **Example - Bad:**
37
+ ```typescript
38
+ it('calls internal _fetchUser method', async () => {
39
+ const spy = jest.spyOn(service, '_fetchUser');
40
+ await getProfile(validToken);
41
+ expect(spy).toHaveBeenCalled(); // Testing implementation, not behavior
42
+ });
43
+ ```
44
+
45
+ ### 2. Coverage Depth
46
+
47
+ Tests should cover more than the happy path.
48
+
49
+ | Check | Pass | Fail |
50
+ |-------|------|------|
51
+ | Error paths tested (what happens when things fail?) | ✅ | ❌ Only success scenarios |
52
+ | Boundary conditions tested (empty, null, max values) | ✅ | ❌ Only typical inputs |
53
+ | Edge cases identified and tested | ✅ | ❌ Assumes inputs are always valid |
54
+
55
+ **Required error path tests:**
56
+ - [ ] Empty input handling
57
+ - [ ] Null/undefined handling
58
+ - [ ] Invalid format handling
59
+ - [ ] Network/API failure handling (if applicable)
60
+ - [ ] Permission denied handling (if applicable)
61
+
62
+ ### 3. Mock Hygiene
63
+
64
+ Mocks should be minimal and purposeful.
65
+
66
+ | Check | Pass | Fail |
67
+ |-------|------|------|
68
+ | Only external dependencies mocked (APIs, DB, file system) | ✅ | ❌ Internal modules mocked |
69
+ | Not mocking the thing being tested | ✅ | ❌ Subject under test is partially mocked |
70
+ | Mock return values match real API contracts | ✅ | ❌ Mocks return impossible data |
71
+ | Mocks cleaned up after tests | ✅ | ❌ Mocks leak between tests |
72
+
73
+ **Over-mocking indicators:**
74
+ - More than 3 modules mocked in a single test file
75
+ - Mock setup is longer than the test itself
76
+ - Tests pass but feature doesn't work in production
77
+
78
+ **Example - Over-mocked (bad):**
79
+ ```typescript
80
+ jest.mock('../utils');
81
+ jest.mock('../helpers');
82
+ jest.mock('../validators');
83
+ jest.mock('../formatters');
84
+ // 4 mocks for a simple unit test = over-mocking
85
+ ```
86
+
87
+ ### 4. Test Reliability
88
+
89
+ Tests should be deterministic and independent.
90
+
91
+ | Check | Pass | Fail |
92
+ |-------|------|------|
93
+ | No timing-dependent assertions | ✅ | ❌ Uses setTimeout, expects specific timing |
94
+ | Tests are deterministic (same result every run) | ✅ | ❌ Flaky tests that sometimes fail |
95
+ | Tests are independent (order doesn't matter) | ✅ | ❌ Tests depend on previous test state |
96
+ | Async operations properly awaited | ✅ | ❌ Fire-and-forget async calls |
97
+
98
+ **Flaky test indicators:**
99
+ - Tests that pass locally but fail in CI
100
+ - Tests that fail intermittently
101
+ - Tests with `setTimeout` or `sleep` calls
102
+ - Tests that depend on system time
103
+
104
+ **Use instead:**
105
+ ```typescript
106
+ // Bad: setTimeout
107
+ await new Promise(resolve => setTimeout(resolve, 1000));
108
+
109
+ // Good: waitFor
110
+ await waitFor(() => expect(element).toBeVisible());
111
+ ```
112
+
113
+ ## Common Anti-Patterns
114
+
115
+ ### 1. Snapshot Abuse
116
+
117
+ **Problem:** Snapshots used for complex objects instead of specific assertions.
118
+
119
+ **Detection:**
120
+ Use the Glob tool to count snapshot and test files:
121
+ ```
122
+ # Count snapshot files
123
+ Glob(pattern="**/*.snap") # Count results
124
+
125
+ # Count test files
126
+ Glob(pattern="**/*.test.*") # Count results
127
+
128
+ # Ratio > 0.5 may indicate overuse
129
+ ```
130
+
131
+ **Flag if:**
132
+ - Snapshots contain >50 lines
133
+ - Snapshot changes are approved without review
134
+ - Tests only use `toMatchSnapshot()` with no other assertions
135
+
136
+ ### 2. Test Data Coupling
137
+
138
+ **Problem:** Tests share mutable state or depend on database seeding order.
139
+
140
+ **Detection:**
141
+ - Look for `beforeAll` that sets up shared state
142
+ - Tests that fail when run in isolation (`it.only`)
143
+
144
+ ### 3. Implementation Mirroring
145
+
146
+ **Problem:** Tests that duplicate the implementation logic.
147
+
148
+ **Example - Bad:**
149
+ ```typescript
150
+ it('calculates total', () => {
151
+ const items = [{price: 10}, {price: 20}];
152
+ // This mirrors the implementation exactly
153
+ const expected = items.reduce((sum, i) => sum + i.price, 0);
154
+ expect(calculateTotal(items)).toBe(expected);
155
+ });
156
+ ```
157
+
158
+ **Better:**
159
+ ```typescript
160
+ it('calculates total', () => {
161
+ const items = [{price: 10}, {price: 20}];
162
+ expect(calculateTotal(items)).toBe(30); // Known correct value
163
+ });
164
+ ```
165
+
166
+ ### 4. Tautological Tests (Automated Detection)
167
+
168
+ **Problem:** Tests that pass but don't call any production code. These tests provide zero regression protection.
169
+
170
+ **Detection:** Automated via `tautology-detector-cli.ts` — runs during QA quality checks.
171
+
172
+ **Example - Tautological (BAD):**
173
+ ```typescript
174
+ import { processData } from './processor';
175
+
176
+ it('should process correctly', () => {
177
+ const result = true; // Never calls processData!
178
+ expect(result).toBe(true);
179
+ });
180
+ ```
181
+
182
+ **Example - Real Test (GOOD):**
183
+ ```typescript
184
+ import { processData } from './processor';
185
+
186
+ it('should process correctly', () => {
187
+ const result = processData({ value: 42 }); // Actually calls production code
188
+ expect(result.success).toBe(true);
189
+ });
190
+ ```
191
+
192
+ **Detection Heuristic:**
193
+ 1. Extract imports from source modules (relative paths like `./`, `../`)
194
+ 2. Exclude test libraries (`vitest`, `jest`, `@testing-library`, etc.)
195
+ 3. Exclude mock/fixture imports (paths containing `mock`, `fixture`, `stub`, etc.)
196
+ 4. For each `it()` / `test()` block, check if any imported function is called
197
+ 5. Flag blocks where zero production functions are invoked
198
+
199
+ **Blocking Threshold:** If >50% of test blocks in the diff are tautological, merge is blocked.
200
+
201
+ ## Output Format
202
+
203
+ Include this section in QA output when test files are modified:
204
+
205
+ ```markdown
206
+ ### Test Quality Review
207
+
208
+ | Category | Status | Notes |
209
+ |----------|--------|-------|
210
+ | Tautology Check | ✅ OK | All tests call production code |
211
+ | Behavior vs Implementation | ✅ OK | Tests assert on outputs |
212
+ | Coverage Depth | ⚠️ WARN | Missing error path tests |
213
+ | Mock Hygiene | ✅ OK | Minimal mocking |
214
+ | Test Reliability | ✅ OK | No timing dependencies |
215
+
216
+ **Issues Found:**
217
+ - `auth.test.ts:45` - Missing error path for invalid token
218
+ - `utils.test.ts` - 4 modules mocked (over-mocking)
219
+
220
+ **Suggestions:**
221
+ 1. Add test for invalid token scenario
222
+ 2. Reduce mocks in utils.test.ts to external dependencies only
223
+ ```
224
+
225
+ ### Tautology Check Output (Automated)
226
+
227
+ When tautological tests are detected:
228
+
229
+ ```markdown
230
+ ### Test Quality Review
231
+
232
+ | Category | Status | Notes |
233
+ |----------|--------|-------|
234
+ | Tautology Check | ⚠️ WARN | 2 tautological test blocks found (25%) |
235
+
236
+ **Tautological Tests Found:**
237
+ - `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
238
+ - `src/lib/bar.test.ts:12` - `test("validates input")` - No production function calls
239
+
240
+ **Verdict Impact:** 25% tautological — warning only, review tests before merge
241
+ ```
242
+
243
+ When >50% are tautological (blocking):
244
+
245
+ ```markdown
246
+ ### Test Quality Review
247
+
248
+ | Category | Status | Notes |
249
+ |----------|--------|-------|
250
+ | Tautology Check | ❌ FAIL | 3 tautological test blocks found (75%) |
251
+
252
+ **Tautological Tests Found:**
253
+ - `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
254
+ - `src/lib/foo.test.ts:52` - `it("should handle input")` - No production function calls
255
+ - `src/lib/bar.test.ts:12` - `test("validates")` - No production function calls
256
+
257
+ **Verdict Impact:** >50% tautological — blocks `READY_FOR_MERGE`
258
+ ```
259
+
260
+ ## Verdict Impact
261
+
262
+ | Test Quality | Verdict Impact |
263
+ |--------------|----------------|
264
+ | All checks pass | No impact |
265
+ | 1-2 warnings | Note in QA, no verdict change |
266
+ | Over-mocking (4+ mocks) | `AC_MET_BUT_NOT_A_PLUS` |
267
+ | No error path tests | `AC_MET_BUT_NOT_A_PLUS` |
268
+ | Tests mirror implementation | `AC_MET_BUT_NOT_A_PLUS` |
269
+ | Tautological tests (1-50%) | `AC_MET_BUT_NOT_A_PLUS` |
270
+ | **Tautological tests (>50%)** | `AC_NOT_MET` (blocker) |
271
+ | Flaky tests introduced | `AC_NOT_MET` (blocker) |
272
+ | Tests deleted without justification | `AC_NOT_MET` (blocker) |
@@ -1,5 +1,45 @@
1
1
  # Testing Requirements
2
2
 
3
+ ## Test Quality Guidelines
4
+
5
+ **The goal is NOT test quantity — it's transparency about what's actually being tested.**
6
+
7
+ ### A Quality Test:
8
+ - **Tests behavior, not implementation details** - Assert on outputs, not internal state
9
+ - **Covers primary use case + at least 1 failure path** - Happy path alone is insufficient
10
+ - **Fails when the feature breaks, passes when it works** - Actually validates the feature
11
+ - **Uses realistic inputs** - Not contrived data that never occurs in production
12
+
13
+ ### Avoid:
14
+ - ❌ Tests that mock everything (tests the mocks, not the code)
15
+ - ❌ Tests that only cover happy path (miss real failures)
16
+ - ❌ Tests written just to hit coverage numbers (low value)
17
+ - ❌ Snapshot tests over 50 lines (too brittle, hard to review)
18
+ - ❌ Tests that mirror implementation (break with any refactor)
19
+
20
+ ### Test Value Hierarchy
21
+
22
+ | Test Type | Value | When to Use |
23
+ |-----------|-------|-------------|
24
+ | **Integration tests** | High | Critical paths, user flows |
25
+ | **Unit tests (behavior)** | Medium-High | Business logic, utilities |
26
+ | **Unit tests (implementation)** | Low | Avoid - too brittle |
27
+ | **Snapshot tests** | Low | UI components only, small snapshots |
28
+
29
+ ### Test-to-Code Ratio Guidelines
30
+
31
+ Don't chase coverage percentages. Instead:
32
+
33
+ | Change Type | Recommended Approach |
34
+ |-------------|---------------------|
35
+ | **Critical path** (auth, payments) | Test thoroughly - multiple scenarios |
36
+ | **Business logic** | Test primary behavior + 1-2 edge cases |
37
+ | **Simple utilities** | Single test covering main use case |
38
+ | **UI tweaks** | Manual verification often sufficient |
39
+ | **Types/config** | No tests needed |
40
+
41
+ ---
42
+
3
43
  ## Adversarial Thinking Checklist
4
44
 
5
45
  **STOP and ask these questions before any READY_FOR_MERGE verdict:**