npm - sequant - Versions diffs - 2.7.0 → 2.8.0 - Mend

sequant 2.7.0 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (59) hide show

package/templates/skills/qa/references/quality-gates.md CHANGED Viewed

@@ -10,14 +10,67 @@ Combine agent outputs into a unified quality assessment:
 | Scope/Size Checker | Files changed, LOC, assessment | Medium - warning if very large |
 | Security Scanner | Critical/warning/info counts | High - blocking if criticals > 0 |
 | Semgrep Static Analysis | Critical/warning findings | High - blocking if criticals > 0 |
+| Test Tautology Detector | Tautological test count, percentage | High - blocking if >50% tautological |
 | RLS Checker (conditional) | Violations found | High - blocking if violations |
 **Synthesis Rules:**
 - **Any FAIL verdict** → Flag as blocker in manual review
 - **Security criticals (including Semgrep)** → Block merge, require fix before proceeding
+- **Build regression detected** → Block merge, require fix before proceeding
+- **Test tautology >50%** → Block merge, tests provide no regression protection
 - **All PASS** → Proceed with confidence to manual review
 - **WARN verdicts** → Note in review, verify manually
+## Build Verification
+When `npm run build` fails on the feature branch, QA must verify whether the failure is a regression (new) or pre-existing (already on main).
+### Verification Logic
+| Feature Build | Main Build | Error Match | Classification |
+|---------------|------------|-------------|----------------|
+| ❌ Fail | ✅ Pass | N/A | **Regression** - failure introduced by PR |
+| ❌ Fail | ❌ Fail | Same error | **Pre-existing** - not blocking |
+| ❌ Fail | ❌ Fail | Different | **Unknown** - requires manual review |
+| ✅ Pass | * | N/A | N/A - no verification needed |
+### Verdict Mapping
+| Build Verification Result | QA Verdict Impact |
+|---------------------------|-------------------|
+| **Regression detected** | **BLOCKING** - `AC_NOT_MET`, must fix before merge |
+| **Pre-existing failure** | Non-blocking - document and proceed |
+| **Unknown (different errors)** | `AC_MET_BUT_NOT_A_PLUS` - manual review recommended |
+| **Build passes** | No impact |
+### Output Format
+```markdown
+### Build Verification
+| Check | Status |
+|-------|--------|
+| Feature branch build | ❌ Failed |
+| Main branch build | ❌ Failed |
+| Error match | ✅ Same error |
+| Regression | **No** (pre-existing) |
+**Note:** Build failure is pre-existing on main branch. Not blocking this PR.
+```
+### Implementation
+The `quality-checks.sh` script includes:
+- `run_build_with_verification()` - runs build and triggers verification on failure
+- `verify_build_against_main()` - compares build results against main branch
+**How it works:**
+1. Run `npm run build` on feature branch
+2. If build fails, capture exit code and error output
+3. Run build on main branch (via main repo directory, not checkout)
+4. Compare exit codes and first error line
+5. Output Build Verification table with classification
 ## Semgrep Integration
 Semgrep provides static analysis for security vulnerabilities and anti-patterns.
@@ -59,6 +112,71 @@ Semgrep uses stack-specific rulesets for targeted analysis:
 Projects can add custom rules in `.sequant/semgrep-rules.yaml`. These are loaded alongside stack rules automatically.
+## Test Tautology Detection
+Tautological tests are tests that pass but don't call any production code. They provide zero regression protection as they only assert on local values.
+### Detection Logic
+A test block is flagged as tautological if:
+1. It's an `it()` or `test()` block
+2. It contains zero calls to functions imported from source modules
+3. Source modules are relative imports (`./`, `../`) excluding mocks/fixtures/test libraries
+### Verdict Mapping
+| Tautology Result | QA Verdict Impact |
+|------------------|-------------------|
+| >50% of test blocks tautological | **BLOCKING** - `AC_NOT_MET` |
+| 1-50% of test blocks tautological | Warning - `AC_MET_BUT_NOT_A_PLUS` |
+| 0% tautological | No impact |
+| No test blocks in diff | No impact (skipped) |
+### Output Format
+```markdown
+### Test Quality Review
+| Category | Status | Notes |
+|----------|--------|-------|
+| Tautology Check | ⚠️ WARN | 2 tautological test blocks found (25%) |
+**Tautological Tests Found:**
+- `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
+- `src/lib/bar.test.ts:12` - `test("validates")` - No production function calls
+```
+### Example - Tautological vs Real Test
+```typescript
+// TAUTOLOGICAL — flags as warning/blocker
+import { executePhaseWithRetry } from "./run.js";
+it("should retry", () => {
+  const retry = true;
+  expect(retry).toBe(true);  // No production function called!
+});
+// REAL — this is fine
+import { executePhaseWithRetry } from "./run.js";
+it("should retry", async () => {
+  const result = await executePhaseWithRetry(123, "exec", config, ...);
+  expect(result.success).toBe(true);  // Calls production function
+});
+```
+### Implementation
+The `quality-checks.sh` script includes:
+- Tautology detector CLI: `scripts/qa/tautology-detector-cli.ts`
+- Detection library: `src/lib/test-tautology-detector.ts`
+**How it works:**
+1. Get test files from `git diff main...HEAD`
+2. For each test file, extract imports and test blocks
+3. Check if any imported production function is called within each test block
+4. Report tautological tests with file:line references
+5. Block if >50% of test blocks are tautological
 ## Verdict Criteria
 ### `READY_FOR_MERGE`
@@ -73,10 +191,12 @@ Must meet ALL of:
 - ✅ Reversibility Test: Clean revert possible
 - ✅ **Adversarial Test: Failure path tested**
 - ✅ **Edge Case Test: At least 1 edge case per AC tested**
+- ✅ **Execution Evidence: Complete or waived** (see below)
 ### `AC_MET_BUT_NOT_A_PLUS`
 AC met, but one or more issues:
 - ⚠️ Minor scope creep (1-2 extra files)
 - ⚠️ Over-engineering (abstraction not required)
 - ⚠️ Size larger than expected but justified
@@ -99,6 +219,7 @@ All AC items are `MET`, but one or more items have `PENDING` status requiring ex
 ### `AC_NOT_MET`
 Any of:
 - ❌ One or more AC items `NOT_MET` or `PARTIALLY_MET`
 - ❌ Deleted tests without justification
 - ❌ Major scope creep (many unrelated files)
@@ -139,6 +260,65 @@ Any of:
 **Important:** `PARTIALLY_MET` is NOT sufficient for merge. It must be treated as `NOT_MET` for verdict purposes.
+## CI Status Impact on Verdict
+**Purpose:** CI status directly affects verdict when AC items depend on CI (e.g., "Tests pass in CI").
+### CI Status Mapping
+| State | Bucket | AC Status | Verdict Impact |
+|-------|--------|-----------|----------------|
+| `SUCCESS` | `pass` | `MET` | No impact |
+| `FAILURE` | `fail` | `NOT_MET` | Blocks merge |
+| `CANCELLED` | `fail` | `NOT_MET` | Blocks merge |
+| `SKIPPED` | `pass` | `N/A` | No impact |
+| `PENDING` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
+| `QUEUED` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
+| `IN_PROGRESS` | `pending` | `PENDING` | → `NEEDS_VERIFICATION` |
+| (no checks) | - | `N/A` | No CI configured |
+### CI-Related AC Detection
+Identify AC items that depend on CI by matching patterns:
+- "Tests pass in CI"
+- "CI passes"
+- "Build succeeds in CI"
+- "GitHub Actions pass"
+- "Pipeline passes"
+- "Workflow passes"
+- "Checks pass"
+- "Actions succeed"
+- "CI/CD passes"
+### Error Handling
+If `gh pr checks` fails:
+- **`gh` not installed** → Skip CI section: "CI status unavailable (gh CLI not found)"
+- **`gh` not authenticated** → Skip CI section: "CI status unavailable (gh auth required)"
+- **Network/auth error** → Treat as N/A: "CI status unavailable"
+- **No PR exists** → Skip CI check entirely
+- **Empty response** → No CI configured (not an error)
+**Portability:** CI detection requires GitHub (`gh` CLI). GitLab, Bitbucket, Azure DevOps not supported.
+### CI Verdict Rules
+1. **CI failure → AC_NOT_MET:** Any failed CI check that maps to an AC item means that AC is NOT_MET
+2. **CI pending → NEEDS_VERIFICATION:** If CI is still running for a CI-related AC, verdict is NEEDS_VERIFICATION
+3. **No CI configured → N/A:** Mark CI-related AC items as N/A, don't block on missing CI
+4. **CI success → MET:** CI-related AC items are MET when all relevant checks pass
+**Example Scenario:**
+```markdown
+AC-1: "Feature implemented" → MET (code review)
+AC-2: "Tests pass locally" → MET (npm test passed)
+AC-3: "Tests pass in CI" → PENDING (CI in progress)
+AC-4: "Docs updated" → MET (README updated)
+Verdict: NEEDS_VERIFICATION (due to AC-3 PENDING)
+```
 ## Code Review Decision Framework
 ### 1. Purpose Test
@@ -177,3 +357,110 @@ Any of:
 - **Unintegrated exports**: ⚠️ Warning only
 - **Security criticals** > 0: ❌ Blocker
 - **Security warnings** > 0: ⚠️ Review each case
+---
+## Execution Evidence Requirements
+### Purpose
+QA must actually execute code for scripts/CLI changes, not just review it. Analysis of 34 run logs shows zero `/loop` phases triggered - QA passes every time without catching runtime issues.
+### Change Type Detection
+Determine execution requirement based on what files were changed:
+```bash
+# Detect change type
+scripts_changed=$(git diff main...HEAD --name-only | grep -E "^scripts/" | wc -l | xargs)
+cli_changed=$(git diff main...HEAD --name-only | grep -E "(cli|commands?)" | wc -l | xargs)
+ui_changed=$(git diff main...HEAD --name-only | grep -E "^(app|components|pages)/" | wc -l | xargs)
+types_only=$(git diff main...HEAD --name-only | grep -E "\.d\.ts$|^types/" | wc -l | xargs)
+tests_only=$(git diff main...HEAD --name-only | grep -E "\.test\.|\.spec\.|__tests__" | wc -l | xargs)
+```
+### Execution Matrix
+| Change Type | QA Must Execute | Example Command |
+|-------------|-----------------|-----------------|
+| `scripts/` files | ✅ Required | `npx tsx scripts/foo.ts --help` |
+| CLI commands | ✅ Required | `npx sequant <cmd> --help` or dry-run |
+| UI components | ⚠️ Via `/test` | Browser testing required |
+| Types/config only | ❌ Waiver OK | Note: "Types-only change, execution waived" |
+| Tests only | ✅ Run tests | `npm test -- --grep "feature"` |
+### Evidence Collection
+For each executable change, QA must:
+1. **Identify a safe smoke command:**
+   - Prefer `--help`, `--dry-run`, or `--version` flags
+   - For scripts: pass minimal safe arguments
+   - Never execute destructive operations
+2. **Execute and capture:**
+   ```bash
+   # Example for a script
+   npx tsx scripts/analytics.ts --help 2>&1
+   echo "Exit code: $?"
+   ```
+3. **Record in output:**
+   ```markdown
+   ### Execution Evidence
+   | Test Type | Command | Exit Code | Result |
+   |-----------|---------|-----------|--------|
+   | Smoke test | `npx tsx scripts/analytics.ts --help` | 0 | Usage info displayed ✓ |
+   | Dry run | `npx tsx scripts/migrate.ts --dry-run` | 0 | Plan shown, no changes ✓ |
+   **Evidence status:** Complete
+   ```
+### Evidence Status Definitions
+| Status | Definition | Verdict Eligibility |
+|--------|------------|---------------------|
+| **Complete** | All required commands executed successfully | `READY_FOR_MERGE` eligible |
+| **Incomplete** | Some commands not run or failed | `AC_MET_BUT_NOT_A_PLUS` max |
+| **Waived** | Explicit reason documented | `READY_FOR_MERGE` eligible |
+| **Not Required** | No executable changes | `READY_FOR_MERGE` eligible |
+### Waiver Criteria
+Execution can be waived with documented reason:
+| Waiver Reason | Example |
+|---------------|---------|
+| Types-only change | "Only `.d.ts` files modified" |
+| Config-only change | "Only `tsconfig.json` or `.eslintrc` modified" |
+| Documentation-only | "Only `.md` files modified" |
+| Test-only change | "Only test files modified, tests run via `npm test`" |
+**Waiver format:**
+```markdown
+### Execution Evidence
+**Status:** Waived
+**Reason:** Types-only change - only `.d.ts` files modified
+```
+### Verdict Gating
+| Verdict | Evidence Requirement |
+|---------|---------------------|
+| `READY_FOR_MERGE` | Evidence: Complete OR Waived (with reason) OR Not Required |
+| `AC_MET_BUT_NOT_A_PLUS` | Evidence: Incomplete + note explaining gap |
+| `AC_NOT_MET` | N/A (AC issues take precedence) |
+### Integration with /verify
+For complex CLI features, `/verify` provides more comprehensive execution testing:
+1. QA detects `scripts/` changes
+2. Basic smoke test in QA (--help, --dry-run)
+3. For full verification: recommend `/verify <issue> --command "..."`
+4. `/verify` posts evidence to issue
+5. Re-run QA to see verification evidence
+See [/verify skill](../../verify/SKILL.md) for detailed execution verification.

package/templates/skills/qa/references/test-quality-checklist.md ADDED Viewed

@@ -0,0 +1,272 @@
+# Test Quality Checklist
+## Purpose
+This checklist helps QA evaluate the quality of tests added or modified during implementation. Tests that pass but don't actually validate behavior create false confidence.
+## When to Apply
+Apply this checklist when:
+- New test files are added
+- Existing test files are modified
+- AC items specifically mention testing requirements
+**Skip if:** No test files were added or modified.
+## Checklist Sections
+### 1. Behavior vs Implementation
+Tests should assert on **observable outputs**, not internal state.
+| Check | Pass | Fail |
+|-------|------|------|
+| Tests assert on return values, rendered output, or API responses | ✅ | ❌ Asserts on private variables or internal state |
+| Refactoring internals wouldn't require test changes | ✅ | ❌ Tests break when implementation changes but behavior doesn't |
+| Tests describe "what" not "how" | ✅ | ❌ Test names describe implementation details |
+**Example - Good:**
+```typescript
+it('returns user profile when authenticated', async () => {
+  const result = await getProfile(validToken);
+  expect(result.name).toBe('John');
+});
+```
+**Example - Bad:**
+```typescript
+it('calls internal _fetchUser method', async () => {
+  const spy = jest.spyOn(service, '_fetchUser');
+  await getProfile(validToken);
+  expect(spy).toHaveBeenCalled(); // Testing implementation, not behavior
+});
+```
+### 2. Coverage Depth
+Tests should cover more than the happy path.
+| Check | Pass | Fail |
+|-------|------|------|
+| Error paths tested (what happens when things fail?) | ✅ | ❌ Only success scenarios |
+| Boundary conditions tested (empty, null, max values) | ✅ | ❌ Only typical inputs |
+| Edge cases identified and tested | ✅ | ❌ Assumes inputs are always valid |
+**Required error path tests:**
+- [ ] Empty input handling
+- [ ] Null/undefined handling
+- [ ] Invalid format handling
+- [ ] Network/API failure handling (if applicable)
+- [ ] Permission denied handling (if applicable)
+### 3. Mock Hygiene
+Mocks should be minimal and purposeful.
+| Check | Pass | Fail |
+|-------|------|------|
+| Only external dependencies mocked (APIs, DB, file system) | ✅ | ❌ Internal modules mocked |
+| Not mocking the thing being tested | ✅ | ❌ Subject under test is partially mocked |
+| Mock return values match real API contracts | ✅ | ❌ Mocks return impossible data |
+| Mocks cleaned up after tests | ✅ | ❌ Mocks leak between tests |
+**Over-mocking indicators:**
+- More than 3 modules mocked in a single test file
+- Mock setup is longer than the test itself
+- Tests pass but feature doesn't work in production
+**Example - Over-mocked (bad):**
+```typescript
+jest.mock('../utils');
+jest.mock('../helpers');
+jest.mock('../validators');
+jest.mock('../formatters');
+// 4 mocks for a simple unit test = over-mocking
+```
+### 4. Test Reliability
+Tests should be deterministic and independent.
+| Check | Pass | Fail |
+|-------|------|------|
+| No timing-dependent assertions | ✅ | ❌ Uses setTimeout, expects specific timing |
+| Tests are deterministic (same result every run) | ✅ | ❌ Flaky tests that sometimes fail |
+| Tests are independent (order doesn't matter) | ✅ | ❌ Tests depend on previous test state |
+| Async operations properly awaited | ✅ | ❌ Fire-and-forget async calls |
+**Flaky test indicators:**
+- Tests that pass locally but fail in CI
+- Tests that fail intermittently
+- Tests with `setTimeout` or `sleep` calls
+- Tests that depend on system time
+**Use instead:**
+```typescript
+// Bad: setTimeout
+await new Promise(resolve => setTimeout(resolve, 1000));
+// Good: waitFor
+await waitFor(() => expect(element).toBeVisible());
+```
+## Common Anti-Patterns
+### 1. Snapshot Abuse
+**Problem:** Snapshots used for complex objects instead of specific assertions.
+**Detection:**
+Use the Glob tool to count snapshot and test files:
+```
+# Count snapshot files
+Glob(pattern="**/*.snap")  # Count results
+# Count test files
+Glob(pattern="**/*.test.*")  # Count results
+# Ratio > 0.5 may indicate overuse
+```
+**Flag if:**
+- Snapshots contain >50 lines
+- Snapshot changes are approved without review
+- Tests only use `toMatchSnapshot()` with no other assertions
+### 2. Test Data Coupling
+**Problem:** Tests share mutable state or depend on database seeding order.
+**Detection:**
+- Look for `beforeAll` that sets up shared state
+- Tests that fail when run in isolation (`it.only`)
+### 3. Implementation Mirroring
+**Problem:** Tests that duplicate the implementation logic.
+**Example - Bad:**
+```typescript
+it('calculates total', () => {
+  const items = [{price: 10}, {price: 20}];
+  // This mirrors the implementation exactly
+  const expected = items.reduce((sum, i) => sum + i.price, 0);
+  expect(calculateTotal(items)).toBe(expected);
+});
+```
+**Better:**
+```typescript
+it('calculates total', () => {
+  const items = [{price: 10}, {price: 20}];
+  expect(calculateTotal(items)).toBe(30); // Known correct value
+});
+```
+### 4. Tautological Tests (Automated Detection)
+**Problem:** Tests that pass but don't call any production code. These tests provide zero regression protection.
+**Detection:** Automated via `tautology-detector-cli.ts` — runs during QA quality checks.
+**Example - Tautological (BAD):**
+```typescript
+import { processData } from './processor';
+it('should process correctly', () => {
+  const result = true;  // Never calls processData!
+  expect(result).toBe(true);
+});
+```
+**Example - Real Test (GOOD):**
+```typescript
+import { processData } from './processor';
+it('should process correctly', () => {
+  const result = processData({ value: 42 });  // Actually calls production code
+  expect(result.success).toBe(true);
+});
+```
+**Detection Heuristic:**
+1. Extract imports from source modules (relative paths like `./`, `../`)
+2. Exclude test libraries (`vitest`, `jest`, `@testing-library`, etc.)
+3. Exclude mock/fixture imports (paths containing `mock`, `fixture`, `stub`, etc.)
+4. For each `it()` / `test()` block, check if any imported function is called
+5. Flag blocks where zero production functions are invoked
+**Blocking Threshold:** If >50% of test blocks in the diff are tautological, merge is blocked.
+## Output Format
+Include this section in QA output when test files are modified:
+```markdown
+### Test Quality Review
+| Category | Status | Notes |
+|----------|--------|-------|
+| Tautology Check | ✅ OK | All tests call production code |
+| Behavior vs Implementation | ✅ OK | Tests assert on outputs |
+| Coverage Depth | ⚠️ WARN | Missing error path tests |
+| Mock Hygiene | ✅ OK | Minimal mocking |
+| Test Reliability | ✅ OK | No timing dependencies |
+**Issues Found:**
+- `auth.test.ts:45` - Missing error path for invalid token
+- `utils.test.ts` - 4 modules mocked (over-mocking)
+**Suggestions:**
+1. Add test for invalid token scenario
+2. Reduce mocks in utils.test.ts to external dependencies only
+```
+### Tautology Check Output (Automated)
+When tautological tests are detected:
+```markdown
+### Test Quality Review
+| Category | Status | Notes |
+|----------|--------|-------|
+| Tautology Check | ⚠️ WARN | 2 tautological test blocks found (25%) |
+**Tautological Tests Found:**
+- `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
+- `src/lib/bar.test.ts:12` - `test("validates input")` - No production function calls
+**Verdict Impact:** 25% tautological — warning only, review tests before merge
+```
+When >50% are tautological (blocking):
+```markdown
+### Test Quality Review
+| Category | Status | Notes |
+|----------|--------|-------|
+| Tautology Check | ❌ FAIL | 3 tautological test blocks found (75%) |
+**Tautological Tests Found:**
+- `src/lib/foo.test.ts:45` - `it("should work")` - No production function calls
+- `src/lib/foo.test.ts:52` - `it("should handle input")` - No production function calls
+- `src/lib/bar.test.ts:12` - `test("validates")` - No production function calls
+**Verdict Impact:** >50% tautological — blocks `READY_FOR_MERGE`
+```
+## Verdict Impact
+| Test Quality | Verdict Impact |
+|--------------|----------------|
+| All checks pass | No impact |
+| 1-2 warnings | Note in QA, no verdict change |
+| Over-mocking (4+ mocks) | `AC_MET_BUT_NOT_A_PLUS` |
+| No error path tests | `AC_MET_BUT_NOT_A_PLUS` |
+| Tests mirror implementation | `AC_MET_BUT_NOT_A_PLUS` |
+| Tautological tests (1-50%) | `AC_MET_BUT_NOT_A_PLUS` |
+| **Tautological tests (>50%)** | `AC_NOT_MET` (blocker) |
+| Flaky tests introduced | `AC_NOT_MET` (blocker) |
+| Tests deleted without justification | `AC_NOT_MET` (blocker) |

package/templates/skills/qa/references/testing-requirements.md CHANGED Viewed

@@ -1,5 +1,45 @@
 # Testing Requirements
+## Test Quality Guidelines
+**The goal is NOT test quantity — it's transparency about what's actually being tested.**
+### A Quality Test:
+- **Tests behavior, not implementation details** - Assert on outputs, not internal state
+- **Covers primary use case + at least 1 failure path** - Happy path alone is insufficient
+- **Fails when the feature breaks, passes when it works** - Actually validates the feature
+- **Uses realistic inputs** - Not contrived data that never occurs in production
+### Avoid:
+- ❌ Tests that mock everything (tests the mocks, not the code)
+- ❌ Tests that only cover happy path (miss real failures)
+- ❌ Tests written just to hit coverage numbers (low value)
+- ❌ Snapshot tests over 50 lines (too brittle, hard to review)
+- ❌ Tests that mirror implementation (break with any refactor)
+### Test Value Hierarchy
+| Test Type | Value | When to Use |
+|-----------|-------|-------------|
+| **Integration tests** | High | Critical paths, user flows |
+| **Unit tests (behavior)** | Medium-High | Business logic, utilities |
+| **Unit tests (implementation)** | Low | Avoid - too brittle |
+| **Snapshot tests** | Low | UI components only, small snapshots |
+### Test-to-Code Ratio Guidelines
+Don't chase coverage percentages. Instead:
+| Change Type | Recommended Approach |
+|-------------|---------------------|
+| **Critical path** (auth, payments) | Test thoroughly - multiple scenarios |
+| **Business logic** | Test primary behavior + 1-2 edge cases |
+| **Simple utilities** | Single test covering main use case |
+| **UI tweaks** | Manual verification often sufficient |
+| **Types/config** | No tests needed |
+---
 ## Adversarial Thinking Checklist
 **STOP and ask these questions before any READY_FOR_MERGE verdict:**