npm - @brunosps00/dev-workflow - Versions diffs - 0.10.0 → 0.13.0 - Mend

@brunosps00/dev-workflow 0.10.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (120) hide show

package/scaffold/skills/dw-testing-discipline/SKILL.md ADDED Viewed

@@ -0,0 +1,148 @@
+---
+name: dw-testing-discipline
+description: Use when authoring, reviewing, or debugging tests — enforces Six Iron Laws (behavior over mocks, push to lowest layer, fix prod first on red, real systems gate merge), 25 anti-patterns, 7 AI agent gates, and flaky-test discipline so tests reveal bugs instead of decorating CI.
+---
+# Testing Discipline
+> **Inspired by** [`pedronauck/skills/testing-boss`](https://github.com/pedronauck/skills/tree/main/skills/mine/testing-boss) (MIT). Six Iron Laws, positive/anti-pattern catalogs, AI agent gates, and flaky-test taxonomy adapted from Pedro Nauck's work. The browser security-boundary and three-workflow-patterns references additionally cite [`addyosmani/agent-skills/browser-devtools`](https://github.com/addyosmani/agent-skills) (MIT), and Playwright recipes carry over from earlier dev-workflow work.
+## Cardinal Premise
+> Tests exist to expose defects, not to keep CI green.
+> A test that fails has done its job.
+> A test that passes for the wrong reason is worse than no test.
+## Six Iron Laws
+```
+1. Test the behavior, never the mock.
+2. Push every test to the lowest layer that can detect the failure.
+3. When a test fails, fix production first — change the test only after writing why.
+4. Real systems gate the merge. Mocks isolate; they do not validate.
+5. Coverage is a flashlight. Mutation score is a quality probe. Neither is a target.
+6. No test-only methods, branches, or flags leak into production code.
+```
+Each law has nuance — read `references/iron-laws.md` for the full version with examples.
+## Required Reading Router
+| Task | MUST read |
+|------|-----------|
+| Deciding where a test belongs | `references/iron-laws.md` (Law 2 deep-dive) |
+| Writing new tests | `references/positive-patterns.md` |
+| Reviewing / debugging tests | `references/anti-patterns.md` |
+| Test authored by an AI agent | `references/ai-agent-gates.md` + `references/anti-patterns.md` |
+| Flaky tests appeared | `references/flaky-discipline.md` |
+| Browser-based E2E with Playwright | `references/playwright-recipes.md` |
+| Browser security boundary testing | `references/security-boundary.md` |
+| Picking the right test workflow (UI vs network vs perf) | `references/three-workflow-patterns.md` |
+## Twelve positive patterns (one-liners, full version in references/positive-patterns.md)
+1. Query by behavior and accessible role; never CSS selectors or DOM indices.
+2. Selector hierarchy: role → label → text → test-id → structural (stop at highest rung that disambiguates).
+3. Wait on observable conditions; never wall-clock sleeps.
+4. Each test independent and order-free; setup over teardown.
+5. One behavior per test; as many assertions as that behavior needs.
+6. Names read like specifications: `should <outcome> when <condition> given <state>`.
+7. Table-driven / parameterized when inputs vary.
+8. Build test data via factories; literal blobs only for fields under test.
+9. Mock at boundaries you don't control; real wiring for owned systems.
+10. Real systems gate final merge; contract tests bridge unit and E2E.
+11. Mutation score, not coverage percentage, measures suite strength.
+12. Page Object Model is a tool, not a religion — collapse for small suites.
+## Five anti-pattern families (25 total, full catalog in references/anti-patterns.md)
+**Brittleness** — tests bound to internals:
+- Implementation-detail selectors, internal-structure assertions, testing private methods, snapshot-as-test, vague existence assertions, action-without-assertion.
+**Flakiness** — tests randomizing verdicts:
+- Static sleeps, test order dependency, non-deterministic inputs (clock, RNG, locale).
+**Mock misuse** — tests testing the test setup:
+- Asserting the mock exists, mock drift, over-mocking children, incomplete mocks, mocking wrong level.
+**Process** — team and suite pathologies:
+- Coverage-as-vanity, happy-path-only, eternal `beforeAll`, cleanup in `afterEach`, magic strings, testing third-party sites, quarantine-as-cemetery, retry-as-fix, duplicate tests across layers, weakening tests to make them pass, mock-driven confidence.
+**AI-specific** — agent failure modes:
+- The seven failure modes that gates in `ai-agent-gates.md` block.
+## Seven AI agent gates (mandatory when an agent writes tests)
+These are mandatory pre-conditions whenever an LLM produces test code. Each gate is a forcing function against a specific LLM tendency:
+1. **Invariant first** — agent prints `INVARIANT: …`, `OWNING_LAYER: …`, `EXISTING_SUITE: …` before any code.
+2. **Owning layer** — extend an existing suite; reject new files without a named invariant.
+3. **Real execution** — every test runs against real DB / real route / real external integration at least once before merging.
+4. **Failure → fix production** — on a red test, the next move reads production code, NOT the test. Document the analysis before changing either.
+5. **No snapshot without contract** — classify the artifact as `PRODUCT_CONTRACT` or `IMPLEMENTATION_DETAIL`. The latter forbids snapshots.
+6. **No assertion on self-set mock** — cannot assert on values the same test body wrote into the mock.
+7. **Negative companion** — every positive assertion ships with a negative test for invalid input or failure mode.
+Full prompt blocks and verification recipes in `references/ai-agent-gates.md`.
+## Placement doctrine (tripwires)
+Before writing test code:
+- Name the invariant in **one sentence**. Fuzzy language signals unclear requirements — stop and clarify.
+- Place the test at the **lowest layer** capable of detecting the failure when the invariant breaks.
+- Reject tests where `(likelihood × blast-radius)` falls below the ten-minute-maintenance threshold (the test is more expensive to maintain than the bug would be to fix).
+## Flaky discipline (tripwires)
+- Quarantine flaky tests within ONE HOUR of detection. Assign a named owner within 24 hours with a fix-by date.
+- Track `flaky_rate` as a first-class metric: SLO under 1–2%; alert at >5%.
+- Real systems at the final gate: mock at unit; contract-test boundaries; real DB/queue/route at integration; near-zero mocks at E2E.
+Full taxonomy in `references/flaky-discipline.md`.
+## Cross-cutting red flags
+Any of these in a PR triggers REJECTED in `/dw-code-review`:
+- Mock setup larger than test logic.
+- Test breaks when an internal method is renamed (not the public contract).
+- Removing the assertion body leaves the test green.
+- Test fails when run with `.only` in isolation.
+- `sleep`, `Thread.sleep`, or `cy.wait(<number>)` appears.
+- Selector contains CSS class, index, or `xpath`.
+- Test asserts a third-party site is reachable.
+- Snapshot diffs accepted without reading.
+- Coverage percentage is the only metric quoted.
+- Failing tests auto-retried until green; no investigation.
+- Skipped/quarantined tests without named owner and fix-by date.
+- Test depends on `new Date()`, `Math.random()`, or system locale.
+- `afterEach` resets database (move to `beforeEach`).
+- AI-written test has 6+ assertions and zero edge cases.
+- Phrase "I'll mock this to be safe" appears in the diff.
+## When NOT to use this skill
+- General code review unrelated to tests.
+- Library-specific debugging where the test is just a reproduction.
+- Non-testing CI pipeline design (deploys, artifacts, secrets).
+- Production observability and alerting.
+- Single-line typo fixes in existing tests.
+## Integration with dev-workflow commands
+- `/dw-create-tasks` uses the placement doctrine — each test-adding task must name the invariant.
+- `/dw-run-task` applies the 7 AI gates when generating tests as part of implementation.
+- `/dw-code-review` runs the anti-pattern checks on diff hunks under test paths.
+- `/dw-fix-qa` runs flaky-discipline taxonomy when retesting bugs.
+- `/dw-run-qa` (UI mode) references `playwright-recipes.md` for concrete recipes.
+## Why this skill exists
+The previous bundled skill (`webapp-testing`) mixed Playwright recipes with two discipline references (`security-boundary`, `three-workflow-patterns`) added later. The discipline references were enterred in a tactical skill that the agent didn't reach for as doctrine.
+This skill consolidates: doctrine at the top, Playwright recipes as one reference, security and workflow patterns as their own references. One skill, coherent voice, doctrine-first.
+## Bottom line
+> A test that cannot fail is decorative. A test that fails for the wrong reason is misleading. Build tests that fail for exactly one reason — the reason the invariant was violated — and trust them when they do. Mocks isolate. Real systems validate. Coverage shines a light. Mutation score grades the suite. Agents will reach for the mock and the snapshot; the gates here make them put both down. Tests reveal bugs, not just pass.

package/scaffold/skills/dw-testing-discipline/references/ai-agent-gates.md ADDED Viewed

@@ -0,0 +1,170 @@
+# Seven AI agent gates — mandatory when an LLM writes tests
+LLMs have characteristic failure modes when authoring tests. These gates are forcing functions for the seven most common.
+Every test produced by an agent (via `/dw-run-task`, `/dw-bugfix`, `/dw-autopilot`, or any other code-generating flow) must pass all seven gates BEFORE the diff is presented for review.
+## Gate 1: Invariant first
+**The failure mode it blocks:** Agent writes 200 lines of test code without articulating what the test is supposed to prove.
+**The gate:**
+Before writing any test code, the agent prints:
+```
+INVARIANT: <one sentence: what behavior is true that the test verifies>
+OWNING_LAYER: <unit | integration | contract | e2e>
+EXISTING_SUITE: <path to existing test file the new test joins>
+```
+**Why it works:**
+- "Invariant" forces specific behavior naming.
+- "Owning layer" forces Law 2 (lowest detectable layer).
+- "Existing suite" forces extending coverage rather than spawning new files.
+**Verification:** In `/dw-code-review`, look for this 3-line preamble in the PR description or the commit body. Missing = REJECTED.
+## Gate 2: Owning layer
+**The failure mode it blocks:** Agent creates a new test file every time, scattering coverage across orphan files. Or, agent writes E2E tests for things unit could prove.
+**The gate:**
+The agent must:
+1. Identify the existing test suite that owns the module under test.
+2. Extend that suite, OR document why a new suite is needed (genuinely new module, new test pyramid layer).
+3. Map the test to the right layer per Law 2.
+**Verification:**
+- New test file in PR but existing file covers the same module? REJECTED.
+- E2E test for pure-logic invariant? REJECTED unless documented.
+- Unit test for cross-service flow? REJECTED — push to integration/E2E.
+## Gate 3: Real execution
+**The failure mode it blocks:** Agent writes tests that mock everything. They pass green forever and validate nothing.
+**The gate:**
+Every test path the agent writes must, at SOME layer, run against real systems before the diff merges:
+- Pure logic: unit only is fine.
+- Code that touches DB: must have at least one integration test running real DB (testcontainers, ephemeral container, dedicated test DB).
+- Code that calls external services: must have a contract test OR a sandbox-account smoke test.
+- UI interactions: must have at least one E2E run on a real preview environment.
+**Verification:** PR description must list the real-system runs that exercise the touched code. If no real-system path covers the change, REJECTED.
+## Gate 4: Failure → fix production
+**The failure mode it blocks:** Agent sees test red, modifies the test until green. Bug ships.
+**The gate:**
+When the agent encounters a failing test (its own or pre-existing):
+1. Print: `INVESTIGATING FAILURE: <test name>`
+2. Read production code in the path that produces the observed value.
+3. Print: `ANALYSIS: <2-3 sentences on whether production is wrong, test is wrong, or invariant changed>`
+4. Decide:
+   - Production wrong → fix production.
+   - Test wrong → fix test AND document the change in the commit body.
+   - Invariant changed → update the test AND open an ADR if the change is a public contract change.
+**Verification:** Every commit that changes a previously-green test must have an `ANALYSIS:` line in the commit body explaining the decision. Missing = REJECTED.
+## Gate 5: No snapshot without contract
+**The failure mode it blocks:** Agent reaches for `toMatchSnapshot()` whenever it doesn't know what to assert. Snapshot becomes the assertion. Drift goes unnoticed.
+**The gate:**
+Before adding a snapshot assertion, the agent classifies the artifact:
+- **PRODUCT_CONTRACT**: a stable contract worth pinning (e.g., serialized output of a public API, schema of a stored record). Snapshot is appropriate. Document the classification.
+- **IMPLEMENTATION_DETAIL**: HTML structure, internal representation, component tree shape. Snapshot is FORBIDDEN. Write specific assertions instead.
+**Verification:** Snapshots in the diff without a classification comment = REJECTED. Snapshots classified as IMPLEMENTATION_DETAIL = REJECTED.
+## Gate 6: No assertion on self-set mock
+**The failure mode it blocks:** Agent writes `mockFn.mockReturnValue('X')`, then asserts `expect(mockFn()).toBe('X')`. Proves nothing.
+**The gate:**
+The agent cannot assert on values it directly fed into a mock. Assertions must be on:
+- The OUTPUT of production code that consumed the mock.
+- The SIDE EFFECTS (DB state, network calls, event emissions) caused by production code.
+- The VISIBLE behavior (UI change, log line, response) the user/caller observes.
+**Verification:** Diff analysis flags pairs where a literal value appears in BOTH a mock setup AND an assertion. Flagged = REJECTED unless the agent can show the value passed through production code.
+## Gate 7: Negative companion
+**The failure mode it blocks:** Agent writes happy-path-only tests. Edge cases, error paths, boundaries uncovered.
+**The gate:**
+Every positive assertion the agent writes ships WITH at least one negative companion:
+- Asserting `createUser(validInput)` succeeds → also assert `createUser(invalidInput)` fails with a specific error.
+- Asserting `parseDate(validString)` returns a Date → also assert `parseDate(invalidString)` throws/returns null.
+- Asserting `transferFunds(...)` succeeds with sufficient balance → also assert it fails with insufficient balance.
+**Verification:** A PR adding N positive assertions must add ≥1 negative assertion per public path. Imbalance >3:1 (positive:negative) on a public path = REJECTED.
+## How the gates compose
+Together, the seven gates produce tests that:
+1. State what they prove (invariant first).
+2. Live at the right layer (owning layer).
+3. Exercise reality somewhere (real execution).
+4. Reveal bugs when red (failure → fix production).
+5. Assert specifically, not via snapshots (no snapshot w/o contract).
+6. Assert outputs, not setup (no self-mock assertion).
+7. Cover failures, not just success (negative companion).
+A test passing all seven is a test worth running. A test missing any one is more likely to mislead than help.
+## Override procedure
+If an agent (or user) wants to skip a gate, they must:
+1. State which gate is being skipped.
+2. State why (one sentence).
+3. Add a `// SKIP-GATE-N: <reason>` comment in the test.
+4. Open a follow-up issue tracking the gap.
+Without all four, the gate is enforced.
+## Prompt block to include when invoking the agent
+```
+You are about to write tests. Before producing test code, complete the
+seven-gate preamble:
+INVARIANT: ___
+OWNING_LAYER: ___
+EXISTING_SUITE: ___
+If you cannot complete these three lines, STOP and ask the user for
+the requirement (do not invent an invariant).
+Then, while writing tests:
+- Real execution: name the real-system path covering this code.
+- On red: investigate production before changing tests; print ANALYSIS.
+- Snapshots: classify as PRODUCT_CONTRACT or IMPLEMENTATION_DETAIL.
+- Assertions: never assert on values you fed into a mock.
+- Coverage: every positive assertion needs a negative companion.
+Tests that violate gates without explicit SKIP-GATE-N comments will be
+REJECTED at review.
+```
+`/dw-run-task` and `/dw-bugfix` inject this prompt before generating test code.
+## Why these seven and not more
+These are the seven LLM failure modes empirically observed in test generation across multiple projects (per pedronauck/skills/testing-boss, MIT, plus dev-workflow internal observation). Other tendencies exist; they're either covered by the positive patterns (e.g., wall-clock waits) or have lower hit-rate.
+If a NEW LLM failure mode appears that none of the seven catches, add a gate AND document the failure mode that motivated it. Don't add gates speculatively.

package/scaffold/skills/dw-testing-discipline/references/anti-patterns.md ADDED Viewed

@@ -0,0 +1,336 @@
+# Anti-patterns — 25 patterns across 5 families
+Each anti-pattern names the smell, shows the violation in pseudo-code, gives the fix, and notes how `/dw-code-review` detects it.
+---
+## Family 1: Brittleness (tests bound to internals)
+### A1. Implementation-detail selectors
+**Violation:**
+```javascript
+await page.click('.btn.btn-primary.checkout-button');
+```
+**Fix:** Use `getByRole('button', { name: 'Checkout' })`.
+**Detection:** Grep for `.click(`, `.querySelector(`, `cy.get('.`, `getByTestId(` with a class-flavored argument.
+### A2. Testing internal structure vs observable behavior
+**Violation:**
+```javascript
+expect(component.state.cart.items.length).toBe(3);
+```
+**Fix:** Assert what the user sees: `expect(await page.getByText('3 items in cart')).toBeVisible()`.
+**Detection:** Tests that import/inspect internal state, refs, or private fields. Class-based component tests that read `.state` directly.
+### A3. Testing private methods directly
+**Violation:**
+```javascript
+expect(orderService._calculateTax(...)).toBe(8.5);
+```
+**Fix:** Test the public method that uses tax calculation; verify the result. If the private method is independently complex, extract it to a module and test that module's public API.
+**Detection:** Identifiers starting with `_` accessed from tests.
+### A4. Snapshot-as-test (snapshot replacing assertion)
+**Violation:**
+```javascript
+expect(rendered).toMatchSnapshot();  // ← only assertion in test
+```
+**Fix:** Either:
+1. Write specific assertions about what the component renders, OR
+2. Use a snapshot AS A SECONDARY check after specific assertions, with a comment classifying the snapshot as `PRODUCT_CONTRACT` (UI contract worth pinning) — never `IMPLEMENTATION_DETAIL`.
+**Detection:** Tests where the only assertion is `toMatchSnapshot` or equivalent.
+### A5. Vague existence assertions
+**Violation:**
+```javascript
+expect(result).toBeTruthy();
+expect(element).toBeDefined();
+expect(button).should('exist');
+```
+**Fix:** Assert what you actually want: `expect(result.status).toBe('success')`, `expect(button).toBeEnabled()`, `expect(button).toHaveText('Continue')`.
+**Detection:** Tests asserting only existence/truthiness without follow-up semantic check.
+### A6. Action without assertion
+**Violation:**
+```javascript
+test('clicking save works', async () => {
+  await page.getByRole('button', { name: 'Save' }).click();
+  // ← no assertion. What did "works" mean?
+});
+```
+**Fix:** Define what "works" means. Assert the observable outcome: URL changed, modal closed, success message visible, data persisted.
+**Detection:** Tests with `await x.click()` or `await x.type()` followed by no `expect(...)`.
+---
+## Family 2: Flakiness (tests randomizing verdicts)
+### A7. Static sleeps / fixed-timeout waits
+**Violation:**
+```javascript
+await page.waitForTimeout(2000);
+```
+**Fix:** `await expect(page.getByText(/order confirmed/i)).toBeVisible({ timeout: 5000 })` — wait on the actual condition.
+**Detection:** Grep for `waitForTimeout`, `cy.wait(<number>)`, `sleep(`, `Thread.sleep`, `time.sleep` in test files.
+### A8. Test order dependency / hidden shared state
+**Violation:** Test B passes only after Test A has run because A populates a shared cache or DB row.
+**Fix:** Each test sets up its own state in `beforeEach`. Verify by running tests with `--shuffle` or `--randomize`.
+**Detection:** Tests fail when run with `.only`. Tests fail with `--shuffle`. Setup in `beforeAll` instead of `beforeEach`.
+### A9. Non-deterministic inputs (clock, RNG, locale)
+**Violation:**
+```javascript
+test('today is Monday', () => {
+  expect(new Date().getDay()).toBe(1);  // fails 6 days a week
+});
+```
+**Fix:** Mock the clock (`vi.useFakeTimers()`, `jest.useFakeTimers()`, `freezegun` in Python). Seed RNG. Pin locale.
+**Detection:** Tests using `new Date()`, `Date.now()`, `Math.random()`, `Intl.DateTimeFormat` without fakes.
+---
+## Family 3: Mock misuse (tests testing the test setup)
+### A10. Asserting the mock exists
+**Violation:**
+```javascript
+expect(mockFn).toBeDefined();
+```
+**Fix:** Don't assert on mock setup. If the mock is wrong, the behavior assertion downstream will fail naturally.
+**Detection:** Mock functions referenced in assertions without `toHaveBeenCalled` semantics.
+### A11. Mock drift
+**Violation:** Mocked API response set up 6 months ago still returns `{ status: 'OK' }` while the real API now returns `{ ok: true }`.
+**Fix:** Contract testing (Pact, schemathesis) or periodic recording (msw + real-traffic capture). Re-validate mocks against real APIs quarterly.
+**Detection:** Tests with mocks that haven't been touched in >90 days against APIs that have changed. Hard to detect in CI; needs explicit contract checks.
+### A12. Over-mocking child components
+**Violation:**
+```javascript
+vi.mock('./UserAvatar');
+vi.mock('./UserMenu');
+vi.mock('./UserBanner');
+// ... testing nothing real
+```
+**Fix:** Mock at boundaries (HTTP, DB, third-party SDKs). Render real children unless they're genuinely expensive or test-irrelevant.
+**Detection:** Test files with >3 module mocks of internal modules.
+### A13. Incomplete mocks (missing fields the code reads)
+**Violation:**
+```javascript
+const mockUser = { id: 1 };  // missing email, but code reads user.email
+```
+**Fix:** Use a factory that supplies sensible defaults for ALL fields the type/contract declares.
+**Detection:** Runtime errors like `Cannot read property 'X' of undefined` inside production code under test.
+### A14. Mocking wrong level (mocking methods the logic depends on)
+**Violation:**
+```javascript
+// Testing OrderService, but mocking its private calculate() method
+const service = new OrderService();
+vi.spyOn(service, 'calculate').mockReturnValue(100);
+expect(service.processOrder(...)).toBe(/* uses mocked 100 */);
+```
+You've tested the SCAFFOLD, not the logic.
+**Fix:** Mock at the EDGE (DB call, HTTP call, time). Let internal logic run.
+**Detection:** Spies on methods of the System Under Test itself.
+---
+## Family 4: Process (team and suite pathologies)
+### A15. Coverage as vanity metric
+**Violation:** PR comments demanding "you need to hit 90% coverage" with no discussion of what the coverage means.
+**Fix:** Coverage is a flashlight. Use it to FIND blind spots. Don't optimize for the number.
+**Detection:** Cultural; visible in PR templates that gate on coverage percentage.
+### A16. Happy-path-only coverage
+**Violation:** Every test exercises the success case. Edge cases, error paths, boundary values uncovered.
+**Fix:** For each unit, write at minimum: happy path + 1 boundary + 1 invalid input + 1 failure path.
+**Detection:** Tests where every assertion is positive (`toBe`, `toEqual`) and none is negative (`toThrow`, `toReject`).
+### A17. Eternal `beforeAll` / shared setup hiding dependencies
+**Violation:**
+```javascript
+beforeAll(async () => {
+  await db.users.create([100 users]);
+  await db.orders.create([500 orders]);
+});
+```
+Tests now SHARE state. Order matters. Cleanup is fragile.
+**Fix:** `beforeEach` with minimal setup specific to each test.
+**Detection:** `beforeAll` blocks creating data (vs `beforeAll` blocks doing one-time framework setup like spinning testcontainers).
+### A18. Cleanup in `afterEach` (use `beforeEach` instead)
+**Violation:**
+```javascript
+afterEach(async () => {
+  await db.users.deleteAll();
+});
+```
+If a test fails mid-run, cleanup might not run; next test starts dirty.
+**Fix:** `beforeEach` with explicit setup-from-clean (truncate + seed). Reliable regardless of previous test outcome.
+**Detection:** `afterEach` blocks doing state reset.
+### A19. Magic strings and logic in tests
+**Violation:**
+```javascript
+const TIMESTAMP = '2024-01-15T10:30:00Z'; // why?
+expect(formatted).toBe('a long string with embedded specifics');
+```
+When the test fails, what was the test's INTENT?
+**Fix:** Use factories with named defaults. Extract magic values to constants with documenting names. Use snapshot testing for legitimate snapshot cases (with classification).
+**Detection:** Test files with ≥10 string literals not bound to a named variable.
+### A20. Testing against third-party sites you don't control
+**Violation:**
+```javascript
+test('Google homepage loads', async ({ page }) => {
+  await page.goto('https://google.com');
+  expect(await page.title()).toContain('Google');
+});
+```
+You're testing Google's availability, not your code.
+**Fix:** Mock the third party. Use a wiremock or msw to fake their responses. If you must call them, do it in a separate "external dependencies up" smoke test, not unit/integration.
+**Detection:** External URLs in test code outside designated smoke tests.
+### A21. Quarantine-as-cemetery
+**Violation:**
+```javascript
+test.skip('flaky on CI sometimes', () => { /* ... */ });
+// commented 8 months ago, no owner, no fix-by date
+```
+**Fix:** Every skip/quarantine has a NAMED OWNER and a FIX-BY DATE. Tracking issue exists. PR that introduces the skip says exactly when the test gets fixed.
+**Detection:** Skipped tests without comments/labels naming owner and date.
+### A22. Retry-as-fix (auto-retry hiding real bugs)
+**Violation:**
+```javascript
+// jest.config or playwright.config
+retries: 3,
+```
+A flaky test is a SIGNAL. Retrying until green hides it.
+**Fix:** When a test is flaky, FIX IT (probably a race condition or non-deterministic input). Quarantine if you can't fix immediately. Never just retry.
+**Detection:** CI config with retry counts. Test runners showing "1 retry succeeded" badges.
+### A23. Duplicate tests across pyramid layers
+**Violation:** Same scenario tested at unit, integration, AND E2E. Triple maintenance, no triple value.
+**Fix:** Apply Law 2 — lowest layer wins. Drop higher-layer duplicates.
+**Detection:** Search for the same scenario name across `tests/unit`, `tests/integration`, `tests/e2e`.
+### A24. Weakening tests to make them pass
+**Violation:**
+```diff
+- expect(orders.length).toBe(5);
++ expect(orders.length).toBeGreaterThan(0);
+```
+The "fix" makes the test useless.
+**Fix:** Read Law 3. Fix production OR document WHY the assertion is weaker.
+**Detection:** PR diff shows assertion relaxation without commit body explanation.
+### A25. Mock-driven confidence (test asserts on its own setup)
+**Violation:**
+```javascript
+const mock = vi.fn().mockReturnValue('hello');
+expect(mock()).toBe('hello');
+```
+You wrote `hello` in the mock. You asserted `hello`. You proved nothing.
+**Fix:** Assert on the OUTPUT of the production code that consumed the mock — not on the mock itself.
+**Detection:** Tests asserting equality between a value the test body created and a value the test body retrieved.
+---
+## How `/dw-code-review` uses this catalog
+For each diff hunk under a test path:
+1. Run regex scans for the patterns flagged "Detection" above.
+2. Each hit becomes a finding with severity from this skill's `dw-review-rigor` integration.
+3. Hits classified as Brittleness/Flakiness/Mock-misuse → severity `high`.
+4. Hits classified as Process → severity `medium`.
+5. Hits where the SAME test has multiple patterns → severity `critical` (suite-health smell, not just one test).
+A PR with ≥1 `high` test anti-pattern that lacks ADR justification gets REJECTED.