@brunosps00/dev-workflow 0.11.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (111) hide show
  1. package/README.md +48 -5
  2. package/lib/constants.js +20 -20
  3. package/lib/init.js +24 -1
  4. package/lib/migrate-skills.js +129 -0
  5. package/lib/removed-bundled-skills.js +16 -0
  6. package/lib/uninstall.js +6 -2
  7. package/lib/utils.js +43 -1
  8. package/package.json +1 -1
  9. package/scaffold/en/agent-instructions.md +68 -0
  10. package/scaffold/en/commands/dw-autopilot.md +1 -1
  11. package/scaffold/en/commands/dw-brainstorm.md +1 -1
  12. package/scaffold/en/commands/dw-bugfix.md +3 -3
  13. package/scaffold/en/commands/dw-create-techspec.md +1 -1
  14. package/scaffold/en/commands/dw-deps-audit.md +1 -1
  15. package/scaffold/en/commands/dw-fix-qa.md +1 -1
  16. package/scaffold/en/commands/dw-functional-doc.md +2 -2
  17. package/scaffold/en/commands/dw-help.md +1 -1
  18. package/scaffold/en/commands/dw-redesign-ui.md +7 -7
  19. package/scaffold/en/commands/dw-run-qa.md +4 -4
  20. package/scaffold/en/commands/dw-run-task.md +2 -2
  21. package/scaffold/en/templates/constitution-template.md +1 -1
  22. package/scaffold/pt-br/agent-instructions.md +68 -0
  23. package/scaffold/pt-br/commands/dw-autopilot.md +1 -1
  24. package/scaffold/pt-br/commands/dw-brainstorm.md +1 -1
  25. package/scaffold/pt-br/commands/dw-bugfix.md +3 -3
  26. package/scaffold/pt-br/commands/dw-create-techspec.md +1 -1
  27. package/scaffold/pt-br/commands/dw-deps-audit.md +1 -1
  28. package/scaffold/pt-br/commands/dw-fix-qa.md +1 -1
  29. package/scaffold/pt-br/commands/dw-functional-doc.md +2 -2
  30. package/scaffold/pt-br/commands/dw-help.md +1 -1
  31. package/scaffold/pt-br/commands/dw-redesign-ui.md +7 -7
  32. package/scaffold/pt-br/commands/dw-run-qa.md +4 -4
  33. package/scaffold/pt-br/commands/dw-run-task.md +2 -2
  34. package/scaffold/pt-br/templates/constitution-template.md +1 -1
  35. package/scaffold/skills/dw-council/SKILL.md +1 -1
  36. package/scaffold/skills/dw-testing-discipline/SKILL.md +148 -0
  37. package/scaffold/skills/dw-testing-discipline/references/ai-agent-gates.md +170 -0
  38. package/scaffold/skills/dw-testing-discipline/references/anti-patterns.md +336 -0
  39. package/scaffold/skills/dw-testing-discipline/references/flaky-discipline.md +163 -0
  40. package/scaffold/skills/dw-testing-discipline/references/iron-laws.md +128 -0
  41. package/scaffold/skills/dw-testing-discipline/references/playwright-recipes.md +282 -0
  42. package/scaffold/skills/dw-testing-discipline/references/positive-patterns.md +241 -0
  43. package/scaffold/skills/{webapp-testing → dw-testing-discipline}/references/security-boundary.md +1 -1
  44. package/scaffold/skills/dw-ui-discipline/SKILL.md +128 -0
  45. package/scaffold/skills/dw-ui-discipline/references/accessibility-floor.md +225 -0
  46. package/scaffold/skills/dw-ui-discipline/references/anti-slop.md +162 -0
  47. package/scaffold/skills/dw-ui-discipline/references/curated-defaults.md +195 -0
  48. package/scaffold/skills/dw-ui-discipline/references/hard-gate.md +142 -0
  49. package/scaffold/skills/dw-ui-discipline/references/state-matrix.md +101 -0
  50. package/scaffold/skills/ui-ux-pro-max/LICENSE +0 -21
  51. package/scaffold/skills/ui-ux-pro-max/SKILL.md +0 -659
  52. package/scaffold/skills/ui-ux-pro-max/data/_sync_all.py +0 -414
  53. package/scaffold/skills/ui-ux-pro-max/data/app-interface.csv +0 -31
  54. package/scaffold/skills/ui-ux-pro-max/data/charts.csv +0 -26
  55. package/scaffold/skills/ui-ux-pro-max/data/colors.csv +0 -162
  56. package/scaffold/skills/ui-ux-pro-max/data/design.csv +0 -1776
  57. package/scaffold/skills/ui-ux-pro-max/data/draft.csv +0 -1779
  58. package/scaffold/skills/ui-ux-pro-max/data/google-fonts.csv +0 -1924
  59. package/scaffold/skills/ui-ux-pro-max/data/icons.csv +0 -106
  60. package/scaffold/skills/ui-ux-pro-max/data/landing.csv +0 -35
  61. package/scaffold/skills/ui-ux-pro-max/data/products.csv +0 -162
  62. package/scaffold/skills/ui-ux-pro-max/data/react-performance.csv +0 -45
  63. package/scaffold/skills/ui-ux-pro-max/data/stacks/angular.csv +0 -51
  64. package/scaffold/skills/ui-ux-pro-max/data/stacks/astro.csv +0 -54
  65. package/scaffold/skills/ui-ux-pro-max/data/stacks/flutter.csv +0 -53
  66. package/scaffold/skills/ui-ux-pro-max/data/stacks/html-tailwind.csv +0 -56
  67. package/scaffold/skills/ui-ux-pro-max/data/stacks/jetpack-compose.csv +0 -53
  68. package/scaffold/skills/ui-ux-pro-max/data/stacks/laravel.csv +0 -51
  69. package/scaffold/skills/ui-ux-pro-max/data/stacks/nextjs.csv +0 -53
  70. package/scaffold/skills/ui-ux-pro-max/data/stacks/nuxt-ui.csv +0 -51
  71. package/scaffold/skills/ui-ux-pro-max/data/stacks/nuxtjs.csv +0 -59
  72. package/scaffold/skills/ui-ux-pro-max/data/stacks/react-native.csv +0 -52
  73. package/scaffold/skills/ui-ux-pro-max/data/stacks/react.csv +0 -54
  74. package/scaffold/skills/ui-ux-pro-max/data/stacks/shadcn.csv +0 -61
  75. package/scaffold/skills/ui-ux-pro-max/data/stacks/svelte.csv +0 -54
  76. package/scaffold/skills/ui-ux-pro-max/data/stacks/swiftui.csv +0 -51
  77. package/scaffold/skills/ui-ux-pro-max/data/stacks/threejs.csv +0 -54
  78. package/scaffold/skills/ui-ux-pro-max/data/stacks/vue.csv +0 -50
  79. package/scaffold/skills/ui-ux-pro-max/data/styles.csv +0 -85
  80. package/scaffold/skills/ui-ux-pro-max/data/typography.csv +0 -74
  81. package/scaffold/skills/ui-ux-pro-max/data/ui-reasoning.csv +0 -162
  82. package/scaffold/skills/ui-ux-pro-max/data/ux-guidelines.csv +0 -100
  83. package/scaffold/skills/ui-ux-pro-max/scripts/core.py +0 -262
  84. package/scaffold/skills/ui-ux-pro-max/scripts/design_system.py +0 -1148
  85. package/scaffold/skills/ui-ux-pro-max/scripts/search.py +0 -114
  86. package/scaffold/skills/ui-ux-pro-max/skills/brand/SKILL.md +0 -97
  87. package/scaffold/skills/ui-ux-pro-max/skills/design/SKILL.md +0 -302
  88. package/scaffold/skills/ui-ux-pro-max/skills/design-system/SKILL.md +0 -244
  89. package/scaffold/skills/ui-ux-pro-max/templates/base/quick-reference.md +0 -297
  90. package/scaffold/skills/ui-ux-pro-max/templates/base/skill-content.md +0 -358
  91. package/scaffold/skills/ui-ux-pro-max/templates/platforms/agent.json +0 -21
  92. package/scaffold/skills/ui-ux-pro-max/templates/platforms/augment.json +0 -18
  93. package/scaffold/skills/ui-ux-pro-max/templates/platforms/claude.json +0 -21
  94. package/scaffold/skills/ui-ux-pro-max/templates/platforms/codebuddy.json +0 -21
  95. package/scaffold/skills/ui-ux-pro-max/templates/platforms/codex.json +0 -21
  96. package/scaffold/skills/ui-ux-pro-max/templates/platforms/continue.json +0 -21
  97. package/scaffold/skills/ui-ux-pro-max/templates/platforms/copilot.json +0 -21
  98. package/scaffold/skills/ui-ux-pro-max/templates/platforms/cursor.json +0 -21
  99. package/scaffold/skills/ui-ux-pro-max/templates/platforms/droid.json +0 -21
  100. package/scaffold/skills/ui-ux-pro-max/templates/platforms/gemini.json +0 -21
  101. package/scaffold/skills/ui-ux-pro-max/templates/platforms/kilocode.json +0 -21
  102. package/scaffold/skills/ui-ux-pro-max/templates/platforms/kiro.json +0 -21
  103. package/scaffold/skills/ui-ux-pro-max/templates/platforms/opencode.json +0 -21
  104. package/scaffold/skills/ui-ux-pro-max/templates/platforms/qoder.json +0 -21
  105. package/scaffold/skills/ui-ux-pro-max/templates/platforms/roocode.json +0 -21
  106. package/scaffold/skills/ui-ux-pro-max/templates/platforms/trae.json +0 -21
  107. package/scaffold/skills/ui-ux-pro-max/templates/platforms/warp.json +0 -18
  108. package/scaffold/skills/ui-ux-pro-max/templates/platforms/windsurf.json +0 -21
  109. package/scaffold/skills/webapp-testing/SKILL.md +0 -138
  110. package/scaffold/skills/webapp-testing/assets/test-helper.js +0 -56
  111. /package/scaffold/skills/{webapp-testing → dw-testing-discipline}/references/three-workflow-patterns.md +0 -0
@@ -0,0 +1,170 @@
1
+ # Seven AI agent gates — mandatory when an LLM writes tests
2
+
3
+ LLMs have characteristic failure modes when authoring tests. These gates are forcing functions for the seven most common.
4
+
5
+ Every test produced by an agent (via `/dw-run-task`, `/dw-bugfix`, `/dw-autopilot`, or any other code-generating flow) must pass all seven gates BEFORE the diff is presented for review.
6
+
7
+ ## Gate 1: Invariant first
8
+
9
+ **The failure mode it blocks:** Agent writes 200 lines of test code without articulating what the test is supposed to prove.
10
+
11
+ **The gate:**
12
+
13
+ Before writing any test code, the agent prints:
14
+
15
+ ```
16
+ INVARIANT: <one sentence: what behavior is true that the test verifies>
17
+ OWNING_LAYER: <unit | integration | contract | e2e>
18
+ EXISTING_SUITE: <path to existing test file the new test joins>
19
+ ```
20
+
21
+ **Why it works:**
22
+ - "Invariant" forces specific behavior naming.
23
+ - "Owning layer" forces Law 2 (lowest detectable layer).
24
+ - "Existing suite" forces extending coverage rather than spawning new files.
25
+
26
+ **Verification:** In `/dw-code-review`, look for this 3-line preamble in the PR description or the commit body. Missing = REJECTED.
27
+
28
+ ## Gate 2: Owning layer
29
+
30
+ **The failure mode it blocks:** Agent creates a new test file every time, scattering coverage across orphan files. Or, agent writes E2E tests for things unit could prove.
31
+
32
+ **The gate:**
33
+
34
+ The agent must:
35
+ 1. Identify the existing test suite that owns the module under test.
36
+ 2. Extend that suite, OR document why a new suite is needed (genuinely new module, new test pyramid layer).
37
+ 3. Map the test to the right layer per Law 2.
38
+
39
+ **Verification:**
40
+ - New test file in PR but existing file covers the same module? REJECTED.
41
+ - E2E test for pure-logic invariant? REJECTED unless documented.
42
+ - Unit test for cross-service flow? REJECTED — push to integration/E2E.
43
+
44
+ ## Gate 3: Real execution
45
+
46
+ **The failure mode it blocks:** Agent writes tests that mock everything. They pass green forever and validate nothing.
47
+
48
+ **The gate:**
49
+
50
+ Every test path the agent writes must, at SOME layer, run against real systems before the diff merges:
51
+
52
+ - Pure logic: unit only is fine.
53
+ - Code that touches DB: must have at least one integration test running real DB (testcontainers, ephemeral container, dedicated test DB).
54
+ - Code that calls external services: must have a contract test OR a sandbox-account smoke test.
55
+ - UI interactions: must have at least one E2E run on a real preview environment.
56
+
57
+ **Verification:** PR description must list the real-system runs that exercise the touched code. If no real-system path covers the change, REJECTED.
58
+
59
+ ## Gate 4: Failure → fix production
60
+
61
+ **The failure mode it blocks:** Agent sees test red, modifies the test until green. Bug ships.
62
+
63
+ **The gate:**
64
+
65
+ When the agent encounters a failing test (its own or pre-existing):
66
+
67
+ 1. Print: `INVESTIGATING FAILURE: <test name>`
68
+ 2. Read production code in the path that produces the observed value.
69
+ 3. Print: `ANALYSIS: <2-3 sentences on whether production is wrong, test is wrong, or invariant changed>`
70
+ 4. Decide:
71
+ - Production wrong → fix production.
72
+ - Test wrong → fix test AND document the change in the commit body.
73
+ - Invariant changed → update the test AND open an ADR if the change is a public contract change.
74
+
75
+ **Verification:** Every commit that changes a previously-green test must have an `ANALYSIS:` line in the commit body explaining the decision. Missing = REJECTED.
76
+
77
+ ## Gate 5: No snapshot without contract
78
+
79
+ **The failure mode it blocks:** Agent reaches for `toMatchSnapshot()` whenever it doesn't know what to assert. Snapshot becomes the assertion. Drift goes unnoticed.
80
+
81
+ **The gate:**
82
+
83
+ Before adding a snapshot assertion, the agent classifies the artifact:
84
+
85
+ - **PRODUCT_CONTRACT**: a stable contract worth pinning (e.g., serialized output of a public API, schema of a stored record). Snapshot is appropriate. Document the classification.
86
+ - **IMPLEMENTATION_DETAIL**: HTML structure, internal representation, component tree shape. Snapshot is FORBIDDEN. Write specific assertions instead.
87
+
88
+ **Verification:** Snapshots in the diff without a classification comment = REJECTED. Snapshots classified as IMPLEMENTATION_DETAIL = REJECTED.
89
+
90
+ ## Gate 6: No assertion on self-set mock
91
+
92
+ **The failure mode it blocks:** Agent writes `mockFn.mockReturnValue('X')`, then asserts `expect(mockFn()).toBe('X')`. Proves nothing.
93
+
94
+ **The gate:**
95
+
96
+ The agent cannot assert on values it directly fed into a mock. Assertions must be on:
97
+ - The OUTPUT of production code that consumed the mock.
98
+ - The SIDE EFFECTS (DB state, network calls, event emissions) caused by production code.
99
+ - The VISIBLE behavior (UI change, log line, response) the user/caller observes.
100
+
101
+ **Verification:** Diff analysis flags pairs where a literal value appears in BOTH a mock setup AND an assertion. Flagged = REJECTED unless the agent can show the value passed through production code.
102
+
103
+ ## Gate 7: Negative companion
104
+
105
+ **The failure mode it blocks:** Agent writes happy-path-only tests. Edge cases, error paths, boundaries uncovered.
106
+
107
+ **The gate:**
108
+
109
+ Every positive assertion the agent writes ships WITH at least one negative companion:
110
+
111
+ - Asserting `createUser(validInput)` succeeds → also assert `createUser(invalidInput)` fails with a specific error.
112
+ - Asserting `parseDate(validString)` returns a Date → also assert `parseDate(invalidString)` throws/returns null.
113
+ - Asserting `transferFunds(...)` succeeds with sufficient balance → also assert it fails with insufficient balance.
114
+
115
+ **Verification:** A PR adding N positive assertions must add ≥1 negative assertion per public path. Imbalance >3:1 (positive:negative) on a public path = REJECTED.
116
+
117
+ ## How the gates compose
118
+
119
+ Together, the seven gates produce tests that:
120
+ 1. State what they prove (invariant first).
121
+ 2. Live at the right layer (owning layer).
122
+ 3. Exercise reality somewhere (real execution).
123
+ 4. Reveal bugs when red (failure → fix production).
124
+ 5. Assert specifically, not via snapshots (no snapshot w/o contract).
125
+ 6. Assert outputs, not setup (no self-mock assertion).
126
+ 7. Cover failures, not just success (negative companion).
127
+
128
+ A test passing all seven is a test worth running. A test missing any one is more likely to mislead than help.
129
+
130
+ ## Override procedure
131
+
132
+ If an agent (or user) wants to skip a gate, they must:
133
+ 1. State which gate is being skipped.
134
+ 2. State why (one sentence).
135
+ 3. Add a `// SKIP-GATE-N: <reason>` comment in the test.
136
+ 4. Open a follow-up issue tracking the gap.
137
+
138
+ Without all four, the gate is enforced.
139
+
140
+ ## Prompt block to include when invoking the agent
141
+
142
+ ```
143
+ You are about to write tests. Before producing test code, complete the
144
+ seven-gate preamble:
145
+
146
+ INVARIANT: ___
147
+ OWNING_LAYER: ___
148
+ EXISTING_SUITE: ___
149
+
150
+ If you cannot complete these three lines, STOP and ask the user for
151
+ the requirement (do not invent an invariant).
152
+
153
+ Then, while writing tests:
154
+ - Real execution: name the real-system path covering this code.
155
+ - On red: investigate production before changing tests; print ANALYSIS.
156
+ - Snapshots: classify as PRODUCT_CONTRACT or IMPLEMENTATION_DETAIL.
157
+ - Assertions: never assert on values you fed into a mock.
158
+ - Coverage: every positive assertion needs a negative companion.
159
+
160
+ Tests that violate gates without explicit SKIP-GATE-N comments will be
161
+ REJECTED at review.
162
+ ```
163
+
164
+ `/dw-run-task` and `/dw-bugfix` inject this prompt before generating test code.
165
+
166
+ ## Why these seven and not more
167
+
168
+ These are the seven LLM failure modes empirically observed in test generation across multiple projects (per pedronauck/skills/testing-boss, MIT, plus dev-workflow internal observation). Other tendencies exist; they're either covered by the positive patterns (e.g., wall-clock waits) or have lower hit-rate.
169
+
170
+ If a NEW LLM failure mode appears that none of the seven catches, add a gate AND document the failure mode that motivated it. Don't add gates speculatively.
@@ -0,0 +1,336 @@
1
+ # Anti-patterns — 25 patterns across 5 families
2
+
3
+ Each anti-pattern names the smell, shows the violation in pseudo-code, gives the fix, and notes how `/dw-code-review` detects it.
4
+
5
+ ---
6
+
7
+ ## Family 1: Brittleness (tests bound to internals)
8
+
9
+ ### A1. Implementation-detail selectors
10
+
11
+ **Violation:**
12
+ ```javascript
13
+ await page.click('.btn.btn-primary.checkout-button');
14
+ ```
15
+
16
+ **Fix:** Use `getByRole('button', { name: 'Checkout' })`.
17
+
18
+ **Detection:** Grep for `.click(`, `.querySelector(`, `cy.get('.`, `getByTestId(` with a class-flavored argument.
19
+
20
+ ### A2. Testing internal structure vs observable behavior
21
+
22
+ **Violation:**
23
+ ```javascript
24
+ expect(component.state.cart.items.length).toBe(3);
25
+ ```
26
+
27
+ **Fix:** Assert what the user sees: `expect(await page.getByText('3 items in cart')).toBeVisible()`.
28
+
29
+ **Detection:** Tests that import/inspect internal state, refs, or private fields. Class-based component tests that read `.state` directly.
30
+
31
+ ### A3. Testing private methods directly
32
+
33
+ **Violation:**
34
+ ```javascript
35
+ expect(orderService._calculateTax(...)).toBe(8.5);
36
+ ```
37
+
38
+ **Fix:** Test the public method that uses tax calculation; verify the result. If the private method is independently complex, extract it to a module and test that module's public API.
39
+
40
+ **Detection:** Identifiers starting with `_` accessed from tests.
41
+
42
+ ### A4. Snapshot-as-test (snapshot replacing assertion)
43
+
44
+ **Violation:**
45
+ ```javascript
46
+ expect(rendered).toMatchSnapshot(); // ← only assertion in test
47
+ ```
48
+
49
+ **Fix:** Either:
50
+ 1. Write specific assertions about what the component renders, OR
51
+ 2. Use a snapshot AS A SECONDARY check after specific assertions, with a comment classifying the snapshot as `PRODUCT_CONTRACT` (UI contract worth pinning) — never `IMPLEMENTATION_DETAIL`.
52
+
53
+ **Detection:** Tests where the only assertion is `toMatchSnapshot` or equivalent.
54
+
55
+ ### A5. Vague existence assertions
56
+
57
+ **Violation:**
58
+ ```javascript
59
+ expect(result).toBeTruthy();
60
+ expect(element).toBeDefined();
61
+ expect(button).should('exist');
62
+ ```
63
+
64
+ **Fix:** Assert what you actually want: `expect(result.status).toBe('success')`, `expect(button).toBeEnabled()`, `expect(button).toHaveText('Continue')`.
65
+
66
+ **Detection:** Tests asserting only existence/truthiness without follow-up semantic check.
67
+
68
+ ### A6. Action without assertion
69
+
70
+ **Violation:**
71
+ ```javascript
72
+ test('clicking save works', async () => {
73
+ await page.getByRole('button', { name: 'Save' }).click();
74
+ // ← no assertion. What did "works" mean?
75
+ });
76
+ ```
77
+
78
+ **Fix:** Define what "works" means. Assert the observable outcome: URL changed, modal closed, success message visible, data persisted.
79
+
80
+ **Detection:** Tests with `await x.click()` or `await x.type()` followed by no `expect(...)`.
81
+
82
+ ---
83
+
84
+ ## Family 2: Flakiness (tests randomizing verdicts)
85
+
86
+ ### A7. Static sleeps / fixed-timeout waits
87
+
88
+ **Violation:**
89
+ ```javascript
90
+ await page.waitForTimeout(2000);
91
+ ```
92
+
93
+ **Fix:** `await expect(page.getByText(/order confirmed/i)).toBeVisible({ timeout: 5000 })` — wait on the actual condition.
94
+
95
+ **Detection:** Grep for `waitForTimeout`, `cy.wait(<number>)`, `sleep(`, `Thread.sleep`, `time.sleep` in test files.
96
+
97
+ ### A8. Test order dependency / hidden shared state
98
+
99
+ **Violation:** Test B passes only after Test A has run because A populates a shared cache or DB row.
100
+
101
+ **Fix:** Each test sets up its own state in `beforeEach`. Verify by running tests with `--shuffle` or `--randomize`.
102
+
103
+ **Detection:** Tests fail when run with `.only`. Tests fail with `--shuffle`. Setup in `beforeAll` instead of `beforeEach`.
104
+
105
+ ### A9. Non-deterministic inputs (clock, RNG, locale)
106
+
107
+ **Violation:**
108
+ ```javascript
109
+ test('today is Monday', () => {
110
+ expect(new Date().getDay()).toBe(1); // fails 6 days a week
111
+ });
112
+ ```
113
+
114
+ **Fix:** Mock the clock (`vi.useFakeTimers()`, `jest.useFakeTimers()`, `freezegun` in Python). Seed RNG. Pin locale.
115
+
116
+ **Detection:** Tests using `new Date()`, `Date.now()`, `Math.random()`, `Intl.DateTimeFormat` without fakes.
117
+
118
+ ---
119
+
120
+ ## Family 3: Mock misuse (tests testing the test setup)
121
+
122
+ ### A10. Asserting the mock exists
123
+
124
+ **Violation:**
125
+ ```javascript
126
+ expect(mockFn).toBeDefined();
127
+ ```
128
+
129
+ **Fix:** Don't assert on mock setup. If the mock is wrong, the behavior assertion downstream will fail naturally.
130
+
131
+ **Detection:** Mock functions referenced in assertions without `toHaveBeenCalled` semantics.
132
+
133
+ ### A11. Mock drift
134
+
135
+ **Violation:** Mocked API response set up 6 months ago still returns `{ status: 'OK' }` while the real API now returns `{ ok: true }`.
136
+
137
+ **Fix:** Contract testing (Pact, schemathesis) or periodic recording (msw + real-traffic capture). Re-validate mocks against real APIs quarterly.
138
+
139
+ **Detection:** Tests with mocks that haven't been touched in >90 days against APIs that have changed. Hard to detect in CI; needs explicit contract checks.
140
+
141
+ ### A12. Over-mocking child components
142
+
143
+ **Violation:**
144
+ ```javascript
145
+ vi.mock('./UserAvatar');
146
+ vi.mock('./UserMenu');
147
+ vi.mock('./UserBanner');
148
+ // ... testing nothing real
149
+ ```
150
+
151
+ **Fix:** Mock at boundaries (HTTP, DB, third-party SDKs). Render real children unless they're genuinely expensive or test-irrelevant.
152
+
153
+ **Detection:** Test files with >3 module mocks of internal modules.
154
+
155
+ ### A13. Incomplete mocks (missing fields the code reads)
156
+
157
+ **Violation:**
158
+ ```javascript
159
+ const mockUser = { id: 1 }; // missing email, but code reads user.email
160
+ ```
161
+
162
+ **Fix:** Use a factory that supplies sensible defaults for ALL fields the type/contract declares.
163
+
164
+ **Detection:** Runtime errors like `Cannot read property 'X' of undefined` inside production code under test.
165
+
166
+ ### A14. Mocking wrong level (mocking methods the logic depends on)
167
+
168
+ **Violation:**
169
+ ```javascript
170
+ // Testing OrderService, but mocking its private calculate() method
171
+ const service = new OrderService();
172
+ vi.spyOn(service, 'calculate').mockReturnValue(100);
173
+ expect(service.processOrder(...)).toBe(/* uses mocked 100 */);
174
+ ```
175
+
176
+ You've tested the SCAFFOLD, not the logic.
177
+
178
+ **Fix:** Mock at the EDGE (DB call, HTTP call, time). Let internal logic run.
179
+
180
+ **Detection:** Spies on methods of the System Under Test itself.
181
+
182
+ ---
183
+
184
+ ## Family 4: Process (team and suite pathologies)
185
+
186
+ ### A15. Coverage as vanity metric
187
+
188
+ **Violation:** PR comments demanding "you need to hit 90% coverage" with no discussion of what the coverage means.
189
+
190
+ **Fix:** Coverage is a flashlight. Use it to FIND blind spots. Don't optimize for the number.
191
+
192
+ **Detection:** Cultural; visible in PR templates that gate on coverage percentage.
193
+
194
+ ### A16. Happy-path-only coverage
195
+
196
+ **Violation:** Every test exercises the success case. Edge cases, error paths, boundary values uncovered.
197
+
198
+ **Fix:** For each unit, write at minimum: happy path + 1 boundary + 1 invalid input + 1 failure path.
199
+
200
+ **Detection:** Tests where every assertion is positive (`toBe`, `toEqual`) and none is negative (`toThrow`, `toReject`).
201
+
202
+ ### A17. Eternal `beforeAll` / shared setup hiding dependencies
203
+
204
+ **Violation:**
205
+ ```javascript
206
+ beforeAll(async () => {
207
+ await db.users.create([100 users]);
208
+ await db.orders.create([500 orders]);
209
+ });
210
+ ```
211
+
212
+ Tests now SHARE state. Order matters. Cleanup is fragile.
213
+
214
+ **Fix:** `beforeEach` with minimal setup specific to each test.
215
+
216
+ **Detection:** `beforeAll` blocks creating data (vs `beforeAll` blocks doing one-time framework setup like spinning testcontainers).
217
+
218
+ ### A18. Cleanup in `afterEach` (use `beforeEach` instead)
219
+
220
+ **Violation:**
221
+ ```javascript
222
+ afterEach(async () => {
223
+ await db.users.deleteAll();
224
+ });
225
+ ```
226
+
227
+ If a test fails mid-run, cleanup might not run; next test starts dirty.
228
+
229
+ **Fix:** `beforeEach` with explicit setup-from-clean (truncate + seed). Reliable regardless of previous test outcome.
230
+
231
+ **Detection:** `afterEach` blocks doing state reset.
232
+
233
+ ### A19. Magic strings and logic in tests
234
+
235
+ **Violation:**
236
+ ```javascript
237
+ const TIMESTAMP = '2024-01-15T10:30:00Z'; // why?
238
+ expect(formatted).toBe('a long string with embedded specifics');
239
+ ```
240
+
241
+ When the test fails, what was the test's INTENT?
242
+
243
+ **Fix:** Use factories with named defaults. Extract magic values to constants with documenting names. Use snapshot testing for legitimate snapshot cases (with classification).
244
+
245
+ **Detection:** Test files with ≥10 string literals not bound to a named variable.
246
+
247
+ ### A20. Testing against third-party sites you don't control
248
+
249
+ **Violation:**
250
+ ```javascript
251
+ test('Google homepage loads', async ({ page }) => {
252
+ await page.goto('https://google.com');
253
+ expect(await page.title()).toContain('Google');
254
+ });
255
+ ```
256
+
257
+ You're testing Google's availability, not your code.
258
+
259
+ **Fix:** Mock the third party. Use a wiremock or msw to fake their responses. If you must call them, do it in a separate "external dependencies up" smoke test, not unit/integration.
260
+
261
+ **Detection:** External URLs in test code outside designated smoke tests.
262
+
263
+ ### A21. Quarantine-as-cemetery
264
+
265
+ **Violation:**
266
+ ```javascript
267
+ test.skip('flaky on CI sometimes', () => { /* ... */ });
268
+ // commented 8 months ago, no owner, no fix-by date
269
+ ```
270
+
271
+ **Fix:** Every skip/quarantine has a NAMED OWNER and a FIX-BY DATE. Tracking issue exists. PR that introduces the skip says exactly when the test gets fixed.
272
+
273
+ **Detection:** Skipped tests without comments/labels naming owner and date.
274
+
275
+ ### A22. Retry-as-fix (auto-retry hiding real bugs)
276
+
277
+ **Violation:**
278
+ ```javascript
279
+ // jest.config or playwright.config
280
+ retries: 3,
281
+ ```
282
+
283
+ A flaky test is a SIGNAL. Retrying until green hides it.
284
+
285
+ **Fix:** When a test is flaky, FIX IT (probably a race condition or non-deterministic input). Quarantine if you can't fix immediately. Never just retry.
286
+
287
+ **Detection:** CI config with retry counts. Test runners showing "1 retry succeeded" badges.
288
+
289
+ ### A23. Duplicate tests across pyramid layers
290
+
291
+ **Violation:** Same scenario tested at unit, integration, AND E2E. Triple maintenance, no triple value.
292
+
293
+ **Fix:** Apply Law 2 — lowest layer wins. Drop higher-layer duplicates.
294
+
295
+ **Detection:** Search for the same scenario name across `tests/unit`, `tests/integration`, `tests/e2e`.
296
+
297
+ ### A24. Weakening tests to make them pass
298
+
299
+ **Violation:**
300
+ ```diff
301
+ - expect(orders.length).toBe(5);
302
+ + expect(orders.length).toBeGreaterThan(0);
303
+ ```
304
+
305
+ The "fix" makes the test useless.
306
+
307
+ **Fix:** Read Law 3. Fix production OR document WHY the assertion is weaker.
308
+
309
+ **Detection:** PR diff shows assertion relaxation without commit body explanation.
310
+
311
+ ### A25. Mock-driven confidence (test asserts on its own setup)
312
+
313
+ **Violation:**
314
+ ```javascript
315
+ const mock = vi.fn().mockReturnValue('hello');
316
+ expect(mock()).toBe('hello');
317
+ ```
318
+
319
+ You wrote `hello` in the mock. You asserted `hello`. You proved nothing.
320
+
321
+ **Fix:** Assert on the OUTPUT of the production code that consumed the mock — not on the mock itself.
322
+
323
+ **Detection:** Tests asserting equality between a value the test body created and a value the test body retrieved.
324
+
325
+ ---
326
+
327
+ ## How `/dw-code-review` uses this catalog
328
+
329
+ For each diff hunk under a test path:
330
+ 1. Run regex scans for the patterns flagged "Detection" above.
331
+ 2. Each hit becomes a finding with severity from this skill's `dw-review-rigor` integration.
332
+ 3. Hits classified as Brittleness/Flakiness/Mock-misuse → severity `high`.
333
+ 4. Hits classified as Process → severity `medium`.
334
+ 5. Hits where the SAME test has multiple patterns → severity `critical` (suite-health smell, not just one test).
335
+
336
+ A PR with ≥1 `high` test anti-pattern that lacks ADR justification gets REJECTED.
@@ -0,0 +1,163 @@
1
+ # Flaky discipline — taxonomy, quarantine, SLOs
2
+
3
+ A flaky test is one that produces different verdicts (pass/fail) on the same code across runs. They corrode trust in the suite faster than any other category of test debt.
4
+
5
+ ## The four root causes (in order of frequency)
6
+
7
+ ### Cause 1: Race conditions (concurrency)
8
+
9
+ **Tells:**
10
+ - Test passes locally, fails in CI (or vice versa).
11
+ - Failure rate correlates with CI machine load.
12
+ - Adding `await page.waitForTimeout(100)` "fixes" it.
13
+
14
+ **Common scenarios:**
15
+ - Async operation completes after test moves on (missing `await`).
16
+ - Two requests sent simultaneously, response order matters.
17
+ - DOM update happens after assertion runs.
18
+ - Database write not yet committed when read fires.
19
+
20
+ **Fix:**
21
+ - Replace wall-clock waits with condition-based waits (`waitFor`, `toBeVisible`, `expect.poll`).
22
+ - Add proper `await` on every async operation.
23
+ - Use transaction boundaries explicitly when test reads its own write.
24
+
25
+ ### Cause 2: Test order dependency
26
+
27
+ **Tells:**
28
+ - Test passes when suite runs in order, fails with `--shuffle`.
29
+ - Test fails when run with `.only` in isolation.
30
+ - Failures cluster on first run after CI restart but not afterwards.
31
+
32
+ **Common scenarios:**
33
+ - `beforeAll` populates shared state; second test mutates it; third test fails.
34
+ - Test A creates a global mock; Test B inherits it unexpectedly.
35
+ - Database row persists across tests because cleanup is in `afterEach` but a test threw mid-execution.
36
+
37
+ **Fix:**
38
+ - Move state creation from `beforeAll` to `beforeEach`.
39
+ - Reset shared state in `beforeEach` (clean slate every test).
40
+ - Avoid global mocks; scope mocks to the test that needs them.
41
+ - Run with `--shuffle` in CI to catch new order dependencies.
42
+
43
+ ### Cause 3: Non-deterministic inputs
44
+
45
+ **Tells:**
46
+ - Test fails at month boundary, year boundary, DST change.
47
+ - Test fails based on hostname, locale, timezone.
48
+ - Test fails when a flaky RNG produces edge values.
49
+
50
+ **Common scenarios:**
51
+ - `new Date()` in production code, tested without clock fake.
52
+ - `Math.random()` for IDs, tested without seed.
53
+ - `Intl.DateTimeFormat` rendering based on system locale.
54
+ - File paths with timestamps, hash IDs based on time.
55
+
56
+ **Fix:**
57
+ - Mock the clock (`vi.useFakeTimers`, `freezegun`).
58
+ - Seed RNG explicitly in tests (`Math.random = () => 0.5` or via DI).
59
+ - Pin locale and timezone in CI environment AND in test setup.
60
+
61
+ ### Cause 4: External dependencies
62
+
63
+ **Tells:**
64
+ - Test fails when a third-party service has an outage.
65
+ - Test fails when CI runs against a real API and hits rate limits.
66
+ - Test fails differently for different geographic CI runners.
67
+
68
+ **Common scenarios:**
69
+ - Direct call to external API in unit tests.
70
+ - DNS lookup baked into test execution path.
71
+ - CDN-hosted resources in E2E tests.
72
+
73
+ **Fix:**
74
+ - Mock external services at unit/integration layers.
75
+ - Use contract tests instead of live calls.
76
+ - For E2E, use a sandbox account / dedicated test environment.
77
+
78
+ ## Quarantine workflow
79
+
80
+ When a test flakes:
81
+
82
+ ### Within 1 hour of detection
83
+
84
+ 1. **Quarantine the test.** Add `.skip` or equivalent. Add a comment:
85
+
86
+ ```javascript
87
+ test.skip('FLAKY-2026-05-12: race condition in checkout flow — owner: bruno, fix-by: 2026-05-19', () => {
88
+ // ...
89
+ });
90
+ ```
91
+
92
+ 2. **File a tracking issue.** Title: `FLAKY: <test name>`. Body includes:
93
+ - Test name and file
94
+ - Failure mode observed (race? order-dependency? non-determinism?)
95
+ - First detection: CI run URL, timestamp
96
+ - Hypothesis (if any)
97
+ - Owner and fix-by date
98
+
99
+ 3. **Note in CI.** The next CI run shows "1 quarantined" — make this visible on the dashboard.
100
+
101
+ ### Within 24 hours
102
+
103
+ 1. **Named owner assigned.** Not "team X" — a person.
104
+ 2. **Fix-by date set.** Default 5 business days. Major flake (production-path test): 2 days.
105
+
106
+ ### When fix-by passes without fix
107
+
108
+ Escalate:
109
+ - Pair the owner with someone for a debug session.
110
+ - If still unfixed after 2× the fix-by window, the test is removed (not skipped). A failing un-skipped test is better than a perpetually skipped test.
111
+
112
+ ## SLOs
113
+
114
+ ### `flaky_rate` (first-class metric)
115
+
116
+ - Definition: `(tests that pass on retry but fail on first run) / (total test runs)`.
117
+ - Target: < 1–2% per week.
118
+ - Alert at: > 5% on any given day.
119
+
120
+ ### `time-to-fix-flaky`
121
+
122
+ - Definition: hours from quarantine to fix-merged.
123
+ - Target: median < 24 hours; p95 < 7 days.
124
+
125
+ ### `quarantine inventory`
126
+
127
+ - Definition: count of currently-skipped tests with `FLAKY-*` markers.
128
+ - Target: < 10 at any time.
129
+ - Alert at: > 25 (the quarantine has become a cemetery — emergency cleanup).
130
+
131
+ ## What NOT to do
132
+
133
+ - **Auto-retry as fix.** `retries: 3` in CI config is hiding flakes, not fixing them. The 4th run that finally passes still validated nothing.
134
+ - **Increase timeouts indefinitely.** A timeout that grows from 5s to 30s "to make CI pass" means the test isn't waiting on the right condition.
135
+ - **Remove the test without investigation.** "It's been flaky forever, delete it" — sometimes correct, but make sure the underlying invariant is captured elsewhere.
136
+ - **Mark skip without owner.** A skip is a debt. An unowned debt is a perpetual liability.
137
+
138
+ ## When a test should be permanently removed
139
+
140
+ A flaky test should be DELETED (not just skipped) when:
141
+
142
+ 1. The invariant it tests is covered elsewhere (duplicate per A23).
143
+ 2. The invariant it tests is no longer a real requirement.
144
+ 3. The test was always probabilistic by design and never had value.
145
+
146
+ Deletion is acceptable; abandonment-by-skip is not.
147
+
148
+ ## Real-systems-at-final-gate principle
149
+
150
+ Many flakes come from mocks drifting from reality. The defense:
151
+
152
+ - **Unit:** mock the world; fast feedback; flake budget tiny here.
153
+ - **Integration:** real DB (testcontainers); mock external services with contract validation.
154
+ - **Contract:** Pact / schemathesis verifying producer-consumer agreement.
155
+ - **E2E:** real services in a preview environment; near-zero mocks.
156
+
157
+ When CI is wired this way, a flake at unit usually = race or order-dependency (fixable). A flake at E2E usually = real environment issue (fix the environment, not the test).
158
+
159
+ ## Integration with dev-workflow
160
+
161
+ - `/dw-fix-qa` uses this taxonomy when retest cycles produce inconsistent results: classify the flake, apply the right fix, document.
162
+ - `/dw-code-review` flags tests being modified that have a `FLAKY-*` marker — review must verify the flake is now actually fixed, not just made less likely.
163
+ - `/dw-run-qa` weekly summary includes the `flaky_rate` metric.