@brunosps00/dev-workflow 0.13.0 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/README.md +9 -3
  2. package/package.json +1 -1
  3. package/scaffold/en/commands/dw-bugfix.md +2 -1
  4. package/scaffold/en/commands/dw-code-review.md +1 -0
  5. package/scaffold/en/commands/dw-create-tasks.md +6 -0
  6. package/scaffold/en/commands/dw-deps-audit.md +1 -1
  7. package/scaffold/en/commands/dw-fix-qa.md +1 -1
  8. package/scaffold/en/commands/dw-functional-doc.md +1 -1
  9. package/scaffold/en/commands/dw-help.md +1 -1
  10. package/scaffold/en/commands/dw-redesign-ui.md +1 -1
  11. package/scaffold/en/commands/dw-run-qa.md +2 -1
  12. package/scaffold/en/commands/dw-run-task.md +1 -1
  13. package/scaffold/pt-br/commands/dw-bugfix.md +2 -1
  14. package/scaffold/pt-br/commands/dw-code-review.md +1 -0
  15. package/scaffold/pt-br/commands/dw-create-tasks.md +6 -0
  16. package/scaffold/pt-br/commands/dw-deps-audit.md +1 -1
  17. package/scaffold/pt-br/commands/dw-fix-qa.md +1 -1
  18. package/scaffold/pt-br/commands/dw-functional-doc.md +1 -1
  19. package/scaffold/pt-br/commands/dw-help.md +1 -1
  20. package/scaffold/pt-br/commands/dw-redesign-ui.md +1 -1
  21. package/scaffold/pt-br/commands/dw-run-qa.md +2 -1
  22. package/scaffold/pt-br/commands/dw-run-task.md +1 -1
  23. package/scaffold/skills/dw-incident-response/SKILL.md +164 -0
  24. package/scaffold/skills/dw-incident-response/references/blameless-discipline.md +126 -0
  25. package/scaffold/skills/dw-incident-response/references/communication-templates.md +107 -0
  26. package/scaffold/skills/dw-incident-response/references/postmortem-template.md +133 -0
  27. package/scaffold/skills/dw-incident-response/references/runbook-templates.md +169 -0
  28. package/scaffold/skills/dw-incident-response/references/severity-and-triage.md +186 -0
  29. package/scaffold/skills/dw-llm-eval/SKILL.md +148 -0
  30. package/scaffold/skills/dw-llm-eval/references/agent-eval.md +252 -0
  31. package/scaffold/skills/dw-llm-eval/references/judge-calibration.md +169 -0
  32. package/scaffold/skills/dw-llm-eval/references/oracle-ladder.md +171 -0
  33. package/scaffold/skills/dw-llm-eval/references/rag-metrics.md +186 -0
  34. package/scaffold/skills/dw-llm-eval/references/reference-dataset.md +190 -0
  35. package/scaffold/skills/dw-testing-discipline/SKILL.md +99 -76
  36. package/scaffold/skills/dw-testing-discipline/references/agent-guardrails.md +170 -0
  37. package/scaffold/skills/dw-testing-discipline/references/anti-patterns.md +6 -6
  38. package/scaffold/skills/dw-testing-discipline/references/core-rules.md +128 -0
  39. package/scaffold/skills/dw-testing-discipline/references/playwright-recipes.md +2 -2
  40. package/scaffold/skills/dw-ui-discipline/SKILL.md +101 -79
  41. package/scaffold/skills/dw-ui-discipline/references/hard-gate.md +93 -73
  42. package/scaffold/skills/dw-ui-discipline/references/visual-slop.md +152 -0
  43. package/scaffold/skills/dw-testing-discipline/references/ai-agent-gates.md +0 -170
  44. package/scaffold/skills/dw-testing-discipline/references/iron-laws.md +0 -128
  45. package/scaffold/skills/dw-ui-discipline/references/anti-slop.md +0 -162
  46. /package/scaffold/skills/dw-testing-discipline/references/{positive-patterns.md → patterns.md} +0 -0
@@ -0,0 +1,152 @@
1
+ # Visual slop — 14 patterns + 17 default values to avoid
2
+
3
+ Two parts:
4
+ 1. **Fourteen patterns** an ungrounded UI agent produces.
5
+ 2. **Seventeen specific values** that signal "no thought went into this."
6
+
7
+ Used by `/dw-code-review` against UI diffs and by `/dw-redesign-ui` as a self-check during proposal.
8
+
9
+ ## The 14 patterns
10
+
11
+ ### 1. Uniform-section flatness
12
+
13
+ Every section uses the same card style, same padding, same text size, same emphasis weight. The eye finds no anchor.
14
+
15
+ - **Why it happens:** Default of "consistent = good" without realizing hierarchy needs deliberate variation.
16
+ - **Fix:** One primary section per scroll height. Differentiate by size, weight, color saturation, or whitespace by ≥30%. Everything else recedes.
17
+ - **Example violation:** Dashboard with 6 identical metric cards.
18
+ - **Example fix:** One hero metric (largest, top); 3 supporting metrics; 2 minor metrics in a different visual treatment.
19
+
20
+ ### 2. Soft hierarchy
21
+
22
+ Headings barely larger than body. Primary CTA same color as secondary. The user can't tell what to look at first.
23
+
24
+ - **Why it happens:** "Elegant restraint" applied without ensuring guidance still works.
25
+ - **Fix:** Squint at the design (literally). What jumps out? If nothing jumps out, increase contrast in size, weight, or color for the primary element.
26
+
27
+ ### 3. Decorative hover
28
+
29
+ Hover effects on elements that have no click handler. Cards that fade slightly but don't link anywhere.
30
+
31
+ - **Why it happens:** Default "apply hover to anything card-shaped."
32
+ - **Fix:** Hover effect lives only on elements with an on-click. Non-interactive shapes get `cursor: default`. If it's hoverable, it must do something.
33
+
34
+ ### 4. Emoji as ornament
35
+
36
+ Emojis in headers and section labels where they add no information: 🎯 Goals · 🚀 Launch · ✨ Features · 📊 Analytics · 🔥 Trending.
37
+
38
+ - **Why it happens:** Training data has many "emoji-in-headers = engaging" patterns.
39
+ - **Fix:** Use icons (lucide, heroicons, tabler) for semantic meaning. Reserve emojis for genuinely emotive contexts (celebrations, errors needing empathy). If removing the emoji preserves the meaning, remove it.
40
+
41
+ ### 5. Gradient cover
42
+
43
+ Hero with diagonal purple-to-pink gradient. Buttons with subtle gradient. Card backgrounds with mesh gradients. Gradient as visual fallback for weak composition.
44
+
45
+ - **Why it happens:** AI-art aesthetics leak into UI; gradients hide compositional weakness.
46
+ - **Fix:** A gradient must earn its place — usually for hero zones with poetic copy. Solid colors with strong hierarchy beat gradients in utility surfaces.
47
+
48
+ ### 6. Glass-on-everything
49
+
50
+ Frosted-glass effect on modals, cards, dropdowns, side panels — anywhere a surface can be layered. Including on top of plain backgrounds where the blur effect has nothing to blur.
51
+
52
+ - **Why it happens:** macOS aesthetic. Looks premium without effort.
53
+ - **Fix:** Glass only when there's meaningful content visible behind the surface. Glass over plain backgrounds adds visual noise without semantic gain.
54
+
55
+ ### 7. Center-aligned by default
56
+
57
+ Body paragraphs center-aligned. Headlines centered. Forms with labels centered above inputs. Tabular data centered instead of column-aligned.
58
+
59
+ - **Why it happens:** Marketing-page training data biases toward center.
60
+ - **Fix:** Center for hero headlines and small CTA labels only. Body text and forms read better left-aligned in LTR scripts. Tabular data reads in columns.
61
+
62
+ ### 8. Grayscale wash
63
+
64
+ Neutral gray palette everywhere — `slate-50`, `gray-100`, `zinc-200` — for backgrounds, borders, text, accents. No accent color, no character.
65
+
66
+ - **Why it happens:** "Neutral = safe" plus shadcn/ui's neutral starting point.
67
+ - **Fix:** Establish ONE accent color (from brand or curated defaults). Use it intentionally on the primary CTA, the active state, the one place the user looks first. Gray is the canvas, not the painting.
68
+
69
+ ### 9. Verb-less CTAs
70
+
71
+ "Get Started" · "Learn More" · "Click Here" · "Submit" · "OK" buttons. Generic verbs that say nothing.
72
+
73
+ - **Why it happens:** Default LLM verb library.
74
+ - **Fix:** Use the verb the user is actually doing. "Approve refund" not "Submit". "Start free trial" not "Get Started". "Schedule a call" not "Contact us".
75
+
76
+ ### 10. Stock-illustration hero
77
+
78
+ Figure-with-laptop hero art. Diverse-team-around-table illustration. Abstract floating shapes. Generic figures from illustration kits.
79
+
80
+ - **Why it happens:** "Illustration = friendly" default. Cheap to produce.
81
+ - **Fix:** Use product screenshots (real screens, real data, sanitized) or skip illustration entirely. A clean hero with strong typography beats generic illustration.
82
+
83
+ ### 11. Shadow soup
84
+
85
+ Cards with shadow. Buttons with shadow. Inputs with shadow. Tooltips with shadow on shadows. Borders AND shadows AND gradients on one element.
86
+
87
+ - **Why it happens:** Material Design leftover; depth as decoration.
88
+ - **Fix:** Pick one depth mechanism per layer. If cards have shadow, buttons inside should not. If you use elevation systematically (Material 3), enforce the elevation hierarchy.
89
+
90
+ ### 12. Generic spinner
91
+
92
+ Spinner overlay for every async operation, regardless of duration or context.
93
+
94
+ - **Why it happens:** Default fallback in every UI library.
95
+ - **Fix granularity:**
96
+ - <300ms: show nothing (spinner appearing then vanishing is flicker).
97
+ - 300ms–2s: skeleton loader matching content shape.
98
+ - 2s–10s: spinner + status text ("Loading orders...").
99
+ - 10s+: progress bar or step indicator + cancel button.
100
+
101
+ ### 13. Silent empty state
102
+
103
+ "No items found." Centered. Nothing else. User has no idea what to do.
104
+
105
+ - **Why it happens:** Empty state treated as edge case, not as a real screen.
106
+ - **Fix:** Every empty state answers two questions: WHY is it empty (no data yet vs filter excluded everything vs error)? WHAT should the user do (CTA, like "Create your first invoice")?
107
+
108
+ ### 14. Toast spam
109
+
110
+ Every UI event becomes a toast. Save successful → toast. Validation error → toast. Network slow → toast. Five stack up and the user reads none.
111
+
112
+ - **Why it happens:** Toast is the default feedback mechanism in component libraries.
113
+ - **Fix:** Toasts only for actions that need confirmation AWAY from the originating surface (background save, undo-able deletion). Inline feedback for form validation. Modal/banner for blocking errors. Cap at 2 stacked toasts.
114
+
115
+ ## The 17 anti-default values
116
+
117
+ Specific values that signal "no thought went into this." Avoid unless you can articulate WHY you picked exactly this one:
118
+
119
+ | Anti-default | Tell |
120
+ |--------------|------|
121
+ | `#3B82F6` (Tailwind blue-500) | The internet's default blue |
122
+ | `rounded-lg` everywhere | Universal default; no surface character |
123
+ | `shadow-md` on every card | Universal default; no depth hierarchy |
124
+ | `bg-gradient-to-br from-purple-500 to-pink-500` | "AI startup landing page" gradient |
125
+ | Inter as the only font choice | Default font of ~60% of new SaaS |
126
+ | `font-bold` for every emphasis | Bold is one tool, not the only tool |
127
+ | Lucide icons exclusively | One icon family is fine; signature is none |
128
+ | Generic "happy team" hero illustration | Placeholder energy |
129
+ | "Get Started" / "Learn More" CTA copy | Verb-less; says nothing |
130
+ | 4 / 8 / 12 / 16 spacing exclusively | The default 4-step scale; no rhythm |
131
+ | `border-gray-200` for every divider | Visual whisper; no intent |
132
+ | Sans-serif headlines + sans-serif body | No typographic contrast |
133
+ | Center-aligned everything | See pattern #7 |
134
+ | Animated CSS confetti on success | Cheesy; mismatches most brands |
135
+ | `bg-white dark:bg-gray-900` only | No real dark-mode design pass |
136
+ | Single-column form on a wide screen | Vertical scroll where horizontal fits |
137
+ | Modal for every interaction | Most modals should be inline editing |
138
+
139
+ ## How to apply this catalog
140
+
141
+ In `/dw-redesign-ui` step 4 (propose) — before presenting design directions, self-check against this list. If you're using a pattern, say why explicitly ("gradient crutch — intentional for marketing hero"). Sometimes the pattern IS the right call; the discipline is awareness, not absolutism.
142
+
143
+ In `/dw-code-review` UI section — grep the diff for the anti-default values and the patterns. Each hit becomes a finding under `dw-review-rigor`:
144
+ - Pattern on a NEW surface → `medium` severity.
145
+ - Pattern propagating EXISTING slop further → `low` severity (consistency wins).
146
+ - Pattern on a redesign that was supposed to fix slop → `high` severity (regression).
147
+
148
+ ## When the patterns bend
149
+
150
+ - **Marketing pages** can use gradients and emojis with more freedom — different surface job.
151
+ - **Brand-mandated** values override this list (if your brand IS `#3B82F6`, use it).
152
+ - **Component libraries** like shadcn ship neutral defaults — the discipline is to ADD character on top, not remove the neutrality.
@@ -1,170 +0,0 @@
1
- # Seven AI agent gates — mandatory when an LLM writes tests
2
-
3
- LLMs have characteristic failure modes when authoring tests. These gates are forcing functions for the seven most common.
4
-
5
- Every test produced by an agent (via `/dw-run-task`, `/dw-bugfix`, `/dw-autopilot`, or any other code-generating flow) must pass all seven gates BEFORE the diff is presented for review.
6
-
7
- ## Gate 1: Invariant first
8
-
9
- **The failure mode it blocks:** Agent writes 200 lines of test code without articulating what the test is supposed to prove.
10
-
11
- **The gate:**
12
-
13
- Before writing any test code, the agent prints:
14
-
15
- ```
16
- INVARIANT: <one sentence: what behavior is true that the test verifies>
17
- OWNING_LAYER: <unit | integration | contract | e2e>
18
- EXISTING_SUITE: <path to existing test file the new test joins>
19
- ```
20
-
21
- **Why it works:**
22
- - "Invariant" forces specific behavior naming.
23
- - "Owning layer" forces Law 2 (lowest detectable layer).
24
- - "Existing suite" forces extending coverage rather than spawning new files.
25
-
26
- **Verification:** In `/dw-code-review`, look for this 3-line preamble in the PR description or the commit body. Missing = REJECTED.
27
-
28
- ## Gate 2: Owning layer
29
-
30
- **The failure mode it blocks:** Agent creates a new test file every time, scattering coverage across orphan files. Or, agent writes E2E tests for things unit could prove.
31
-
32
- **The gate:**
33
-
34
- The agent must:
35
- 1. Identify the existing test suite that owns the module under test.
36
- 2. Extend that suite, OR document why a new suite is needed (genuinely new module, new test pyramid layer).
37
- 3. Map the test to the right layer per Law 2.
38
-
39
- **Verification:**
40
- - New test file in PR but existing file covers the same module? REJECTED.
41
- - E2E test for pure-logic invariant? REJECTED unless documented.
42
- - Unit test for cross-service flow? REJECTED — push to integration/E2E.
43
-
44
- ## Gate 3: Real execution
45
-
46
- **The failure mode it blocks:** Agent writes tests that mock everything. They pass green forever and validate nothing.
47
-
48
- **The gate:**
49
-
50
- Every test path the agent writes must, at SOME layer, run against real systems before the diff merges:
51
-
52
- - Pure logic: unit only is fine.
53
- - Code that touches DB: must have at least one integration test running real DB (testcontainers, ephemeral container, dedicated test DB).
54
- - Code that calls external services: must have a contract test OR a sandbox-account smoke test.
55
- - UI interactions: must have at least one E2E run on a real preview environment.
56
-
57
- **Verification:** PR description must list the real-system runs that exercise the touched code. If no real-system path covers the change, REJECTED.
58
-
59
- ## Gate 4: Failure → fix production
60
-
61
- **The failure mode it blocks:** Agent sees test red, modifies the test until green. Bug ships.
62
-
63
- **The gate:**
64
-
65
- When the agent encounters a failing test (its own or pre-existing):
66
-
67
- 1. Print: `INVESTIGATING FAILURE: <test name>`
68
- 2. Read production code in the path that produces the observed value.
69
- 3. Print: `ANALYSIS: <2-3 sentences on whether production is wrong, test is wrong, or invariant changed>`
70
- 4. Decide:
71
- - Production wrong → fix production.
72
- - Test wrong → fix test AND document the change in the commit body.
73
- - Invariant changed → update the test AND open an ADR if the change is a public contract change.
74
-
75
- **Verification:** Every commit that changes a previously-green test must have an `ANALYSIS:` line in the commit body explaining the decision. Missing = REJECTED.
76
-
77
- ## Gate 5: No snapshot without contract
78
-
79
- **The failure mode it blocks:** Agent reaches for `toMatchSnapshot()` whenever it doesn't know what to assert. Snapshot becomes the assertion. Drift goes unnoticed.
80
-
81
- **The gate:**
82
-
83
- Before adding a snapshot assertion, the agent classifies the artifact:
84
-
85
- - **PRODUCT_CONTRACT**: a stable contract worth pinning (e.g., serialized output of a public API, schema of a stored record). Snapshot is appropriate. Document the classification.
86
- - **IMPLEMENTATION_DETAIL**: HTML structure, internal representation, component tree shape. Snapshot is FORBIDDEN. Write specific assertions instead.
87
-
88
- **Verification:** Snapshots in the diff without a classification comment = REJECTED. Snapshots classified as IMPLEMENTATION_DETAIL = REJECTED.
89
-
90
- ## Gate 6: No assertion on self-set mock
91
-
92
- **The failure mode it blocks:** Agent writes `mockFn.mockReturnValue('X')`, then asserts `expect(mockFn()).toBe('X')`. Proves nothing.
93
-
94
- **The gate:**
95
-
96
- The agent cannot assert on values it directly fed into a mock. Assertions must be on:
97
- - The OUTPUT of production code that consumed the mock.
98
- - The SIDE EFFECTS (DB state, network calls, event emissions) caused by production code.
99
- - The VISIBLE behavior (UI change, log line, response) the user/caller observes.
100
-
101
- **Verification:** Diff analysis flags pairs where a literal value appears in BOTH a mock setup AND an assertion. Flagged = REJECTED unless the agent can show the value passed through production code.
102
-
103
- ## Gate 7: Negative companion
104
-
105
- **The failure mode it blocks:** Agent writes happy-path-only tests. Edge cases, error paths, boundaries uncovered.
106
-
107
- **The gate:**
108
-
109
- Every positive assertion the agent writes ships WITH at least one negative companion:
110
-
111
- - Asserting `createUser(validInput)` succeeds → also assert `createUser(invalidInput)` fails with a specific error.
112
- - Asserting `parseDate(validString)` returns a Date → also assert `parseDate(invalidString)` throws/returns null.
113
- - Asserting `transferFunds(...)` succeeds with sufficient balance → also assert it fails with insufficient balance.
114
-
115
- **Verification:** A PR adding N positive assertions must add ≥1 negative assertion per public path. Imbalance >3:1 (positive:negative) on a public path = REJECTED.
116
-
117
- ## How the gates compose
118
-
119
- Together, the seven gates produce tests that:
120
- 1. State what they prove (invariant first).
121
- 2. Live at the right layer (owning layer).
122
- 3. Exercise reality somewhere (real execution).
123
- 4. Reveal bugs when red (failure → fix production).
124
- 5. Assert specifically, not via snapshots (no snapshot w/o contract).
125
- 6. Assert outputs, not setup (no self-mock assertion).
126
- 7. Cover failures, not just success (negative companion).
127
-
128
- A test passing all seven is a test worth running. A test missing any one is more likely to mislead than help.
129
-
130
- ## Override procedure
131
-
132
- If an agent (or user) wants to skip a gate, they must:
133
- 1. State which gate is being skipped.
134
- 2. State why (one sentence).
135
- 3. Add a `// SKIP-GATE-N: <reason>` comment in the test.
136
- 4. Open a follow-up issue tracking the gap.
137
-
138
- Without all four, the gate is enforced.
139
-
140
- ## Prompt block to include when invoking the agent
141
-
142
- ```
143
- You are about to write tests. Before producing test code, complete the
144
- seven-gate preamble:
145
-
146
- INVARIANT: ___
147
- OWNING_LAYER: ___
148
- EXISTING_SUITE: ___
149
-
150
- If you cannot complete these three lines, STOP and ask the user for
151
- the requirement (do not invent an invariant).
152
-
153
- Then, while writing tests:
154
- - Real execution: name the real-system path covering this code.
155
- - On red: investigate production before changing tests; print ANALYSIS.
156
- - Snapshots: classify as PRODUCT_CONTRACT or IMPLEMENTATION_DETAIL.
157
- - Assertions: never assert on values you fed into a mock.
158
- - Coverage: every positive assertion needs a negative companion.
159
-
160
- Tests that violate gates without explicit SKIP-GATE-N comments will be
161
- REJECTED at review.
162
- ```
163
-
164
- `/dw-run-task` and `/dw-bugfix` inject this prompt before generating test code.
165
-
166
- ## Why these seven and not more
167
-
168
- These are the seven LLM failure modes empirically observed in test generation across multiple projects (per pedronauck/skills/testing-boss, MIT, plus dev-workflow internal observation). Other tendencies exist; they're either covered by the positive patterns (e.g., wall-clock waits) or have lower hit-rate.
169
-
170
- If a NEW LLM failure mode appears that none of the seven catches, add a gate AND document the failure mode that motivated it. Don't add gates speculatively.
@@ -1,128 +0,0 @@
1
- # Six Iron Laws — expanded with examples
2
-
3
- The laws are short for memorization. Each carries nuance that matters in practice.
4
-
5
- ## Law 1: Test the behavior, never the mock
6
-
7
- **What it means:** Your test asserts what the system DOES from the caller's perspective. It does not assert that internal call X was made with internal argument Y.
8
-
9
- **Why it matters:** A test bound to internal calls breaks the day you refactor — even when behavior didn't change. The "test is red, behavior is fine" experience erodes trust in the suite. Soon no one runs the suite.
10
-
11
- **Violation example:**
12
-
13
- ```javascript
14
- // BAD — asserting on mock internals
15
- test('createOrder calls inventory.reserve', () => {
16
- const inventory = { reserve: vi.fn() };
17
- createOrder({ items: [...] }, inventory);
18
- expect(inventory.reserve).toHaveBeenCalledWith(items, 'reserve');
19
- });
20
- ```
21
-
22
- You've asserted that `createOrder` USES the inventory adapter in a specific way. Now the refactor that consolidates `reserve` into `commit-with-reservation` breaks this test even though the order still gets created.
23
-
24
- **Correct version:**
25
-
26
- ```javascript
27
- // GOOD — asserting behavior
28
- test('createOrder reserves inventory before confirming', async () => {
29
- const result = await createOrder({ items: [...] });
30
- expect(result.status).toBe('confirmed');
31
- expect(await getInventoryFor(items[0].sku)).toBe(originalStock - 1);
32
- });
33
- ```
34
-
35
- Now the test cares about the OUTCOME (inventory decremented, order confirmed), not the path.
36
-
37
- ## Law 2: Push every test to the lowest layer that can detect the failure
38
-
39
- **What it means:** If a unit test can catch a bug, use it. If only an integration test can catch it, integration. If only an end-to-end run can catch it, E2E. Don't write E2E for what a unit can prove.
40
-
41
- **Why it matters:** Tests at lower layers run faster, fail more precisely, isolate the cause better. A bug in pure logic caught at unit takes 50ms and tells you the exact function. The same bug caught at E2E takes 30 seconds and tells you "checkout failed."
42
-
43
- **The pyramid resolved:**
44
-
45
- | Layer | Catches | Speed | Cost |
46
- |-------|---------|-------|------|
47
- | Unit | Pure logic, math, parsing, formatters | <100ms | low |
48
- | Integration | Module composition, DB queries, HTTP handlers | 500ms–5s | medium |
49
- | Contract | Producer/consumer agreement at API boundary | 1–10s | medium |
50
- | E2E | User journey across multiple services | 10s–60s | high |
51
-
52
- **Rule of thumb:**
53
- - If you can write a unit test for it, do so.
54
- - If unit can't reach it (needs DB, queue, real HTTP), write integration.
55
- - E2E only for journeys that NO lower layer can detect (browser-renders-correctly, third-party-callback-arrives, multi-step session state).
56
-
57
- ## Law 3: When a test fails, fix production first — change the test only after writing why
58
-
59
- **What it means:** A red test is a signal. The first question is "what's wrong with production?" Not "why is the test wrong?"
60
-
61
- **Why it matters:** Tests are weakened to pass FAR more often than they should be. "The behavior is fine; the test is too strict" is the slippery slope that leaves you with a green suite full of meaningless assertions.
62
-
63
- **Process when a test goes red:**
64
-
65
- 1. **Read the failure message.** What invariant did the test claim, and what did it observe?
66
- 2. **Read production code** in the path that produces the observation.
67
- 3. **Decide which is wrong.** If production violates the invariant, fix production. If the test mis-states the invariant, document WHY before relaxing.
68
- 4. **Commit the analysis** in the test's commit message or PR body. "Relaxed assertion from X to Y because <reason>" is auditable; "fix test" is not.
69
-
70
- **Anti-pattern:** Re-run the test until green. Auto-retry on flake. Add `.only` to skip the rest.
71
-
72
- ## Law 4: Real systems gate the merge. Mocks isolate; they do not validate.
73
-
74
- **What it means:** Before code merges to main, at least ONE test path exercised real systems (real DB, real route, real external integration in a sandbox or test account). Mocks are fine for fast unit feedback; they cannot decide "safe to ship."
75
-
76
- **Why it matters:** Mock drift is real. The mocked HTTP response from 3 months ago no longer matches the actual API. Tests pass; production fails on first real call.
77
-
78
- **Practical pattern:**
79
-
80
- - Unit tests: mock the world; run on every keystroke / on every commit.
81
- - Integration tests: real local DB (testcontainers, in-memory if equivalent); run on every PR.
82
- - Contract tests: real producer/consumer agreement check; run on every PR.
83
- - E2E: real preview environment with real services; run on PRs before merge to main.
84
-
85
- The discipline: no merge without a green E2E (or equivalent real-system check) for the touched path.
86
-
87
- ## Law 5: Coverage is a flashlight. Mutation score is a quality probe. Neither is a target.
88
-
89
- **What it means:**
90
- - **Coverage** tells you what lines executed. Useful as a NEGATIVE signal: 30% coverage = lots of dark code. Useless as a positive signal: 95% coverage with weak assertions is decorative.
91
- - **Mutation score** introduces small bugs (mutations) and measures whether tests catch them. A high mutation score means tests are actually probing behavior, not just executing lines.
92
- - Neither should be a number you optimize for. They're diagnostics.
93
-
94
- **Anti-pattern:** "We need 90% coverage to merge." Coverage as a gate produces tests written to pass the gate, not to find bugs.
95
-
96
- **Healthier framing:** "What lines in the touched diff are NOT covered? Why?" Sometimes the answer is "we don't care, it's logging." Sometimes it's "actually that's a critical branch, add a test."
97
-
98
- ## Law 6: No test-only methods, branches, or flags leak into production code
99
-
100
- **What it means:** Production code should not have `if (process.env.NODE_ENV === 'test') { ... }` branches. Should not have `// for testing only` methods exposed on classes. Should not export internals just for assertions.
101
-
102
- **Why it matters:** Production code carrying test-only logic is testing decorations leaking into the artifact users run. Bug surface grows; the test environment diverges from production.
103
-
104
- **Correct patterns:**
105
-
106
- - Need to inject a dependency for testing? Use constructor injection / dependency injection.
107
- - Need to assert on internal state? Add a logging hook or event emission that production also benefits from.
108
- - Need to bypass auth in tests? Use a dedicated test environment with test credentials, not a backdoor flag.
109
-
110
- **Tell tales:**
111
- - `// only used in tests` comments.
112
- - `*ForTesting` suffix on methods.
113
- - `vi.spyOn(module, '_internal')` accessing things prefixed with underscore.
114
- - `process.env.E2E_MODE` reaching into production runtime decisions.
115
-
116
- If you see these, the test design is wrong. Refactor production to be testable, don't add backdoors.
117
-
118
- ## Putting the laws together
119
-
120
- A healthy test:
121
- 1. Asserts behavior visible to a caller (Law 1).
122
- 2. Sits at the lowest layer that can prove that behavior (Law 2).
123
- 3. When red, sends you to read production code (Law 3).
124
- 4. Has a sibling that exercises real systems somewhere in the pipeline (Law 4).
125
- 5. Survives a mutation in the code it claims to cover (Law 5).
126
- 6. Has zero footprint in production code (Law 6).
127
-
128
- Any test that fails ≥2 of these is technical debt accumulating. `/dw-code-review` flags them.
@@ -1,162 +0,0 @@
1
- # Anti-slop catalog — 14 patterns + 17 anti-defaults
2
-
3
- Every pattern below is a category of slop that LLMs produce when ungrounded. Detection happens in `/dw-code-review` (UI diffs) and design proposals from `/dw-redesign-ui`.
4
-
5
- ## The 14 patterns
6
-
7
- ### 1. Visual sameness
8
-
9
- **What it looks like:** Every section of the page uses the same card style, same padding, same text size, same emphasis weight. The eye finds no anchor.
10
-
11
- **Why it happens:** LLM defaults to "consistent = good," missing that hierarchy needs deliberate variation.
12
-
13
- **Fix:** Establish one primary section per scroll-height. Use size, weight, color saturation, or whitespace differential to anchor the eye. Everything else recedes.
14
-
15
- **Example violation:** Dashboard with 6 identical-looking metric cards. **Fix:** One hero metric (largest, brightest, top), 3 supporting metrics, 2 minor metrics in a different visual treatment.
16
-
17
- ### 2. Weak hierarchy
18
-
19
- **What it looks like:** Headings barely larger than body. Important CTAs same color as secondary. The user can't tell what to look at first.
20
-
21
- **Why it happens:** Defaults to "elegant restraint" without ensuring guidance still works.
22
-
23
- **Fix:** Verify hierarchy by squinting at the design (literally) — what jumps out? If nothing jumps out, hierarchy is failing. Increase contrast in size, weight, or color for the primary element by at least 30%.
24
-
25
- ### 3. Fake interactivity
26
-
27
- **What it looks like:** Hover states that change opacity but the click does nothing meaningful. Buttons that look interactive but don't have a job. Cards with subtle hover but no on-click handler.
28
-
29
- **Why it happens:** LLM applies hover styles to anything that looks card-shaped.
30
-
31
- **Fix:** Hover state ONLY on elements that have an on-click. If it can't be clicked, don't suggest it can. Use cursor: default explicitly on non-interactive shapes.
32
-
33
- ### 4. Emoji spam
34
-
35
- **What it looks like:** 🎯 Goals · 🚀 Launch · ✨ Features · 📊 Analytics · 🔥 Trending — emojis as decoration in headers, CTAs, and section labels where they add no information.
36
-
37
- **Why it happens:** LLM training data has tons of "emoji in headers = engaging" patterns.
38
-
39
- **Fix:** Use icons (lucide, heroicons, tabler) for semantic meaning; reserve emojis for genuinely emotive contexts (celebrations, errors that need empathy). If you can remove the emoji and the meaning survives, remove it.
40
-
41
- ### 5. Gradient crutch
42
-
43
- **What it looks like:** Hero with diagonal purple-to-pink gradient. Buttons with subtle gradient. Card backgrounds with mesh gradients. Every empty space gets a gradient.
44
-
45
- **Why it happens:** Stable Diffusion / midjourney aesthetics leaked into UI; gradients hide weak composition.
46
-
47
- **Fix:** A gradient must earn its place — usually for hero zones with poetic copy, never for utility surfaces. Solid colors + good hierarchy beat gradients 9 times out of 10.
48
-
49
- ### 6. Glass everything
50
-
51
- **What it looks like:** Frosted-glass effect on modals, cards, dropdowns, side panels — anywhere a surface can be layered.
52
-
53
- **Why it happens:** Apple's macOS aesthetic. Looks "premium" without effort.
54
-
55
- **Fix:** Glass only when there's meaningful content visible behind the surface (a hero image, a busy dashboard). Glass on top of plain backgrounds adds blur for no reason.
56
-
57
- ### 7. Centered all the things
58
-
59
- **What it looks like:** Body paragraphs center-aligned. Headlines centered. Forms with labels centered above inputs. Every text block reads center.
60
-
61
- **Why it happens:** Marketing-page training data biases toward center-aligned.
62
-
63
- **Fix:** Center-align for hero headlines and small CTA labels only. Body text and forms read better left-aligned (in LTR scripts). Tabular data reads in columns, not centered.
64
-
65
- ### 8. AI gray washing
66
-
67
- **What it looks like:** Neutral gray palette everywhere. `slate-50`, `gray-100`, `zinc-200` for backgrounds, borders, text, accents. Nothing has color personality.
68
-
69
- **Why it happens:** "Neutral = safe" default, plus shadcn/ui's neutral start.
70
-
71
- **Fix:** Establish ONE accent color from the curated defaults or brand. Use it intentionally — primary CTAs, active states, the one place the user looks first. Gray is the canvas, not the painting.
72
-
73
- ### 9. Generic CTAs
74
-
75
- **What it looks like:** "Get Started" · "Learn More" · "Click Here" · "Submit" · "OK" buttons.
76
-
77
- **Why it happens:** Default LLM verb library.
78
-
79
- **Fix:** Use the verb the user is actually doing. "Approve refund" not "Submit". "Start free trial" not "Get Started". "Schedule a call" not "Contact us". Generic verbs are tells.
80
-
81
- ### 10. Stock illustration
82
-
83
- **What it looks like:** Figure-with-laptop hero art. Diverse-team-around-table illustration. Abstract floating shapes.
84
-
85
- **Why it happens:** "Illustration = friendly" default. Cheap to produce, generic to consume.
86
-
87
- **Fix:** Either use product screenshots (real screens, real data — sanitized) or skip illustration entirely. A clean hero with strong typography beats a generic illustration every time.
88
-
89
- ### 11. Drop shadow soup
90
-
91
- **What it looks like:** Cards with shadow. Buttons with shadow. Inputs with shadow. Tooltips with shadow on shadows. Borders AND shadows AND gradients on one element.
92
-
93
- **Why it happens:** Material Design leftover; depth as decoration.
94
-
95
- **Fix:** Pick ONE depth mechanism per layer. If cards have shadow, buttons inside should not. If you're using elevation systematically (Material 3), enforce the elevation hierarchy.
96
-
97
- ### 12. Loading spinner default
98
-
99
- **What it looks like:** Spinner overlay for every async operation, regardless of duration or context.
100
-
101
- **Why it happens:** Default fallback in every UI library.
102
-
103
- **Fix:**
104
- - <300ms: don't show anything. Spinner appearing then disappearing instantly is flicker.
105
- - 300ms-2s: skeleton loader (shape of incoming content).
106
- - 2s-10s: spinner + status text ("Loading orders...").
107
- - 10s+: progress bar or step indicator + cancel button.
108
-
109
- ### 13. Empty state void
110
-
111
- **What it looks like:** "No items found." Centered. Nothing else. User has no idea what to do.
112
-
113
- **Why it happens:** Empty state treated as edge case, not a real screen.
114
-
115
- **Fix:** Every empty state must answer: WHY is it empty (no data yet vs filter excluded everything vs error). WHAT should the user do next (CTA: "Create your first invoice"). Show example/illustration if it helps onboard.
116
-
117
- ### 14. Notification-soup tray
118
-
119
- **What it looks like:** Every UI event becomes a toast. Save successful → toast. Validation error → toast. Network slow → toast. Now there are 5 stacked toasts and the user can't read any.
120
-
121
- **Why it happens:** Toast is the default feedback mechanism in component libraries.
122
-
123
- **Fix:** Reserve toasts for actions that need confirmation AWAY from the originating surface (background save completed, deletion can be undone). Inline feedback for form validation. Modal/banner for blocking errors. NEVER stack >2 toasts at once.
124
-
125
- ## The 17 anti-defaults
126
-
127
- Specific values that signal "no thought was put in." Avoid unless you can articulate WHY you picked exactly this:
128
-
129
- | Anti-default | Why it's a tell |
130
- |--------------|-----------------|
131
- | `#3B82F6` (Tailwind blue-500) | The internet's default blue |
132
- | `rounded-lg` everywhere | Universal default; no surface character |
133
- | `shadow-md` on every card | Universal default; no depth hierarchy |
134
- | `bg-gradient-to-br from-purple-500 to-pink-500` | The "AI startup landing page" gradient |
135
- | Inter font as the only choice | Default font of 60% of new SaaS |
136
- | `font-bold` for everything emphasized | Bold is a tool, not the only tool |
137
- | Lucide icons exclusively | One icon family is fine; signature is none |
138
- | Stock "happy team" hero illustration | Generic placeholder energy |
139
- | "Get Started" / "Learn More" CTA copy | Verb-less; says nothing |
140
- | 4px / 8px / 12px / 16px spacing exclusively | The default 4-step scale; no rhythm |
141
- | `border-gray-200` for every divider | Visual whisper; no intentionality |
142
- | Sans-serif headlines + sans-serif body | No typographic contrast |
143
- | Center-aligned everything | See pattern #7 |
144
- | Animated CSS confetti on success | Cheesy; never matches the brand |
145
- | `bg-white dark:bg-gray-900` only | No real dark-mode design pass |
146
- | Single-column form on a wide screen | Vertical scroll where horizontal fits |
147
- | Modal-for-everything | Most modals should be inline editing |
148
-
149
- ## How to use this catalog
150
-
151
- In `/dw-redesign-ui` step 4 (propose) — before presenting design directions, self-check against this list. Flag any pattern you're consciously using AND say why (sometimes it's the right call; "gradient crutch" can be intentional for a marketing hero).
152
-
153
- In `/dw-code-review` UI section — grep the diff for the anti-defaults table values and the patterns. Each hit becomes a finding with `dw-review-rigor` severity:
154
- - Pattern violations on a NEW surface → `medium` severity.
155
- - Pattern violations spreading EXISTING surface's slop further → `low` severity (consistency wins).
156
- - Pattern violations on a redesign that was supposed to fix slop → `high` severity (regression).
157
-
158
- ## When the rules bend
159
-
160
- - **Marketing pages** can use gradients and emojis with more freedom — different surface job.
161
- - **Brand-mandated** anti-defaults override this list (if the brand IS `#3B82F6`, use it).
162
- - **Component libraries** like shadcn ship with neutral defaults — the discipline is to ADD character on top, not remove their neutrality.