safeword 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.claude/commands/arch-review.md +32 -0
  2. package/.claude/commands/lint.md +6 -0
  3. package/.claude/commands/quality-review.md +13 -0
  4. package/.claude/commands/setup-linting.md +6 -0
  5. package/.claude/hooks/auto-lint.sh +6 -0
  6. package/.claude/hooks/auto-quality-review.sh +170 -0
  7. package/.claude/hooks/check-linting-sync.sh +17 -0
  8. package/.claude/hooks/inject-timestamp.sh +6 -0
  9. package/.claude/hooks/question-protocol.sh +12 -0
  10. package/.claude/hooks/run-linters.sh +8 -0
  11. package/.claude/hooks/run-quality-review.sh +76 -0
  12. package/.claude/hooks/version-check.sh +10 -0
  13. package/.claude/mcp/README.md +96 -0
  14. package/.claude/mcp/arcade.sample.json +9 -0
  15. package/.claude/mcp/context7.sample.json +7 -0
  16. package/.claude/mcp/playwright.sample.json +7 -0
  17. package/.claude/settings.json +62 -0
  18. package/.claude/skills/quality-reviewer/SKILL.md +190 -0
  19. package/.claude/skills/safeword-quality-reviewer/SKILL.md +13 -0
  20. package/.env.arcade.example +4 -0
  21. package/.env.example +11 -0
  22. package/.gitmodules +4 -0
  23. package/.safeword/SAFEWORD.md +33 -0
  24. package/.safeword/eslint/eslint-base.mjs +101 -0
  25. package/.safeword/guides/architecture-guide.md +404 -0
  26. package/.safeword/guides/code-philosophy.md +174 -0
  27. package/.safeword/guides/context-files-guide.md +405 -0
  28. package/.safeword/guides/data-architecture-guide.md +183 -0
  29. package/.safeword/guides/design-doc-guide.md +165 -0
  30. package/.safeword/guides/learning-extraction.md +515 -0
  31. package/.safeword/guides/llm-instruction-design.md +239 -0
  32. package/.safeword/guides/llm-prompting.md +95 -0
  33. package/.safeword/guides/tdd-best-practices.md +570 -0
  34. package/.safeword/guides/test-definitions-guide.md +243 -0
  35. package/.safeword/guides/testing-methodology.md +573 -0
  36. package/.safeword/guides/user-story-guide.md +237 -0
  37. package/.safeword/guides/zombie-process-cleanup.md +214 -0
  38. package/{templates → .safeword}/hooks/agents-md-check.sh +0 -0
  39. package/{templates → .safeword}/hooks/post-tool.sh +0 -0
  40. package/{templates → .safeword}/hooks/pre-commit.sh +0 -0
  41. package/.safeword/planning/002-user-story-quality-evaluation.md +1840 -0
  42. package/.safeword/planning/003-langsmith-eval-setup-prompt.md +363 -0
  43. package/.safeword/planning/004-llm-eval-test-cases.md +3226 -0
  44. package/.safeword/planning/005-architecture-enforcement-system.md +169 -0
  45. package/.safeword/planning/006-reactive-fix-prevention-research.md +135 -0
  46. package/.safeword/planning/011-cli-ux-vision.md +330 -0
  47. package/.safeword/planning/012-project-structure-cleanup.md +154 -0
  48. package/.safeword/planning/README.md +39 -0
  49. package/.safeword/planning/automation-plan-v2.md +1225 -0
  50. package/.safeword/planning/automation-plan-v3.md +1291 -0
  51. package/.safeword/planning/automation-plan.md +3058 -0
  52. package/.safeword/planning/design/005-cli-implementation.md +343 -0
  53. package/.safeword/planning/design/013-cli-self-contained-templates.md +596 -0
  54. package/.safeword/planning/design/013a-eslint-plugin-suite.md +256 -0
  55. package/.safeword/planning/design/013b-implementation-snippets.md +385 -0
  56. package/.safeword/planning/design/013c-config-isolation-strategy.md +242 -0
  57. package/.safeword/planning/design/code-philosophy-improvements.md +60 -0
  58. package/.safeword/planning/mcp-analysis.md +545 -0
  59. package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +451 -0
  60. package/.safeword/planning/settings-improvements.md +970 -0
  61. package/.safeword/planning/test-definitions/005-cli-implementation.md +1301 -0
  62. package/.safeword/planning/test-definitions/cli-self-contained-templates.md +205 -0
  63. package/.safeword/planning/user-stories/001-guides-review-user-stories.md +1381 -0
  64. package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +132 -0
  65. package/.safeword/planning/user-stories/004-technical-constraints.md +86 -0
  66. package/.safeword/planning/user-stories/005-cli-implementation.md +311 -0
  67. package/.safeword/planning/user-stories/cli-self-contained-templates.md +172 -0
  68. package/.safeword/planning/versioned-distribution.md +740 -0
  69. package/.safeword/prompts/arch-review.md +43 -0
  70. package/.safeword/prompts/quality-review.md +11 -0
  71. package/.safeword/scripts/arch-review.sh +235 -0
  72. package/.safeword/scripts/check-linting-sync.sh +58 -0
  73. package/.safeword/scripts/setup-linting.sh +559 -0
  74. package/.safeword/templates/architecture-template.md +136 -0
  75. package/.safeword/templates/ci/architecture-check.yml +79 -0
  76. package/.safeword/templates/design-doc-template.md +127 -0
  77. package/.safeword/templates/test-definitions-feature.md +100 -0
  78. package/.safeword/templates/ticket-template.md +74 -0
  79. package/.safeword/templates/user-stories-template.md +82 -0
  80. package/.safeword/tickets/001-guides-review-user-stories.md +83 -0
  81. package/.safeword/tickets/002-architecture-enforcement.md +211 -0
  82. package/.safeword/tickets/003-reactive-fix-prevention.md +57 -0
  83. package/.safeword/tickets/004-technical-constraints-in-user-stories.md +39 -0
  84. package/.safeword/tickets/005-cli-implementation.md +248 -0
  85. package/.safeword/tickets/006-flesh-out-skills.md +43 -0
  86. package/.safeword/tickets/007-flesh-out-questioning.md +44 -0
  87. package/.safeword/tickets/008-upgrade-questioning.md +58 -0
  88. package/.safeword/tickets/009-naming-conventions.md +41 -0
  89. package/.safeword/tickets/010-safeword-md-cleanup.md +34 -0
  90. package/.safeword/tickets/011-cursor-setup.md +86 -0
  91. package/.safeword/tickets/README.md +73 -0
  92. package/.safeword/version +1 -0
  93. package/AGENTS.md +59 -0
  94. package/CLAUDE.md +12 -0
  95. package/README.md +347 -0
  96. package/docs/001-cli-implementation-plan.md +856 -0
  97. package/docs/elite-dx-implementation-plan.md +1034 -0
  98. package/framework/README.md +131 -0
  99. package/framework/mcp/README.md +96 -0
  100. package/framework/mcp/arcade.sample.json +8 -0
  101. package/framework/mcp/context7.sample.json +6 -0
  102. package/framework/mcp/playwright.sample.json +6 -0
  103. package/framework/scripts/arch-review.sh +235 -0
  104. package/framework/scripts/check-linting-sync.sh +58 -0
  105. package/framework/scripts/load-env.sh +49 -0
  106. package/framework/scripts/setup-claude.sh +223 -0
  107. package/framework/scripts/setup-linting.sh +559 -0
  108. package/framework/scripts/setup-quality.sh +477 -0
  109. package/framework/scripts/setup-safeword.sh +550 -0
  110. package/framework/templates/ci/architecture-check.yml +78 -0
  111. package/learnings/ai-sdk-v5-breaking-changes.md +178 -0
  112. package/learnings/e2e-test-zombie-processes.md +231 -0
  113. package/learnings/milkdown-crepe-editor-property.md +96 -0
  114. package/learnings/prosemirror-fragment-traversal.md +119 -0
  115. package/package.json +19 -43
  116. package/packages/cli/AGENTS.md +1 -0
  117. package/packages/cli/ARCHITECTURE.md +279 -0
  118. package/packages/cli/package.json +51 -0
  119. package/packages/cli/src/cli.ts +63 -0
  120. package/packages/cli/src/commands/check.ts +166 -0
  121. package/packages/cli/src/commands/diff.ts +209 -0
  122. package/packages/cli/src/commands/reset.ts +190 -0
  123. package/packages/cli/src/commands/setup.ts +325 -0
  124. package/packages/cli/src/commands/upgrade.ts +163 -0
  125. package/packages/cli/src/index.ts +3 -0
  126. package/packages/cli/src/templates/config.ts +58 -0
  127. package/packages/cli/src/templates/content.ts +18 -0
  128. package/packages/cli/src/templates/index.ts +12 -0
  129. package/packages/cli/src/utils/agents-md.ts +66 -0
  130. package/packages/cli/src/utils/fs.ts +179 -0
  131. package/packages/cli/src/utils/git.ts +124 -0
  132. package/packages/cli/src/utils/hooks.ts +29 -0
  133. package/packages/cli/src/utils/output.ts +60 -0
  134. package/packages/cli/src/utils/project-detector.test.ts +185 -0
  135. package/packages/cli/src/utils/project-detector.ts +44 -0
  136. package/packages/cli/src/utils/version.ts +28 -0
  137. package/packages/cli/src/version.ts +6 -0
  138. package/packages/cli/templates/SAFEWORD.md +776 -0
  139. package/packages/cli/templates/doc-templates/architecture-template.md +136 -0
  140. package/packages/cli/templates/doc-templates/design-doc-template.md +134 -0
  141. package/packages/cli/templates/doc-templates/test-definitions-feature.md +131 -0
  142. package/packages/cli/templates/doc-templates/ticket-template.md +82 -0
  143. package/packages/cli/templates/doc-templates/user-stories-template.md +92 -0
  144. package/packages/cli/templates/guides/architecture-guide.md +423 -0
  145. package/packages/cli/templates/guides/code-philosophy.md +195 -0
  146. package/packages/cli/templates/guides/context-files-guide.md +457 -0
  147. package/packages/cli/templates/guides/data-architecture-guide.md +200 -0
  148. package/packages/cli/templates/guides/design-doc-guide.md +171 -0
  149. package/packages/cli/templates/guides/learning-extraction.md +552 -0
  150. package/packages/cli/templates/guides/llm-instruction-design.md +248 -0
  151. package/packages/cli/templates/guides/llm-prompting.md +102 -0
  152. package/packages/cli/templates/guides/tdd-best-practices.md +615 -0
  153. package/packages/cli/templates/guides/test-definitions-guide.md +334 -0
  154. package/packages/cli/templates/guides/testing-methodology.md +618 -0
  155. package/packages/cli/templates/guides/user-story-guide.md +256 -0
  156. package/packages/cli/templates/guides/zombie-process-cleanup.md +219 -0
  157. package/packages/cli/templates/hooks/agents-md-check.sh +27 -0
  158. package/packages/cli/templates/hooks/post-tool.sh +4 -0
  159. package/packages/cli/templates/hooks/pre-commit.sh +10 -0
  160. package/packages/cli/templates/prompts/arch-review.md +43 -0
  161. package/packages/cli/templates/prompts/quality-review.md +10 -0
  162. package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
  163. package/packages/cli/tests/commands/check.test.ts +129 -0
  164. package/packages/cli/tests/commands/cli.test.ts +89 -0
  165. package/packages/cli/tests/commands/diff.test.ts +115 -0
  166. package/packages/cli/tests/commands/reset.test.ts +310 -0
  167. package/packages/cli/tests/commands/self-healing.test.ts +170 -0
  168. package/packages/cli/tests/commands/setup-blocking.test.ts +71 -0
  169. package/packages/cli/tests/commands/setup-core.test.ts +135 -0
  170. package/packages/cli/tests/commands/setup-git.test.ts +139 -0
  171. package/packages/cli/tests/commands/setup-hooks.test.ts +334 -0
  172. package/packages/cli/tests/commands/setup-linting.test.ts +189 -0
  173. package/packages/cli/tests/commands/setup-noninteractive.test.ts +80 -0
  174. package/packages/cli/tests/commands/setup-templates.test.ts +181 -0
  175. package/packages/cli/tests/commands/upgrade.test.ts +215 -0
  176. package/packages/cli/tests/helpers.ts +243 -0
  177. package/packages/cli/tests/npm-package.test.ts +83 -0
  178. package/packages/cli/tests/technical-constraints.test.ts +96 -0
  179. package/packages/cli/tsconfig.json +25 -0
  180. package/packages/cli/tsup.config.ts +11 -0
  181. package/packages/cli/vitest.config.ts +23 -0
  182. package/promptfoo.yaml +3270 -0
  183. package/dist/check-3NGQ4NR5.js +0 -129
  184. package/dist/check-3NGQ4NR5.js.map +0 -1
  185. package/dist/chunk-2XWIUEQK.js +0 -190
  186. package/dist/chunk-2XWIUEQK.js.map +0 -1
  187. package/dist/chunk-GZRQL3SX.js +0 -146
  188. package/dist/chunk-GZRQL3SX.js.map +0 -1
  189. package/dist/chunk-ORQHKDT2.js +0 -10
  190. package/dist/chunk-ORQHKDT2.js.map +0 -1
  191. package/dist/chunk-W66Z3C5H.js +0 -21
  192. package/dist/chunk-W66Z3C5H.js.map +0 -1
  193. package/dist/cli.d.ts +0 -1
  194. package/dist/cli.js +0 -34
  195. package/dist/cli.js.map +0 -1
  196. package/dist/diff-Y6QTAW4O.js +0 -166
  197. package/dist/diff-Y6QTAW4O.js.map +0 -1
  198. package/dist/index.d.ts +0 -11
  199. package/dist/index.js +0 -7
  200. package/dist/index.js.map +0 -1
  201. package/dist/reset-3ACTIYYE.js +0 -143
  202. package/dist/reset-3ACTIYYE.js.map +0 -1
  203. package/dist/setup-RR4M334C.js +0 -266
  204. package/dist/setup-RR4M334C.js.map +0 -1
  205. package/dist/upgrade-6AR3DHUV.js +0 -134
  206. package/dist/upgrade-6AR3DHUV.js.map +0 -1
  207. /package/{templates → framework}/SAFEWORD.md +0 -0
  208. /package/{templates → framework}/guides/architecture-guide.md +0 -0
  209. /package/{templates → framework}/guides/code-philosophy.md +0 -0
  210. /package/{templates → framework}/guides/context-files-guide.md +0 -0
  211. /package/{templates → framework}/guides/data-architecture-guide.md +0 -0
  212. /package/{templates → framework}/guides/design-doc-guide.md +0 -0
  213. /package/{templates → framework}/guides/learning-extraction.md +0 -0
  214. /package/{templates → framework}/guides/llm-instruction-design.md +0 -0
  215. /package/{templates → framework}/guides/llm-prompting.md +0 -0
  216. /package/{templates → framework}/guides/tdd-best-practices.md +0 -0
  217. /package/{templates → framework}/guides/test-definitions-guide.md +0 -0
  218. /package/{templates → framework}/guides/testing-methodology.md +0 -0
  219. /package/{templates → framework}/guides/user-story-guide.md +0 -0
  220. /package/{templates → framework}/guides/zombie-process-cleanup.md +0 -0
  221. /package/{templates → framework}/prompts/arch-review.md +0 -0
  222. /package/{templates → framework}/prompts/quality-review.md +0 -0
  223. /package/{templates/skills/safeword-quality-reviewer → framework/skills/quality-reviewer}/SKILL.md +0 -0
  224. /package/{templates/doc-templates → framework/templates}/architecture-template.md +0 -0
  225. /package/{templates/doc-templates → framework/templates}/design-doc-template.md +0 -0
  226. /package/{templates/doc-templates → framework/templates}/test-definitions-feature.md +0 -0
  227. /package/{templates/doc-templates → framework/templates}/ticket-template.md +0 -0
  228. /package/{templates/doc-templates → framework/templates}/user-stories-template.md +0 -0
  229. /package/{templates → packages/cli/templates}/commands/arch-review.md +0 -0
  230. /package/{templates → packages/cli/templates}/commands/lint.md +0 -0
  231. /package/{templates → packages/cli/templates}/commands/quality-review.md +0 -0
  232. /package/{templates → packages/cli/templates}/hooks/inject-timestamp.sh +0 -0
  233. /package/{templates → packages/cli/templates}/lib/common.sh +0 -0
  234. /package/{templates → packages/cli/templates}/lib/jq-fallback.sh +0 -0
  235. /package/{templates → packages/cli/templates}/markdownlint.jsonc +0 -0
@@ -0,0 +1,573 @@
1
+ # Testing Methodology
2
+
3
+ ---
4
+
5
+ ## Test Philosophy
6
+
7
+ **Test what matters** - Focus on user experience and delivered features, not implementation details.
8
+
9
+ **Always test what you build** - Run tests yourself before completion. Don't ask the user to verify.
10
+
11
+ ---
12
+
13
+ ## Test Integrity (CRITICAL)
14
+
15
+ **NEVER modify, skip, or delete tests without explicit human approval.**
16
+
17
+ Tests are the specification. When a test fails, the implementation is wrong—not the test.
18
+
19
+ ### Forbidden Actions (Require Approval)
20
+
21
+ | Action | Why It's Forbidden |
22
+ |--------|-------------------|
23
+ | Changing assertions to match broken code | Hides bugs instead of fixing them |
24
+ | Adding `.skip()`, `.only()`, `xit()`, `.todo()` | Makes failures invisible |
25
+ | Deleting tests you can't get passing | Removes coverage for edge cases |
26
+ | Weakening assertions (`toBe` → `toBeTruthy`) | Reduces test precision |
27
+ | Commenting out test code | Same as skipping |
28
+
29
+ ### What To Do Instead
30
+
31
+ 1. **Test fails?** → Fix the implementation, not the test
32
+ 2. **Test seems wrong?** → Explain why and ask: "This test expects X but I think it should expect Y because [reason]. Can I update it?"
33
+ 3. **Requirements changed?** → Explain the change and ask before updating tests to match new requirements
34
+ 4. **Test is flaky?** → Fix the flakiness (usually async issues), don't skip it
35
+ 5. **Test blocks progress?** → Ask for guidance, don't work around it
36
+
37
+ ---
38
+
39
+ ## Testing Principles
40
+
41
+ **Goal:** Catch bugs quickly and cheaply with fast feedback loops.
42
+
43
+ **Optimization rule:** Test with the fastest test type that can catch the bug.
44
+
45
+ **Tie-breaking rule:** If multiple test types apply, choose the faster one.
46
+
47
+ ### Test Speed Hierarchy (Fast → Slow)
48
+
49
+ ```
50
+ Unit (milliseconds) ← Pure functions, no I/O
51
+
52
+ Integration (seconds) ← Multiple modules, database, API calls
53
+
54
+ LLM Eval (seconds) ← AI judgment, costs $0.01-0.30 per run
55
+
56
+ E2E (seconds-minutes) ← Full browser, user flows
57
+ ```
58
+
59
+ ### Anti-Patterns: Testing at the Wrong Level
60
+
61
+ ❌ **Testing business logic with E2E tests**
62
+ ```typescript
63
+ // BAD: Launching browser to test a calculation
64
+ test('discount calculation', async ({ page }) => {
65
+ await page.goto('/checkout')
66
+ await page.fill('[name="price"]', '100')
67
+ await expect(page.locator('.total')).toContainText('80')
68
+ })
69
+
70
+ // GOOD: Unit test (runs in milliseconds)
71
+ it('applies 20% discount', () => {
72
+ expect(calculateDiscount(100, 0.20)).toBe(80)
73
+ })
74
+ ```
75
+
76
+ ❌ **Testing UI components at the wrong level**
77
+ ```typescript
78
+ // BAD: Heavy mocking in unit test (brittle, tests implementation details)
79
+ it('renders header', () => {
80
+ const mockProps = { /* 50 lines of mocks */ }
81
+ render(<Header {...mockProps} />)
82
+ expect(mockProps.onLogout).toHaveBeenCalled() // Testing implementation
83
+ })
84
+
85
+ // BETTER: Integration test (fast, tests behavior with real data)
86
+ it('renders header with username', () => {
87
+ render(<Header user={{ name: 'Alex' }} />)
88
+ expect(screen.getByRole('banner')).toHaveTextContent('Alex')
89
+ })
90
+
91
+ // BEST for testing full user flow: E2E test (only when needed for multi-page flows)
92
+ test('user sees header after login', async ({ page }) => {
93
+ await page.goto('/login')
94
+ await page.fill('[name="email"]', 'alex@example.com')
95
+ await page.click('button:has-text("Login")')
96
+ await expect(page.getByRole('banner')).toContainText('Alex')
97
+ })
98
+ ```
99
+
100
+ **Principle:** Use integration tests for component behavior, E2E tests for multi-page user flows.
101
+
102
+ ### Target Distribution (Guideline, Not Rule)
103
+
104
+ **Focus on speed, not strict ratios:**
105
+ - Write as many **fast tests** (unit + integration) as possible
106
+ - Write **E2E tests** only for critical user paths that require a browser
107
+ - Write **LLM evals** only for AI features requiring quality judgment
108
+
109
+ **Common patterns by architecture:**
110
+ - **Microservices:** More integration tests needed (test service contracts, API interactions)
111
+ - **UI-heavy apps:** More E2E tests needed (test multi-page flows, visual interactions)
112
+ - **Pure libraries:** Mostly unit tests (pure functions, no external dependencies)
113
+ - **AI-powered apps:** Add LLM evals (test prompt quality, reasoning accuracy)
114
+
115
+ **Red flag:** If you have more E2E tests than integration tests, your test suite is too slow.
116
+
117
+ ---
118
+
119
+ ## TDD Workflow (RED → GREEN → REFACTOR)
120
+
121
+ **Test-Driven Development** - Write tests BEFORE implementation. Tests define expected behavior, code makes them pass.
122
+
123
+ ### Phase 1: RED (Write Failing Tests)
124
+
125
+ **Steps:**
126
+ 1. Write test based on expected input/output
127
+ 2. **CRITICAL:** Run test and confirm it fails for the right reason
128
+ 3. **DO NOT write any implementation code yet**
129
+ 4. Commit the test when satisfied
130
+
131
+ **Critical warnings:**
132
+ - ⚠️ **No mock implementations** - Be explicit about TDD to avoid creating placeholder code for functionality that doesn't exist yet
133
+ - ⚠️ **Verify failure** - Test must fail before implementation (proves test works)
134
+ - ⚠️ **Performance** - Run single tests, not whole suite (`npm test -- path/to/file.test.ts`)
135
+
136
+ **Example:**
137
+ ```typescript
138
+ // RED: Write failing test
139
+ it('calculates total with tax', () => {
140
+ expect(calculateTotal(100, 0.08)).toBe(108) // FAILS - function doesn't exist
141
+ })
142
+ ```
143
+
144
+ ### Phase 2: GREEN (Make Tests Pass)
145
+
146
+ **Steps:**
147
+ 1. Write **minimum** code to make test pass
148
+ 2. Run test - verify it passes
149
+ 3. No extra features (YAGNI - You Ain't Gonna Need It)
150
+
151
+ **Example:**
152
+ ```typescript
153
+ // GREEN: Minimal implementation
154
+ function calculateTotal(amount: number, taxRate: number): number {
155
+ return amount + (amount * taxRate)
156
+ }
157
+ ```
158
+
159
+ ### Phase 3: REFACTOR (Clean Up)
160
+
161
+ **Steps:**
162
+ 1. Improve code quality without changing behavior
163
+ 2. Run tests - verify they still pass
164
+ 3. Remove duplication, improve naming
165
+
166
+ **Optional: Subagent Validation**
167
+ - Use independent AI instance to verify implementation isn't overfitting to tests
168
+ - Ask: "Does this implementation handle edge cases beyond the test scenarios?"
169
+
170
+ ---
171
+
172
+ ## When to Use Each Test Type
173
+
174
+ ### Decision Tree
175
+
176
+ Answer these questions in order to choose the test type. Questions are mutually exclusive - stop at the first match. If multiple seem to apply, use the tie-breaking rule (line 19): choose the faster one.
177
+
178
+ ```
179
+ 1. Does this test AI-generated content quality (tone, reasoning, creativity)?
180
+ └─ YES → LLM Evaluation
181
+ Examples: Narrative quality, prompt effectiveness, conversational naturalness
182
+ └─ NO → Continue to question 2
183
+
184
+ 2. Does this test require a real browser (Playwright/Cypress)?
185
+ └─ YES → E2E test
186
+ Examples: Multi-page navigation, browser-specific behavior (localStorage, cookies), visual regression, drag-and-drop
187
+ Note: React Testing Library does NOT require a browser - that's integration testing
188
+ └─ NO → Continue to question 3
189
+
190
+ 3. Does this test interactions between multiple components/services?
191
+ └─ YES → Integration test
192
+ Examples: API + database, React component + state store, service + external API
193
+ └─ NO → Continue to question 4
194
+
195
+ 4. Does this test a pure function (input → output, no I/O or side effects)?
196
+ └─ YES → Unit test
197
+ Examples: Calculations, formatters, validators, pure algorithms
198
+ └─ NO → Re-evaluate: What are you actually testing?
199
+ ```
200
+
201
+ **Edge cases:**
202
+ - **Non-deterministic functions** (Math.random(), Date.now(), UUID generation) → Unit test with mocked randomness/time
203
+ - **Functions with environment dependencies** (process.env, window.location) → Integration test
204
+ - **Mixed pure + I/O logic** → Extract pure logic into separate function → Unit test pure part, integration test I/O
205
+
206
+ **Re-evaluation guide:**
207
+ If testing behavior that doesn't fit the four categories:
208
+ 1. **Break it down:** Separate pure logic from I/O/UI concerns
209
+ 2. **Test each piece separately:** Pure logic → Unit, I/O → Integration, Multi-page flow → E2E
210
+ 3. **Example:** Login validation
211
+ - Pure: `isValidEmail(email)` → Unit test
212
+ - I/O: `checkUserExists(email)` → Integration test (hits database)
213
+ - Full flow: Login form → Dashboard → E2E test (multi-page)
214
+
215
+ ### What Bugs Can Each Test Type Catch?
216
+
217
+ Understanding which test type catches which bugs helps you choose the fastest effective test.
218
+
219
+ | Bug Type | Can Unit Test Catch? | Can Integration Test Catch? | Can E2E Test Catch? | Best Choice |
220
+ |----------|---------------------|----------------------------|-------------------|-------------|
221
+ | Calculation error | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
222
+ | Invalid input handling | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
223
+ | Database query returning wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
224
+ | API endpoint contract violation | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
225
+ | Race condition between services | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
226
+ | State management bug (Zustand, Redux) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
227
+ | React component rendering wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
228
+ | CSS layout broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
229
+ | Multi-page navigation broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
230
+ | Browser-specific rendering | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
231
+ | Form validation logic (isValidEmail) | ✅ YES | ✅ YES | ✅ YES | Unit (fastest, test pure logic) |
232
+ | Form validation UI (shows error message) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
233
+ | Form validation UX (multi-field flow) | ❌ NO | ❌ NO | ✅ YES | E2E (only option for full flow) |
234
+ | AI prompt quality degradation | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
235
+ | AI reasoning accuracy | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
236
+
237
+ **Key principle:** If multiple test types can catch the bug, choose the fastest one.
238
+
239
+ ---
240
+
241
+ ## Test Type Examples
242
+
243
+ ### 1. Unit Tests
244
+
245
+ **Note:** If your business logic needs a database, API, or file system, use an integration test instead.
246
+
247
+ **Example:**
248
+ ```typescript
249
+ // ✅ GOOD - Pure function
250
+ it('applies 20% discount for VIP users', () => {
251
+ expect(calculateDiscount(100, { tier: 'VIP' })).toBe(80)
252
+ })
253
+
254
+ // ❌ BAD - Testing implementation details
255
+ it('calls setState with correct value', () => {
256
+ expect(setState).toHaveBeenCalledWith({ count: 1 })
257
+ })
258
+ ```
259
+
260
+ ### 2. Integration Tests
261
+
262
+ **Key distinction:** Integration tests can render UI components but don't require a real browser. They run in Node.js with jsdom (simulated browser environment).
263
+
264
+ **Example:**
265
+ ```typescript
266
+ // ✅ GOOD - Tests agent + state integration
267
+ describe('Agent + State Integration', () => {
268
+ it('updates character state after agent processes action', async () => {
269
+ const agent = new GameAgent()
270
+ const store = useGameStore.getState()
271
+
272
+ await agent.processAction('attack guard')
273
+
274
+ expect(store.character.stress).toBeGreaterThan(0)
275
+ expect(store.messages).toHaveLength(2) // player + AI response
276
+ })
277
+ })
278
+ ```
279
+
280
+ ### 3. E2E Tests
281
+
282
+ **Example:**
283
+ ```typescript
284
+ // ✅ GOOD - Tests complete user flow
285
+ test('user creates account and first item', async ({ page }) => {
286
+ await page.goto('/signup')
287
+ await page.fill('[name="email"]', 'test@example.com')
288
+ await page.fill('[name="password"]', 'secure123')
289
+ await page.click('button:has-text("Sign Up")')
290
+ await expect(page).toHaveURL('/dashboard')
291
+
292
+ await page.click('text=New Item')
293
+ await page.fill('[name="title"]', 'My First Item')
294
+ await page.click('text=Save')
295
+ await expect(page.getByText('My First Item')).toBeVisible()
296
+ })
297
+ ```
298
+
299
+ ### E2E Testing with Persistent Dev Servers
300
+
301
+ When using Playwright for E2E tests, isolate persistent dev instances from test instances to avoid port conflicts and zombie processes.
302
+
303
+ **Port Isolation Strategy:**
304
+ - **Dev instance**: Project's configured port (e.g., 3000, 8080) - runs persistently for manual testing
305
+ - **Test instances**: `devPort + 1000` (e.g., 4000, 9080) - managed by Playwright
306
+ - **Fallback**: Ephemeral OS-assigned port if offset port is busy
307
+
308
+ **Process Management:**
309
+ - Dev instance runs persistently (started manually, survives test runs)
310
+ - Test instances spawn/cleanup per test run (Playwright manages lifecycle)
311
+ - Never kill processes on dev port range
312
+
313
+ **Playwright Configuration** (example uses 3000/4000 - adjust to your project's ports):
314
+ ```typescript
315
+ // playwright.config.ts
316
+ import { defineConfig } from '@playwright/test';
317
+
318
+ export default defineConfig({
319
+ webServer: {
320
+ command: 'npm run dev:test', // Test script with isolated port
321
+ port: 4000, // devPort + 1000 (e.g., 5173→6173)
322
+ reuseExistingServer: !process.env.CI, // Reuse locally, fresh in CI
323
+ timeout: 120000,
324
+ },
325
+ use: {
326
+ baseURL: 'http://localhost:4000', // Test against test instance
327
+ }
328
+ });
329
+ ```
330
+
331
+ **Package.json Scripts** (example uses 3000/4000 - adjust to your project's ports):
332
+ ```json
333
+ {
334
+ "scripts": {
335
+ "dev": "vite --port 3000", // Dev instance (manual testing)
336
+ "dev:test": "vite --port 4000", // Test instance (Playwright managed)
337
+ "test:e2e": "playwright test"
338
+ }
339
+ }
340
+ ```
341
+
342
+ **Why this pattern:**
343
+ - ✅ Manual testing on stable URL (dev instance always 3000)
344
+ - ✅ Automated tests isolated (Playwright controls lifecycle)
345
+ - ✅ No zombie processes (Playwright cleanup automatic)
346
+ - ✅ No port conflicts (predictable offset)
347
+ - ✅ Works in CI (fresh test instance every run)
348
+
349
+ **Alternative patterns:**
350
+ - Different projects use different ports (Next.js: 3000, Laravel: 8000, Rails: 3000)
351
+ - Dynamic offset adapts: `8000` → `9000`, `5173` → `6173`
352
+ - If offset port busy, Playwright can use ephemeral port (49152-65535)
353
+
354
+ **Cleanup:** For killing zombie dev/test servers, see `zombie-process-cleanup.md` → "Port-Based Cleanup"
355
+
356
+ ### 4. LLM Evaluations
357
+
358
+ **Cost:** ~$0.01-0.30 per test run (depends on prompt size, caching)
359
+
360
+ **Example:**
361
+ ```yaml
362
+ - description: "Infer user intent from casual input"
363
+ vars:
364
+ input: "I want to order a large pepperoni"
365
+ assert:
366
+ - type: javascript
367
+ value: JSON.parse(output).intent === 'order_pizza'
368
+ - type: llm-rubric
369
+ value: |
370
+ EXCELLENT: Confirms pizza type/size, asks for delivery details
371
+ POOR: Generic response or wrong intent
372
+ ```
373
+
374
+ **Assertion types:**
375
+
376
+ **Programmatic** (fast, deterministic):
377
+ - JSON schema validation
378
+ - Required fields present
379
+ - Values in valid ranges
380
+ - Output format compliance
381
+
382
+ **LLM-as-Judge** (nuanced, contextual):
383
+ - Reasoning quality
384
+ - Tone/style adherence
385
+ - Factual accuracy
386
+ - Conversational naturalness
387
+ - Domain expertise demonstration
388
+
389
+ **When to skip LLM evals:**
390
+ - Structured output validation (use programmatic tests)
391
+ - Simple classification tasks (unit tests sufficient)
392
+ - Non-AI features
393
+
394
+ ---
395
+
396
+ ## Cost Considerations
397
+
398
+ **LLM eval costs:** $0.01-0.30 per run depending on prompt size. **Prompt caching reduces costs by 90%** (30 scenarios: $0.30 → $0.03 after first run).
399
+
400
+ **Cost reduction strategies:**
401
+ - Cache static content (system prompts, examples, rules)
402
+ - Batch multiple scenarios in one run
403
+ - Run full evals on PR/schedule, not every commit
404
+
405
+ **ROI:** Catching one bad prompt change before production >> eval costs
406
+
407
+ ---
408
+
409
+ ## Test Coverage Goals
410
+
411
+ - **Unit tests:** 80%+ coverage of pure functions
412
+ - **Integration tests:** All critical paths covered (see definition below)
413
+ - **E2E tests:** All critical multi-page user flows have at least one E2E test
414
+ - **LLM evals:** All AI features have evaluation scenarios
415
+
416
+ **What are "critical paths"?**
417
+ - **Always critical:** Authentication, payment/checkout, data loss scenarios (delete, overwrite)
418
+ - **Usually critical:** Core user workflows (create → edit → publish), primary feature flows
419
+ - **Rarely critical:** UI polish (button colors, layout tweaks), admin-only features with low usage
420
+ - **Rule of thumb:** If it breaks, would users notice immediately and be unable to complete their main task?
421
+
422
+ ---
423
+
424
+ ## Writing Effective Tests
425
+
426
+ ### AAA Pattern (Arrange-Act-Assert)
427
+
428
+ Structure tests clearly: Setup data (Arrange) → Execute behavior (Act) → Verify expectations (Assert).
429
+
430
+ ```typescript
431
+ it('applies discount to VIP users', () => {
432
+ const user = { tier: 'VIP' }, cart = { total: 100 } // Arrange
433
+ const result = applyDiscount(user, cart) // Act
434
+ expect(result.total).toBe(80) // Assert
435
+ })
436
+ ```
437
+
438
+ ### Test Naming
439
+
440
+ Be descriptive and specific, not vague or implementation-focused.
441
+
442
+ ```typescript
443
+ // ✅ GOOD
444
+ it('returns 401 when API key is missing')
445
+ it('preserves user input after validation error')
446
+
447
+ // ❌ BAD
448
+ it('works correctly')
449
+ it('should call setState')
450
+ ```
451
+
452
+ ### Test Independence
453
+
454
+ **Each test should:**
455
+ - Run in any order
456
+ - Not depend on other tests
457
+ - Clean up its own state
458
+ - Use fresh fixtures/data
459
+
460
+ ```typescript
461
+ // ✅ GOOD - Fresh state per test
462
+ beforeEach(() => { gameState = createFreshGameState() })
463
+
464
+ // ❌ BAD - Shared state (test B depends on test A)
465
+ let sharedUser = createUser()
466
+ it('test A', () => { sharedUser.name = 'Alice' })
467
+ it('test B', () => { expect(sharedUser.name).toBe('Alice') })
468
+ ```
469
+
470
+ ### Async Testing
471
+
472
+ **NEVER use arbitrary timeouts** - Makes tests slow and non-deterministic.
473
+
474
+ ```typescript
475
+ // ❌ BAD - Arbitrary timeout
476
+ await page.waitForTimeout(3000) // What if it takes 3.1 seconds?
477
+ await sleep(500) // Flaky test
478
+
479
+ // ✅ GOOD - Poll until condition is met
480
+ await expect.poll(() => getStatus()).toBe('ready')
481
+ await page.waitForSelector('[data-testid="loaded"]')
482
+ await waitFor(() => expect(screen.getByText('Success')).toBeVisible())
483
+ ```
484
+
485
+ **Why:** Polling is deterministic (passes when condition is met) and faster (no unnecessary waiting).
486
+
487
+ ---
488
+
489
+ ## What Not to Test
490
+
491
+ ❌ **Implementation details** - Private methods, CSS classes, internal state, how (test what users see)
492
+ ❌ **Third-party libraries** - Assume React/Axios work, test YOUR code
493
+ ❌ **Trivial code** - Getters/setters with no logic, pass-through functions
494
+ ❌ **UI copy** - Exact text (use regex `/submit/i`), specific wording (test error shown, not message)
495
+
496
+ ---
497
+
498
+ ## CI/CD Integration
499
+
500
+ Run unit+integration tests on every commit (fast feedback), E2E tests on every PR, and LLM evals on schedule (weekly to catch regressions without per-commit cost).
501
+
502
+ ---
503
+
504
+ ## Quick Reference
505
+
506
+ | Need to test... | Test type | Technology | Speed | Cost |
507
+ |----------------|-----------|------------|-------|------|
508
+ | Pure function | Unit | Vitest | Fast | Free |
509
+ | Service integration | Integration | Vitest | Medium | Free |
510
+ | Full user flow | E2E | Playwright | Slow | Free |
511
+ | AI reasoning quality | LLM eval | Promptfoo | Slow | $0.01-0.30 |
512
+
513
+ ---
514
+
515
+ ## Project-Specific Testing Documentation
516
+
517
+ **Location:** `tests/SAFEWORD.md` (may be nested like `packages/web/tests/SAFEWORD.md` in monorepos)
518
+
519
+ **Purpose:** Document project-specific testing stack, commands, and setup. Supplements global methodology.
520
+
521
+ **What to include:**
522
+ - **Tech stack:** Testing frameworks (Vitest/Jest, Playwright/Cypress, Promptfoo)
523
+ - **Test commands:** How to run tests, including single-file execution for performance
524
+ - **Setup requirements:** API keys, build steps, database setup, browser installation
525
+ - **File structure:** Where tests live and naming conventions
526
+ - **Project patterns:** Custom helpers, fixtures, mocks, assertion styles
527
+ - **TDD guidance:** Project-specific workflow expectations (write tests first, commit tests before implementation)
528
+ - **Coverage requirements:** Minimum coverage thresholds or critical paths
529
+ - **PR requirements:** Test passage requirements before merge
530
+
531
+ **Example:**
532
+ ```markdown
533
+ # Testing
534
+
535
+ ## Tech Stack
536
+ - Unit/Integration: Vitest
537
+ - E2E: Playwright
538
+ - LLM Evals: Promptfoo
539
+
540
+ ## Commands
541
+ npm test # All tests
542
+ npm test -- path/to/file.test.ts # Single file (performance)
543
+ npm run test:coverage # With coverage report
544
+ npm run test:e2e # E2E tests only
545
+
546
+ ## TDD Workflow
547
+ 1. Write failing tests first (RED phase)
548
+ 2. Confirm tests fail: `npm test -- path/to/file.test.ts`
549
+ 3. Commit tests before implementation
550
+ 4. Implement minimum code to pass (GREEN phase)
551
+ 5. Refactor while keeping tests green
552
+
553
+ ## Setup
554
+ 1. Install: `npm install`
555
+ 2. Browsers: `npx playwright install`
556
+ 3. API keys: `export ANTHROPIC_API_KEY=sk-ant-...`
557
+ 4. Build before testing: `npm run build`
558
+
559
+ ## Coverage Requirements
560
+ - Unit tests: 80%+ for business logic
561
+ - E2E tests: All critical user paths
562
+
563
+ ## PR Requirements
564
+ - All tests must pass
565
+ - No skipped tests without justification
566
+ - Coverage thresholds met
567
+ ```
568
+
569
+ **If not found:** Ask user "Where are the testing docs?"
570
+
571
+ **Cascading precedence:**
572
+ 1. **Global** (`~/.claude/testing-methodology.md`) - Universal methodology (test type selection, TDD workflow)
573
+ 2. **Project** (`tests/SAFEWORD.md`) - Specific stack, commands, patterns