safeword 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/dist/check-3NGQ4NR5.js +129 -0
  2. package/dist/check-3NGQ4NR5.js.map +1 -0
  3. package/dist/chunk-2XWIUEQK.js +190 -0
  4. package/dist/chunk-2XWIUEQK.js.map +1 -0
  5. package/dist/chunk-GZRQL3SX.js +146 -0
  6. package/dist/chunk-GZRQL3SX.js.map +1 -0
  7. package/dist/chunk-ORQHKDT2.js +10 -0
  8. package/dist/chunk-ORQHKDT2.js.map +1 -0
  9. package/dist/chunk-W66Z3C5H.js +21 -0
  10. package/dist/chunk-W66Z3C5H.js.map +1 -0
  11. package/dist/cli.d.ts +1 -0
  12. package/dist/cli.js +34 -0
  13. package/dist/cli.js.map +1 -0
  14. package/dist/diff-Y6QTAW4O.js +166 -0
  15. package/dist/diff-Y6QTAW4O.js.map +1 -0
  16. package/dist/index.d.ts +11 -0
  17. package/dist/index.js +7 -0
  18. package/dist/index.js.map +1 -0
  19. package/dist/reset-3ACTIYYE.js +143 -0
  20. package/dist/reset-3ACTIYYE.js.map +1 -0
  21. package/dist/setup-RR4M334C.js +266 -0
  22. package/dist/setup-RR4M334C.js.map +1 -0
  23. package/dist/upgrade-6AR3DHUV.js +134 -0
  24. package/dist/upgrade-6AR3DHUV.js.map +1 -0
  25. package/package.json +44 -19
  26. package/{.safeword → templates}/hooks/agents-md-check.sh +0 -0
  27. package/{.safeword → templates}/hooks/post-tool.sh +0 -0
  28. package/{.safeword → templates}/hooks/pre-commit.sh +0 -0
  29. package/.claude/commands/arch-review.md +0 -32
  30. package/.claude/commands/lint.md +0 -6
  31. package/.claude/commands/quality-review.md +0 -13
  32. package/.claude/commands/setup-linting.md +0 -6
  33. package/.claude/hooks/auto-lint.sh +0 -6
  34. package/.claude/hooks/auto-quality-review.sh +0 -170
  35. package/.claude/hooks/check-linting-sync.sh +0 -17
  36. package/.claude/hooks/inject-timestamp.sh +0 -6
  37. package/.claude/hooks/question-protocol.sh +0 -12
  38. package/.claude/hooks/run-linters.sh +0 -8
  39. package/.claude/hooks/run-quality-review.sh +0 -76
  40. package/.claude/hooks/version-check.sh +0 -10
  41. package/.claude/mcp/README.md +0 -96
  42. package/.claude/mcp/arcade.sample.json +0 -9
  43. package/.claude/mcp/context7.sample.json +0 -7
  44. package/.claude/mcp/playwright.sample.json +0 -7
  45. package/.claude/settings.json +0 -62
  46. package/.claude/skills/quality-reviewer/SKILL.md +0 -190
  47. package/.claude/skills/safeword-quality-reviewer/SKILL.md +0 -13
  48. package/.env.arcade.example +0 -4
  49. package/.env.example +0 -11
  50. package/.gitmodules +0 -4
  51. package/.safeword/SAFEWORD.md +0 -33
  52. package/.safeword/eslint/eslint-base.mjs +0 -101
  53. package/.safeword/guides/architecture-guide.md +0 -404
  54. package/.safeword/guides/code-philosophy.md +0 -174
  55. package/.safeword/guides/context-files-guide.md +0 -405
  56. package/.safeword/guides/data-architecture-guide.md +0 -183
  57. package/.safeword/guides/design-doc-guide.md +0 -165
  58. package/.safeword/guides/learning-extraction.md +0 -515
  59. package/.safeword/guides/llm-instruction-design.md +0 -239
  60. package/.safeword/guides/llm-prompting.md +0 -95
  61. package/.safeword/guides/tdd-best-practices.md +0 -570
  62. package/.safeword/guides/test-definitions-guide.md +0 -243
  63. package/.safeword/guides/testing-methodology.md +0 -573
  64. package/.safeword/guides/user-story-guide.md +0 -237
  65. package/.safeword/guides/zombie-process-cleanup.md +0 -214
  66. package/.safeword/planning/002-user-story-quality-evaluation.md +0 -1840
  67. package/.safeword/planning/003-langsmith-eval-setup-prompt.md +0 -363
  68. package/.safeword/planning/004-llm-eval-test-cases.md +0 -3226
  69. package/.safeword/planning/005-architecture-enforcement-system.md +0 -169
  70. package/.safeword/planning/006-reactive-fix-prevention-research.md +0 -135
  71. package/.safeword/planning/011-cli-ux-vision.md +0 -330
  72. package/.safeword/planning/012-project-structure-cleanup.md +0 -154
  73. package/.safeword/planning/README.md +0 -39
  74. package/.safeword/planning/automation-plan-v2.md +0 -1225
  75. package/.safeword/planning/automation-plan-v3.md +0 -1291
  76. package/.safeword/planning/automation-plan.md +0 -3058
  77. package/.safeword/planning/design/005-cli-implementation.md +0 -343
  78. package/.safeword/planning/design/013-cli-self-contained-templates.md +0 -596
  79. package/.safeword/planning/design/013a-eslint-plugin-suite.md +0 -256
  80. package/.safeword/planning/design/013b-implementation-snippets.md +0 -385
  81. package/.safeword/planning/design/013c-config-isolation-strategy.md +0 -242
  82. package/.safeword/planning/design/code-philosophy-improvements.md +0 -60
  83. package/.safeword/planning/mcp-analysis.md +0 -545
  84. package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +0 -451
  85. package/.safeword/planning/settings-improvements.md +0 -970
  86. package/.safeword/planning/test-definitions/005-cli-implementation.md +0 -1301
  87. package/.safeword/planning/test-definitions/cli-self-contained-templates.md +0 -205
  88. package/.safeword/planning/user-stories/001-guides-review-user-stories.md +0 -1381
  89. package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +0 -132
  90. package/.safeword/planning/user-stories/004-technical-constraints.md +0 -86
  91. package/.safeword/planning/user-stories/005-cli-implementation.md +0 -311
  92. package/.safeword/planning/user-stories/cli-self-contained-templates.md +0 -172
  93. package/.safeword/planning/versioned-distribution.md +0 -740
  94. package/.safeword/prompts/arch-review.md +0 -43
  95. package/.safeword/prompts/quality-review.md +0 -11
  96. package/.safeword/scripts/arch-review.sh +0 -235
  97. package/.safeword/scripts/check-linting-sync.sh +0 -58
  98. package/.safeword/scripts/setup-linting.sh +0 -559
  99. package/.safeword/templates/architecture-template.md +0 -136
  100. package/.safeword/templates/ci/architecture-check.yml +0 -79
  101. package/.safeword/templates/design-doc-template.md +0 -127
  102. package/.safeword/templates/test-definitions-feature.md +0 -100
  103. package/.safeword/templates/ticket-template.md +0 -74
  104. package/.safeword/templates/user-stories-template.md +0 -82
  105. package/.safeword/tickets/001-guides-review-user-stories.md +0 -83
  106. package/.safeword/tickets/002-architecture-enforcement.md +0 -211
  107. package/.safeword/tickets/003-reactive-fix-prevention.md +0 -57
  108. package/.safeword/tickets/004-technical-constraints-in-user-stories.md +0 -39
  109. package/.safeword/tickets/005-cli-implementation.md +0 -248
  110. package/.safeword/tickets/006-flesh-out-skills.md +0 -43
  111. package/.safeword/tickets/007-flesh-out-questioning.md +0 -44
  112. package/.safeword/tickets/008-upgrade-questioning.md +0 -58
  113. package/.safeword/tickets/009-naming-conventions.md +0 -41
  114. package/.safeword/tickets/010-safeword-md-cleanup.md +0 -34
  115. package/.safeword/tickets/011-cursor-setup.md +0 -86
  116. package/.safeword/tickets/README.md +0 -73
  117. package/.safeword/version +0 -1
  118. package/AGENTS.md +0 -59
  119. package/CLAUDE.md +0 -12
  120. package/README.md +0 -347
  121. package/docs/001-cli-implementation-plan.md +0 -856
  122. package/docs/elite-dx-implementation-plan.md +0 -1034
  123. package/framework/README.md +0 -131
  124. package/framework/mcp/README.md +0 -96
  125. package/framework/mcp/arcade.sample.json +0 -8
  126. package/framework/mcp/context7.sample.json +0 -6
  127. package/framework/mcp/playwright.sample.json +0 -6
  128. package/framework/scripts/arch-review.sh +0 -235
  129. package/framework/scripts/check-linting-sync.sh +0 -58
  130. package/framework/scripts/load-env.sh +0 -49
  131. package/framework/scripts/setup-claude.sh +0 -223
  132. package/framework/scripts/setup-linting.sh +0 -559
  133. package/framework/scripts/setup-quality.sh +0 -477
  134. package/framework/scripts/setup-safeword.sh +0 -550
  135. package/framework/templates/ci/architecture-check.yml +0 -78
  136. package/learnings/ai-sdk-v5-breaking-changes.md +0 -178
  137. package/learnings/e2e-test-zombie-processes.md +0 -231
  138. package/learnings/milkdown-crepe-editor-property.md +0 -96
  139. package/learnings/prosemirror-fragment-traversal.md +0 -119
  140. package/packages/cli/AGENTS.md +0 -1
  141. package/packages/cli/ARCHITECTURE.md +0 -279
  142. package/packages/cli/package.json +0 -51
  143. package/packages/cli/src/cli.ts +0 -63
  144. package/packages/cli/src/commands/check.ts +0 -166
  145. package/packages/cli/src/commands/diff.ts +0 -209
  146. package/packages/cli/src/commands/reset.ts +0 -190
  147. package/packages/cli/src/commands/setup.ts +0 -325
  148. package/packages/cli/src/commands/upgrade.ts +0 -163
  149. package/packages/cli/src/index.ts +0 -3
  150. package/packages/cli/src/templates/config.ts +0 -58
  151. package/packages/cli/src/templates/content.ts +0 -18
  152. package/packages/cli/src/templates/index.ts +0 -12
  153. package/packages/cli/src/utils/agents-md.ts +0 -66
  154. package/packages/cli/src/utils/fs.ts +0 -179
  155. package/packages/cli/src/utils/git.ts +0 -124
  156. package/packages/cli/src/utils/hooks.ts +0 -29
  157. package/packages/cli/src/utils/output.ts +0 -60
  158. package/packages/cli/src/utils/project-detector.test.ts +0 -185
  159. package/packages/cli/src/utils/project-detector.ts +0 -44
  160. package/packages/cli/src/utils/version.ts +0 -28
  161. package/packages/cli/src/version.ts +0 -6
  162. package/packages/cli/templates/SAFEWORD.md +0 -776
  163. package/packages/cli/templates/doc-templates/architecture-template.md +0 -136
  164. package/packages/cli/templates/doc-templates/design-doc-template.md +0 -134
  165. package/packages/cli/templates/doc-templates/test-definitions-feature.md +0 -131
  166. package/packages/cli/templates/doc-templates/ticket-template.md +0 -82
  167. package/packages/cli/templates/doc-templates/user-stories-template.md +0 -92
  168. package/packages/cli/templates/guides/architecture-guide.md +0 -423
  169. package/packages/cli/templates/guides/code-philosophy.md +0 -195
  170. package/packages/cli/templates/guides/context-files-guide.md +0 -457
  171. package/packages/cli/templates/guides/data-architecture-guide.md +0 -200
  172. package/packages/cli/templates/guides/design-doc-guide.md +0 -171
  173. package/packages/cli/templates/guides/learning-extraction.md +0 -552
  174. package/packages/cli/templates/guides/llm-instruction-design.md +0 -248
  175. package/packages/cli/templates/guides/llm-prompting.md +0 -102
  176. package/packages/cli/templates/guides/tdd-best-practices.md +0 -615
  177. package/packages/cli/templates/guides/test-definitions-guide.md +0 -334
  178. package/packages/cli/templates/guides/testing-methodology.md +0 -618
  179. package/packages/cli/templates/guides/user-story-guide.md +0 -256
  180. package/packages/cli/templates/guides/zombie-process-cleanup.md +0 -219
  181. package/packages/cli/templates/hooks/agents-md-check.sh +0 -27
  182. package/packages/cli/templates/hooks/post-tool.sh +0 -4
  183. package/packages/cli/templates/hooks/pre-commit.sh +0 -10
  184. package/packages/cli/templates/prompts/arch-review.md +0 -43
  185. package/packages/cli/templates/prompts/quality-review.md +0 -10
  186. package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +0 -207
  187. package/packages/cli/tests/commands/check.test.ts +0 -129
  188. package/packages/cli/tests/commands/cli.test.ts +0 -89
  189. package/packages/cli/tests/commands/diff.test.ts +0 -115
  190. package/packages/cli/tests/commands/reset.test.ts +0 -310
  191. package/packages/cli/tests/commands/self-healing.test.ts +0 -170
  192. package/packages/cli/tests/commands/setup-blocking.test.ts +0 -71
  193. package/packages/cli/tests/commands/setup-core.test.ts +0 -135
  194. package/packages/cli/tests/commands/setup-git.test.ts +0 -139
  195. package/packages/cli/tests/commands/setup-hooks.test.ts +0 -334
  196. package/packages/cli/tests/commands/setup-linting.test.ts +0 -189
  197. package/packages/cli/tests/commands/setup-noninteractive.test.ts +0 -80
  198. package/packages/cli/tests/commands/setup-templates.test.ts +0 -181
  199. package/packages/cli/tests/commands/upgrade.test.ts +0 -215
  200. package/packages/cli/tests/helpers.ts +0 -243
  201. package/packages/cli/tests/npm-package.test.ts +0 -83
  202. package/packages/cli/tests/technical-constraints.test.ts +0 -96
  203. package/packages/cli/tsconfig.json +0 -25
  204. package/packages/cli/tsup.config.ts +0 -11
  205. package/packages/cli/vitest.config.ts +0 -23
  206. package/promptfoo.yaml +0 -3270
  207. /package/{framework → templates}/SAFEWORD.md +0 -0
  208. /package/{packages/cli/templates → templates}/commands/arch-review.md +0 -0
  209. /package/{packages/cli/templates → templates}/commands/lint.md +0 -0
  210. /package/{packages/cli/templates → templates}/commands/quality-review.md +0 -0
  211. /package/{framework/templates → templates/doc-templates}/architecture-template.md +0 -0
  212. /package/{framework/templates → templates/doc-templates}/design-doc-template.md +0 -0
  213. /package/{framework/templates → templates/doc-templates}/test-definitions-feature.md +0 -0
  214. /package/{framework/templates → templates/doc-templates}/ticket-template.md +0 -0
  215. /package/{framework/templates → templates/doc-templates}/user-stories-template.md +0 -0
  216. /package/{framework → templates}/guides/architecture-guide.md +0 -0
  217. /package/{framework → templates}/guides/code-philosophy.md +0 -0
  218. /package/{framework → templates}/guides/context-files-guide.md +0 -0
  219. /package/{framework → templates}/guides/data-architecture-guide.md +0 -0
  220. /package/{framework → templates}/guides/design-doc-guide.md +0 -0
  221. /package/{framework → templates}/guides/learning-extraction.md +0 -0
  222. /package/{framework → templates}/guides/llm-instruction-design.md +0 -0
  223. /package/{framework → templates}/guides/llm-prompting.md +0 -0
  224. /package/{framework → templates}/guides/tdd-best-practices.md +0 -0
  225. /package/{framework → templates}/guides/test-definitions-guide.md +0 -0
  226. /package/{framework → templates}/guides/testing-methodology.md +0 -0
  227. /package/{framework → templates}/guides/user-story-guide.md +0 -0
  228. /package/{framework → templates}/guides/zombie-process-cleanup.md +0 -0
  229. /package/{packages/cli/templates → templates}/hooks/inject-timestamp.sh +0 -0
  230. /package/{packages/cli/templates → templates}/lib/common.sh +0 -0
  231. /package/{packages/cli/templates → templates}/lib/jq-fallback.sh +0 -0
  232. /package/{packages/cli/templates → templates}/markdownlint.jsonc +0 -0
  233. /package/{framework → templates}/prompts/arch-review.md +0 -0
  234. /package/{framework → templates}/prompts/quality-review.md +0 -0
  235. /package/{framework/skills/quality-reviewer → templates/skills/safeword-quality-reviewer}/SKILL.md +0 -0
@@ -1,618 +0,0 @@
1
- # Testing Methodology
2
-
3
- ---
4
-
5
- ## Test Philosophy
6
-
7
- **Test what matters** - Focus on user experience and delivered features, not implementation details.
8
-
9
- **Always test what you build** - Run tests yourself before completion. Don't ask the user to verify.
10
-
11
- ---
12
-
13
- ## Test Integrity (CRITICAL)
14
-
15
- **NEVER modify, skip, or delete tests without explicit human approval.**
16
-
17
- Tests are the specification. When a test fails, the implementation is wrong—not the test.
18
-
19
- ### Forbidden Actions (Require Approval)
20
-
21
- | Action | Why It's Forbidden |
22
- | ----------------------------------------------- | --------------------------------- |
23
- | Changing assertions to match broken code | Hides bugs instead of fixing them |
24
- | Adding `.skip()`, `.only()`, `xit()`, `.todo()` | Makes failures invisible |
25
- | Deleting tests you can't get passing | Removes coverage for edge cases |
26
- | Weakening assertions (`toBe` → `toBeTruthy`) | Reduces test precision |
27
- | Commenting out test code | Same as skipping |
28
-
29
- ### What To Do Instead
30
-
31
- 1. **Test fails?** → Fix the implementation, not the test
32
- 2. **Test seems wrong?** → Explain why and ask: "This test expects X but I think it should expect Y because [reason]. Can I update it?"
33
- 3. **Requirements changed?** → Explain the change and ask before updating tests to match new requirements
34
- 4. **Test is flaky?** → Fix the flakiness (usually async issues), don't skip it
35
- 5. **Test blocks progress?** → Ask for guidance, don't work around it
36
-
37
- ---
38
-
39
- ## Testing Principles
40
-
41
- **Goal:** Catch bugs quickly and cheaply with fast feedback loops.
42
-
43
- **Optimization rule:** Test with the fastest test type that can catch the bug.
44
-
45
- **Tie-breaking rule:** If multiple test types apply, choose the faster one.
46
-
47
- ### Test Speed Hierarchy (Fast → Slow)
48
-
49
- ```
50
- Unit (milliseconds) ← Pure functions, no I/O
51
-
52
- Integration (seconds) ← Multiple modules, database, API calls
53
-
54
- LLM Eval (seconds) ← AI judgment, costs $0.01-0.30 per run
55
-
56
- E2E (seconds-minutes) ← Full browser, user flows
57
- ```
58
-
59
- ### Anti-Patterns: Testing at the Wrong Level
60
-
61
- ❌ **Testing business logic with E2E tests**
62
-
63
- ```typescript
64
- // BAD: Launching browser to test a calculation
65
- test('discount calculation', async ({ page }) => {
66
- await page.goto('/checkout');
67
- await page.fill('[name="price"]', '100');
68
- await expect(page.locator('.total')).toContainText('80');
69
- });
70
-
71
- // GOOD: Unit test (runs in milliseconds)
72
- it('applies 20% discount', () => {
73
- expect(calculateDiscount(100, 0.2)).toBe(80);
74
- });
75
- ```
76
-
77
- ❌ **Testing UI components at the wrong level**
78
-
79
- ```typescript
80
- // BAD: Heavy mocking in unit test (brittle, tests implementation details)
81
- it('renders header', () => {
82
- const mockProps = { /* 50 lines of mocks */ }
83
- render(<Header {...mockProps} />)
84
- expect(mockProps.onLogout).toHaveBeenCalled() // Testing implementation
85
- })
86
-
87
- // BETTER: Integration test (fast, tests behavior with real data)
88
- it('renders header with username', () => {
89
- render(<Header user={{ name: 'Alex' }} />)
90
- expect(screen.getByRole('banner')).toHaveTextContent('Alex')
91
- })
92
-
93
- // BEST for testing full user flow: E2E test (only when needed for multi-page flows)
94
- test('user sees header after login', async ({ page }) => {
95
- await page.goto('/login')
96
- await page.fill('[name="email"]', 'alex@example.com')
97
- await page.click('button:has-text("Login")')
98
- await expect(page.getByRole('banner')).toContainText('Alex')
99
- })
100
- ```
101
-
102
- **Principle:** Use integration tests for component behavior, E2E tests for multi-page user flows.
103
-
104
- ### Target Distribution (Guideline, Not Rule)
105
-
106
- **Focus on speed, not strict ratios:**
107
-
108
- - Write as many **fast tests** (unit + integration) as possible
109
- - Write **E2E tests** only for critical user paths that require a browser
110
- - Write **LLM evals** only for AI features requiring quality judgment
111
-
112
- **Common patterns by architecture:**
113
-
114
- - **Microservices:** More integration tests needed (test service contracts, API interactions)
115
- - **UI-heavy apps:** More E2E tests needed (test multi-page flows, visual interactions)
116
- - **Pure libraries:** Mostly unit tests (pure functions, no external dependencies)
117
- - **AI-powered apps:** Add LLM evals (test prompt quality, reasoning accuracy)
118
-
119
- **Red flag:** If you have more E2E tests than integration tests, your test suite is too slow.
120
-
121
- ---
122
-
123
- ## TDD Workflow (RED → GREEN → REFACTOR)
124
-
125
- **Test-Driven Development** - Write tests BEFORE implementation. Tests define expected behavior, code makes them pass.
126
-
127
- ### Phase 1: RED (Write Failing Tests)
128
-
129
- **Steps:**
130
-
131
- 1. Write test based on expected input/output
132
- 2. **CRITICAL:** Run test and confirm it fails for the right reason
133
- 3. **DO NOT write any implementation code yet**
134
- 4. Commit the test when satisfied
135
-
136
- **Critical warnings:**
137
-
138
- - ⚠️ **No mock implementations** - Be explicit about TDD to avoid creating placeholder code for functionality that doesn't exist yet
139
- - ⚠️ **Verify failure** - Test must fail before implementation (proves test works)
140
- - ⚠️ **Performance** - Run single tests, not whole suite (`npm test -- path/to/file.test.ts`)
141
-
142
- **Example:**
143
-
144
- ```typescript
145
- // RED: Write failing test
146
- it('calculates total with tax', () => {
147
- expect(calculateTotal(100, 0.08)).toBe(108); // FAILS - function doesn't exist
148
- });
149
- ```
150
-
151
- ### Phase 2: GREEN (Make Tests Pass)
152
-
153
- **Steps:**
154
-
155
- 1. Write **minimum** code to make test pass
156
- 2. Run test - verify it passes
157
- 3. No extra features (YAGNI - You Ain't Gonna Need It)
158
-
159
- **Example:**
160
-
161
- ```typescript
162
- // GREEN: Minimal implementation
163
- function calculateTotal(amount: number, taxRate: number): number {
164
- return amount + amount * taxRate;
165
- }
166
- ```
167
-
168
- ### Phase 3: REFACTOR (Clean Up)
169
-
170
- **Steps:**
171
-
172
- 1. Improve code quality without changing behavior
173
- 2. Run tests - verify they still pass
174
- 3. Remove duplication, improve naming
175
-
176
- **Optional: Subagent Validation**
177
-
178
- - Use independent AI instance to verify implementation isn't overfitting to tests
179
- - Ask: "Does this implementation handle edge cases beyond the test scenarios?"
180
-
181
- ---
182
-
183
- ## When to Use Each Test Type
184
-
185
- ### Decision Tree
186
-
187
- Answer these questions in order to choose the test type. Questions are mutually exclusive - stop at the first match. If multiple seem to apply, use the tie-breaking rule (line 19): choose the faster one.
188
-
189
- ```
190
- 1. Does this test AI-generated content quality (tone, reasoning, creativity)?
191
- └─ YES → LLM Evaluation
192
- Examples: Narrative quality, prompt effectiveness, conversational naturalness
193
- └─ NO → Continue to question 2
194
-
195
- 2. Does this test require a real browser (Playwright/Cypress)?
196
- └─ YES → E2E test
197
- Examples: Multi-page navigation, browser-specific behavior (localStorage, cookies), visual regression, drag-and-drop
198
- Note: React Testing Library does NOT require a browser - that's integration testing
199
- └─ NO → Continue to question 3
200
-
201
- 3. Does this test interactions between multiple components/services?
202
- └─ YES → Integration test
203
- Examples: API + database, React component + state store, service + external API
204
- └─ NO → Continue to question 4
205
-
206
- 4. Does this test a pure function (input → output, no I/O or side effects)?
207
- └─ YES → Unit test
208
- Examples: Calculations, formatters, validators, pure algorithms
209
- └─ NO → Re-evaluate: What are you actually testing?
210
- ```
211
-
212
- **Edge cases:**
213
-
214
- - **Non-deterministic functions** (Math.random(), Date.now(), UUID generation) → Unit test with mocked randomness/time
215
- - **Functions with environment dependencies** (process.env, window.location) → Integration test
216
- - **Mixed pure + I/O logic** → Extract pure logic into separate function → Unit test pure part, integration test I/O
217
-
218
- **Re-evaluation guide:**
219
- If testing behavior that doesn't fit the four categories:
220
-
221
- 1. **Break it down:** Separate pure logic from I/O/UI concerns
222
- 2. **Test each piece separately:** Pure logic → Unit, I/O → Integration, Multi-page flow → E2E
223
- 3. **Example:** Login validation
224
- - Pure: `isValidEmail(email)` → Unit test
225
- - I/O: `checkUserExists(email)` → Integration test (hits database)
226
- - Full flow: Login form → Dashboard → E2E test (multi-page)
227
-
228
- ### What Bugs Can Each Test Type Catch?
229
-
230
- Understanding which test type catches which bugs helps you choose the fastest effective test.
231
-
232
- | Bug Type | Can Unit Test Catch? | Can Integration Test Catch? | Can E2E Test Catch? | Best Choice |
233
- | ---------------------------------------- | -------------------- | --------------------------- | ------------------- | ------------------------------- |
234
- | Calculation error | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
235
- | Invalid input handling | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
236
- | Database query returning wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
237
- | API endpoint contract violation | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
238
- | Race condition between services | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
239
- | State management bug (Zustand, Redux) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
240
- | React component rendering wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
241
- | CSS layout broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
242
- | Multi-page navigation broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
243
- | Browser-specific rendering | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
244
- | Form validation logic (isValidEmail) | ✅ YES | ✅ YES | ✅ YES | Unit (fastest, test pure logic) |
245
- | Form validation UI (shows error message) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
246
- | Form validation UX (multi-field flow) | ❌ NO | ❌ NO | ✅ YES | E2E (only option for full flow) |
247
- | AI prompt quality degradation | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
248
- | AI reasoning accuracy | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
249
-
250
- **Key principle:** If multiple test types can catch the bug, choose the fastest one.
251
-
252
- ---
253
-
254
- ## Test Type Examples
255
-
256
- ### 1. Unit Tests
257
-
258
- **Note:** If your business logic needs a database, API, or file system, use an integration test instead.
259
-
260
- **Example:**
261
-
262
- ```typescript
263
- // ✅ GOOD - Pure function
264
- it('applies 20% discount for VIP users', () => {
265
- expect(calculateDiscount(100, { tier: 'VIP' })).toBe(80);
266
- });
267
-
268
- // ❌ BAD - Testing implementation details
269
- it('calls setState with correct value', () => {
270
- expect(setState).toHaveBeenCalledWith({ count: 1 });
271
- });
272
- ```
273
-
274
- ### 2. Integration Tests
275
-
276
- **Key distinction:** Integration tests can render UI components but don't require a real browser. They run in Node.js with jsdom (simulated browser environment).
277
-
278
- **Example:**
279
-
280
- ```typescript
281
- // ✅ GOOD - Tests agent + state integration
282
- describe('Agent + State Integration', () => {
283
- it('updates character state after agent processes action', async () => {
284
- const agent = new GameAgent();
285
- const store = useGameStore.getState();
286
-
287
- await agent.processAction('attack guard');
288
-
289
- expect(store.character.stress).toBeGreaterThan(0);
290
- expect(store.messages).toHaveLength(2); // player + AI response
291
- });
292
- });
293
- ```
294
-
295
- ### 3. E2E Tests
296
-
297
- **Example:**
298
-
299
- ```typescript
300
- // ✅ GOOD - Tests complete user flow
301
- test('user creates account and first item', async ({ page }) => {
302
- await page.goto('/signup');
303
- await page.fill('[name="email"]', 'test@example.com');
304
- await page.fill('[name="password"]', 'secure123');
305
- await page.click('button:has-text("Sign Up")');
306
- await expect(page).toHaveURL('/dashboard');
307
-
308
- await page.click('text=New Item');
309
- await page.fill('[name="title"]', 'My First Item');
310
- await page.click('text=Save');
311
- await expect(page.getByText('My First Item')).toBeVisible();
312
- });
313
- ```
314
-
315
- ### E2E Testing with Persistent Dev Servers
316
-
317
- When using Playwright for E2E tests, isolate persistent dev instances from test instances to avoid port conflicts and zombie processes.
318
-
319
- **Port Isolation Strategy:**
320
-
321
- - **Dev instance**: Project's configured port (e.g., 3000, 8080) - runs persistently for manual testing
322
- - **Test instances**: `devPort + 1000` (e.g., 4000, 9080) - managed by Playwright
323
- - **Fallback**: Ephemeral OS-assigned port if offset port is busy
324
-
325
- **Process Management:**
326
-
327
- - Dev instance runs persistently (started manually, survives test runs)
328
- - Test instances spawn/cleanup per test run (Playwright manages lifecycle)
329
- - Never kill processes on dev port range
330
-
331
- **Playwright Configuration** (example uses 3000/4000 - adjust to your project's ports):
332
-
333
- ```typescript
334
- // playwright.config.ts
335
- import { defineConfig } from '@playwright/test';
336
-
337
- export default defineConfig({
338
- webServer: {
339
- command: 'npm run dev:test', // Test script with isolated port
340
- port: 4000, // devPort + 1000 (e.g., 5173→6173)
341
- reuseExistingServer: !process.env.CI, // Reuse locally, fresh in CI
342
- timeout: 120000,
343
- },
344
- use: {
345
- baseURL: 'http://localhost:4000', // Test against test instance
346
- },
347
- });
348
- ```
349
-
350
- **Package.json Scripts** (example uses 3000/4000 - adjust to your project's ports):
351
-
352
- ```json
353
- {
354
- "scripts": {
355
- "dev": "vite --port 3000", // Dev instance (manual testing)
356
- "dev:test": "vite --port 4000", // Test instance (Playwright managed)
357
- "test:e2e": "playwright test"
358
- }
359
- }
360
- ```
361
-
362
- **Why this pattern:**
363
-
364
- - ✅ Manual testing on stable URL (dev instance always 3000)
365
- - ✅ Automated tests isolated (Playwright controls lifecycle)
366
- - ✅ No zombie processes (Playwright cleanup automatic)
367
- - ✅ No port conflicts (predictable offset)
368
- - ✅ Works in CI (fresh test instance every run)
369
-
370
- **Alternative patterns:**
371
-
372
- - Different projects use different ports (Next.js: 3000, Laravel: 8000, Rails: 3000)
373
- - Dynamic offset adapts: `8000` → `9000`, `5173` → `6173`
374
- - If offset port busy, Playwright can use ephemeral port (49152-65535)
375
-
376
- **Cleanup:** For killing zombie dev/test servers, see `zombie-process-cleanup.md` → "Port-Based Cleanup"
377
-
378
- ### 4. LLM Evaluations
379
-
380
- **Cost:** ~$0.01-0.30 per test run (depends on prompt size, caching)
381
-
382
- **Example:**
383
-
384
- ```yaml
385
- - description: 'Infer user intent from casual input'
386
- vars:
387
- input: 'I want to order a large pepperoni'
388
- assert:
389
- - type: javascript
390
- value: JSON.parse(output).intent === 'order_pizza'
391
- - type: llm-rubric
392
- value: |
393
- EXCELLENT: Confirms pizza type/size, asks for delivery details
394
- POOR: Generic response or wrong intent
395
- ```
396
-
397
- **Assertion types:**
398
-
399
- **Programmatic** (fast, deterministic):
400
-
401
- - JSON schema validation
402
- - Required fields present
403
- - Values in valid ranges
404
- - Output format compliance
405
-
406
- **LLM-as-Judge** (nuanced, contextual):
407
-
408
- - Reasoning quality
409
- - Tone/style adherence
410
- - Factual accuracy
411
- - Conversational naturalness
412
- - Domain expertise demonstration
413
-
414
- **When to skip LLM evals:**
415
-
416
- - Structured output validation (use programmatic tests)
417
- - Simple classification tasks (unit tests sufficient)
418
- - Non-AI features
419
-
420
- ---
421
-
422
- ## Cost Considerations
423
-
424
- **LLM eval costs:** $0.01-0.30 per run depending on prompt size. **Prompt caching reduces costs by 90%** (30 scenarios: $0.30 → $0.03 after first run).
425
-
426
- **Cost reduction strategies:**
427
-
428
- - Cache static content (system prompts, examples, rules)
429
- - Batch multiple scenarios in one run
430
- - Run full evals on PR/schedule, not every commit
431
-
432
- **ROI:** Catching one bad prompt change before production >> eval costs
433
-
434
- ---
435
-
436
- ## Test Coverage Goals
437
-
438
- - **Unit tests:** 80%+ coverage of pure functions
439
- - **Integration tests:** All critical paths covered (see definition below)
440
- - **E2E tests:** All critical multi-page user flows have at least one E2E test
441
- - **LLM evals:** All AI features have evaluation scenarios
442
-
443
- **What are "critical paths"?**
444
-
445
- - **Always critical:** Authentication, payment/checkout, data loss scenarios (delete, overwrite)
446
- - **Usually critical:** Core user workflows (create → edit → publish), primary feature flows
447
- - **Rarely critical:** UI polish (button colors, layout tweaks), admin-only features with low usage
448
- - **Rule of thumb:** If it breaks, would users notice immediately and be unable to complete their main task?
449
-
450
- ---
451
-
452
- ## Writing Effective Tests
453
-
454
- ### AAA Pattern (Arrange-Act-Assert)
455
-
456
- Structure tests clearly: Setup data (Arrange) → Execute behavior (Act) → Verify expectations (Assert).
457
-
458
- ```typescript
459
- it('applies discount to VIP users', () => {
460
- const user = { tier: 'VIP' },
461
- cart = { total: 100 }; // Arrange
462
- const result = applyDiscount(user, cart); // Act
463
- expect(result.total).toBe(80); // Assert
464
- });
465
- ```
466
-
467
- ### Test Naming
468
-
469
- Be descriptive and specific, not vague or implementation-focused.
470
-
471
- ```typescript
472
- // ✅ GOOD
473
- it('returns 401 when API key is missing');
474
- it('preserves user input after validation error');
475
-
476
- // ❌ BAD
477
- it('works correctly');
478
- it('should call setState');
479
- ```
480
-
481
- ### Test Independence
482
-
483
- **Each test should:**
484
-
485
- - Run in any order
486
- - Not depend on other tests
487
- - Clean up its own state
488
- - Use fresh fixtures/data
489
-
490
- ```typescript
491
- // ✅ GOOD - Fresh state per test
492
- beforeEach(() => {
493
- gameState = createFreshGameState();
494
- });
495
-
496
- // ❌ BAD - Shared state (test B depends on test A)
497
- let sharedUser = createUser();
498
- it('test A', () => {
499
- sharedUser.name = 'Alice';
500
- });
501
- it('test B', () => {
502
- expect(sharedUser.name).toBe('Alice');
503
- });
504
- ```
505
-
506
- ### Async Testing
507
-
508
- **NEVER use arbitrary timeouts** - Makes tests slow and non-deterministic.
509
-
510
- ```typescript
511
- // ❌ BAD - Arbitrary timeout
512
- await page.waitForTimeout(3000); // What if it takes 3.1 seconds?
513
- await sleep(500); // Flaky test
514
-
515
- // ✅ GOOD - Poll until condition is met
516
- await expect.poll(() => getStatus()).toBe('ready');
517
- await page.waitForSelector('[data-testid="loaded"]');
518
- await waitFor(() => expect(screen.getByText('Success')).toBeVisible());
519
- ```
520
-
521
- **Why:** Polling is deterministic (passes when condition is met) and faster (no unnecessary waiting).
522
-
523
- ---
524
-
525
- ## What Not to Test
526
-
527
- ❌ **Implementation details** - Private methods, CSS classes, internal state, how (test what users see)
528
- ❌ **Third-party libraries** - Assume React/Axios work, test YOUR code
529
- ❌ **Trivial code** - Getters/setters with no logic, pass-through functions
530
- ❌ **UI copy** - Exact text (use regex `/submit/i`), specific wording (test error shown, not message)
531
-
532
- ---
533
-
534
- ## CI/CD Integration
535
-
536
- Run unit+integration tests on every commit (fast feedback), E2E tests on every PR, and LLM evals on schedule (weekly to catch regressions without per-commit cost).
537
-
538
- ---
539
-
540
- ## Quick Reference
541
-
542
- | Need to test... | Test type | Technology | Speed | Cost |
543
- | -------------------- | ----------- | ---------- | ------ | ---------- |
544
- | Pure function | Unit | Vitest | Fast | Free |
545
- | Service integration | Integration | Vitest | Medium | Free |
546
- | Full user flow | E2E | Playwright | Slow | Free |
547
- | AI reasoning quality | LLM eval | Promptfoo | Slow | $0.01-0.30 |
548
-
549
- ---
550
-
551
- ## Project-Specific Testing Documentation
552
-
553
- **Location:** `tests/SAFEWORD.md` (may be nested like `packages/web/tests/SAFEWORD.md` in monorepos)
554
-
555
- **Purpose:** Document project-specific testing stack, commands, and setup. Supplements global methodology.
556
-
557
- **What to include:**
558
-
559
- - **Tech stack:** Testing frameworks (Vitest/Jest, Playwright/Cypress, Promptfoo)
560
- - **Test commands:** How to run tests, including single-file execution for performance
561
- - **Setup requirements:** API keys, build steps, database setup, browser installation
562
- - **File structure:** Where tests live and naming conventions
563
- - **Project patterns:** Custom helpers, fixtures, mocks, assertion styles
564
- - **TDD guidance:** Project-specific workflow expectations (write tests first, commit tests before implementation)
565
- - **Coverage requirements:** Minimum coverage thresholds or critical paths
566
- - **PR requirements:** Test passage requirements before merge
567
-
568
- **Example:**
569
-
570
- ```markdown
571
- # Testing
572
-
573
- ## Tech Stack
574
-
575
- - Unit/Integration: Vitest
576
- - E2E: Playwright
577
- - LLM Evals: Promptfoo
578
-
579
- ## Commands
580
-
581
- npm test # All tests
582
- npm test -- path/to/file.test.ts # Single file (performance)
583
- npm run test:coverage # With coverage report
584
- npm run test:e2e # E2E tests only
585
-
586
- ## TDD Workflow
587
-
588
- 1. Write failing tests first (RED phase)
589
- 2. Confirm tests fail: `npm test -- path/to/file.test.ts`
590
- 3. Commit tests before implementation
591
- 4. Implement minimum code to pass (GREEN phase)
592
- 5. Refactor while keeping tests green
593
-
594
- ## Setup
595
-
596
- 1. Install: `npm install`
597
- 2. Browsers: `npx playwright install`
598
- 3. API keys: `export ANTHROPIC_API_KEY=sk-ant-...`
599
- 4. Build before testing: `npm run build`
600
-
601
- ## Coverage Requirements
602
-
603
- - Unit tests: 80%+ for business logic
604
- - E2E tests: All critical user paths
605
-
606
- ## PR Requirements
607
-
608
- - All tests must pass
609
- - No skipped tests without justification
610
- - Coverage thresholds met
611
- ```
612
-
613
- **If not found:** Ask user "Where are the testing docs?"
614
-
615
- **Cascading precedence:**
616
-
617
- 1. **Global** (`~/.claude/testing-methodology.md`) - Universal methodology (test type selection, TDD workflow)
618
- 2. **Project** (`tests/SAFEWORD.md`) - Specific stack, commands, patterns