safeword 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/{check-J6DFVBCE.js → check-FHNTC46G.js} +2 -2
- package/dist/check-FHNTC46G.js.map +1 -0
- package/dist/chunk-2XWIUEQK.js +190 -0
- package/dist/chunk-2XWIUEQK.js.map +1 -0
- package/dist/{chunk-UQMQ64CB.js → chunk-GZRQL3SX.js} +41 -2
- package/dist/chunk-GZRQL3SX.js.map +1 -0
- package/dist/cli.js +5 -5
- package/dist/{diff-U4IELWRL.js → diff-SPJ7BJBG.js} +31 -28
- package/dist/diff-SPJ7BJBG.js.map +1 -0
- package/dist/{reset-XETOHTCK.js → reset-3ACTIYYE.js} +44 -27
- package/dist/reset-3ACTIYYE.js.map +1 -0
- package/dist/{setup-CLDCHROZ.js → setup-ANC3NUOC.js} +76 -47
- package/dist/setup-ANC3NUOC.js.map +1 -0
- package/dist/{upgrade-DOKWRK7J.js → upgrade-3KVLLNDF.js} +37 -49
- package/dist/upgrade-3KVLLNDF.js.map +1 -0
- package/package.json +1 -1
- package/templates/SAFEWORD.md +776 -0
- package/templates/commands/arch-review.md +24 -0
- package/templates/commands/lint.md +11 -0
- package/templates/commands/quality-review.md +23 -0
- package/templates/doc-templates/architecture-template.md +136 -0
- package/templates/doc-templates/design-doc-template.md +134 -0
- package/templates/doc-templates/test-definitions-feature.md +131 -0
- package/templates/doc-templates/user-stories-template.md +92 -0
- package/templates/guides/architecture-guide.md +423 -0
- package/templates/guides/code-philosophy.md +195 -0
- package/templates/guides/context-files-guide.md +457 -0
- package/templates/guides/data-architecture-guide.md +200 -0
- package/templates/guides/design-doc-guide.md +171 -0
- package/templates/guides/learning-extraction.md +552 -0
- package/templates/guides/llm-instruction-design.md +248 -0
- package/templates/guides/llm-prompting.md +102 -0
- package/templates/guides/tdd-best-practices.md +615 -0
- package/templates/guides/test-definitions-guide.md +334 -0
- package/templates/guides/testing-methodology.md +618 -0
- package/templates/guides/user-story-guide.md +256 -0
- package/templates/guides/zombie-process-cleanup.md +219 -0
- package/templates/hooks/agents-md-check.sh +27 -0
- package/templates/hooks/inject-timestamp.sh +2 -3
- package/templates/hooks/post-tool.sh +4 -0
- package/templates/hooks/pre-commit.sh +10 -0
- package/templates/lib/common.sh +26 -0
- package/templates/lib/jq-fallback.sh +20 -0
- package/templates/markdownlint.jsonc +25 -0
- package/templates/prompts/arch-review.md +43 -0
- package/templates/prompts/quality-review.md +10 -0
- package/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
- package/dist/check-J6DFVBCE.js.map +0 -1
- package/dist/chunk-24OB57NJ.js +0 -78
- package/dist/chunk-24OB57NJ.js.map +0 -1
- package/dist/chunk-DB4CMUFD.js +0 -157
- package/dist/chunk-DB4CMUFD.js.map +0 -1
- package/dist/chunk-UQMQ64CB.js.map +0 -1
- package/dist/diff-U4IELWRL.js.map +0 -1
- package/dist/reset-XETOHTCK.js.map +0 -1
- package/dist/setup-CLDCHROZ.js.map +0 -1
- package/dist/upgrade-DOKWRK7J.js.map +0 -1
|
@@ -0,0 +1,618 @@
|
|
|
1
|
+
# Testing Methodology
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## Test Philosophy
|
|
6
|
+
|
|
7
|
+
**Test what matters** - Focus on user experience and delivered features, not implementation details.
|
|
8
|
+
|
|
9
|
+
**Always test what you build** - Run tests yourself before completion. Don't ask the user to verify.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Test Integrity (CRITICAL)
|
|
14
|
+
|
|
15
|
+
**NEVER modify, skip, or delete tests without explicit human approval.**
|
|
16
|
+
|
|
17
|
+
Tests are the specification. When a test fails, the implementation is wrong—not the test.
|
|
18
|
+
|
|
19
|
+
### Forbidden Actions (Require Approval)
|
|
20
|
+
|
|
21
|
+
| Action | Why It's Forbidden |
|
|
22
|
+
| ----------------------------------------------- | --------------------------------- |
|
|
23
|
+
| Changing assertions to match broken code | Hides bugs instead of fixing them |
|
|
24
|
+
| Adding `.skip()`, `.only()`, `xit()`, `.todo()` | Makes failures invisible |
|
|
25
|
+
| Deleting tests you can't get passing | Removes coverage for edge cases |
|
|
26
|
+
| Weakening assertions (`toBe` → `toBeTruthy`) | Reduces test precision |
|
|
27
|
+
| Commenting out test code | Same as skipping |
|
|
28
|
+
|
|
29
|
+
### What To Do Instead
|
|
30
|
+
|
|
31
|
+
1. **Test fails?** → Fix the implementation, not the test
|
|
32
|
+
2. **Test seems wrong?** → Explain why and ask: "This test expects X but I think it should expect Y because [reason]. Can I update it?"
|
|
33
|
+
3. **Requirements changed?** → Explain the change and ask before updating tests to match new requirements
|
|
34
|
+
4. **Test is flaky?** → Fix the flakiness (usually async issues), don't skip it
|
|
35
|
+
5. **Test blocks progress?** → Ask for guidance, don't work around it
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Testing Principles
|
|
40
|
+
|
|
41
|
+
**Goal:** Catch bugs quickly and cheaply with fast feedback loops.
|
|
42
|
+
|
|
43
|
+
**Optimization rule:** Test with the fastest test type that can catch the bug.
|
|
44
|
+
|
|
45
|
+
**Tie-breaking rule:** If multiple test types apply, choose the faster one.
|
|
46
|
+
|
|
47
|
+
### Test Speed Hierarchy (Fast → Slow)
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
Unit (milliseconds) ← Pure functions, no I/O
|
|
51
|
+
↓
|
|
52
|
+
Integration (seconds) ← Multiple modules, database, API calls
|
|
53
|
+
↓
|
|
54
|
+
LLM Eval (seconds) ← AI judgment, costs $0.01-0.30 per run
|
|
55
|
+
↓
|
|
56
|
+
E2E (seconds-minutes) ← Full browser, user flows
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Anti-Patterns: Testing at the Wrong Level
|
|
60
|
+
|
|
61
|
+
❌ **Testing business logic with E2E tests**
|
|
62
|
+
|
|
63
|
+
```typescript
|
|
64
|
+
// BAD: Launching browser to test a calculation
|
|
65
|
+
test('discount calculation', async ({ page }) => {
|
|
66
|
+
await page.goto('/checkout');
|
|
67
|
+
await page.fill('[name="price"]', '100');
|
|
68
|
+
await expect(page.locator('.total')).toContainText('80');
|
|
69
|
+
});
|
|
70
|
+
|
|
71
|
+
// GOOD: Unit test (runs in milliseconds)
|
|
72
|
+
it('applies 20% discount', () => {
|
|
73
|
+
expect(calculateDiscount(100, 0.2)).toBe(80);
|
|
74
|
+
});
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
❌ **Testing UI components at the wrong level**
|
|
78
|
+
|
|
79
|
+
```typescript
|
|
80
|
+
// BAD: Heavy mocking in unit test (brittle, tests implementation details)
|
|
81
|
+
it('renders header', () => {
|
|
82
|
+
const mockProps = { /* 50 lines of mocks */ }
|
|
83
|
+
render(<Header {...mockProps} />)
|
|
84
|
+
expect(mockProps.onLogout).toHaveBeenCalled() // Testing implementation
|
|
85
|
+
})
|
|
86
|
+
|
|
87
|
+
// BETTER: Integration test (fast, tests behavior with real data)
|
|
88
|
+
it('renders header with username', () => {
|
|
89
|
+
render(<Header user={{ name: 'Alex' }} />)
|
|
90
|
+
expect(screen.getByRole('banner')).toHaveTextContent('Alex')
|
|
91
|
+
})
|
|
92
|
+
|
|
93
|
+
// BEST for testing full user flow: E2E test (only when needed for multi-page flows)
|
|
94
|
+
test('user sees header after login', async ({ page }) => {
|
|
95
|
+
await page.goto('/login')
|
|
96
|
+
await page.fill('[name="email"]', 'alex@example.com')
|
|
97
|
+
await page.click('button:has-text("Login")')
|
|
98
|
+
await expect(page.getByRole('banner')).toContainText('Alex')
|
|
99
|
+
})
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Principle:** Use integration tests for component behavior, E2E tests for multi-page user flows.
|
|
103
|
+
|
|
104
|
+
### Target Distribution (Guideline, Not Rule)
|
|
105
|
+
|
|
106
|
+
**Focus on speed, not strict ratios:**
|
|
107
|
+
|
|
108
|
+
- Write as many **fast tests** (unit + integration) as possible
|
|
109
|
+
- Write **E2E tests** only for critical user paths that require a browser
|
|
110
|
+
- Write **LLM evals** only for AI features requiring quality judgment
|
|
111
|
+
|
|
112
|
+
**Common patterns by architecture:**
|
|
113
|
+
|
|
114
|
+
- **Microservices:** More integration tests needed (test service contracts, API interactions)
|
|
115
|
+
- **UI-heavy apps:** More E2E tests needed (test multi-page flows, visual interactions)
|
|
116
|
+
- **Pure libraries:** Mostly unit tests (pure functions, no external dependencies)
|
|
117
|
+
- **AI-powered apps:** Add LLM evals (test prompt quality, reasoning accuracy)
|
|
118
|
+
|
|
119
|
+
**Red flag:** If you have more E2E tests than integration tests, your test suite is too slow.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## TDD Workflow (RED → GREEN → REFACTOR)
|
|
124
|
+
|
|
125
|
+
**Test-Driven Development** - Write tests BEFORE implementation. Tests define expected behavior, code makes them pass.
|
|
126
|
+
|
|
127
|
+
### Phase 1: RED (Write Failing Tests)
|
|
128
|
+
|
|
129
|
+
**Steps:**
|
|
130
|
+
|
|
131
|
+
1. Write test based on expected input/output
|
|
132
|
+
2. **CRITICAL:** Run test and confirm it fails for the right reason
|
|
133
|
+
3. **DO NOT write any implementation code yet**
|
|
134
|
+
4. Commit the test when satisfied
|
|
135
|
+
|
|
136
|
+
**Critical warnings:**
|
|
137
|
+
|
|
138
|
+
- ⚠️ **No mock implementations** - Be explicit about TDD to avoid creating placeholder code for functionality that doesn't exist yet
|
|
139
|
+
- ⚠️ **Verify failure** - Test must fail before implementation (proves test works)
|
|
140
|
+
- ⚠️ **Performance** - Run single tests, not whole suite (`npm test -- path/to/file.test.ts`)
|
|
141
|
+
|
|
142
|
+
**Example:**
|
|
143
|
+
|
|
144
|
+
```typescript
|
|
145
|
+
// RED: Write failing test
|
|
146
|
+
it('calculates total with tax', () => {
|
|
147
|
+
expect(calculateTotal(100, 0.08)).toBe(108); // FAILS - function doesn't exist
|
|
148
|
+
});
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Phase 2: GREEN (Make Tests Pass)
|
|
152
|
+
|
|
153
|
+
**Steps:**
|
|
154
|
+
|
|
155
|
+
1. Write **minimum** code to make test pass
|
|
156
|
+
2. Run test - verify it passes
|
|
157
|
+
3. No extra features (YAGNI - You Ain't Gonna Need It)
|
|
158
|
+
|
|
159
|
+
**Example:**
|
|
160
|
+
|
|
161
|
+
```typescript
|
|
162
|
+
// GREEN: Minimal implementation
|
|
163
|
+
function calculateTotal(amount: number, taxRate: number): number {
|
|
164
|
+
return amount + amount * taxRate;
|
|
165
|
+
}
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Phase 3: REFACTOR (Clean Up)
|
|
169
|
+
|
|
170
|
+
**Steps:**
|
|
171
|
+
|
|
172
|
+
1. Improve code quality without changing behavior
|
|
173
|
+
2. Run tests - verify they still pass
|
|
174
|
+
3. Remove duplication, improve naming
|
|
175
|
+
|
|
176
|
+
**Optional: Subagent Validation**
|
|
177
|
+
|
|
178
|
+
- Use independent AI instance to verify implementation isn't overfitting to tests
|
|
179
|
+
- Ask: "Does this implementation handle edge cases beyond the test scenarios?"
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## When to Use Each Test Type
|
|
184
|
+
|
|
185
|
+
### Decision Tree
|
|
186
|
+
|
|
187
|
+
Answer these questions in order to choose the test type. Questions are mutually exclusive - stop at the first match. If multiple seem to apply, use the tie-breaking rule (line 19): choose the faster one.
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
1. Does this test AI-generated content quality (tone, reasoning, creativity)?
|
|
191
|
+
└─ YES → LLM Evaluation
|
|
192
|
+
Examples: Narrative quality, prompt effectiveness, conversational naturalness
|
|
193
|
+
└─ NO → Continue to question 2
|
|
194
|
+
|
|
195
|
+
2. Does this test require a real browser (Playwright/Cypress)?
|
|
196
|
+
└─ YES → E2E test
|
|
197
|
+
Examples: Multi-page navigation, browser-specific behavior (localStorage, cookies), visual regression, drag-and-drop
|
|
198
|
+
Note: React Testing Library does NOT require a browser - that's integration testing
|
|
199
|
+
└─ NO → Continue to question 3
|
|
200
|
+
|
|
201
|
+
3. Does this test interactions between multiple components/services?
|
|
202
|
+
└─ YES → Integration test
|
|
203
|
+
Examples: API + database, React component + state store, service + external API
|
|
204
|
+
└─ NO → Continue to question 4
|
|
205
|
+
|
|
206
|
+
4. Does this test a pure function (input → output, no I/O or side effects)?
|
|
207
|
+
└─ YES → Unit test
|
|
208
|
+
Examples: Calculations, formatters, validators, pure algorithms
|
|
209
|
+
└─ NO → Re-evaluate: What are you actually testing?
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
**Edge cases:**
|
|
213
|
+
|
|
214
|
+
- **Non-deterministic functions** (Math.random(), Date.now(), UUID generation) → Unit test with mocked randomness/time
|
|
215
|
+
- **Functions with environment dependencies** (process.env, window.location) → Integration test
|
|
216
|
+
- **Mixed pure + I/O logic** → Extract pure logic into separate function → Unit test pure part, integration test I/O
|
|
217
|
+
|
|
218
|
+
**Re-evaluation guide:**
|
|
219
|
+
If testing behavior that doesn't fit the four categories:
|
|
220
|
+
|
|
221
|
+
1. **Break it down:** Separate pure logic from I/O/UI concerns
|
|
222
|
+
2. **Test each piece separately:** Pure logic → Unit, I/O → Integration, Multi-page flow → E2E
|
|
223
|
+
3. **Example:** Login validation
|
|
224
|
+
- Pure: `isValidEmail(email)` → Unit test
|
|
225
|
+
- I/O: `checkUserExists(email)` → Integration test (hits database)
|
|
226
|
+
- Full flow: Login form → Dashboard → E2E test (multi-page)
|
|
227
|
+
|
|
228
|
+
### What Bugs Can Each Test Type Catch?
|
|
229
|
+
|
|
230
|
+
Understanding which test type catches which bugs helps you choose the fastest effective test.
|
|
231
|
+
|
|
232
|
+
| Bug Type | Can Unit Test Catch? | Can Integration Test Catch? | Can E2E Test Catch? | Best Choice |
|
|
233
|
+
| ---------------------------------------- | -------------------- | --------------------------- | ------------------- | ------------------------------- |
|
|
234
|
+
| Calculation error | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
|
|
235
|
+
| Invalid input handling | ✅ YES | ✅ YES | ✅ YES | Unit (fastest) |
|
|
236
|
+
| Database query returning wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
237
|
+
| API endpoint contract violation | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
238
|
+
| Race condition between services | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
239
|
+
| State management bug (Zustand, Redux) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
240
|
+
| React component rendering wrong data | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
241
|
+
| CSS layout broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
|
|
242
|
+
| Multi-page navigation broken | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
|
|
243
|
+
| Browser-specific rendering | ❌ NO | ❌ NO | ✅ YES | E2E (only option) |
|
|
244
|
+
| Form validation logic (isValidEmail) | ✅ YES | ✅ YES | ✅ YES | Unit (fastest, test pure logic) |
|
|
245
|
+
| Form validation UI (shows error message) | ❌ NO | ✅ YES | ✅ YES | Integration (faster than E2E) |
|
|
246
|
+
| Form validation UX (multi-field flow) | ❌ NO | ❌ NO | ✅ YES | E2E (only option for full flow) |
|
|
247
|
+
| AI prompt quality degradation | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
|
|
248
|
+
| AI reasoning accuracy | ❌ NO | ❌ NO | ❌ NO | LLM Eval (only option) |
|
|
249
|
+
|
|
250
|
+
**Key principle:** If multiple test types can catch the bug, choose the fastest one.
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## Test Type Examples
|
|
255
|
+
|
|
256
|
+
### 1. Unit Tests
|
|
257
|
+
|
|
258
|
+
**Note:** If your business logic needs a database, API, or file system, use an integration test instead.
|
|
259
|
+
|
|
260
|
+
**Example:**
|
|
261
|
+
|
|
262
|
+
```typescript
|
|
263
|
+
// ✅ GOOD - Pure function
|
|
264
|
+
it('applies 20% discount for VIP users', () => {
|
|
265
|
+
expect(calculateDiscount(100, { tier: 'VIP' })).toBe(80);
|
|
266
|
+
});
|
|
267
|
+
|
|
268
|
+
// ❌ BAD - Testing implementation details
|
|
269
|
+
it('calls setState with correct value', () => {
|
|
270
|
+
expect(setState).toHaveBeenCalledWith({ count: 1 });
|
|
271
|
+
});
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### 2. Integration Tests
|
|
275
|
+
|
|
276
|
+
**Key distinction:** Integration tests can render UI components but don't require a real browser. They run in Node.js with jsdom (simulated browser environment).
|
|
277
|
+
|
|
278
|
+
**Example:**
|
|
279
|
+
|
|
280
|
+
```typescript
|
|
281
|
+
// ✅ GOOD - Tests agent + state integration
|
|
282
|
+
describe('Agent + State Integration', () => {
|
|
283
|
+
it('updates character state after agent processes action', async () => {
|
|
284
|
+
const agent = new GameAgent();
|
|
285
|
+
const store = useGameStore.getState();
|
|
286
|
+
|
|
287
|
+
await agent.processAction('attack guard');
|
|
288
|
+
|
|
289
|
+
expect(store.character.stress).toBeGreaterThan(0);
|
|
290
|
+
expect(store.messages).toHaveLength(2); // player + AI response
|
|
291
|
+
});
|
|
292
|
+
});
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### 3. E2E Tests
|
|
296
|
+
|
|
297
|
+
**Example:**
|
|
298
|
+
|
|
299
|
+
```typescript
|
|
300
|
+
// ✅ GOOD - Tests complete user flow
|
|
301
|
+
test('user creates account and first item', async ({ page }) => {
|
|
302
|
+
await page.goto('/signup');
|
|
303
|
+
await page.fill('[name="email"]', 'test@example.com');
|
|
304
|
+
await page.fill('[name="password"]', 'secure123');
|
|
305
|
+
await page.click('button:has-text("Sign Up")');
|
|
306
|
+
await expect(page).toHaveURL('/dashboard');
|
|
307
|
+
|
|
308
|
+
await page.click('text=New Item');
|
|
309
|
+
await page.fill('[name="title"]', 'My First Item');
|
|
310
|
+
await page.click('text=Save');
|
|
311
|
+
await expect(page.getByText('My First Item')).toBeVisible();
|
|
312
|
+
});
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
### E2E Testing with Persistent Dev Servers
|
|
316
|
+
|
|
317
|
+
When using Playwright for E2E tests, isolate persistent dev instances from test instances to avoid port conflicts and zombie processes.
|
|
318
|
+
|
|
319
|
+
**Port Isolation Strategy:**
|
|
320
|
+
|
|
321
|
+
- **Dev instance**: Project's configured port (e.g., 3000, 8080) - runs persistently for manual testing
|
|
322
|
+
- **Test instances**: `devPort + 1000` (e.g., 4000, 9080) - managed by Playwright
|
|
323
|
+
- **Fallback**: Ephemeral OS-assigned port if offset port is busy
|
|
324
|
+
|
|
325
|
+
**Process Management:**
|
|
326
|
+
|
|
327
|
+
- Dev instance runs persistently (started manually, survives test runs)
|
|
328
|
+
- Test instances spawn/cleanup per test run (Playwright manages lifecycle)
|
|
329
|
+
- Never kill processes on dev port range
|
|
330
|
+
|
|
331
|
+
**Playwright Configuration** (example uses 3000/4000 - adjust to your project's ports):
|
|
332
|
+
|
|
333
|
+
```typescript
|
|
334
|
+
// playwright.config.ts
|
|
335
|
+
import { defineConfig } from '@playwright/test';
|
|
336
|
+
|
|
337
|
+
export default defineConfig({
|
|
338
|
+
webServer: {
|
|
339
|
+
command: 'npm run dev:test', // Test script with isolated port
|
|
340
|
+
port: 4000, // devPort + 1000 (e.g., 5173→6173)
|
|
341
|
+
reuseExistingServer: !process.env.CI, // Reuse locally, fresh in CI
|
|
342
|
+
timeout: 120000,
|
|
343
|
+
},
|
|
344
|
+
use: {
|
|
345
|
+
baseURL: 'http://localhost:4000', // Test against test instance
|
|
346
|
+
},
|
|
347
|
+
});
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
**Package.json Scripts** (example uses 3000/4000 - adjust to your project's ports):
|
|
351
|
+
|
|
352
|
+
```json
|
|
353
|
+
{
|
|
354
|
+
"scripts": {
|
|
355
|
+
"dev": "vite --port 3000", // Dev instance (manual testing)
|
|
356
|
+
"dev:test": "vite --port 4000", // Test instance (Playwright managed)
|
|
357
|
+
"test:e2e": "playwright test"
|
|
358
|
+
}
|
|
359
|
+
}
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
**Why this pattern:**
|
|
363
|
+
|
|
364
|
+
- ✅ Manual testing on stable URL (dev instance always 3000)
|
|
365
|
+
- ✅ Automated tests isolated (Playwright controls lifecycle)
|
|
366
|
+
- ✅ No zombie processes (Playwright cleanup automatic)
|
|
367
|
+
- ✅ No port conflicts (predictable offset)
|
|
368
|
+
- ✅ Works in CI (fresh test instance every run)
|
|
369
|
+
|
|
370
|
+
**Alternative patterns:**
|
|
371
|
+
|
|
372
|
+
- Different projects use different ports (Next.js: 3000, Laravel: 8000, Rails: 3000)
|
|
373
|
+
- Dynamic offset adapts: `8000` → `9000`, `5173` → `6173`
|
|
374
|
+
- If offset port busy, Playwright can use ephemeral port (49152-65535)
|
|
375
|
+
|
|
376
|
+
**Cleanup:** For killing zombie dev/test servers, see `zombie-process-cleanup.md` → "Port-Based Cleanup"
|
|
377
|
+
|
|
378
|
+
### 4. LLM Evaluations
|
|
379
|
+
|
|
380
|
+
**Cost:** ~$0.01-0.30 per test run (depends on prompt size, caching)
|
|
381
|
+
|
|
382
|
+
**Example:**
|
|
383
|
+
|
|
384
|
+
```yaml
|
|
385
|
+
- description: 'Infer user intent from casual input'
|
|
386
|
+
vars:
|
|
387
|
+
input: 'I want to order a large pepperoni'
|
|
388
|
+
assert:
|
|
389
|
+
- type: javascript
|
|
390
|
+
value: JSON.parse(output).intent === 'order_pizza'
|
|
391
|
+
- type: llm-rubric
|
|
392
|
+
value: |
|
|
393
|
+
EXCELLENT: Confirms pizza type/size, asks for delivery details
|
|
394
|
+
POOR: Generic response or wrong intent
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
**Assertion types:**
|
|
398
|
+
|
|
399
|
+
**Programmatic** (fast, deterministic):
|
|
400
|
+
|
|
401
|
+
- JSON schema validation
|
|
402
|
+
- Required fields present
|
|
403
|
+
- Values in valid ranges
|
|
404
|
+
- Output format compliance
|
|
405
|
+
|
|
406
|
+
**LLM-as-Judge** (nuanced, contextual):
|
|
407
|
+
|
|
408
|
+
- Reasoning quality
|
|
409
|
+
- Tone/style adherence
|
|
410
|
+
- Factual accuracy
|
|
411
|
+
- Conversational naturalness
|
|
412
|
+
- Domain expertise demonstration
|
|
413
|
+
|
|
414
|
+
**When to skip LLM evals:**
|
|
415
|
+
|
|
416
|
+
- Structured output validation (use programmatic tests)
|
|
417
|
+
- Simple classification tasks (unit tests sufficient)
|
|
418
|
+
- Non-AI features
|
|
419
|
+
|
|
420
|
+
---
|
|
421
|
+
|
|
422
|
+
## Cost Considerations
|
|
423
|
+
|
|
424
|
+
**LLM eval costs:** $0.01-0.30 per run depending on prompt size. **Prompt caching reduces costs by 90%** (30 scenarios: $0.30 → $0.03 after first run).
|
|
425
|
+
|
|
426
|
+
**Cost reduction strategies:**
|
|
427
|
+
|
|
428
|
+
- Cache static content (system prompts, examples, rules)
|
|
429
|
+
- Batch multiple scenarios in one run
|
|
430
|
+
- Run full evals on PR/schedule, not every commit
|
|
431
|
+
|
|
432
|
+
**ROI:** Catching one bad prompt change before production >> eval costs
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|
|
436
|
+
## Test Coverage Goals
|
|
437
|
+
|
|
438
|
+
- **Unit tests:** 80%+ coverage of pure functions
|
|
439
|
+
- **Integration tests:** All critical paths covered (see definition below)
|
|
440
|
+
- **E2E tests:** All critical multi-page user flows have at least one E2E test
|
|
441
|
+
- **LLM evals:** All AI features have evaluation scenarios
|
|
442
|
+
|
|
443
|
+
**What are "critical paths"?**
|
|
444
|
+
|
|
445
|
+
- **Always critical:** Authentication, payment/checkout, data loss scenarios (delete, overwrite)
|
|
446
|
+
- **Usually critical:** Core user workflows (create → edit → publish), primary feature flows
|
|
447
|
+
- **Rarely critical:** UI polish (button colors, layout tweaks), admin-only features with low usage
|
|
448
|
+
- **Rule of thumb:** If it breaks, would users notice immediately and be unable to complete their main task?
|
|
449
|
+
|
|
450
|
+
---
|
|
451
|
+
|
|
452
|
+
## Writing Effective Tests
|
|
453
|
+
|
|
454
|
+
### AAA Pattern (Arrange-Act-Assert)
|
|
455
|
+
|
|
456
|
+
Structure tests clearly: Setup data (Arrange) → Execute behavior (Act) → Verify expectations (Assert).
|
|
457
|
+
|
|
458
|
+
```typescript
|
|
459
|
+
it('applies discount to VIP users', () => {
|
|
460
|
+
const user = { tier: 'VIP' },
|
|
461
|
+
cart = { total: 100 }; // Arrange
|
|
462
|
+
const result = applyDiscount(user, cart); // Act
|
|
463
|
+
expect(result.total).toBe(80); // Assert
|
|
464
|
+
});
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
### Test Naming
|
|
468
|
+
|
|
469
|
+
Be descriptive and specific, not vague or implementation-focused.
|
|
470
|
+
|
|
471
|
+
```typescript
|
|
472
|
+
// ✅ GOOD
|
|
473
|
+
it('returns 401 when API key is missing');
|
|
474
|
+
it('preserves user input after validation error');
|
|
475
|
+
|
|
476
|
+
// ❌ BAD
|
|
477
|
+
it('works correctly');
|
|
478
|
+
it('should call setState');
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
### Test Independence
|
|
482
|
+
|
|
483
|
+
**Each test should:**
|
|
484
|
+
|
|
485
|
+
- Run in any order
|
|
486
|
+
- Not depend on other tests
|
|
487
|
+
- Clean up its own state
|
|
488
|
+
- Use fresh fixtures/data
|
|
489
|
+
|
|
490
|
+
```typescript
|
|
491
|
+
// ✅ GOOD - Fresh state per test
|
|
492
|
+
beforeEach(() => {
|
|
493
|
+
gameState = createFreshGameState();
|
|
494
|
+
});
|
|
495
|
+
|
|
496
|
+
// ❌ BAD - Shared state (test B depends on test A)
|
|
497
|
+
let sharedUser = createUser();
|
|
498
|
+
it('test A', () => {
|
|
499
|
+
sharedUser.name = 'Alice';
|
|
500
|
+
});
|
|
501
|
+
it('test B', () => {
|
|
502
|
+
expect(sharedUser.name).toBe('Alice');
|
|
503
|
+
});
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
### Async Testing
|
|
507
|
+
|
|
508
|
+
**NEVER use arbitrary timeouts** - Makes tests slow and non-deterministic.
|
|
509
|
+
|
|
510
|
+
```typescript
|
|
511
|
+
// ❌ BAD - Arbitrary timeout
|
|
512
|
+
await page.waitForTimeout(3000); // What if it takes 3.1 seconds?
|
|
513
|
+
await sleep(500); // Flaky test
|
|
514
|
+
|
|
515
|
+
// ✅ GOOD - Poll until condition is met
|
|
516
|
+
await expect.poll(() => getStatus()).toBe('ready');
|
|
517
|
+
await page.waitForSelector('[data-testid="loaded"]');
|
|
518
|
+
await waitFor(() => expect(screen.getByText('Success')).toBeVisible());
|
|
519
|
+
```
|
|
520
|
+
|
|
521
|
+
**Why:** Polling is deterministic (passes when condition is met) and faster (no unnecessary waiting).
|
|
522
|
+
|
|
523
|
+
---
|
|
524
|
+
|
|
525
|
+
## What Not to Test
|
|
526
|
+
|
|
527
|
+
❌ **Implementation details** - Private methods, CSS classes, internal state, how (test what users see)
|
|
528
|
+
❌ **Third-party libraries** - Assume React/Axios work, test YOUR code
|
|
529
|
+
❌ **Trivial code** - Getters/setters with no logic, pass-through functions
|
|
530
|
+
❌ **UI copy** - Exact text (use regex `/submit/i`), specific wording (test error shown, not message)
|
|
531
|
+
|
|
532
|
+
---
|
|
533
|
+
|
|
534
|
+
## CI/CD Integration
|
|
535
|
+
|
|
536
|
+
Run unit+integration tests on every commit (fast feedback), E2E tests on every PR, and LLM evals on schedule (weekly to catch regressions without per-commit cost).
|
|
537
|
+
|
|
538
|
+
---
|
|
539
|
+
|
|
540
|
+
## Quick Reference
|
|
541
|
+
|
|
542
|
+
| Need to test... | Test type | Technology | Speed | Cost |
|
|
543
|
+
| -------------------- | ----------- | ---------- | ------ | ---------- |
|
|
544
|
+
| Pure function | Unit | Vitest | Fast | Free |
|
|
545
|
+
| Service integration | Integration | Vitest | Medium | Free |
|
|
546
|
+
| Full user flow | E2E | Playwright | Slow | Free |
|
|
547
|
+
| AI reasoning quality | LLM eval | Promptfoo | Slow | $0.01-0.30 |
|
|
548
|
+
|
|
549
|
+
---
|
|
550
|
+
|
|
551
|
+
## Project-Specific Testing Documentation
|
|
552
|
+
|
|
553
|
+
**Location:** `tests/SAFEWORD.md` (may be nested like `packages/web/tests/SAFEWORD.md` in monorepos)
|
|
554
|
+
|
|
555
|
+
**Purpose:** Document project-specific testing stack, commands, and setup. Supplements global methodology.
|
|
556
|
+
|
|
557
|
+
**What to include:**
|
|
558
|
+
|
|
559
|
+
- **Tech stack:** Testing frameworks (Vitest/Jest, Playwright/Cypress, Promptfoo)
|
|
560
|
+
- **Test commands:** How to run tests, including single-file execution for performance
|
|
561
|
+
- **Setup requirements:** API keys, build steps, database setup, browser installation
|
|
562
|
+
- **File structure:** Where tests live and naming conventions
|
|
563
|
+
- **Project patterns:** Custom helpers, fixtures, mocks, assertion styles
|
|
564
|
+
- **TDD guidance:** Project-specific workflow expectations (write tests first, commit tests before implementation)
|
|
565
|
+
- **Coverage requirements:** Minimum coverage thresholds or critical paths
|
|
566
|
+
- **PR requirements:** Test passage requirements before merge
|
|
567
|
+
|
|
568
|
+
**Example:**
|
|
569
|
+
|
|
570
|
+
```markdown
|
|
571
|
+
# Testing
|
|
572
|
+
|
|
573
|
+
## Tech Stack
|
|
574
|
+
|
|
575
|
+
- Unit/Integration: Vitest
|
|
576
|
+
- E2E: Playwright
|
|
577
|
+
- LLM Evals: Promptfoo
|
|
578
|
+
|
|
579
|
+
## Commands
|
|
580
|
+
|
|
581
|
+
npm test # All tests
|
|
582
|
+
npm test -- path/to/file.test.ts # Single file (performance)
|
|
583
|
+
npm run test:coverage # With coverage report
|
|
584
|
+
npm run test:e2e # E2E tests only
|
|
585
|
+
|
|
586
|
+
## TDD Workflow
|
|
587
|
+
|
|
588
|
+
1. Write failing tests first (RED phase)
|
|
589
|
+
2. Confirm tests fail: `npm test -- path/to/file.test.ts`
|
|
590
|
+
3. Commit tests before implementation
|
|
591
|
+
4. Implement minimum code to pass (GREEN phase)
|
|
592
|
+
5. Refactor while keeping tests green
|
|
593
|
+
|
|
594
|
+
## Setup
|
|
595
|
+
|
|
596
|
+
1. Install: `npm install`
|
|
597
|
+
2. Browsers: `npx playwright install`
|
|
598
|
+
3. API keys: `export ANTHROPIC_API_KEY=sk-ant-...`
|
|
599
|
+
4. Build before testing: `npm run build`
|
|
600
|
+
|
|
601
|
+
## Coverage Requirements
|
|
602
|
+
|
|
603
|
+
- Unit tests: 80%+ for business logic
|
|
604
|
+
- E2E tests: All critical user paths
|
|
605
|
+
|
|
606
|
+
## PR Requirements
|
|
607
|
+
|
|
608
|
+
- All tests must pass
|
|
609
|
+
- No skipped tests without justification
|
|
610
|
+
- Coverage thresholds met
|
|
611
|
+
```
|
|
612
|
+
|
|
613
|
+
**If not found:** Ask user "Where are the testing docs?"
|
|
614
|
+
|
|
615
|
+
**Cascading precedence:**
|
|
616
|
+
|
|
617
|
+
1. **Global** (`~/.claude/testing-methodology.md`) - Universal methodology (test type selection, TDD workflow)
|
|
618
|
+
2. **Project** (`tests/SAFEWORD.md`) - Specific stack, commands, patterns
|