safeword 0.7.6 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/dist/{check-3TTR7WPD.js → check-2QCPMURS.js} +3 -3
  2. package/dist/{chunk-V5T3TEEQ.js → chunk-2P7QXQFL.js} +2 -2
  3. package/dist/{chunk-LETSGOTR.js → chunk-OXQIEKC7.js} +12 -7
  4. package/dist/{chunk-LETSGOTR.js.map → chunk-OXQIEKC7.js.map} +1 -1
  5. package/dist/{chunk-4NJCU6Z7.js → chunk-ZFRO5LB5.js} +2 -2
  6. package/dist/cli.js +6 -6
  7. package/dist/{diff-XJFCAA4Q.js → diff-6LJGYHY5.js} +3 -3
  8. package/dist/{reset-WPXUWP6Y.js → reset-VHNADDMA.js} +3 -3
  9. package/dist/{setup-DLS6K6EO.js → setup-QJNVWHTK.js} +3 -3
  10. package/dist/sync-TIBNJXB2.js +9 -0
  11. package/dist/{upgrade-4ESTGNXG.js → upgrade-GZSLDUEF.js} +4 -4
  12. package/package.json +15 -14
  13. package/templates/SAFEWORD.md +13 -4
  14. package/templates/doc-templates/feature-spec-template.md +1 -1
  15. package/templates/doc-templates/task-spec-template.md +1 -1
  16. package/templates/doc-templates/test-definitions-feature.md +1 -1
  17. package/templates/guides/architecture-guide.md +5 -7
  18. package/templates/guides/cli-reference.md +8 -0
  19. package/templates/guides/code-philosophy.md +9 -0
  20. package/templates/guides/context-files-guide.md +9 -0
  21. package/templates/guides/data-architecture-guide.md +9 -0
  22. package/templates/guides/design-doc-guide.md +21 -0
  23. package/templates/guides/learning-extraction.md +9 -0
  24. package/templates/guides/llm-guide.md +49 -0
  25. package/templates/guides/planning-guide.md +431 -0
  26. package/templates/guides/testing-guide.md +439 -0
  27. package/templates/guides/zombie-process-cleanup.md +9 -0
  28. package/templates/scripts/lint-md.sh +0 -0
  29. package/templates/skills/safeword-systematic-debugger/SKILL.md +1 -1
  30. package/templates/skills/safeword-tdd-enforcer/SKILL.md +2 -3
  31. package/dist/sync-AAG4SP5F.js +0 -9
  32. package/templates/guides/development-workflow.md +0 -618
  33. package/templates/guides/tdd-best-practices.md +0 -616
  34. package/templates/guides/test-definitions-guide.md +0 -335
  35. package/templates/guides/user-story-guide.md +0 -256
  36. /package/dist/{check-3TTR7WPD.js.map → check-2QCPMURS.js.map} +0 -0
  37. /package/dist/{chunk-V5T3TEEQ.js.map → chunk-2P7QXQFL.js.map} +0 -0
  38. /package/dist/{chunk-4NJCU6Z7.js.map → chunk-ZFRO5LB5.js.map} +0 -0
  39. /package/dist/{diff-XJFCAA4Q.js.map → diff-6LJGYHY5.js.map} +0 -0
  40. /package/dist/{reset-WPXUWP6Y.js.map → reset-VHNADDMA.js.map} +0 -0
  41. /package/dist/{setup-DLS6K6EO.js.map → setup-QJNVWHTK.js.map} +0 -0
  42. /package/dist/{sync-AAG4SP5F.js.map → sync-TIBNJXB2.js.map} +0 -0
  43. /package/dist/{upgrade-4ESTGNXG.js.map → upgrade-GZSLDUEF.js.map} +0 -0
@@ -46,10 +46,8 @@ Training data is stale. Follow this sequence:
46
46
 
47
47
  | Trigger | Guide |
48
48
  | --------------------------------------------------------- | ----------------------------------------------- |
49
- | Starting ANY feature, bug fix, or enhancement | `./.safeword/guides/development-workflow.md` |
50
- | Need to write OR review user stories | `./.safeword/guides/user-story-guide.md` |
51
- | Need to write OR review test definitions | `./.safeword/guides/test-definitions-guide.md` |
52
- | Writing tests, doing TDD, or test is failing | `./.safeword/guides/tdd-best-practices.md` |
49
+ | Starting feature/task OR writing specs/test definitions | `./.safeword/guides/planning-guide.md` |
50
+ | Choosing test type, doing TDD, OR test is failing | `./.safeword/guides/testing-guide.md` |
53
51
  | Creating OR updating a design doc | `./.safeword/guides/design-doc-guide.md` |
54
52
  | Making architectural decision OR writing ADR | `./.safeword/guides/architecture-guide.md` |
55
53
  | Designing data models, schemas, or database changes | `./.safeword/guides/data-architecture-guide.md` |
@@ -306,3 +304,14 @@ When markdown lint reports MD040 (missing language), choose:
306
304
  - Integration struggle between tools
307
305
 
308
306
  **Before extracting:** Check `.safeword/learnings/` for existing similar learnings—update, don't duplicate.
307
+
308
+ ---
309
+
310
+ ## Always Remember
311
+
312
+ 1. **Clarity → Simplicity → Correctness** (in that order)
313
+ 2. **Test what you can test**—never ask user to verify
314
+ 3. **RED → GREEN → REFACTOR**—never skip steps
315
+ 4. **Commit after each GREEN phase**
316
+ 5. **Read the matching guide** when a trigger fires
317
+ 6. **End every response** with: `{"proposedChanges": bool, "madeChanges": bool, "askedQuestion": bool}`
@@ -1,6 +1,6 @@
1
1
  # Feature Spec: [Feature Name] (Issue #[number])
2
2
 
3
- **Guide**: `@./.safeword/guides/user-story-guide.md` - Best practices, INVEST criteria, and examples
3
+ **Guide**: `@./.safeword/guides/planning-guide.md` - Best practices, INVEST criteria, and examples
4
4
  **Template**: `@./.safeword/templates/feature-spec-template.md`
5
5
 
6
6
  **Feature**: [Brief description of the feature]
@@ -1,6 +1,6 @@
1
1
  # Task: [Name]
2
2
 
3
- **Guide**: `@./.safeword/guides/development-workflow.md`
3
+ **Guide**: `@./.safeword/guides/planning-guide.md`
4
4
  **Template**: `@./.safeword/templates/task-spec-template.md`
5
5
 
6
6
  ---
@@ -1,6 +1,6 @@
1
1
  # Test Definitions: [Feature Name] (Issue #[number])
2
2
 
3
- **Guide**: `@./.safeword/guides/test-definitions-guide.md` - Structure, status tracking, and TDD workflow
3
+ **Guide**: `@./.safeword/guides/testing-guide.md` - Structure, status tracking, and TDD workflow
4
4
  **Template**: `@./.safeword/templates/test-definitions-feature.md`
5
5
 
6
6
  **Feature**: [Brief description of the feature]
@@ -413,11 +413,9 @@ export default defineConfig([
413
413
 
414
414
  ---
415
415
 
416
- ## Key Takeaway
416
+ ## Key Takeaways
417
417
 
418
- **One comprehensive architecture document per project** > many scattered ADR files:
419
-
420
- Full context in one place
421
- Living document (update in place)
422
- ✅ LLMs consume entire architecture at once
423
- ✅ Sequential decision trees prevent ambiguity
418
+ - One Architecture Doc per project—not scattered ADRs
419
+ - Every decision needs: What / Why / Trade-off / Alternatives
420
+ - Update when adding: technology, schema, or project-wide pattern
421
+ - Living documentupdate in place with version/status tracking
@@ -33,3 +33,11 @@ Common flags:
33
33
  - `-y, --yes` - Skip confirmations (setup, reset)
34
34
  - `-v, --verbose` - Show detailed output (diff)
35
35
  - `-q, --quiet` - Suppress output (sync)
36
+
37
+ ---
38
+
39
+ ## Key Takeaways
40
+
41
+ - Always use `@latest` for setup/check/upgrade/diff to get current CLI
42
+ - Run `sync` after adding/removing frameworks to update linting plugins
43
+ - Use `diff` before `upgrade` to preview changes
@@ -196,3 +196,12 @@ Before completing any work, verify:
196
196
  # ❌ Bad: "misc fixes"
197
197
  # ✅ Good: "fix: login button not responding to clicks"
198
198
  ```
199
+
200
+ ---
201
+
202
+ ## Key Takeaways
203
+
204
+ - Clarity → Simplicity → Correctness (in that order)
205
+ - Delete unused code—no "just in case" abstractions
206
+ - Commit often with descriptive messages
207
+ - Verify library versions before using APIs (training data is stale)
@@ -455,3 +455,12 @@ Before committing:
455
455
  - Bloated files cost more tokens and introduce noise
456
456
  - Keep under 50KB for optimal performance (though no hard limit)
457
457
  - Use imports to modularize instead of monolithic files
458
+
459
+ ---
460
+
461
+ ## Key Takeaways
462
+
463
+ - Keep context files under 200 lines—use imports to modularize
464
+ - Short declarative bullets, not narrative paragraphs
465
+ - Update immediately when architecture changes (stale docs = confusion)
466
+ - Put critical rules at the END of documents (recency bias)
@@ -198,3 +198,12 @@ Before finalizing data architecture doc:
198
198
  - [ ] Migration strategy covers both additive and breaking changes
199
199
  - [ ] Version and status match codebase (verify with git/deployment)
200
200
  - [ ] Cross-referenced from root ARCHITECTURE.md or SAFEWORD.md (link exists)
201
+
202
+ ---
203
+
204
+ ## Key Takeaways
205
+
206
+ - Data quality, governance, accessibility are core principles
207
+ - Every entity needs: attributes, types, relationships, constraints
208
+ - Performance targets use concrete numbers (e.g., <100ms, not "fast")
209
+ - Migration strategy covers both additive and breaking changes
@@ -1,5 +1,17 @@
1
1
  # Design Doc Guide for Claude Code
2
2
 
3
+ ## Escalation Check
4
+
5
+ **STOP if ANY apply—use `architecture-guide.md` first:**
6
+
7
+ - [ ] Need to choose a technology or library
8
+ - [ ] Need to design a data model or schema
9
+ - [ ] Pattern will affect 2+ features
10
+
11
+ Then return here.
12
+
13
+ ---
14
+
3
15
  ## How to Fill Out Design Doc
4
16
 
5
17
  **Template:** `@.safeword/templates/design-doc-template.md`
@@ -169,3 +181,12 @@ Before saving, verify:
169
181
  **Important:** Design docs are instructions that LLMs read and follow.
170
182
 
171
183
  **See:** `@.safeword/guides/llm-guide.md` for comprehensive framework on writing clear, actionable documentation that LLMs can reliably follow.
184
+
185
+ ---
186
+
187
+ ## Key Takeaways
188
+
189
+ - Escalate to Architecture Doc if: new tech, new schema, or pattern affects 2+ features
190
+ - Reference user stories and test definitions—don't duplicate them
191
+ - Every decision needs: what, why, trade-off
192
+ - ~121 lines target (concise, LLM-optimized)
@@ -550,3 +550,12 @@ This is a **living process** - iterate and refine based on what works.
550
550
  - Refactor when multiple learnings cover similar topics (consolidate)
551
551
  - Split when learning file >200 lines (focus on single concept)
552
552
  - Update SAFEWORD.md references when learnings move or merge
553
+
554
+ ---
555
+
556
+ ## Key Takeaways
557
+
558
+ - Extract after 5+ debug cycles or 3+ approaches tried
559
+ - Check existing learnings first—update, don't duplicate
560
+ - One concept per file, under 200 lines
561
+ - Extract immediately while fresh (don't defer to "later")
@@ -259,6 +259,44 @@ When LLMs hit dead ends, provide concrete next steps.
259
259
  - Login form → Dashboard → E2E test (multi-page)"
260
260
  ```
261
261
 
262
+ ### 14. Position-Aware Writing (Recency Bias)
263
+
264
+ LLMs retain information at the **beginning and end** of context better than the middle. Structure documents accordingly.
265
+
266
+ ```markdown
267
+ ❌ BAD - Critical rules buried in middle:
268
+
269
+ # Guide
270
+
271
+ ## Background (100 lines)
272
+
273
+ ## Details (200 lines)
274
+
275
+ ## Critical Rules (10 lines) ← forgotten
276
+
277
+ ## Appendix (50 lines)
278
+
279
+ ✅ GOOD - Critical rules at end:
280
+
281
+ # Guide
282
+
283
+ ## Background (100 lines)
284
+
285
+ ## Details (200 lines)
286
+
287
+ ## Appendix (50 lines)
288
+
289
+ ## Key Takeaways (10 lines) ← retained
290
+ ```
291
+
292
+ **Application:**
293
+
294
+ - CLAUDE.md / SAFEWORD.md: Put "Always Remember" section last
295
+ - Guides: End with "Key Takeaways" section
296
+ - Templates: Put most important sections at top OR bottom, not middle
297
+
298
+ **Research basis:** "Lost in the middle" phenomenon—models show <40% recall for middle content vs >80% for beginning/end content.
299
+
262
300
  ---
263
301
 
264
302
  ## Anti-Patterns
@@ -267,6 +305,7 @@ When LLMs hit dead ends, provide concrete next steps.
267
305
  ❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
268
306
  ❌ **Competing guidance** - Multiple decision frameworks that contradict each other
269
307
  ❌ **Outdated references** - Remove concepts, but forget to update all mentions
308
+ ❌ **Critical info in the middle** - Most important rules buried between background and appendix
270
309
 
271
310
  ---
272
311
 
@@ -283,6 +322,7 @@ Before saving/committing LLM-consumable documentation:
283
322
  - [ ] Tie-breaking rules provided
284
323
  - [ ] Complex decisions (3+ branches) have lookup tables
285
324
  - [ ] Dead-end paths have re-evaluation steps with examples
325
+ - [ ] Critical rules positioned at END of document (recency bias)
286
326
 
287
327
  ---
288
328
 
@@ -310,3 +350,12 @@ Edge cases:
310
350
  - React components with React Testing Library → Integration (not E2E, no real browser)
311
351
  - Non-deterministic functions (Date.now()) → Unit test with mocked time
312
352
  ```
353
+
354
+ ---
355
+
356
+ ## Key Takeaways
357
+
358
+ - Decision trees: sequential, MECE, with tie-breakers
359
+ - Every rule needs concrete examples (good vs bad)
360
+ - Define all terms explicitly—assume nothing is obvious
361
+ - Put critical rules at the END of documents (recency bias)
@@ -0,0 +1,431 @@
1
+ # Planning Guide
2
+
3
+ How to write specs, user stories, and test definitions before implementation.
4
+
5
+ ---
6
+
7
+ ## Artifact Levels
8
+
9
+ **Triage first - answer IN ORDER, stop at first match:**
10
+
11
+ | Question | Level | Artifacts |
12
+ | ---------------------------------------- | -------------- | ---------------------------------------------------- |
13
+ | User-facing feature with business value? | **L2 Feature** | Feature Spec + Test Definitions (+ Design Doc if 3+) |
14
+ | Bug, improvement, internal, or refactor? | **L1 Task** | Task Spec with inline tests |
15
+ | Typo, config, or trivial change? | **L0 Micro** | Minimal Task Spec, existing tests |
16
+
17
+ **Locations:**
18
+
19
+ - Specs: `.safeword/planning/specs/`
20
+ - Test definitions: `.safeword/planning/test-definitions/`
21
+
22
+ **If none fit:** Break down the work. A single task spanning all three levels should be split into separate L2 feature + L1 tasks.
23
+
24
+ ---
25
+
26
+ ## Templates
27
+
28
+ | Need | Template |
29
+ | ------------------------------- | ---------------------------------------------------- |
30
+ | L2 Feature spec | `@./.safeword/templates/feature-spec-template.md` |
31
+ | L1/L0 Task spec | `@./.safeword/templates/task-spec-template.md` |
32
+ | L2 Test definitions | `@./.safeword/templates/test-definitions-feature.md` |
33
+ | Complex feature design | `@./.safeword/templates/design-doc-template.md` |
34
+ | Architectural decision | `@./.safeword/templates/architecture-template.md` |
35
+ | Context anchor for complex work | `@./.safeword/templates/ticket-template.md` |
36
+ | Execution scratch pad | `@./.safeword/templates/work-log-template.md` |
37
+
38
+ ---
39
+
40
+ ## Part 1: User Stories
41
+
42
+ ### When to Use Each Format
43
+
44
+ | Format | Best For | Example Trigger |
45
+ | ------------------------------ | ------------------------------------------- | ---------------------------- |
46
+ | Standard (As a/I want/So that) | User-facing features, UI flows | "User can do X" |
47
+ | Given-When-Then | API behavior, state transitions, edge cases | "When X happens, then Y" |
48
+ | Job Story | Problem-solving, user motivation unclear | "User needs to accomplish X" |
49
+
50
+ **Decision rule:** Default to Standard. Use Given-When-Then for APIs or complex state. Use Job Story when focusing on the problem, not the solution.
51
+
52
+ **Edge cases:**
53
+
54
+ - API with UI? → Standard for UI, Given-When-Then for API contract tests
55
+ - Unclear user role? → Job Story to focus on the problem first, convert to Standard later
56
+ - Technical task (refactor, upgrade)? → Skip story format, use Technical Task template
57
+
58
+ ### Standard Format (Recommended)
59
+
60
+ ```text
61
+ As a [role/persona]
62
+ I want [capability/feature]
63
+ So that [business value/benefit]
64
+
65
+ Acceptance Criteria:
66
+ - [Specific, testable condition 1]
67
+ - [Specific, testable condition 2]
68
+ - [Specific, testable condition 3]
69
+
70
+ Out of Scope:
71
+ - [What this story explicitly does NOT include]
72
+ ```
73
+
74
+ ### Given-When-Then Format (Behavior-Focused)
75
+
76
+ ```text
77
+ Given [initial context/state]
78
+ When [action/event occurs]
79
+ Then [expected outcome]
80
+
81
+ And [additional context/outcome]
82
+ But [exception/edge case]
83
+ ```
84
+
85
+ **Example:**
86
+
87
+ ```text
88
+ Given I am an authenticated API user
89
+ When I POST to /api/campaigns with valid JSON
90
+ Then I receive a 201 Created response with campaign ID
91
+ And the campaign appears in my GET /api/campaigns list
92
+ But invalid JSON returns 400 with descriptive error messages
93
+ ```
94
+
95
+ ### Job Story Format (Outcome-Focused)
96
+
97
+ ```text
98
+ When [situation/context]
99
+ I want to [motivation/job-to-be-done]
100
+ So I can [expected outcome]
101
+ ```
102
+
103
+ **Example:**
104
+
105
+ ```text
106
+ When I'm debugging a failing test
107
+ I want to see the exact LLM prompt and response
108
+ So I can identify whether the issue is prompt engineering or code logic
109
+ ```
110
+
111
+ ---
112
+
113
+ ## INVEST Validation
114
+
115
+ Before saving any story, verify it passes all six criteria:
116
+
117
+ - [ ] **Independent** - Can be completed without depending on other stories
118
+ - [ ] **Negotiable** - Details emerge through conversation, not a fixed contract
119
+ - [ ] **Valuable** - Delivers clear value to user or business
120
+ - [ ] **Estimable** - Team can estimate effort (not too vague, not too detailed)
121
+ - [ ] **Small** - Completable in one sprint/iteration (typically 1-5 days)
122
+ - [ ] **Testable** - Clear acceptance criteria define when it's done
123
+
124
+ **If a story fails any criteria, it's not ready - refine or split it.**
125
+
126
+ ---
127
+
128
+ ## Writing Good Acceptance Criteria
129
+
130
+ **✅ GOOD - Specific, user-facing, testable:**
131
+
132
+ - User can switch campaigns without page reload
133
+ - Response time is under 200ms
134
+ - Current campaign is visually highlighted
135
+ - Error message explains what went wrong
136
+
137
+ **❌ BAD - Vague, technical, or implementation:**
138
+
139
+ - Campaign switching works ← Too vague
140
+ - Use Zustand for state ← Implementation detail
141
+ - Database is fast ← Not user-facing
142
+ - Code is clean ← Not testable
143
+
144
+ ---
145
+
146
+ ## Size Guidelines
147
+
148
+ | Indicator | Too Big | Just Right | Too Small |
149
+ | ------------------- | ------- | ---------- | --------- |
150
+ | Acceptance Criteria | 6+ | 1-5 | 0 |
151
+ | Personas/Screens | 3+ | 1-2 | N/A |
152
+ | Duration | 6+ days | 1-5 days | <1 hour |
153
+ | **Action** | Split | ✅ Ship | Combine |
154
+
155
+ **Decision rule:** When borderline, err on the side of splitting. Smaller stories are easier to estimate and complete.
156
+
157
+ ---
158
+
159
+ ## Technical Constraints Section
160
+
161
+ **Purpose:** Capture non-functional requirements that inform test definitions.
162
+
163
+ **When to use:** Fill in constraints BEFORE writing test definitions. Delete sections that don't apply.
164
+
165
+ | Category | What It Captures | Examples |
166
+ | -------------- | -------------------------------- | ----------------------------------------------- |
167
+ | Performance | Speed, throughput, capacity | Response time < 200ms, 1000 concurrent users |
168
+ | Security | Auth, validation, rate limiting | Sanitized inputs, session required, 100 req/min |
169
+ | Compatibility | Browsers, devices, accessibility | Chrome 100+, iOS 14+, WCAG 2.1 AA |
170
+ | Data | Privacy, retention, compliance | GDPR delete in 72h, 90-day log retention |
171
+ | Dependencies | Existing systems, restrictions | Use AuthService, no new packages |
172
+ | Infrastructure | Resources, offline, deployment | < 512MB memory, offline-capable |
173
+
174
+ **Include a constraint if:**
175
+
176
+ - It affects how you write tests
177
+ - It limits implementation choices
178
+ - Violating it would fail an audit or break SLAs
179
+
180
+ ---
181
+
182
+ ## User Story Examples
183
+
184
+ ### ✅ GOOD Story
185
+
186
+ ```text
187
+ As a player with multiple campaigns
188
+ I want to switch between campaigns from the sidebar
189
+ So that I can quickly resume different games
190
+
191
+ Acceptance Criteria:
192
+ - [ ] Sidebar shows all campaigns with last-played date
193
+ - [ ] Clicking campaign loads it within 200ms
194
+ - [ ] Current campaign is highlighted
195
+
196
+ Out of Scope:
197
+ - Campaign merging/deletion (separate story)
198
+ ```
199
+
200
+ ### ❌ BAD Story (Too Big)
201
+
202
+ ```text
203
+ As a user
204
+ I want a complete campaign management system
205
+ So that I can organize my games
206
+
207
+ Acceptance Criteria:
208
+ - [ ] Create, edit, delete campaigns
209
+ - [ ] Share campaigns with other players
210
+ - [ ] Export/import campaign data
211
+ - [ ] Search and filter campaigns
212
+ - [ ] Tag campaigns by theme
213
+ ```
214
+
215
+ **Problem:** This is 5+ separate stories. Split it.
216
+
217
+ ### ❌ BAD Story (No Value)
218
+
219
+ ```text
220
+ As a developer
221
+ I want to refactor the GameStore
222
+ So that code is cleaner
223
+ ```
224
+
225
+ **Problem:** Developer is not a user. "Cleaner code" is not user-facing value.
226
+
227
+ ### ✅ BETTER (Technical Task)
228
+
229
+ ```text
230
+ Technical Task: Refactor GameStore to use Immer
231
+
232
+ Why: Prevent state mutation bugs (3 bugs in last sprint)
233
+ Effort: 2-3 hours
234
+ Test: All existing tests pass, no new mutations
235
+ ```
236
+
237
+ ---
238
+
239
+ ## Part 2: Test Definitions
240
+
241
+ ### How to Fill Out Test Definitions
242
+
243
+ 1. Read `@./.safeword/templates/test-definitions-feature.md`
244
+ 2. Read user story's Technical Constraints section (if exists)
245
+ 3. Fill in feature name, issue number, test file path
246
+ 4. Organize tests into logical suites
247
+ 5. Create numbered tests (Test 1.1, Test 1.2, etc.)
248
+ 6. Add status for each test
249
+ 7. Include detailed steps and expected outcomes
250
+ 8. Add summary with coverage breakdown
251
+ 9. Save to `.safeword/planning/test-definitions/feature-[slug].md`
252
+
253
+ ---
254
+
255
+ ## Test Status Indicators
256
+
257
+ Use these consistently:
258
+
259
+ - **✅ Passing** - Test is implemented and passing
260
+ - **⏭️ Skipped** - Test is intentionally skipped (add rationale)
261
+ - **❌ Not Implemented** - Test is defined but not yet written
262
+ - **🔴 Failing** - Test exists but is currently failing
263
+
264
+ ---
265
+
266
+ ## Test Definition Naming
267
+
268
+ **✅ GOOD - Descriptive and specific:**
269
+
270
+ - "Render all three panes"
271
+ - "Cmd+J toggles AI pane visibility"
272
+ - "State persistence across sessions"
273
+
274
+ **❌ BAD - Vague or technical:**
275
+
276
+ - "Test 1" (no description)
277
+ - "Check state" (too vague)
278
+ - "Verify useUIStore hook" (implementation detail)
279
+
280
+ ---
281
+
282
+ ## Writing Test Steps
283
+
284
+ **✅ GOOD - Clear, actionable steps:**
285
+
286
+ ```text
287
+ **Steps**:
288
+ 1. Toggle AI pane visible
289
+ 2. Get bounding box for AI pane
290
+ 3. Get bounding box for Editor pane
291
+ 4. Compare X coordinates
292
+ ```
293
+
294
+ **❌ BAD - Vague or incomplete:**
295
+
296
+ ```text
297
+ **Steps**:
298
+ 1. Check panes
299
+ 2. Verify order
300
+ ```
301
+
302
+ ---
303
+
304
+ ## Writing Expected Outcomes
305
+
306
+ **✅ GOOD - Specific, testable assertions:**
307
+
308
+ ```text
309
+ **Expected**:
310
+ - AI pane X coordinate < Editor pane X coordinate
311
+ - Explorer pane X coordinate > Editor pane X coordinate
312
+ - All coordinates are positive numbers
313
+ ```
314
+
315
+ **❌ BAD - Vague expectations:**
316
+
317
+ ```text
318
+ **Expected**:
319
+ - Panes are in correct order
320
+ - Everything works
321
+ ```
322
+
323
+ ---
324
+
325
+ ## Organizing Test Suites
326
+
327
+ Group related tests:
328
+
329
+ - **Layout/Structure** - DOM structure, element presence, positioning
330
+ - **User Interactions** - Clicks, keyboard shortcuts, drag/drop
331
+ - **State Management** - State changes, persistence, reactivity
332
+ - **Accessibility** - ARIA labels, keyboard navigation, focus
333
+ - **Edge Cases** - Error handling, boundary conditions
334
+ - **Technical Constraints** - Non-functional requirements from user story
335
+
336
+ ---
337
+
338
+ ## Coverage Summary
339
+
340
+ **Always include:**
341
+
342
+ - Total test count
343
+ - Breakdown by status (passing, skipped, not implemented, failing)
344
+ - Percentages for each category
345
+ - Rationale for skipped tests
346
+
347
+ **Example:**
348
+
349
+ ```text
350
+ **Total**: 20 tests
351
+ **Passing**: 9 tests (45%)
352
+ **Skipped**: 4 tests (20%)
353
+ **Not Implemented**: 7 tests (35%)
354
+ **Failing**: 0 tests
355
+ ```
356
+
357
+ ---
358
+
359
+ ## Testing Technical Constraints
360
+
361
+ User stories include Technical Constraints. These MUST have corresponding tests.
362
+
363
+ | Constraint Category | Test Type | What to Verify |
364
+ | ------------------- | -------------------------- | --------------------------------------------- |
365
+ | Performance | Load/timing tests | Response times, throughput, capacity |
366
+ | Security | Security tests | Input sanitization, auth, rate limiting |
367
+ | Compatibility | Cross-browser/device tests | Browser versions, mobile, accessibility |
368
+ | Data | Compliance tests | Retention, deletion, privacy rules |
369
+ | Dependencies | Integration tests | Required services work, no forbidden packages |
370
+ | Infrastructure | Resource tests | Memory limits, offline behavior |
371
+
372
+ ---
373
+
374
+ ## Test Definition Example
375
+
376
+ ```markdown
377
+ ### Test 3.1: Cmd+J toggles AI pane visibility ✅
378
+
379
+ **Status**: ✅ Passing
380
+ **Description**: Verifies Cmd+J keyboard shortcut toggles AI pane
381
+
382
+ **Steps**:
383
+
384
+ 1. Verify AI pane hidden initially (default state)
385
+ 2. Press Cmd+J (Mac) or Ctrl+J (Windows/Linux)
386
+ 3. Verify AI pane becomes visible
387
+ 4. Press Cmd+J again
388
+ 5. Verify AI pane becomes hidden
389
+
390
+ **Expected**:
391
+
392
+ - AI pane starts hidden
393
+ - After first toggle: AI pane visible
394
+ - After second toggle: AI pane hidden
395
+ ```
396
+
397
+ ---
398
+
399
+ ## File Naming Convention
400
+
401
+ **Specs:** `.safeword/planning/specs/feature-[slug].md` or `task-[slug].md`
402
+
403
+ **Test definitions:** `.safeword/planning/test-definitions/feature-[slug].md`
404
+
405
+ **Good filenames:**
406
+
407
+ - `feature-campaign-switching.md`
408
+ - `task-fix-login-timeout.md`
409
+
410
+ **Bad filenames:**
411
+
412
+ - `user-story-1.md` ← Not descriptive
413
+ - `STORY_CAMPAIGN_FINAL_v2.md` ← Bloated
414
+
415
+ ---
416
+
417
+ ## Quick Reference
418
+
419
+ **User Story Red Flags (INVEST Violations):**
420
+
421
+ - No acceptance criteria → Too vague
422
+ - > 5 acceptance criteria → Split into multiple stories
423
+ - Technical implementation details → Wrong audience
424
+ - Missing "So that" → No clear value
425
+
426
+ **Test Definition Red Flags:**
427
+
428
+ - Test name doesn't describe behavior → Rename
429
+ - Steps are vague → Add detail
430
+ - No expected outcomes → Add assertions
431
+ - No coverage summary → Add totals