safeword 0.8.4 → 0.8.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/dist/{check-OUVIK2O6.js → check-QZ3ZAHZY.js} +3 -3
  2. package/dist/{chunk-SIK3BC7F.js → chunk-62YXVEKM.js} +27 -9
  3. package/dist/{chunk-SIK3BC7F.js.map → chunk-62YXVEKM.js.map} +1 -1
  4. package/dist/{chunk-TP334635.js → chunk-SIEJPWQA.js} +2 -2
  5. package/dist/{chunk-QYCKBF57.js → chunk-VZPFU4YI.js} +2 -2
  6. package/dist/cli.js +6 -6
  7. package/dist/{diff-ASRWAPYJ.js → diff-YXUSIILN.js} +3 -3
  8. package/dist/{reset-4G5DEEMY.js → reset-HQXT7SCI.js} +3 -3
  9. package/dist/{setup-URK77YMR.js → setup-JX476EH3.js} +3 -3
  10. package/dist/sync-A7VWZKQD.js +9 -0
  11. package/dist/{upgrade-IDR2ZALG.js → upgrade-45E6255Q.js} +4 -4
  12. package/package.json +15 -14
  13. package/templates/cursor/rules/safeword-brainstorming.mdc +198 -0
  14. package/templates/cursor/rules/safeword-debugging.mdc +202 -0
  15. package/templates/cursor/rules/safeword-enforcing-tdd.mdc +207 -0
  16. package/templates/cursor/rules/safeword-quality-reviewer.mdc +158 -0
  17. package/templates/cursor/rules/safeword-refactoring.mdc +175 -0
  18. package/templates/scripts/lint-md.sh +0 -0
  19. package/templates/skills/safeword-brainstorming/SKILL.md +210 -0
  20. package/templates/skills/{safeword-systematic-debugger → safeword-debugging}/SKILL.md +1 -71
  21. package/templates/skills/{safeword-tdd-enforcer → safeword-enforcing-tdd}/SKILL.md +17 -17
  22. package/templates/skills/safeword-quality-reviewer/SKILL.md +17 -67
  23. package/dist/sync-AOKWEHCY.js +0 -9
  24. /package/dist/{check-OUVIK2O6.js.map → check-QZ3ZAHZY.js.map} +0 -0
  25. /package/dist/{chunk-TP334635.js.map → chunk-SIEJPWQA.js.map} +0 -0
  26. /package/dist/{chunk-QYCKBF57.js.map → chunk-VZPFU4YI.js.map} +0 -0
  27. /package/dist/{diff-ASRWAPYJ.js.map → diff-YXUSIILN.js.map} +0 -0
  28. /package/dist/{reset-4G5DEEMY.js.map → reset-HQXT7SCI.js.map} +0 -0
  29. /package/dist/{setup-URK77YMR.js.map → setup-JX476EH3.js.map} +0 -0
  30. /package/dist/{sync-AOKWEHCY.js.map → sync-A7VWZKQD.js.map} +0 -0
  31. /package/dist/{upgrade-IDR2ZALG.js.map → upgrade-45E6255Q.js.map} +0 -0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "safeword",
3
- "version": "0.8.4",
3
+ "version": "0.8.5",
4
4
  "description": "CLI for setting up and managing safeword development environments",
5
5
  "type": "module",
6
6
  "bin": {
@@ -19,6 +19,18 @@
19
19
  "engines": {
20
20
  "node": ">=18"
21
21
  },
22
+ "scripts": {
23
+ "build": "tsup",
24
+ "dev": "tsup --watch",
25
+ "test": "vitest run",
26
+ "test:e2e": "vitest run tests/e2e/",
27
+ "test:watch": "vitest",
28
+ "test:coverage": "vitest run --coverage",
29
+ "typecheck": "tsc --noEmit",
30
+ "lint": "eslint src tests",
31
+ "clean": "rm -rf dist",
32
+ "prepublishOnly": "npm run build && npm test"
33
+ },
22
34
  "dependencies": {
23
35
  "commander": "^12.1.0"
24
36
  },
@@ -36,16 +48,5 @@
36
48
  "claude-code"
37
49
  ],
38
50
  "author": "",
39
- "license": "MIT",
40
- "scripts": {
41
- "build": "tsup",
42
- "dev": "tsup --watch",
43
- "test": "vitest run",
44
- "test:e2e": "vitest run tests/e2e/",
45
- "test:watch": "vitest",
46
- "test:coverage": "vitest run --coverage",
47
- "typecheck": "tsc --noEmit",
48
- "lint": "eslint src tests",
49
- "clean": "rm -rf dist"
50
- }
51
- }
51
+ "license": "MIT"
52
+ }
@@ -0,0 +1,198 @@
1
+ ---
2
+ description: Use before implementation when refining rough ideas into specs. Guides collaborative design through Socratic questioning, alternative exploration, and incremental validation. Triggers: 'brainstorm', 'design', 'explore options', 'figure out', 'think through', 'what approach'.
3
+ alwaysApply: false
4
+ ---
5
+
6
+ # Brainstorming Ideas Into Specs
7
+
8
+ Turn rough ideas into validated specs through Socratic dialogue.
9
+
10
+ **Iron Law:** ONE QUESTION AT A TIME. EXPLORE ALTERNATIVES BEFORE DECIDING.
11
+
12
+ ## When to Use
13
+
14
+ Answer IN ORDER. Stop at first match:
15
+
16
+ 1. Rough idea needs refinement? → Use this skill
17
+ 2. Multiple approaches possible? → Use this skill
18
+ 3. Unclear requirements? → Use this skill
19
+ 4. Clear task, obvious approach? → Skip (use enforcing-tdd directly)
20
+ 5. Pure research/investigation? → Skip
21
+
22
+ ---
23
+
24
+ ## Phase 1: CONTEXT
25
+
26
+ **Purpose:** Understand what exists before asking questions.
27
+
28
+ **Protocol:**
29
+
30
+ 1. Check project state (files, recent commits, existing specs)
31
+ 2. Review relevant docs in `.safeword/planning/`
32
+ 3. Identify constraints and patterns already established
33
+
34
+ **Exit Criteria:**
35
+
36
+ - [ ] Reviewed relevant codebase areas
37
+ - [ ] Checked existing specs/designs
38
+ - [ ] Ready to ask informed questions
39
+
40
+ ---
41
+
42
+ ## Phase 2: QUESTION
43
+
44
+ **Iron Law:** ONE QUESTION PER MESSAGE
45
+
46
+ **Protocol:**
47
+
48
+ 1. Ask one focused question
49
+ 2. Prefer multiple choice (2-4 options) when possible
50
+ 3. Open-ended is fine for exploratory topics
51
+ 4. Focus on: purpose, constraints, success criteria, scope
52
+
53
+ **Question Types (in order of preference):**
54
+
55
+ | Type | When | Example |
56
+ | --------------- | ----------------------- | ---------------------------------------------------- |
57
+ | Multiple choice | Clear options exist | "Should this be (A) real-time or (B) polling-based?" |
58
+ | Yes/No | Binary decision | "Do we need offline support?" |
59
+ | Bounded open | Need specifics | "What's the max number of items to display?" |
60
+ | Open-ended | Exploring problem space | "What problem are you trying to solve?" |
61
+
62
+ **Exit Criteria:**
63
+
64
+ - [ ] Understand the core problem/goal
65
+ - [ ] Know key constraints
66
+ - [ ] Have success criteria
67
+ - [ ] Scope boundaries are clear
68
+
69
+ ---
70
+
71
+ ## Phase 3: ALTERNATIVES
72
+
73
+ **Iron Law:** ALWAYS PRESENT 2-3 OPTIONS BEFORE DECIDING
74
+
75
+ **Protocol:**
76
+
77
+ 1. Present 2-3 approaches with trade-offs
78
+ 2. Lead with your recommendation and why
79
+ 3. Be explicit about what each gives up
80
+ 4. Let user choose (or suggest hybrid)
81
+
82
+ **Format:**
83
+
84
+ ```text
85
+ I'd recommend Option A because [reason].
86
+
87
+ **Option A: [Name]**
88
+ - Approach: [how it works]
89
+ - Pros: [benefits]
90
+ - Cons: [drawbacks]
91
+
92
+ **Option B: [Name]**
93
+ - Approach: [how it works]
94
+ - Pros: [benefits]
95
+ - Cons: [drawbacks]
96
+
97
+ Which direction feels right?
98
+ ```
99
+
100
+ **Exit Criteria:**
101
+
102
+ - [ ] Presented 2-3 viable approaches
103
+ - [ ] Gave clear recommendation with reasoning
104
+ - [ ] User selected approach (or hybrid)
105
+
106
+ ---
107
+
108
+ ## Phase 4: DESIGN
109
+
110
+ **Iron Law:** PRESENT IN 200-300 WORD SECTIONS. VALIDATE EACH.
111
+
112
+ **Protocol:**
113
+
114
+ 1. Present design incrementally (not all at once)
115
+ 2. After each section: "Does this look right so far?"
116
+ 3. Cover: architecture, components, data flow, error handling
117
+ 4. Apply YAGNI ruthlessly - remove anything "just in case"
118
+ 5. Go back and clarify when something doesn't fit
119
+
120
+ **Sections (present one at a time):**
121
+
122
+ 1. **Overview** - What we're building, high-level approach
123
+ 2. **Components** - Key pieces and responsibilities
124
+ 3. **Data Flow** - How data moves through system
125
+ 4. **Edge Cases** - Error handling, boundaries
126
+ 5. **Out of Scope** - What we're explicitly NOT doing
127
+
128
+ **Exit Criteria:**
129
+
130
+ - [ ] Each section validated by user
131
+ - [ ] Design is complete and coherent
132
+ - [ ] YAGNI applied (no speculative features)
133
+ - [ ] Ready to create spec
134
+
135
+ ---
136
+
137
+ ## Phase 5: SPEC
138
+
139
+ **Purpose:** Convert validated design into structured spec.
140
+
141
+ **Protocol:**
142
+
143
+ 1. Determine level (L0/L1/L2) using triage questions
144
+ 2. Create spec using appropriate template
145
+ 3. Commit the spec
146
+
147
+ **Triage:**
148
+
149
+ | Question | If Yes → |
150
+ | ---------------------------------------- | ---------------------------- |
151
+ | User-facing feature with business value? | **L2** → Feature Spec |
152
+ | Bug, improvement, internal, or refactor? | **L1** → Task Spec |
153
+ | Typo, config, or trivial change? | **L0** → Task Spec (minimal) |
154
+
155
+ **Exit Criteria:**
156
+
157
+ - [ ] Spec created in correct location
158
+ - [ ] L2: Test definitions created
159
+ - [ ] Spec committed to git
160
+
161
+ ---
162
+
163
+ ## Phase 6: HANDOFF
164
+
165
+ **Protocol:**
166
+
167
+ 1. Summarize what was created
168
+ 2. Ask: "Ready to start implementation with TDD?"
169
+ 3. If yes → Invoke enforcing-tdd skill
170
+
171
+ **Exit Criteria:**
172
+
173
+ - [ ] User confirmed spec is complete
174
+ - [ ] Handed off to enforcing-tdd (if continuing)
175
+
176
+ ---
177
+
178
+ ## Key Principles
179
+
180
+ | Principle | Why |
181
+ | ----------------------------- | --------------------------------------- |
182
+ | One question at a time | Prevents overwhelm, gets better answers |
183
+ | Multiple choice preferred | Faster to answer, reduces ambiguity |
184
+ | Alternatives before decisions | Avoids premature commitment |
185
+ | Incremental validation | Catches misunderstandings early |
186
+ | YAGNI ruthlessly | Scope creep kills projects |
187
+
188
+ ---
189
+
190
+ ## Anti-Patterns
191
+
192
+ | Don't | Do |
193
+ | ------------------------------ | -------------------------------- |
194
+ | Dump full design at once | Present in 200-300 word sections |
195
+ | Ask 5 questions in one message | Ask ONE question |
196
+ | Skip alternatives | Always present 2-3 options |
197
+ | Accept vague requirements | Probe until concrete |
198
+ | Add "nice to have" features | Put them in "Out of Scope" |
@@ -0,0 +1,202 @@
1
+ ---
2
+ description: Four-phase debugging framework that ensures root cause identification before fixes. Use when encountering bugs, test failures, unexpected behavior, or when previous fix attempts failed. Enforces investigate-first discipline ('debug this', 'fix this error', 'test is failing', 'not working').
3
+ alwaysApply: false
4
+ ---
5
+
6
+ # Systematic Debugger
7
+
8
+ Find root cause before fixing. Symptom fixes are failure.
9
+
10
+ **Iron Law:** NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
11
+
12
+ ## When to Use
13
+
14
+ Answer IN ORDER. Stop at first match:
15
+
16
+ 1. Bug, error, or test failure? → Use this skill
17
+ 2. Unexpected behavior? → Use this skill
18
+ 3. Previous fix didn't work? → Use this skill (especially important)
19
+ 4. Performance problem? → Use this skill
20
+ 5. None of above? → Skip this skill
21
+
22
+ **Use especially when:**
23
+
24
+ - Under time pressure (emergencies make guessing tempting)
25
+ - "Quick fix" seems obvious (red flag)
26
+ - Already tried 1+ fixes that didn't work
27
+
28
+ ## The Four Phases
29
+
30
+ Complete each phase before proceeding.
31
+
32
+ ### Phase 1: Root Cause Investigation
33
+
34
+ **BEFORE attempting ANY fix:**
35
+
36
+ **1. Read Error Messages Completely**
37
+
38
+ ```text
39
+ Don't skip past errors. They often contain the exact solution.
40
+ - Full stack trace (note line numbers, file paths)
41
+ - Error codes and messages
42
+ - Warnings that preceded the error
43
+ ```
44
+
45
+ **2. Reproduce Consistently**
46
+
47
+ | Can reproduce? | Action |
48
+ | --------------- | ---------------------------------------------------- |
49
+ | Yes, every time | Proceed to step 3 |
50
+ | Sometimes | Gather more data - when does it happen vs not? |
51
+ | Never | Cannot debug what you cannot reproduce - gather logs |
52
+
53
+ **3. Check Recent Changes**
54
+
55
+ ```bash
56
+ git diff HEAD~5 # Recent code changes
57
+ git log --oneline -10 # Recent commits
58
+ ```
59
+
60
+ What changed that could cause this? Dependencies? Config? Environment?
61
+
62
+ **4. Trace Data Flow (Root Cause Tracing)**
63
+
64
+ When error is deep in call stack:
65
+
66
+ ```text
67
+ Symptom: Error at line 50 in utils.js
68
+ ↑ Called by handler.js:120
69
+ ↑ Called by router.js:45
70
+ ↑ Called by app.js:10 ← ROOT CAUSE: bad input here
71
+ ```
72
+
73
+ **Technique:**
74
+
75
+ 1. Find where error occurs (symptom)
76
+ 2. Ask: "What called this with bad data?"
77
+ 3. Trace up until you find the SOURCE
78
+ 4. Fix at source, not at symptom
79
+
80
+ **5. Multi-Component Systems**
81
+
82
+ When system has multiple layers (API → service → database):
83
+
84
+ ```bash
85
+ # Log at EACH boundary before proposing fixes
86
+ echo "=== Layer 1 (API): request=$REQUEST ==="
87
+ echo "=== Layer 2 (Service): input=$INPUT ==="
88
+ echo "=== Layer 3 (DB): query=$QUERY ==="
89
+ ```
90
+
91
+ Run once to find WHERE it breaks. Then investigate that layer.
92
+
93
+ ### Phase 2: Pattern Analysis
94
+
95
+ **1. Find Working Examples**
96
+
97
+ Locate similar working code in same codebase. What works that's similar?
98
+
99
+ **2. Identify Differences**
100
+
101
+ | Working code | Broken code | Could this matter? |
102
+ | ---------------- | -------------- | ------------------ |
103
+ | Uses async/await | Uses callbacks | Yes - timing |
104
+ | Validates input | No validation | Yes - bad data |
105
+
106
+ List ALL differences. Don't assume "that can't matter."
107
+
108
+ ### Phase 3: Hypothesis Testing
109
+
110
+ **1. Form Single Hypothesis**
111
+
112
+ Write it down: "I think X is the root cause because Y"
113
+
114
+ Be specific:
115
+
116
+ - ❌ "Something's wrong with the database"
117
+ - ✅ "Connection pool exhausted because connections aren't released in error path"
118
+
119
+ **2. Test Minimally**
120
+
121
+ | Rule | Why |
122
+ | ------------------------ | ---------------------- |
123
+ | ONE change at a time | Isolate what works |
124
+ | Smallest possible change | Avoid side effects |
125
+ | Don't bundle fixes | Can't tell what helped |
126
+
127
+ **3. Evaluate Result**
128
+
129
+ | Result | Action |
130
+ | --------------- | --------------------------------------- |
131
+ | Fixed | Phase 4 (verify) |
132
+ | Not fixed | NEW hypothesis (return to 3.1) |
133
+ | Partially fixed | Found one issue, continue investigating |
134
+
135
+ ### Phase 4: Implementation
136
+
137
+ **1. Create Failing Test**
138
+
139
+ Before fixing, write test that fails due to the bug:
140
+
141
+ ```javascript
142
+ it('handles empty input without crashing', () => {
143
+ // This test should FAIL before fix, PASS after
144
+ expect(() => processData('')).not.toThrow();
145
+ });
146
+ ```
147
+
148
+ **2. Implement Fix**
149
+
150
+ - Address ROOT CAUSE identified in Phase 1
151
+ - ONE change
152
+ - No "while I'm here" improvements
153
+
154
+ **3. Verify**
155
+
156
+ - [ ] New test passes
157
+ - [ ] Existing tests still pass
158
+ - [ ] Issue actually resolved (not just test passing)
159
+
160
+ **4. If Fix Doesn't Work**
161
+
162
+ | Fix attempts | Action |
163
+ | ------------ | ---------------------------------------- |
164
+ | 1-2 | Return to Phase 1 with new information |
165
+ | 3+ | STOP - Question architecture (see below) |
166
+
167
+ **5. After 3+ Failed Fixes: Question Architecture**
168
+
169
+ Pattern indicating architectural problem:
170
+
171
+ - Each fix reveals new coupling/shared state
172
+ - Fixes require "massive refactoring"
173
+ - Each fix creates new symptoms elsewhere
174
+
175
+ **STOP and ask:**
176
+
177
+ - Is this pattern fundamentally sound?
178
+ - Should we refactor vs. continue patching?
179
+ - Discuss with user before more fix attempts
180
+
181
+ ## Red Flags - STOP Immediately
182
+
183
+ If you catch yourself thinking:
184
+
185
+ | Thought | Reality |
186
+ | ---------------------------------------------- | --------------------------------- |
187
+ | "Quick fix for now, investigate later" | Investigate NOW or you never will |
188
+ | "Just try changing X" | That's guessing, not debugging |
189
+ | "I'll add multiple fixes and test" | Can't isolate what worked |
190
+ | "I don't fully understand but this might work" | You need to understand first |
191
+ | "One more fix attempt" (after 2+ failures) | 3+ failures = wrong approach |
192
+
193
+ **ALL mean: STOP. Return to Phase 1.**
194
+
195
+ ## Quick Reference
196
+
197
+ | Phase | Key Question | Success Criteria |
198
+ | ----------------- | ------------------------------------- | ---------------------------------- |
199
+ | 1. Root Cause | "WHY is this happening?" | Understand cause, not just symptom |
200
+ | 2. Pattern | "What's different from working code?" | Identified key differences |
201
+ | 3. Hypothesis | "Is my theory correct?" | Confirmed or formed new theory |
202
+ | 4. Implementation | "Does the fix work?" | Test passes, issue resolved |
@@ -0,0 +1,207 @@
1
+ ---
2
+ description: Use when implementing features, fixing bugs, or making code changes. Ensures scope is defined before coding, then enforces RED → GREEN → REFACTOR test discipline. Triggers: 'implement', 'add', 'build', 'create', 'fix', 'change', 'feature', 'bug'.
3
+ alwaysApply: false
4
+ ---
5
+
6
+ # TDD Enforcer
7
+
8
+ Scope work before coding. Write tests before implementation.
9
+
10
+ **Iron Law:** NO IMPLEMENTATION UNTIL SCOPE IS DEFINED AND TEST FAILS
11
+
12
+ ## When to Use
13
+
14
+ Answer IN ORDER. Stop at first match:
15
+
16
+ 1. Implementing new feature? → Use this skill
17
+ 2. Fixing bug? → Use this skill
18
+ 3. Adding enhancement? → Use this skill
19
+ 4. Refactoring? → Use this skill
20
+ 5. Research/investigation only? → Skip this skill
21
+
22
+ ---
23
+
24
+ ## Phase 0: TRIAGE
25
+
26
+ **Purpose:** Determine work level and ensure scope exists.
27
+
28
+ ### Step 1: Identify Level
29
+
30
+ Answer IN ORDER. Stop at first match:
31
+
32
+ | Question | If Yes → |
33
+ | ---------------------------------------- | -------------- |
34
+ | User-facing feature with business value? | **L2 Feature** |
35
+ | Bug, improvement, internal, or refactor? | **L1 Task** |
36
+ | Typo, config, or trivial change? | **L0 Micro** |
37
+
38
+ ### Step 2: Check/Create Artifacts
39
+
40
+ | Level | Required Artifacts | Test Location |
41
+ | ------ | --------------------------------------------------------------- | ------------------------------- |
42
+ | **L2** | Feature Spec + Test Definitions (+ Design Doc if 3+ components) | `test-definitions/feature-*.md` |
43
+ | **L1** | Task Spec | Inline in spec |
44
+ | **L0** | Task Spec (minimal) | Existing tests |
45
+
46
+ **Locations:**
47
+
48
+ - Specs: `.safeword/planning/specs/`
49
+ - Test definitions: `.safeword/planning/test-definitions/`
50
+
51
+ ### Exit Criteria
52
+
53
+ - [ ] Level identified (L0/L1/L2)
54
+ - [ ] Spec exists with "Out of Scope" defined
55
+ - [ ] L2: Test definitions file exists
56
+ - [ ] L1: Test scenarios in spec
57
+ - [ ] L0: Existing test coverage confirmed
58
+
59
+ ---
60
+
61
+ ## Phase 1: RED
62
+
63
+ **Iron Law:** NO IMPLEMENTATION UNTIL TEST FAILS FOR THE RIGHT REASON
64
+
65
+ **Protocol:**
66
+
67
+ 1. Pick ONE test from spec (L1) or test definitions (L2)
68
+ 2. Write test code
69
+ 3. Run test
70
+ 4. Verify: fails because behavior missing (not syntax error)
71
+ 5. Commit: `test: [behavior]`
72
+
73
+ **For L0:** No new test needed. Confirm existing tests pass, then proceed to Phase 2.
74
+
75
+ **Exit Criteria:**
76
+
77
+ - [ ] Test written and executed
78
+ - [ ] Test fails for RIGHT reason (behavior missing)
79
+ - [ ] Committed: `test: [behavior]`
80
+
81
+ **Red Flags → STOP:**
82
+
83
+ | Flag | Action |
84
+ | ----------------------- | -------------------------------- |
85
+ | Test passes immediately | Rewrite - you're testing nothing |
86
+ | Syntax error | Fix syntax, not behavior |
87
+ | Wrote implementation | Delete it, return to test |
88
+ | Multiple tests | Pick ONE |
89
+
90
+ ---
91
+
92
+ ## Phase 2: GREEN
93
+
94
+ **Iron Law:** ONLY WRITE CODE THE TEST REQUIRES
95
+
96
+ **Protocol:**
97
+
98
+ 1. Write minimal code to pass test
99
+ 2. Run test → verify pass
100
+ 3. Commit: `feat:` or `fix:`
101
+
102
+ **Exit Criteria:**
103
+
104
+ - [ ] Test passes
105
+ - [ ] No extra code
106
+ - [ ] No hardcoded/mock values
107
+ - [ ] Committed
108
+
109
+ ### Verification Gate
110
+
111
+ **Before claiming GREEN:** Evidence before claims, always.
112
+
113
+ ```text
114
+ ✅ CORRECT ❌ WRONG
115
+ ───────────────────────────────── ─────────────────────────────────
116
+ Run: npm test "Tests should pass now"
117
+ Output: ✓ 34/34 tests pass "I'm confident this works"
118
+ Claim: "All tests pass" "Tests pass" (no output shown)
119
+ ```
120
+
121
+ **The Rule:** If you haven't run the verification command in this response, you cannot claim it passes.
122
+
123
+ **Red Flags → STOP:**
124
+
125
+ | Flag | Action |
126
+ | --------------------------- | -------------------------------------- |
127
+ | "should", "probably" claims | Run command, show output first |
128
+ | "Done!" before verification | Run command, show output first |
129
+ | "Just in case" code | Delete it |
130
+ | Multiple functions | Delete extras |
131
+ | Refactoring | Stop - that's Phase 3 |
132
+ | Test still fails | Debug (→ debugging skill if stuck) |
133
+ | Hardcoded value | Implement real logic (see below) |
134
+
135
+ ### Anti-Pattern: Mock Implementations
136
+
137
+ LLMs sometimes hardcode values to pass tests. This is not TDD.
138
+
139
+ ```typescript
140
+ // ❌ BAD - Hardcoded to pass test
141
+ function calculateDiscount(amount, tier) {
142
+ return 80; // Passes test but isn't real
143
+ }
144
+
145
+ // ✅ GOOD - Actual logic
146
+ function calculateDiscount(amount, tier) {
147
+ if (tier === 'VIP') return amount * 0.8;
148
+ return amount;
149
+ }
150
+ ```
151
+
152
+ Fix mocks immediately. The next test cycle will catch them, but they're technical debt.
153
+
154
+ ---
155
+
156
+ ## Phase 3: REFACTOR
157
+
158
+ **Protocol:**
159
+
160
+ 1. Tests pass before changes
161
+ 2. Improve code (rename, extract, dedupe)
162
+ 3. Tests pass after changes
163
+ 4. Commit if changed: `refactor: [improvement]`
164
+
165
+ **Exit Criteria:**
166
+
167
+ - [ ] Tests still pass
168
+ - [ ] Code cleaner (or no changes needed)
169
+ - [ ] Committed (if changed)
170
+
171
+ **NOT Allowed:** New behavior, changing assertions, adding tests.
172
+
173
+ ---
174
+
175
+ ## Phase 4: ITERATE
176
+
177
+ ```text
178
+ More tests in spec/test-definitions?
179
+ ├─ Yes → Return to Phase 1
180
+ └─ No → All "Done When" / AC checked?
181
+ ├─ Yes → Complete
182
+ └─ No → Update spec, return to Phase 0
183
+ ```
184
+
185
+ For L2: Update test definition status (✅/⏭️/❌/🔴) as tests pass.
186
+
187
+ ---
188
+
189
+ ## Quick Reference
190
+
191
+ | Phase | Key Question | Gate |
192
+ | ----------- | -------------------------------- | ----------------------------- |
193
+ | 0. TRIAGE | What level? Is scope defined? | Spec exists with boundaries |
194
+ | 1. RED | Does test fail for right reason? | Test fails (behavior missing) |
195
+ | 2. GREEN | Does minimal code pass? | Test passes, no extras |
196
+ | 3. REFACTOR | Is code clean? | Tests still pass |
197
+ | 4. ITERATE | More tests? | All done → complete |
198
+
199
+ ---
200
+
201
+ ## Integration
202
+
203
+ | Scenario | Handoff |
204
+ | ----------------------- | ----------------- |
205
+ | Test fails unexpectedly | → debugging skill |
206
+ | Review needed | → quality-reviewer |
207
+ | Scope expanding | → Update spec first |