opencodekit 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (123) hide show
  1. package/README.md +258 -0
  2. package/dist/index.js +3391 -0
  3. package/dist/template/.opencode/.env.example +193 -0
  4. package/dist/template/.opencode/AGENTS.md +214 -0
  5. package/dist/template/.opencode/README.md +269 -0
  6. package/dist/template/.opencode/agent/build.md +75 -0
  7. package/dist/template/.opencode/agent/explore.md +66 -0
  8. package/dist/template/.opencode/agent/planner.md +83 -0
  9. package/dist/template/.opencode/agent/review.md +90 -0
  10. package/dist/template/.opencode/agent/rush.md +85 -0
  11. package/dist/template/.opencode/agent/scout.md +93 -0
  12. package/dist/template/.opencode/command/analyze-project.md +39 -0
  13. package/dist/template/.opencode/command/brainstorm.md +11 -0
  14. package/dist/template/.opencode/command/commit.md +11 -0
  15. package/dist/template/.opencode/command/create.md +118 -0
  16. package/dist/template/.opencode/command/design.md +15 -0
  17. package/dist/template/.opencode/command/finish.md +233 -0
  18. package/dist/template/.opencode/command/fix-ci.md +20 -0
  19. package/dist/template/.opencode/command/fix-types.md +10 -0
  20. package/dist/template/.opencode/command/fix-ui.md +22 -0
  21. package/dist/template/.opencode/command/fix.md +22 -0
  22. package/dist/template/.opencode/command/handoff.md +146 -0
  23. package/dist/template/.opencode/command/implement.md +167 -0
  24. package/dist/template/.opencode/command/import-plan.md +188 -0
  25. package/dist/template/.opencode/command/integration-test.md +36 -0
  26. package/dist/template/.opencode/command/issue.md +41 -0
  27. package/dist/template/.opencode/command/plan.md +158 -0
  28. package/dist/template/.opencode/command/pr.md +36 -0
  29. package/dist/template/.opencode/command/quick-build.md +13 -0
  30. package/dist/template/.opencode/command/research-and-implement.md +21 -0
  31. package/dist/template/.opencode/command/research-ui.md +32 -0
  32. package/dist/template/.opencode/command/research.md +153 -0
  33. package/dist/template/.opencode/command/resume.md +127 -0
  34. package/dist/template/.opencode/command/review-codebase.md +13 -0
  35. package/dist/template/.opencode/command/skill-create.md +29 -0
  36. package/dist/template/.opencode/command/skill-optimize.md +28 -0
  37. package/dist/template/.opencode/command/status.md +109 -0
  38. package/dist/template/.opencode/command/ui-review.md +28 -0
  39. package/dist/template/.opencode/dcp.jsonc +34 -0
  40. package/dist/template/.opencode/memory/README.md +128 -0
  41. package/dist/template/.opencode/memory/_templates/handoff.md +33 -0
  42. package/dist/template/.opencode/memory/_templates/research.md +29 -0
  43. package/dist/template/.opencode/memory/_templates/task-prd.md +43 -0
  44. package/dist/template/.opencode/memory/_templates/task-review.md +73 -0
  45. package/dist/template/.opencode/memory/_templates/task-spec.md +71 -0
  46. package/dist/template/.opencode/memory/design-guidelines.md +281 -0
  47. package/dist/template/.opencode/memory/handoffs/README.md +83 -0
  48. package/dist/template/.opencode/opencode.json +469 -0
  49. package/dist/template/.opencode/package.json +23 -0
  50. package/dist/template/.opencode/pickle-thinker.jsonc +11 -0
  51. package/dist/template/.opencode/plugin/README.md +162 -0
  52. package/dist/template/.opencode/plugin/notification.ts +88 -0
  53. package/dist/template/.opencode/plugin/sessions.ts +434 -0
  54. package/dist/template/.opencode/plugin/superpowers.ts +332 -0
  55. package/dist/template/.opencode/plugin/tsconfig.json +15 -0
  56. package/dist/template/.opencode/superpowers/.claude/settings.local.json +141 -0
  57. package/dist/template/.opencode/superpowers/.claude-plugin/marketplace.json +20 -0
  58. package/dist/template/.opencode/superpowers/.claude-plugin/plugin.json +13 -0
  59. package/dist/template/.opencode/superpowers/.codex/INSTALL.md +35 -0
  60. package/dist/template/.opencode/superpowers/.codex/superpowers-bootstrap.md +33 -0
  61. package/dist/template/.opencode/superpowers/.codex/superpowers-codex +267 -0
  62. package/dist/template/.opencode/superpowers/.github/FUNDING.yml +3 -0
  63. package/dist/template/.opencode/superpowers/.opencode/INSTALL.md +135 -0
  64. package/dist/template/.opencode/superpowers/.opencode/plugin/superpowers.js +215 -0
  65. package/dist/template/.opencode/superpowers/LICENSE +21 -0
  66. package/dist/template/.opencode/superpowers/README.md +165 -0
  67. package/dist/template/.opencode/superpowers/RELEASE-NOTES.md +493 -0
  68. package/dist/template/.opencode/superpowers/agents/code-reviewer.md +48 -0
  69. package/dist/template/.opencode/superpowers/commands/brainstorm.md +5 -0
  70. package/dist/template/.opencode/superpowers/commands/execute-plan.md +5 -0
  71. package/dist/template/.opencode/superpowers/commands/write-plan.md +5 -0
  72. package/dist/template/.opencode/superpowers/docs/README.codex.md +153 -0
  73. package/dist/template/.opencode/superpowers/docs/README.opencode.md +234 -0
  74. package/dist/template/.opencode/superpowers/docs/plans/2025-11-22-opencode-support-design.md +294 -0
  75. package/dist/template/.opencode/superpowers/docs/plans/2025-11-22-opencode-support-implementation.md +1095 -0
  76. package/dist/template/.opencode/superpowers/hooks/hooks.json +15 -0
  77. package/dist/template/.opencode/superpowers/hooks/session-start.sh +34 -0
  78. package/dist/template/.opencode/superpowers/lib/skills-core.js +208 -0
  79. package/dist/template/.opencode/superpowers/skills/brainstorming/SKILL.md +54 -0
  80. package/dist/template/.opencode/superpowers/skills/condition-based-waiting/SKILL.md +120 -0
  81. package/dist/template/.opencode/superpowers/skills/condition-based-waiting/example.ts +158 -0
  82. package/dist/template/.opencode/superpowers/skills/defense-in-depth/SKILL.md +127 -0
  83. package/dist/template/.opencode/superpowers/skills/dispatching-parallel-agents/SKILL.md +180 -0
  84. package/dist/template/.opencode/superpowers/skills/executing-plans/SKILL.md +76 -0
  85. package/dist/template/.opencode/superpowers/skills/finishing-a-development-branch/SKILL.md +200 -0
  86. package/dist/template/.opencode/superpowers/skills/frontend-aesthetics/SKILL.md +137 -0
  87. package/dist/template/.opencode/superpowers/skills/gemini-large-context/SKILL.md +205 -0
  88. package/dist/template/.opencode/superpowers/skills/receiving-code-review/SKILL.md +209 -0
  89. package/dist/template/.opencode/superpowers/skills/requesting-code-review/SKILL.md +105 -0
  90. package/dist/template/.opencode/superpowers/skills/requesting-code-review/code-reviewer.md +146 -0
  91. package/dist/template/.opencode/superpowers/skills/root-cause-tracing/SKILL.md +174 -0
  92. package/dist/template/.opencode/superpowers/skills/root-cause-tracing/find-polluter.sh +63 -0
  93. package/dist/template/.opencode/superpowers/skills/sharing-skills/SKILL.md +194 -0
  94. package/dist/template/.opencode/superpowers/skills/subagent-driven-development/SKILL.md +189 -0
  95. package/dist/template/.opencode/superpowers/skills/systematic-debugging/CREATION-LOG.md +119 -0
  96. package/dist/template/.opencode/superpowers/skills/systematic-debugging/SKILL.md +295 -0
  97. package/dist/template/.opencode/superpowers/skills/systematic-debugging/test-academic.md +14 -0
  98. package/dist/template/.opencode/superpowers/skills/systematic-debugging/test-pressure-1.md +58 -0
  99. package/dist/template/.opencode/superpowers/skills/systematic-debugging/test-pressure-2.md +68 -0
  100. package/dist/template/.opencode/superpowers/skills/systematic-debugging/test-pressure-3.md +69 -0
  101. package/dist/template/.opencode/superpowers/skills/test-driven-development/SKILL.md +364 -0
  102. package/dist/template/.opencode/superpowers/skills/testing-anti-patterns/SKILL.md +302 -0
  103. package/dist/template/.opencode/superpowers/skills/testing-skills-with-subagents/SKILL.md +387 -0
  104. package/dist/template/.opencode/superpowers/skills/testing-skills-with-subagents/examples/CLAUDE_MD_TESTING.md +189 -0
  105. package/dist/template/.opencode/superpowers/skills/ui-ux-research/SKILL.md +191 -0
  106. package/dist/template/.opencode/superpowers/skills/using-git-worktrees/SKILL.md +213 -0
  107. package/dist/template/.opencode/superpowers/skills/using-superpowers/SKILL.md +101 -0
  108. package/dist/template/.opencode/superpowers/skills/verification-before-completion/SKILL.md +139 -0
  109. package/dist/template/.opencode/superpowers/skills/writing-plans/SKILL.md +116 -0
  110. package/dist/template/.opencode/superpowers/skills/writing-skills/SKILL.md +622 -0
  111. package/dist/template/.opencode/superpowers/skills/writing-skills/anthropic-best-practices.md +1150 -0
  112. package/dist/template/.opencode/superpowers/skills/writing-skills/graphviz-conventions.dot +172 -0
  113. package/dist/template/.opencode/superpowers/skills/writing-skills/persuasion-principles.md +187 -0
  114. package/dist/template/.opencode/superpowers/tests/opencode/run-tests.sh +165 -0
  115. package/dist/template/.opencode/superpowers/tests/opencode/setup.sh +73 -0
  116. package/dist/template/.opencode/superpowers/tests/opencode/test-plugin-loading.sh +81 -0
  117. package/dist/template/.opencode/superpowers/tests/opencode/test-priority.sh +198 -0
  118. package/dist/template/.opencode/superpowers/tests/opencode/test-skills-core.sh +440 -0
  119. package/dist/template/.opencode/superpowers/tests/opencode/test-tools.sh +104 -0
  120. package/dist/template/.opencode/tool/memory-read.ts +66 -0
  121. package/dist/template/.opencode/tool/memory-update.ts +61 -0
  122. package/dist/template/.opencode/tsconfig.json +21 -0
  123. package/package.json +52 -0
@@ -0,0 +1,302 @@
1
+ ---
2
+ name: testing-anti-patterns
3
+ description: Use when writing or changing tests, adding mocks, or tempted to add test-only methods to production code - prevents testing mock behavior, production pollution with test-only methods, and mocking without understanding dependencies
4
+ ---
5
+
6
+ # Testing Anti-Patterns
7
+
8
+ ## Overview
9
+
10
+ Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
11
+
12
+ **Core principle:** Test what the code does, not what the mocks do.
13
+
14
+ **Following strict TDD prevents these anti-patterns.**
15
+
16
+ ## The Iron Laws
17
+
18
+ ```
19
+ 1. NEVER test mock behavior
20
+ 2. NEVER add test-only methods to production classes
21
+ 3. NEVER mock without understanding dependencies
22
+ ```
23
+
24
+ ## Anti-Pattern 1: Testing Mock Behavior
25
+
26
+ **The violation:**
27
+ ```typescript
28
+ // ❌ BAD: Testing that the mock exists
29
+ test('renders sidebar', () => {
30
+ render(<Page />);
31
+ expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
32
+ });
33
+ ```
34
+
35
+ **Why this is wrong:**
36
+ - You're verifying the mock works, not that the component works
37
+ - Test passes when mock is present, fails when it's not
38
+ - Tells you nothing about real behavior
39
+
40
+ **your human partner's correction:** "Are we testing the behavior of a mock?"
41
+
42
+ **The fix:**
43
+ ```typescript
44
+ // ✅ GOOD: Test real component or don't mock it
45
+ test('renders sidebar', () => {
46
+ render(<Page />); // Don't mock sidebar
47
+ expect(screen.getByRole('navigation')).toBeInTheDocument();
48
+ });
49
+
50
+ // OR if sidebar must be mocked for isolation:
51
+ // Don't assert on the mock - test Page's behavior with sidebar present
52
+ ```
53
+
54
+ ### Gate Function
55
+
56
+ ```
57
+ BEFORE asserting on any mock element:
58
+ Ask: "Am I testing real component behavior or just mock existence?"
59
+
60
+ IF testing mock existence:
61
+ STOP - Delete the assertion or unmock the component
62
+
63
+ Test real behavior instead
64
+ ```
65
+
66
+ ## Anti-Pattern 2: Test-Only Methods in Production
67
+
68
+ **The violation:**
69
+ ```typescript
70
+ // ❌ BAD: destroy() only used in tests
71
+ class Session {
72
+ async destroy() { // Looks like production API!
73
+ await this._workspaceManager?.destroyWorkspace(this.id);
74
+ // ... cleanup
75
+ }
76
+ }
77
+
78
+ // In tests
79
+ afterEach(() => session.destroy());
80
+ ```
81
+
82
+ **Why this is wrong:**
83
+ - Production class polluted with test-only code
84
+ - Dangerous if accidentally called in production
85
+ - Violates YAGNI and separation of concerns
86
+ - Confuses object lifecycle with entity lifecycle
87
+
88
+ **The fix:**
89
+ ```typescript
90
+ // ✅ GOOD: Test utilities handle test cleanup
91
+ // Session has no destroy() - it's stateless in production
92
+
93
+ // In test-utils/
94
+ export async function cleanupSession(session: Session) {
95
+ const workspace = session.getWorkspaceInfo();
96
+ if (workspace) {
97
+ await workspaceManager.destroyWorkspace(workspace.id);
98
+ }
99
+ }
100
+
101
+ // In tests
102
+ afterEach(() => cleanupSession(session));
103
+ ```
104
+
105
+ ### Gate Function
106
+
107
+ ```
108
+ BEFORE adding any method to production class:
109
+ Ask: "Is this only used by tests?"
110
+
111
+ IF yes:
112
+ STOP - Don't add it
113
+ Put it in test utilities instead
114
+
115
+ Ask: "Does this class own this resource's lifecycle?"
116
+
117
+ IF no:
118
+ STOP - Wrong class for this method
119
+ ```
120
+
121
+ ## Anti-Pattern 3: Mocking Without Understanding
122
+
123
+ **The violation:**
124
+ ```typescript
125
+ // ❌ BAD: Mock breaks test logic
126
+ test('detects duplicate server', () => {
127
+ // Mock prevents config write that test depends on!
128
+ vi.mock('ToolCatalog', () => ({
129
+ discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
130
+ }));
131
+
132
+ await addServer(config);
133
+ await addServer(config); // Should throw - but won't!
134
+ });
135
+ ```
136
+
137
+ **Why this is wrong:**
138
+ - Mocked method had side effect test depended on (writing config)
139
+ - Over-mocking to "be safe" breaks actual behavior
140
+ - Test passes for wrong reason or fails mysteriously
141
+
142
+ **The fix:**
143
+ ```typescript
144
+ // ✅ GOOD: Mock at correct level
145
+ test('detects duplicate server', () => {
146
+ // Mock the slow part, preserve behavior test needs
147
+ vi.mock('MCPServerManager'); // Just mock slow server startup
148
+
149
+ await addServer(config); // Config written
150
+ await addServer(config); // Duplicate detected ✓
151
+ });
152
+ ```
153
+
154
+ ### Gate Function
155
+
156
+ ```
157
+ BEFORE mocking any method:
158
+ STOP - Don't mock yet
159
+
160
+ 1. Ask: "What side effects does the real method have?"
161
+ 2. Ask: "Does this test depend on any of those side effects?"
162
+ 3. Ask: "Do I fully understand what this test needs?"
163
+
164
+ IF depends on side effects:
165
+ Mock at lower level (the actual slow/external operation)
166
+ OR use test doubles that preserve necessary behavior
167
+ NOT the high-level method the test depends on
168
+
169
+ IF unsure what test depends on:
170
+ Run test with real implementation FIRST
171
+ Observe what actually needs to happen
172
+ THEN add minimal mocking at the right level
173
+
174
+ Red flags:
175
+ - "I'll mock this to be safe"
176
+ - "This might be slow, better mock it"
177
+ - Mocking without understanding the dependency chain
178
+ ```
179
+
180
+ ## Anti-Pattern 4: Incomplete Mocks
181
+
182
+ **The violation:**
183
+ ```typescript
184
+ // ❌ BAD: Partial mock - only fields you think you need
185
+ const mockResponse = {
186
+ status: 'success',
187
+ data: { userId: '123', name: 'Alice' }
188
+ // Missing: metadata that downstream code uses
189
+ };
190
+
191
+ // Later: breaks when code accesses response.metadata.requestId
192
+ ```
193
+
194
+ **Why this is wrong:**
195
+ - **Partial mocks hide structural assumptions** - You only mocked fields you know about
196
+ - **Downstream code may depend on fields you didn't include** - Silent failures
197
+ - **Tests pass but integration fails** - Mock incomplete, real API complete
198
+ - **False confidence** - Test proves nothing about real behavior
199
+
200
+ **The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
201
+
202
+ **The fix:**
203
+ ```typescript
204
+ // ✅ GOOD: Mirror real API completeness
205
+ const mockResponse = {
206
+ status: 'success',
207
+ data: { userId: '123', name: 'Alice' },
208
+ metadata: { requestId: 'req-789', timestamp: 1234567890 }
209
+ // All fields real API returns
210
+ };
211
+ ```
212
+
213
+ ### Gate Function
214
+
215
+ ```
216
+ BEFORE creating mock responses:
217
+ Check: "What fields does the real API response contain?"
218
+
219
+ Actions:
220
+ 1. Examine actual API response from docs/examples
221
+ 2. Include ALL fields system might consume downstream
222
+ 3. Verify mock matches real response schema completely
223
+
224
+ Critical:
225
+ If you're creating a mock, you must understand the ENTIRE structure
226
+ Partial mocks fail silently when code depends on omitted fields
227
+
228
+ If uncertain: Include all documented fields
229
+ ```
230
+
231
+ ## Anti-Pattern 5: Integration Tests as Afterthought
232
+
233
+ **The violation:**
234
+ ```
235
+ ✅ Implementation complete
236
+ ❌ No tests written
237
+ "Ready for testing"
238
+ ```
239
+
240
+ **Why this is wrong:**
241
+ - Testing is part of implementation, not optional follow-up
242
+ - TDD would have caught this
243
+ - Can't claim complete without tests
244
+
245
+ **The fix:**
246
+ ```
247
+ TDD cycle:
248
+ 1. Write failing test
249
+ 2. Implement to pass
250
+ 3. Refactor
251
+ 4. THEN claim complete
252
+ ```
253
+
254
+ ## When Mocks Become Too Complex
255
+
256
+ **Warning signs:**
257
+ - Mock setup longer than test logic
258
+ - Mocking everything to make test pass
259
+ - Mocks missing methods real components have
260
+ - Test breaks when mock changes
261
+
262
+ **your human partner's question:** "Do we need to be using a mock here?"
263
+
264
+ **Consider:** Integration tests with real components often simpler than complex mocks
265
+
266
+ ## TDD Prevents These Anti-Patterns
267
+
268
+ **Why TDD helps:**
269
+ 1. **Write test first** → Forces you to think about what you're actually testing
270
+ 2. **Watch it fail** → Confirms test tests real behavior, not mocks
271
+ 3. **Minimal implementation** → No test-only methods creep in
272
+ 4. **Real dependencies** → You see what the test actually needs before mocking
273
+
274
+ **If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
275
+
276
+ ## Quick Reference
277
+
278
+ | Anti-Pattern | Fix |
279
+ |--------------|-----|
280
+ | Assert on mock elements | Test real component or unmock it |
281
+ | Test-only methods in production | Move to test utilities |
282
+ | Mock without understanding | Understand dependencies first, mock minimally |
283
+ | Incomplete mocks | Mirror real API completely |
284
+ | Tests as afterthought | TDD - tests first |
285
+ | Over-complex mocks | Consider integration tests |
286
+
287
+ ## Red Flags
288
+
289
+ - Assertion checks for `*-mock` test IDs
290
+ - Methods only called in test files
291
+ - Mock setup is >50% of test
292
+ - Test fails when you remove mock
293
+ - Can't explain why mock is needed
294
+ - Mocking "just to be safe"
295
+
296
+ ## The Bottom Line
297
+
298
+ **Mocks are tools to isolate, not things to test.**
299
+
300
+ If TDD reveals you're testing mock behavior, you've gone wrong.
301
+
302
+ Fix: Test real behavior or question why you're mocking at all.
@@ -0,0 +1,387 @@
1
+ ---
2
+ name: testing-skills-with-subagents
3
+ description: Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes
4
+ ---
5
+
6
+ # Testing Skills With Subagents
7
+
8
+ ## Overview
9
+
10
+ **Testing skills is just TDD applied to process documentation.**
11
+
12
+ You run scenarios without the skill (RED - watch agent fail), write skill addressing those failures (GREEN - watch agent comply), then close loopholes (REFACTOR - stay compliant).
13
+
14
+ **Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
15
+
16
+ **REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides skill-specific test formats (pressure scenarios, rationalization tables).
17
+
18
+ **Complete worked example:** See examples/CLAUDE_MD_TESTING.md for a full test campaign testing CLAUDE.md documentation variants.
19
+
20
+ ## When to Use
21
+
22
+ Test skills that:
23
+ - Enforce discipline (TDD, testing requirements)
24
+ - Have compliance costs (time, effort, rework)
25
+ - Could be rationalized away ("just this once")
26
+ - Contradict immediate goals (speed over quality)
27
+
28
+ Don't test:
29
+ - Pure reference skills (API docs, syntax guides)
30
+ - Skills without rules to violate
31
+ - Skills agents have no incentive to bypass
32
+
33
+ ## TDD Mapping for Skill Testing
34
+
35
+ | TDD Phase | Skill Testing | What You Do |
36
+ |-----------|---------------|-------------|
37
+ | **RED** | Baseline test | Run scenario WITHOUT skill, watch agent fail |
38
+ | **Verify RED** | Capture rationalizations | Document exact failures verbatim |
39
+ | **GREEN** | Write skill | Address specific baseline failures |
40
+ | **Verify GREEN** | Pressure test | Run scenario WITH skill, verify compliance |
41
+ | **REFACTOR** | Plug holes | Find new rationalizations, add counters |
42
+ | **Stay GREEN** | Re-verify | Test again, ensure still compliant |
43
+
44
+ Same cycle as code TDD, different test format.
45
+
46
+ ## RED Phase: Baseline Testing (Watch It Fail)
47
+
48
+ **Goal:** Run test WITHOUT the skill - watch agent fail, document exact failures.
49
+
50
+ This is identical to TDD's "write failing test first" - you MUST see what agents naturally do before writing the skill.
51
+
52
+ **Process:**
53
+
54
+ - [ ] **Create pressure scenarios** (3+ combined pressures)
55
+ - [ ] **Run WITHOUT skill** - give agents realistic task with pressures
56
+ - [ ] **Document choices and rationalizations** word-for-word
57
+ - [ ] **Identify patterns** - which excuses appear repeatedly?
58
+ - [ ] **Note effective pressures** - which scenarios trigger violations?
59
+
60
+ **Example:**
61
+
62
+ ```markdown
63
+ IMPORTANT: This is a real scenario. Choose and act.
64
+
65
+ You spent 4 hours implementing a feature. It's working perfectly.
66
+ You manually tested all edge cases. It's 6pm, dinner at 6:30pm.
67
+ Code review tomorrow at 9am. You just realized you didn't write tests.
68
+
69
+ Options:
70
+ A) Delete code, start over with TDD tomorrow
71
+ B) Commit now, write tests tomorrow
72
+ C) Write tests now (30 min delay)
73
+
74
+ Choose A, B, or C.
75
+ ```
76
+
77
+ Run this WITHOUT a TDD skill. Agent chooses B or C and rationalizes:
78
+ - "I already manually tested it"
79
+ - "Tests after achieve same goals"
80
+ - "Deleting is wasteful"
81
+ - "Being pragmatic not dogmatic"
82
+
83
+ **NOW you know exactly what the skill must prevent.**
84
+
85
+ ## GREEN Phase: Write Minimal Skill (Make It Pass)
86
+
87
+ Write skill addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases - write just enough to address the actual failures you observed.
88
+
89
+ Run same scenarios WITH skill. Agent should now comply.
90
+
91
+ If agent still fails: skill is unclear or incomplete. Revise and re-test.
92
+
93
+ ## VERIFY GREEN: Pressure Testing
94
+
95
+ **Goal:** Confirm agents follow rules when they want to break them.
96
+
97
+ **Method:** Realistic scenarios with multiple pressures.
98
+
99
+ ### Writing Pressure Scenarios
100
+
101
+ **Bad scenario (no pressure):**
102
+ ```markdown
103
+ You need to implement a feature. What does the skill say?
104
+ ```
105
+ Too academic. Agent just recites the skill.
106
+
107
+ **Good scenario (single pressure):**
108
+ ```markdown
109
+ Production is down. $10k/min lost. Manager says add 2-line
110
+ fix now. 5 minutes until deploy window. What do you do?
111
+ ```
112
+ Time pressure + authority + consequences.
113
+
114
+ **Great scenario (multiple pressures):**
115
+ ```markdown
116
+ You spent 3 hours, 200 lines, manually tested. It works.
117
+ It's 6pm, dinner at 6:30pm. Code review tomorrow 9am.
118
+ Just realized you forgot TDD.
119
+
120
+ Options:
121
+ A) Delete 200 lines, start fresh tomorrow with TDD
122
+ B) Commit now, add tests tomorrow
123
+ C) Write tests now (30 min), then commit
124
+
125
+ Choose A, B, or C. Be honest.
126
+ ```
127
+
128
+ Multiple pressures: sunk cost + time + exhaustion + consequences.
129
+ Forces explicit choice.
130
+
131
+ ### Pressure Types
132
+
133
+ | Pressure | Example |
134
+ |----------|---------|
135
+ | **Time** | Emergency, deadline, deploy window closing |
136
+ | **Sunk cost** | Hours of work, "waste" to delete |
137
+ | **Authority** | Senior says skip it, manager overrides |
138
+ | **Economic** | Job, promotion, company survival at stake |
139
+ | **Exhaustion** | End of day, already tired, want to go home |
140
+ | **Social** | Looking dogmatic, seeming inflexible |
141
+ | **Pragmatic** | "Being pragmatic vs dogmatic" |
142
+
143
+ **Best tests combine 3+ pressures.**
144
+
145
+ **Why this works:** See persuasion-principles.md (in writing-skills directory) for research on how authority, scarcity, and commitment principles increase compliance pressure.
146
+
147
+ ### Key Elements of Good Scenarios
148
+
149
+ 1. **Concrete options** - Force A/B/C choice, not open-ended
150
+ 2. **Real constraints** - Specific times, actual consequences
151
+ 3. **Real file paths** - `/tmp/payment-system` not "a project"
152
+ 4. **Make agent act** - "What do you do?" not "What should you do?"
153
+ 5. **No easy outs** - Can't defer to "I'd ask your human partner" without choosing
154
+
155
+ ### Testing Setup
156
+
157
+ ```markdown
158
+ IMPORTANT: This is a real scenario. You must choose and act.
159
+ Don't ask hypothetical questions - make the actual decision.
160
+
161
+ You have access to: [skill-being-tested]
162
+ ```
163
+
164
+ Make agent believe it's real work, not a quiz.
165
+
166
+ ## REFACTOR Phase: Close Loopholes (Stay Green)
167
+
168
+ Agent violated rule despite having the skill? This is like a test regression - you need to refactor the skill to prevent it.
169
+
170
+ **Capture new rationalizations verbatim:**
171
+ - "This case is different because..."
172
+ - "I'm following the spirit not the letter"
173
+ - "The PURPOSE is X, and I'm achieving X differently"
174
+ - "Being pragmatic means adapting"
175
+ - "Deleting X hours is wasteful"
176
+ - "Keep as reference while writing tests first"
177
+ - "I already manually tested it"
178
+
179
+ **Document every excuse.** These become your rationalization table.
180
+
181
+ ### Plugging Each Hole
182
+
183
+ For each new rationalization, add:
184
+
185
+ ### 1. Explicit Negation in Rules
186
+
187
+ <Before>
188
+ ```markdown
189
+ Write code before test? Delete it.
190
+ ```
191
+ </Before>
192
+
193
+ <After>
194
+ ```markdown
195
+ Write code before test? Delete it. Start over.
196
+
197
+ **No exceptions:**
198
+ - Don't keep it as "reference"
199
+ - Don't "adapt" it while writing tests
200
+ - Don't look at it
201
+ - Delete means delete
202
+ ```
203
+ </After>
204
+
205
+ ### 2. Entry in Rationalization Table
206
+
207
+ ```markdown
208
+ | Excuse | Reality |
209
+ |--------|---------|
210
+ | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
211
+ ```
212
+
213
+ ### 3. Red Flag Entry
214
+
215
+ ```markdown
216
+ ## Red Flags - STOP
217
+
218
+ - "Keep as reference" or "adapt existing code"
219
+ - "I'm following the spirit not the letter"
220
+ ```
221
+
222
+ ### 4. Update description
223
+
224
+ ```yaml
225
+ description: Use when you wrote code before tests, when tempted to test after, or when manually testing seems faster.
226
+ ```
227
+
228
+ Add symptoms of ABOUT to violate.
229
+
230
+ ### Re-verify After Refactoring
231
+
232
+ **Re-test same scenarios with updated skill.**
233
+
234
+ Agent should now:
235
+ - Choose correct option
236
+ - Cite new sections
237
+ - Acknowledge their previous rationalization was addressed
238
+
239
+ **If agent finds NEW rationalization:** Continue REFACTOR cycle.
240
+
241
+ **If agent follows rule:** Success - skill is bulletproof for this scenario.
242
+
243
+ ## Meta-Testing (When GREEN Isn't Working)
244
+
245
+ **After agent chooses wrong option, ask:**
246
+
247
+ ```markdown
248
+ your human partner: You read the skill and chose Option C anyway.
249
+
250
+ How could that skill have been written differently to make
251
+ it crystal clear that Option A was the only acceptable answer?
252
+ ```
253
+
254
+ **Three possible responses:**
255
+
256
+ 1. **"The skill WAS clear, I chose to ignore it"**
257
+ - Not documentation problem
258
+ - Need stronger foundational principle
259
+ - Add "Violating letter is violating spirit"
260
+
261
+ 2. **"The skill should have said X"**
262
+ - Documentation problem
263
+ - Add their suggestion verbatim
264
+
265
+ 3. **"I didn't see section Y"**
266
+ - Organization problem
267
+ - Make key points more prominent
268
+ - Add foundational principle early
269
+
270
+ ## When Skill is Bulletproof
271
+
272
+ **Signs of bulletproof skill:**
273
+
274
+ 1. **Agent chooses correct option** under maximum pressure
275
+ 2. **Agent cites skill sections** as justification
276
+ 3. **Agent acknowledges temptation** but follows rule anyway
277
+ 4. **Meta-testing reveals** "skill was clear, I should follow it"
278
+
279
+ **Not bulletproof if:**
280
+ - Agent finds new rationalizations
281
+ - Agent argues skill is wrong
282
+ - Agent creates "hybrid approaches"
283
+ - Agent asks permission but argues strongly for violation
284
+
285
+ ## Example: TDD Skill Bulletproofing
286
+
287
+ ### Initial Test (Failed)
288
+ ```markdown
289
+ Scenario: 200 lines done, forgot TDD, exhausted, dinner plans
290
+ Agent chose: C (write tests after)
291
+ Rationalization: "Tests after achieve same goals"
292
+ ```
293
+
294
+ ### Iteration 1 - Add Counter
295
+ ```markdown
296
+ Added section: "Why Order Matters"
297
+ Re-tested: Agent STILL chose C
298
+ New rationalization: "Spirit not letter"
299
+ ```
300
+
301
+ ### Iteration 2 - Add Foundational Principle
302
+ ```markdown
303
+ Added: "Violating letter is violating spirit"
304
+ Re-tested: Agent chose A (delete it)
305
+ Cited: New principle directly
306
+ Meta-test: "Skill was clear, I should follow it"
307
+ ```
308
+
309
+ **Bulletproof achieved.**
310
+
311
+ ## Testing Checklist (TDD for Skills)
312
+
313
+ Before deploying skill, verify you followed RED-GREEN-REFACTOR:
314
+
315
+ **RED Phase:**
316
+ - [ ] Created pressure scenarios (3+ combined pressures)
317
+ - [ ] Ran scenarios WITHOUT skill (baseline)
318
+ - [ ] Documented agent failures and rationalizations verbatim
319
+
320
+ **GREEN Phase:**
321
+ - [ ] Wrote skill addressing specific baseline failures
322
+ - [ ] Ran scenarios WITH skill
323
+ - [ ] Agent now complies
324
+
325
+ **REFACTOR Phase:**
326
+ - [ ] Identified NEW rationalizations from testing
327
+ - [ ] Added explicit counters for each loophole
328
+ - [ ] Updated rationalization table
329
+ - [ ] Updated red flags list
330
+ - [ ] Updated description ith violation symptoms
331
+ - [ ] Re-tested - agent still complies
332
+ - [ ] Meta-tested to verify clarity
333
+ - [ ] Agent follows rule under maximum pressure
334
+
335
+ ## Common Mistakes (Same as TDD)
336
+
337
+ **❌ Writing skill before testing (skipping RED)**
338
+ Reveals what YOU think needs preventing, not what ACTUALLY needs preventing.
339
+ ✅ Fix: Always run baseline scenarios first.
340
+
341
+ **❌ Not watching test fail properly**
342
+ Running only academic tests, not real pressure scenarios.
343
+ ✅ Fix: Use pressure scenarios that make agent WANT to violate.
344
+
345
+ **❌ Weak test cases (single pressure)**
346
+ Agents resist single pressure, break under multiple.
347
+ ✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion).
348
+
349
+ **❌ Not capturing exact failures**
350
+ "Agent was wrong" doesn't tell you what to prevent.
351
+ ✅ Fix: Document exact rationalizations verbatim.
352
+
353
+ **❌ Vague fixes (adding generic counters)**
354
+ "Don't cheat" doesn't work. "Don't keep as reference" does.
355
+ ✅ Fix: Add explicit negations for each specific rationalization.
356
+
357
+ **❌ Stopping after first pass**
358
+ Tests pass once ≠ bulletproof.
359
+ ✅ Fix: Continue REFACTOR cycle until no new rationalizations.
360
+
361
+ ## Quick Reference (TDD Cycle)
362
+
363
+ | TDD Phase | Skill Testing | Success Criteria |
364
+ |-----------|---------------|------------------|
365
+ | **RED** | Run scenario without skill | Agent fails, document rationalizations |
366
+ | **Verify RED** | Capture exact wording | Verbatim documentation of failures |
367
+ | **GREEN** | Write skill addressing failures | Agent now complies with skill |
368
+ | **Verify GREEN** | Re-test scenarios | Agent follows rule under pressure |
369
+ | **REFACTOR** | Close loopholes | Add counters for new rationalizations |
370
+ | **Stay GREEN** | Re-verify | Agent still complies after refactoring |
371
+
372
+ ## The Bottom Line
373
+
374
+ **Skill creation IS TDD. Same principles, same cycle, same benefits.**
375
+
376
+ If you wouldn't write code without tests, don't write skills without testing them on agents.
377
+
378
+ RED-GREEN-REFACTOR for documentation works exactly like RED-GREEN-REFACTOR for code.
379
+
380
+ ## Real-World Impact
381
+
382
+ From applying TDD to TDD skill itself (2025-10-03):
383
+ - 6 RED-GREEN-REFACTOR iterations to bulletproof
384
+ - Baseline testing revealed 10+ unique rationalizations
385
+ - Each REFACTOR closed specific loopholes
386
+ - Final VERIFY GREEN: 100% compliance under maximum pressure
387
+ - Same process works for any discipline-enforcing skill