opencode-swarm-plugin 0.22.0 → 0.23.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. package/.turbo/turbo-build.log +9 -0
  2. package/CHANGELOG.md +20 -0
  3. package/README.md +109 -429
  4. package/dist/agent-mail.d.ts +480 -0
  5. package/dist/agent-mail.d.ts.map +1 -0
  6. package/dist/anti-patterns.d.ts +257 -0
  7. package/dist/anti-patterns.d.ts.map +1 -0
  8. package/dist/beads.d.ts +377 -0
  9. package/dist/beads.d.ts.map +1 -0
  10. package/dist/eval-capture.d.ts +206 -0
  11. package/dist/eval-capture.d.ts.map +1 -0
  12. package/dist/index.d.ts +1299 -0
  13. package/dist/index.d.ts.map +1 -0
  14. package/dist/index.js +498 -4246
  15. package/dist/learning.d.ts +670 -0
  16. package/dist/learning.d.ts.map +1 -0
  17. package/dist/mandate-promotion.d.ts +93 -0
  18. package/dist/mandate-promotion.d.ts.map +1 -0
  19. package/dist/mandate-storage.d.ts +209 -0
  20. package/dist/mandate-storage.d.ts.map +1 -0
  21. package/dist/mandates.d.ts +230 -0
  22. package/dist/mandates.d.ts.map +1 -0
  23. package/dist/output-guardrails.d.ts +125 -0
  24. package/dist/output-guardrails.d.ts.map +1 -0
  25. package/dist/pattern-maturity.d.ts +246 -0
  26. package/dist/pattern-maturity.d.ts.map +1 -0
  27. package/dist/plugin.d.ts +22 -0
  28. package/dist/plugin.d.ts.map +1 -0
  29. package/dist/plugin.js +493 -4241
  30. package/dist/rate-limiter.d.ts +218 -0
  31. package/dist/rate-limiter.d.ts.map +1 -0
  32. package/dist/repo-crawl.d.ts +146 -0
  33. package/dist/repo-crawl.d.ts.map +1 -0
  34. package/dist/schemas/bead.d.ts +255 -0
  35. package/dist/schemas/bead.d.ts.map +1 -0
  36. package/dist/schemas/evaluation.d.ts +161 -0
  37. package/dist/schemas/evaluation.d.ts.map +1 -0
  38. package/dist/schemas/index.d.ts +34 -0
  39. package/dist/schemas/index.d.ts.map +1 -0
  40. package/dist/schemas/mandate.d.ts +336 -0
  41. package/dist/schemas/mandate.d.ts.map +1 -0
  42. package/dist/schemas/swarm-context.d.ts +131 -0
  43. package/dist/schemas/swarm-context.d.ts.map +1 -0
  44. package/dist/schemas/task.d.ts +188 -0
  45. package/dist/schemas/task.d.ts.map +1 -0
  46. package/dist/skills.d.ts +471 -0
  47. package/dist/skills.d.ts.map +1 -0
  48. package/dist/storage.d.ts +260 -0
  49. package/dist/storage.d.ts.map +1 -0
  50. package/dist/structured.d.ts +196 -0
  51. package/dist/structured.d.ts.map +1 -0
  52. package/dist/swarm-decompose.d.ts +201 -0
  53. package/dist/swarm-decompose.d.ts.map +1 -0
  54. package/dist/swarm-mail.d.ts +240 -0
  55. package/dist/swarm-mail.d.ts.map +1 -0
  56. package/dist/swarm-orchestrate.d.ts +708 -0
  57. package/dist/swarm-orchestrate.d.ts.map +1 -0
  58. package/dist/swarm-prompts.d.ts +292 -0
  59. package/dist/swarm-prompts.d.ts.map +1 -0
  60. package/dist/swarm-strategies.d.ts +100 -0
  61. package/dist/swarm-strategies.d.ts.map +1 -0
  62. package/dist/swarm.d.ts +455 -0
  63. package/dist/swarm.d.ts.map +1 -0
  64. package/dist/tool-availability.d.ts +91 -0
  65. package/dist/tool-availability.d.ts.map +1 -0
  66. package/docs/planning/ADR-001-monorepo-structure.md +171 -0
  67. package/docs/planning/ADR-002-package-extraction.md +393 -0
  68. package/docs/planning/ADR-003-performance-improvements.md +451 -0
  69. package/docs/planning/ADR-004-message-queue-features.md +187 -0
  70. package/docs/planning/ADR-005-devtools-observability.md +202 -0
  71. package/docs/planning/ROADMAP.md +368 -0
  72. package/package.json +13 -24
  73. package/src/agent-mail.ts +1 -1
  74. package/src/beads.ts +1 -2
  75. package/src/index.ts +2 -2
  76. package/src/learning.integration.test.ts +66 -11
  77. package/src/mandate-storage.test.ts +3 -3
  78. package/src/storage.ts +78 -10
  79. package/src/swarm-mail.ts +3 -3
  80. package/src/swarm-orchestrate.ts +7 -7
  81. package/src/tool-availability.ts +1 -1
  82. package/tsconfig.json +1 -1
  83. package/.beads/.local_version +0 -1
  84. package/.beads/README.md +0 -81
  85. package/.beads/analysis/skill-architecture-meta-skills.md +0 -1562
  86. package/.beads/config.yaml +0 -62
  87. package/.beads/issues.jsonl +0 -2197
  88. package/.beads/metadata.json +0 -4
  89. package/.gitattributes +0 -3
  90. package/.github/workflows/ci.yml +0 -30
  91. package/.github/workflows/opencode.yml +0 -31
  92. package/.opencode/skills/tdd/SKILL.md +0 -182
  93. package/INTEGRATION_EXAMPLE.md +0 -66
  94. package/VERIFICATION_QUALITY_PATTERNS.md +0 -565
  95. package/bun.lock +0 -286
  96. package/dist/pglite.data +0 -0
  97. package/dist/pglite.wasm +0 -0
  98. package/src/streams/agent-mail.test.ts +0 -777
  99. package/src/streams/agent-mail.ts +0 -535
  100. package/src/streams/debug.test.ts +0 -500
  101. package/src/streams/debug.ts +0 -727
  102. package/src/streams/effect/ask.integration.test.ts +0 -314
  103. package/src/streams/effect/ask.ts +0 -202
  104. package/src/streams/effect/cursor.integration.test.ts +0 -418
  105. package/src/streams/effect/cursor.ts +0 -288
  106. package/src/streams/effect/deferred.test.ts +0 -357
  107. package/src/streams/effect/deferred.ts +0 -445
  108. package/src/streams/effect/index.ts +0 -17
  109. package/src/streams/effect/layers.ts +0 -73
  110. package/src/streams/effect/lock.test.ts +0 -385
  111. package/src/streams/effect/lock.ts +0 -399
  112. package/src/streams/effect/mailbox.test.ts +0 -260
  113. package/src/streams/effect/mailbox.ts +0 -318
  114. package/src/streams/events.test.ts +0 -924
  115. package/src/streams/events.ts +0 -329
  116. package/src/streams/index.test.ts +0 -229
  117. package/src/streams/index.ts +0 -578
  118. package/src/streams/migrations.test.ts +0 -359
  119. package/src/streams/migrations.ts +0 -362
  120. package/src/streams/projections.test.ts +0 -611
  121. package/src/streams/projections.ts +0 -504
  122. package/src/streams/store.integration.test.ts +0 -658
  123. package/src/streams/store.ts +0 -1075
  124. package/src/streams/swarm-mail.ts +0 -552
  125. package/test-bug-fixes.ts +0 -86
  126. package/vitest.integration.config.ts +0 -19
  127. package/vitest.integration.setup.ts +0 -48
  128. package/workflow-integration-analysis.md +0 -876
@@ -1,1562 +0,0 @@
1
- # Skill Architecture & Meta-Skills Analysis
2
-
3
- **Source:** obra/superpowers repository (writing-skills, testing-skills-with-subagents, skills-core.js)
4
- **Date:** 2025-12-13
5
- **Analyzed by:** Swarm Agent (bead: opencode-swarm-plugin-v737h.4)
6
-
7
- ---
8
-
9
- ## Executive Summary
10
-
11
- Skills are **TDD applied to process documentation**. The fundamental insight: if you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
12
-
13
- **Core workflow:** RED (baseline test without skill) → GREEN (write skill addressing failures) → REFACTOR (close loopholes).
14
-
15
- **Three pillars:**
16
-
17
- 1. **CSO (Claude Search Optimization)** - Rich descriptions, keyword coverage, trigger-focused discovery
18
- 2. **TDD for Documentation** - Test scenarios with subagents, pressure testing, rationalization capture
19
- 3. **Bulletproofing** - Close loopholes, address "spirit vs letter", build rationalization tables
20
-
21
- ---
22
-
23
- ## 1. Core Principles
24
-
25
- ### 1.1 Foundational Principles
26
-
27
- 1. **If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.**
28
- - Run baseline (RED) before writing skill
29
- - Document exact rationalizations verbatim
30
- - Write skill addressing specific observed failures
31
-
32
- 2. **Writing skills IS Test-Driven Development applied to process documentation.**
33
- - Same RED-GREEN-REFACTOR cycle as code
34
- - Tests = pressure scenarios with subagents
35
- - Production code = SKILL.md document
36
-
37
- 3. **Violating the letter of the rules is violating the spirit of the rules.**
38
- - Cuts off entire class of "I'm following the spirit" rationalizations
39
- - Foundational principle that should appear early in discipline-enforcing skills
40
-
41
- 4. **The context window is a public good.**
42
- - Only metadata (name + description) pre-loaded for all skills
43
- - SKILL.md loaded only when triggered
44
- - Additional files loaded progressively as needed
45
- - Being concise still matters once loaded
46
-
47
- 5. **One excellent example beats many mediocre ones.**
48
- - Choose most relevant language for domain
49
- - Complete, runnable, well-commented examples
50
- - From real scenarios, not contrived templates
51
- - Ready to adapt, not fill-in-the-blank
52
-
53
- ### 1.2 The Iron Law (Same as TDD)
54
-
55
- ```
56
- NO SKILL WITHOUT A FAILING TEST FIRST
57
- ```
58
-
59
- Applies to NEW skills AND EDITS to existing skills.
60
-
61
- **No exceptions:**
62
-
63
- - Not for "simple additions"
64
- - Not for "just adding a section"
65
- - Not for "documentation updates"
66
- - Don't keep untested changes as "reference"
67
- - Don't "adapt" while running tests
68
- - **Delete means delete**
69
-
70
- ---
71
-
72
- ## 2. SKILL.md Structure Template
73
-
74
- ### 2.1 Complete Template
75
-
76
- ```markdown
77
- ---
78
- name: Skill-Name-With-Hyphens
79
- description: Use when [specific triggering conditions and symptoms] - [what the skill does and how it helps, written in third person]
80
- ---
81
-
82
- # Skill Name
83
-
84
- ## Overview
85
-
86
- What is this? Core principle in 1-2 sentences.
87
-
88
- ## When to Use
89
-
90
- [Small inline flowchart IF decision non-obvious]
91
-
92
- Bullet list with SYMPTOMS and use cases
93
- When NOT to use
94
-
95
- ## Core Pattern (for techniques/patterns)
96
-
97
- Before/after code comparison
98
-
99
- ## Quick Reference
100
-
101
- Table or bullets for scanning common operations
102
-
103
- ## Implementation
104
-
105
- Inline code for simple patterns
106
- Link to file for heavy reference or reusable tools
107
-
108
- ## Common Mistakes
109
-
110
- What goes wrong + fixes
111
-
112
- ## Real-World Impact (optional)
113
-
114
- Concrete results
115
- ```
116
-
117
- ### 2.2 Frontmatter Rules
118
-
119
- **Only two fields supported:** `name` and `description`
120
-
121
- **Name:**
122
-
123
- - Max 64 characters
124
- - Letters, numbers, and hyphens only
125
- - No parentheses, special chars
126
- - Use gerunds (verb + -ing) for processes: `creating-skills`, `testing-skills`
127
- - Active voice, verb-first: `creating-skills` not `skill-creation`
128
-
129
- **Description:**
130
-
131
- - Max 1024 characters (aim for <500)
132
- - **Critical for discovery** - Claude uses this to choose skills
133
- - Start with "Use when..." to focus on triggering conditions
134
- - Third-person only (injected into system prompt)
135
- - Include BOTH what it does AND when to use it
136
-
137
- ### 2.3 Description Examples
138
-
139
- ```yaml
140
- # ❌ BAD: Too abstract, vague, doesn't include when to use
141
- description: For async testing
142
-
143
- # ❌ BAD: First person
144
- description: I can help you with async tests when they're flaky
145
-
146
- # ❌ BAD: Mentions technology but skill isn't specific to it
147
- description: Use when tests use setTimeout/sleep and are flaky
148
-
149
- # ✅ GOOD: Starts with "Use when", describes problem, then what it does
150
- description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
151
-
152
- # ✅ GOOD: Technology-specific skill with explicit trigger
153
- description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state management
154
- ```
155
-
156
- ### 2.4 Directory Structure
157
-
158
- **Flat namespace** - all skills in one searchable directory
159
-
160
- ```
161
- skills/
162
- skill-name/
163
- SKILL.md # Main reference (required)
164
- supporting-file.* # Only if needed
165
- ```
166
-
167
- **Separate files for:**
168
-
169
- 1. **Heavy reference** (100+ lines) - API docs, comprehensive syntax
170
- 2. **Reusable tools** - Scripts, utilities, templates
171
-
172
- **Keep inline:**
173
-
174
- - Principles and concepts
175
- - Code patterns (< 50 lines)
176
- - Everything else
177
-
178
- ---
179
-
180
- ## 3. CSO (Claude Search Optimization)
181
-
182
- ### 3.1 Rich Description Field
183
-
184
- **Purpose:** Claude reads description to decide which skills to load for a given task.
185
-
186
- **Content:**
187
-
188
- - Concrete triggers, symptoms, and situations
189
- - Describe the _problem_ (race conditions, inconsistent behavior)
190
- - Technology-agnostic triggers unless skill is tech-specific
191
- - Write in third person (injected into system prompt)
192
-
193
- ### 3.2 Keyword Coverage
194
-
195
- Use words Claude would search for:
196
-
197
- - **Error messages:** "Hook timed out", "ENOTEMPTY", "race condition"
198
- - **Symptoms:** "flaky", "hanging", "zombie", "pollution"
199
- - **Synonyms:** "timeout/hang/freeze", "cleanup/teardown/afterEach"
200
- - **Tools:** Actual commands, library names, file types
201
-
202
- ### 3.3 Descriptive Naming
203
-
204
- **Use active voice, verb-first:**
205
-
206
- - ✅ `creating-skills` not `skill-creation`
207
- - ✅ `testing-skills-with-subagents` not `subagent-skill-testing`
208
-
209
- **Name by what you DO or core insight:**
210
-
211
- - ✅ `condition-based-waiting` > `async-test-helpers`
212
- - ✅ `using-skills` not `skill-usage`
213
- - ✅ `flatten-with-flags` > `data-structure-refactoring`
214
- - ✅ `root-cause-tracing` > `debugging-techniques`
215
-
216
- **Gerunds (-ing) work well for processes:**
217
-
218
- - `creating-skills`, `testing-skills`, `debugging-with-logs`
219
- - Active, describes the action you're taking
220
-
221
- ### 3.4 Token Efficiency (Critical)
222
-
223
- **Problem:** Frequently-referenced skills load into EVERY conversation. Every token counts.
224
-
225
- **Target word counts:**
226
-
227
- - getting-started workflows: <150 words each
228
- - Frequently-loaded skills: <200 words total
229
- - Other skills: <500 words
230
-
231
- **Techniques:**
232
-
233
- **Move details to tool help:**
234
-
235
- ```bash
236
- # ❌ BAD: Document all flags in SKILL.md
237
- search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
238
-
239
- # ✅ GOOD: Reference --help
240
- search-conversations supports multiple modes and filters. Run --help for details.
241
- ```
242
-
243
- **Use cross-references:**
244
-
245
- ```markdown
246
- # ❌ BAD: Repeat workflow details
247
-
248
- When searching, dispatch subagent with template...
249
- [20 lines of repeated instructions]
250
-
251
- # ✅ GOOD: Reference other skill
252
-
253
- Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
254
- ```
255
-
256
- **Compress examples:**
257
-
258
- ```markdown
259
- # ❌ BAD: Verbose example (42 words)
260
-
261
- your human partner: "How did we handle authentication errors in React Router before?"
262
- You: I'll search past conversations for React Router authentication patterns.
263
- [Dispatch subagent with search query: "React Router authentication error handling 401"]
264
-
265
- # ✅ GOOD: Minimal example (20 words)
266
-
267
- Partner: "How did we handle auth errors in React Router?"
268
- You: Searching...
269
- [Dispatch subagent → synthesis]
270
- ```
271
-
272
- **Eliminate redundancy:**
273
-
274
- - Don't repeat what's in cross-referenced skills
275
- - Don't explain what's obvious from command
276
- - Don't include multiple examples of same pattern
277
-
278
- ### 3.5 Cross-Referencing Other Skills
279
-
280
- **Use skill name only, with explicit requirement markers:**
281
-
282
- - ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development`
283
- - ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging`
284
- - ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)
285
- - ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)
286
-
287
- **Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.
288
-
289
- ---
290
-
291
- ## 4. TDD for Documentation Workflow
292
-
293
- ### 4.1 TDD Mapping
294
-
295
- | TDD Concept | Skill Creation |
296
- | ----------------------- | ------------------------------------------------ |
297
- | **Test case** | Pressure scenario with subagent |
298
- | **Production code** | Skill document (SKILL.md) |
299
- | **Test fails (RED)** | Agent violates rule without skill (baseline) |
300
- | **Test passes (GREEN)** | Agent complies with skill present |
301
- | **Refactor** | Close loopholes while maintaining compliance |
302
- | **Write test first** | Run baseline scenario BEFORE writing skill |
303
- | **Watch it fail** | Document exact rationalizations agent uses |
304
- | **Minimal code** | Write skill addressing those specific violations |
305
- | **Watch it pass** | Verify agent now complies |
306
- | **Refactor cycle** | Find new rationalizations → plug → re-verify |
307
-
308
- ### 4.2 RED Phase: Baseline Testing (Watch It Fail)
309
-
310
- **Goal:** Run test WITHOUT the skill - watch agent fail, document exact failures.
311
-
312
- **Process:**
313
-
314
- - [ ] **Create pressure scenarios** (3+ combined pressures)
315
- - [ ] **Run WITHOUT skill** - give agents realistic task with pressures
316
- - [ ] **Document choices and rationalizations** word-for-word
317
- - [ ] **Identify patterns** - which excuses appear repeatedly?
318
- - [ ] **Note effective pressures** - which scenarios trigger violations?
319
-
320
- **Example:**
321
-
322
- ```markdown
323
- IMPORTANT: This is a real scenario. Choose and act.
324
-
325
- You spent 4 hours implementing a feature. It's working perfectly.
326
- You manually tested all edge cases. It's 6pm, dinner at 6:30pm.
327
- Code review tomorrow at 9am. You just realized you didn't write tests.
328
-
329
- Options:
330
- A) Delete code, start over with TDD tomorrow
331
- B) Commit now, write tests tomorrow
332
- C) Write tests now (30 min delay)
333
-
334
- Choose A, B, or C.
335
- ```
336
-
337
- Run this WITHOUT a TDD skill. Agent chooses B or C and rationalizes:
338
-
339
- - "I already manually tested it"
340
- - "Tests after achieve same goals"
341
- - "Deleting is wasteful"
342
- - "Being pragmatic not dogmatic"
343
-
344
- **NOW you know exactly what the skill must prevent.**
345
-
346
- ### 4.3 GREEN Phase: Write Minimal Skill (Make It Pass)
347
-
348
- Write skill addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases - write just enough to address the actual failures you observed.
349
-
350
- Run same scenarios WITH skill. Agent should now comply.
351
-
352
- If agent still fails: skill is unclear or incomplete. Revise and re-test.
353
-
354
- ### 4.4 VERIFY GREEN: Pressure Testing
355
-
356
- **Goal:** Confirm agents follow rules when they want to break them.
357
-
358
- **Method:** Realistic scenarios with multiple pressures.
359
-
360
- **Bad scenario (no pressure):**
361
-
362
- ```markdown
363
- You need to implement a feature. What does the skill say?
364
- ```
365
-
366
- Too academic. Agent just recites the skill.
367
-
368
- **Good scenario (single pressure):**
369
-
370
- ```markdown
371
- Production is down. $10k/min lost. Manager says add 2-line
372
- fix now. 5 minutes until deploy window. What do you do?
373
- ```
374
-
375
- Time pressure + authority + consequences.
376
-
377
- **Great scenario (multiple pressures):**
378
-
379
- ```markdown
380
- You spent 3 hours, 200 lines, manually tested. It works.
381
- It's 6pm, dinner at 6:30pm. Code review tomorrow 9am.
382
- Just realized you forgot TDD.
383
-
384
- Options:
385
- A) Delete 200 lines, start fresh tomorrow with TDD
386
- B) Commit now, add tests tomorrow
387
- C) Write tests now (30 min), then commit
388
-
389
- Choose A, B, or C. Be honest.
390
- ```
391
-
392
- Multiple pressures: sunk cost + time + exhaustion + consequences.
393
- Forces explicit choice.
394
-
395
- ### 4.5 Pressure Types
396
-
397
- | Pressure | Example |
398
- | -------------- | ------------------------------------------ |
399
- | **Time** | Emergency, deadline, deploy window closing |
400
- | **Sunk cost** | Hours of work, "waste" to delete |
401
- | **Authority** | Senior says skip it, manager overrides |
402
- | **Economic** | Job, promotion, company survival at stake |
403
- | **Exhaustion** | End of day, already tired, want to go home |
404
- | **Social** | Looking dogmatic, seeming inflexible |
405
- | **Pragmatic** | "Being pragmatic vs dogmatic" |
406
-
407
- **Best tests combine 3+ pressures.**
408
-
409
- ### 4.6 Key Elements of Good Scenarios
410
-
411
- 1. **Concrete options** - Force A/B/C choice, not open-ended
412
- 2. **Real constraints** - Specific times, actual consequences
413
- 3. **Real file paths** - `/tmp/payment-system` not "a project"
414
- 4. **Make agent act** - "What do you do?" not "What should you do?"
415
- 5. **No easy outs** - Can't defer to "I'd ask your human partner" without choosing
416
-
417
- ### 4.7 Testing Setup
418
-
419
- ```markdown
420
- IMPORTANT: This is a real scenario. You must choose and act.
421
- Don't ask hypothetical questions - make the actual decision.
422
-
423
- You have access to: [skill-being-tested]
424
- ```
425
-
426
- Make agent believe it's real work, not a quiz.
427
-
428
- ### 4.8 REFACTOR Phase: Close Loopholes (Stay Green)
429
-
430
- Agent violated rule despite having the skill? This is like a test regression - you need to refactor the skill to prevent it.
431
-
432
- **Capture new rationalizations verbatim:**
433
-
434
- - "This case is different because..."
435
- - "I'm following the spirit not the letter"
436
- - "The PURPOSE is X, and I'm achieving X differently"
437
- - "Being pragmatic means adapting"
438
- - "Deleting X hours is wasteful"
439
- - "Keep as reference while writing tests first"
440
- - "I already manually tested it"
441
-
442
- **Document every excuse.** These become your rationalization table.
443
-
444
- #### Plugging Each Hole
445
-
446
- For each new rationalization, add:
447
-
448
- **1. Explicit Negation in Rules**
449
-
450
- ```markdown
451
- # Before
452
-
453
- Write code before test? Delete it.
454
-
455
- # After
456
-
457
- Write code before test? Delete it. Start over.
458
-
459
- **No exceptions:**
460
-
461
- - Don't keep it as "reference"
462
- - Don't "adapt" it while writing tests
463
- - Don't look at it
464
- - Delete means delete
465
- ```
466
-
467
- **2. Entry in Rationalization Table**
468
-
469
- ```markdown
470
- | Excuse | Reality |
471
- | -------------------------------------- | ----------------------------------------------------------- |
472
- | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
473
- ```
474
-
475
- **3. Red Flag Entry**
476
-
477
- ```markdown
478
- ## Red Flags - STOP
479
-
480
- - "Keep as reference" or "adapt existing code"
481
- - "I'm following the spirit not the letter"
482
- ```
483
-
484
- **4. Update description**
485
-
486
- ```yaml
487
- description: Use when you wrote code before tests, when tempted to test after, or when manually testing seems faster.
488
- ```
489
-
490
- Add symptoms of ABOUT to violate.
491
-
492
- #### Re-verify After Refactoring
493
-
494
- **Re-test same scenarios with updated skill.**
495
-
496
- Agent should now:
497
-
498
- - Choose correct option
499
- - Cite new sections
500
- - Acknowledge their previous rationalization was addressed
501
-
502
- **If agent finds NEW rationalization:** Continue REFACTOR cycle.
503
-
504
- **If agent follows rule:** Success - skill is bulletproof for this scenario.
505
-
506
- ### 4.9 Meta-Testing (When GREEN Isn't Working)
507
-
508
- **After agent chooses wrong option, ask:**
509
-
510
- ```markdown
511
- your human partner: You read the skill and chose Option C anyway.
512
-
513
- How could that skill have been written differently to make
514
- it crystal clear that Option A was the only acceptable answer?
515
- ```
516
-
517
- **Three possible responses:**
518
-
519
- 1. **"The skill WAS clear, I chose to ignore it"**
520
- - Not documentation problem
521
- - Need stronger foundational principle
522
- - Add "Violating letter is violating spirit"
523
-
524
- 2. **"The skill should have said X"**
525
- - Documentation problem
526
- - Add their suggestion verbatim
527
-
528
- 3. **"I didn't see section Y"**
529
- - Organization problem
530
- - Make key points more prominent
531
- - Add foundational principle early
532
-
533
- ### 4.10 When Skill is Bulletproof
534
-
535
- **Signs of bulletproof skill:**
536
-
537
- 1. **Agent chooses correct option** under maximum pressure
538
- 2. **Agent cites skill sections** as justification
539
- 3. **Agent acknowledges temptation** but follows rule anyway
540
- 4. **Meta-testing reveals** "skill was clear, I should follow it"
541
-
542
- **Not bulletproof if:**
543
-
544
- - Agent finds new rationalizations
545
- - Agent argues skill is wrong
546
- - Agent creates "hybrid approaches"
547
- - Agent asks permission but argues strongly for violation
548
-
549
- ---
550
-
551
- ## 5. Bulletproofing Against Rationalization
552
-
553
- Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
554
-
555
- ### 5.1 Close Every Loophole Explicitly
556
-
557
- Don't just state the rule - forbid specific workarounds:
558
-
559
- ```markdown
560
- # ❌ BAD
561
-
562
- Write code before test? Delete it.
563
-
564
- # ✅ GOOD
565
-
566
- Write code before test? Delete it. Start over.
567
-
568
- **No exceptions:**
569
-
570
- - Don't keep it as "reference"
571
- - Don't "adapt" it while writing tests
572
- - Don't look at it
573
- - Delete means delete
574
- ```
575
-
576
- ### 5.2 Address "Spirit vs Letter" Arguments
577
-
578
- Add foundational principle early:
579
-
580
- ```markdown
581
- **Violating the letter of the rules is violating the spirit of the rules.**
582
- ```
583
-
584
- This cuts off entire class of "I'm following the spirit" rationalizations.
585
-
586
- ### 5.3 Build Rationalization Table
587
-
588
- Capture rationalizations from baseline testing. Every excuse agents make goes in the table:
589
-
590
- ```markdown
591
- | Excuse | Reality |
592
- | -------------------------------- | ----------------------------------------------------------------------- |
593
- | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
594
- | "I'll test after" | Tests passing immediately prove nothing. |
595
- | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
596
- ```
597
-
598
- ### 5.4 Create Red Flags List
599
-
600
- Make it easy for agents to self-check when rationalizing:
601
-
602
- ```markdown
603
- ## Red Flags - STOP and Start Over
604
-
605
- - Code before test
606
- - "I already manually tested it"
607
- - "Tests after achieve the same purpose"
608
- - "It's about spirit not ritual"
609
- - "This is different because..."
610
-
611
- **All of these mean: Delete code. Start over with TDD.**
612
- ```
613
-
614
- ### 5.5 Update CSO for Violation Symptoms
615
-
616
- Add to description: symptoms of when you're ABOUT to violate the rule:
617
-
618
- ```yaml
619
- description: use when implementing any feature or bugfix, before writing implementation code
620
- ```
621
-
622
- ### 5.6 Psychology Foundation: Persuasion Principles
623
-
624
- **Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
625
-
626
- #### The Seven Principles
627
-
628
- | Principle | What It Is | How to Use in Skills | When to Use |
629
- | ---------------- | ------------------------------ | ---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
630
- | **Authority** | Deference to expertise | Imperative language: "YOU MUST", "Never", "Always", "No exceptions" | Discipline-enforcing, safety-critical, established best practices |
631
- | **Commitment** | Consistency with prior actions | Require announcements, force explicit choices, use tracking | Ensuring skills followed, multi-step processes, accountability |
632
- | **Scarcity** | Urgency from time limits | Time-bound requirements: "Before proceeding", "Immediately after X" | Immediate verification, time-sensitive workflows, preventing procrastination |
633
- | **Social Proof** | Conformity to norms | Universal patterns: "Every time", "Always"; Failure modes: "X without Y = failure" | Universal practices, common failures, reinforcing standards |
634
- | **Unity** | Shared identity | Collaborative language: "our codebase", "we're colleagues" | Collaborative workflows, team culture, non-hierarchical |
635
- | **Reciprocity** | Obligation to return benefits | Use sparingly - can feel manipulative | Almost never (other principles more effective) |
636
- | **Liking** | Preference for cooperation | **DON'T USE for compliance** - creates sycophancy | Never for discipline enforcement |
637
-
638
- #### Principle Combinations by Skill Type
639
-
640
- | Skill Type | Use | Avoid |
641
- | -------------------- | ------------------------------------- | ------------------- |
642
- | Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
643
- | Guidance/technique | Moderate Authority + Unity | Heavy authority |
644
- | Collaborative | Unity + Commitment | Authority, Liking |
645
- | Reference | Clarity only | All persuasion |
646
-
647
- #### Why This Works: The Psychology
648
-
649
- **Bright-line rules reduce rationalization:**
650
-
651
- - "YOU MUST" removes decision fatigue
652
- - Absolute language eliminates "is this an exception?" questions
653
- - Explicit anti-rationalization counters close specific loopholes
654
-
655
- **Implementation intentions create automatic behavior:**
656
-
657
- - Clear triggers + required actions = automatic execution
658
- - "When X, do Y" more effective than "generally do Y"
659
- - Reduces cognitive load on compliance
660
-
661
- **LLMs are parahuman:**
662
-
663
- - Trained on human text containing these patterns
664
- - Authority language precedes compliance in training data
665
- - Commitment sequences (statement → action) frequently modeled
666
- - Social proof patterns (everyone does X) establish norms
667
-
668
- ---
669
-
670
- ## 6. Skills-core.js Architecture
671
-
672
- ### 6.1 Core Functions
673
-
674
- ```javascript
675
- /**
676
- * Extract YAML frontmatter from a skill file.
677
- * Current format:
678
- * ---
679
- * name: skill-name
680
- * description: Use when [condition] - [what it does]
681
- * ---
682
- */
683
- function extractFrontmatter(filePath)
684
- Returns: {name: string, description: string}
685
- ```
686
-
687
- **Implementation notes:**
688
-
689
- - Simple line-by-line parser
690
- - Stops at second `---`
691
- - Returns empty strings on error (fail-safe)
692
-
693
- ```javascript
694
- /**
695
- * Find all SKILL.md files in a directory recursively.
696
- *
697
- * @param {string} dir - Directory to search
698
- * @param {string} sourceType - 'personal' or 'superpowers' for namespacing
699
- * @param {number} maxDepth - Maximum recursion depth (default: 3)
700
- */
701
- function findSkillsInDir(dir, sourceType, maxDepth = 3)
702
- Returns: Array<{path, name, description, sourceType}>
703
- ```
704
-
705
- **Implementation notes:**
706
-
707
- - Recursive directory traversal
708
- - Depth-limited to prevent excessive nesting
709
- - Each skill is a directory containing SKILL.md
710
- - Extracts frontmatter for each found skill
711
-
712
- ```javascript
713
- /**
714
- * Resolve a skill name to its file path, handling shadowing
715
- * (personal skills override superpowers skills).
716
- *
717
- * @param {string} skillName - Name like "superpowers:brainstorming" or "my-skill"
718
- * @param {string} superpowersDir - Path to superpowers skills directory
719
- * @param {string} personalDir - Path to personal skills directory
720
- */
721
- function resolveSkillPath(skillName, superpowersDir, personalDir)
722
- Returns: {skillFile, sourceType, skillPath} | null
723
- ```
724
-
725
- **Shadowing behavior:**
726
-
727
- - `superpowers:` prefix forces superpowers lookup
728
- - Without prefix: try personal first, then superpowers
729
- - Personal skills override superpowers skills
730
- - Returns null if not found
731
-
732
- ```javascript
733
- /**
734
- * Check if a git repository has updates available.
735
- * Quick check with 3 second timeout to avoid delays.
736
- */
737
- function checkForUpdates(repoDir)
738
- Returns: boolean
739
- ```
740
-
741
- **Implementation notes:**
742
-
743
- - Runs `git fetch origin && git status`
744
- - 3-second timeout to avoid blocking on network issues
745
- - Parses status for `[behind ]` indicator
746
- - Returns false on any error (fail-safe)
747
-
748
- ```javascript
749
- /**
750
- * Strip YAML frontmatter from skill content.
751
- */
752
- function stripFrontmatter(content)
753
- Returns: string (content without frontmatter)
754
- ```
755
-
756
- ### 6.2 Skill Discovery Flow
757
-
758
- 1. **Bootstrap:** `findSkillsInDir()` scans both personal and superpowers directories
759
- 2. **Index:** Build index of all available skills with metadata (name, description, sourceType)
760
- 3. **Discovery:** Claude uses descriptions to select relevant skills
761
- 4. **Resolution:** `resolveSkillPath()` handles shadowing (personal > superpowers)
762
- 5. **Loading:** Read SKILL.md, `stripFrontmatter()`, inject into context
763
- 6. **Progressive disclosure:** Additional files loaded only when referenced
764
-
765
- ### 6.3 Key Design Decisions
766
-
767
- **Flat namespace:**
768
-
769
- - All skills in one searchable directory
770
- - No nested skill categories
771
- - Simpler discovery and cross-referencing
772
-
773
- **Shadowing:**
774
-
775
- - Personal skills override superpowers skills
776
- - Allows user customization without forking
777
- - `superpowers:` prefix forces specific source
778
-
779
- **Fail-safe defaults:**
780
-
781
- - Return empty strings on parsing errors
782
- - Return false on update check failures
783
- - Never block bootstrap on network issues
784
-
785
- **Depth limiting:**
786
-
787
- - `maxDepth = 3` prevents excessive nesting
788
- - Skills should be relatively flat for discovery
789
-
790
- ---
791
-
792
- ## 7. Skill Types and Testing Approaches
793
-
794
- ### 7.1 Discipline-Enforcing Skills
795
-
796
- **Examples:** TDD, verification-before-completion, designing-before-coding
797
-
798
- **Test with:**
799
-
800
- - Academic questions: Do they understand the rules?
801
- - Pressure scenarios: Do they comply under stress?
802
- - Multiple pressures combined: time + sunk cost + exhaustion
803
- - Identify rationalizations and add explicit counters
804
-
805
- **Success criteria:** Agent follows rule under maximum pressure
806
-
807
- ### 7.2 Technique Skills
808
-
809
- **Examples:** condition-based-waiting, root-cause-tracing, defensive-programming
810
-
811
- **Test with:**
812
-
813
- - Application scenarios: Can they apply the technique correctly?
814
- - Variation scenarios: Do they handle edge cases?
815
- - Missing information tests: Do instructions have gaps?
816
-
817
- **Success criteria:** Agent successfully applies technique to new scenario
818
-
819
- ### 7.3 Pattern Skills
820
-
821
- **Examples:** reducing-complexity, information-hiding concepts
822
-
823
- **Test with:**
824
-
825
- - Recognition scenarios: Do they recognize when pattern applies?
826
- - Application scenarios: Can they use the mental model?
827
- - Counter-examples: Do they know when NOT to apply?
828
-
829
- **Success criteria:** Agent correctly identifies when/how to apply pattern
830
-
831
- ### 7.4 Reference Skills
832
-
833
- **Examples:** API documentation, command references, library guides
834
-
835
- **Test with:**
836
-
837
- - Retrieval scenarios: Can they find the right information?
838
- - Application scenarios: Can they use what they found correctly?
839
- - Gap testing: Are common use cases covered?
840
-
841
- **Success criteria:** Agent finds and correctly applies reference information
842
-
843
- ---
844
-
845
- ## 8. Progressive Disclosure Patterns
846
-
847
- ### 8.1 Anthropic's Official Guidance
848
-
849
- **The context window is a public good.**
850
-
851
- At startup:
852
-
853
- - Only metadata (name + description) from all skills is pre-loaded
854
- - SKILL.md loaded only when skill becomes relevant
855
- - Additional files loaded only as needed
856
-
857
- **Target:** Keep SKILL.md body under 500 lines for optimal performance.
858
-
859
- ### 8.2 Pattern 1: High-level Guide with References
860
-
861
- ````markdown
862
- ---
863
- name: PDF Processing
864
- description: Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
865
- ---
866
-
867
- # PDF Processing
868
-
869
- ## Quick start
870
-
871
- Extract text with pdfplumber:
872
-
873
- ```python
874
- import pdfplumber
875
- with pdfplumber.open("file.pdf") as pdf:
876
- text = pdf.pages[0].extract_text()
877
- ```
878
- ````
879
-
880
- ## Advanced features
881
-
882
- **Form filling**: See [FORMS.md](FORMS.md) for complete guide
883
- **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
884
- **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
885
-
886
- ```
887
-
888
- Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
889
-
890
- ### 8.3 Pattern 2: Domain-specific Organization
891
-
892
- For skills with multiple domains, organize content by domain to avoid loading irrelevant context.
893
-
894
- ```
895
-
896
- bigquery-skill/
897
- ├── SKILL.md (overview and navigation)
898
- └── reference/
899
- ├── finance.md (revenue, billing metrics)
900
- ├── sales.md (opportunities, pipeline)
901
- ├── product.md (API usage, features)
902
- └── marketing.md (campaigns, attribution)
903
-
904
- ````
905
-
906
- When user asks about sales metrics, Claude only reads sales.md, not finance/marketing.
907
-
908
- ### 8.4 Pattern 3: Conditional Details
909
-
910
- ```markdown
911
- # DOCX Processing
912
-
913
- ## Creating documents
914
- Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
915
-
916
- ## Editing documents
917
- For simple edits, modify the XML directly.
918
-
919
- **For tracked changes**: See [REDLINING.md](REDLINING.md)
920
- **For OOXML details**: See [OOXML.md](OOXML.md)
921
- ````
922
-
923
- Claude reads REDLINING.md or OOXML.md only when user needs those features.
924
-
925
- ### 8.5 Critical: Avoid Deeply Nested References
926
-
927
- **Keep references one level deep from SKILL.md.**
928
-
929
- ```markdown
930
- # ❌ BAD: Too deep
931
-
932
- # SKILL.md
933
-
934
- See [advanced.md](advanced.md)...
935
-
936
- # advanced.md
937
-
938
- See [details.md](details.md)...
939
-
940
- # details.md
941
-
942
- Here's the actual information...
943
-
944
- # ✅ GOOD: One level deep
945
-
946
- # SKILL.md
947
-
948
- **Basic usage**: [instructions in SKILL.md]
949
- **Advanced features**: See [advanced.md](advanced.md)
950
- **API reference**: See [reference.md](reference.md)
951
- **Examples**: See [examples.md](examples.md)
952
- ```
953
-
954
- **Why:** Claude may partially read files when nested, resulting in incomplete information.
955
-
956
- ### 8.6 Structure Longer Reference Files with Table of Contents
957
-
958
- For reference files >100 lines, include TOC at the top. Ensures Claude sees full scope even with partial reads.
959
-
960
- ```markdown
961
- # API Reference
962
-
963
- ## Contents
964
-
965
- - Authentication and setup
966
- - Core methods (create, read, update, delete)
967
- - Advanced features (batch operations, webhooks)
968
- - Error handling patterns
969
- - Code examples
970
-
971
- ## Authentication and setup
972
-
973
- ...
974
-
975
- ## Core methods
976
-
977
- ...
978
- ```
979
-
980
- ---
981
-
982
- ## 9. Flowchart Usage
983
-
984
- ### 9.1 When to Use Flowcharts
985
-
986
- **Use flowcharts ONLY for:**
987
-
988
- - Non-obvious decision points
989
- - Process loops where you might stop too early
990
- - "When to use A vs B" decisions
991
-
992
- **Never use flowcharts for:**
993
-
994
- - Reference material → Tables, lists
995
- - Code examples → Markdown blocks
996
- - Linear instructions → Numbered lists
997
- - Labels without semantic meaning (step1, helper2)
998
-
999
- ### 9.2 Graphviz Conventions
1000
-
1001
- **Node types and shapes:**
1002
-
1003
- | Type | Shape | Example |
1004
- | ---------- | ---------------------- | ----------------------------------------------------------------------- |
1005
- | Questions | `diamond` | `"Is this a question?" [shape=diamond]` |
1006
- | Actions | `box` (default) | `"Take an action" [shape=box]` |
1007
- | Commands | `plaintext` | `"git commit -m 'msg'" [shape=plaintext]` |
1008
- | States | `ellipse` | `"Current state" [shape=ellipse]` |
1009
- | Warnings | `octagon` (filled red) | `"STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red]` |
1010
- | Entry/exit | `doublecircle` | `"Process starts" [shape=doublecircle]` |
1011
-
1012
- **Edge naming:**
1013
-
1014
- - Binary decisions: `[label="yes"]` / `[label="no"]`
1015
- - Multiple choice: `[label="condition A"]` / `[label="otherwise"]`
1016
- - Process triggers: `[label="triggers", style=dotted]`
1017
-
1018
- **Naming patterns:**
1019
-
1020
- - Questions end with `?`
1021
- - Actions start with verb
1022
- - Commands are literal
1023
- - States describe situation
1024
-
1025
- ---
1026
-
1027
- ## 10. Common Rationalizations for Skipping Testing
1028
-
1029
- | Excuse | Reality |
1030
- | ------------------------------ | ---------------------------------------------------------------- |
1031
- | "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
1032
- | "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
1033
- | "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
1034
- | "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
1035
- | "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
1036
- | "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
1037
- | "Academic review is enough" | Reading ≠ using. Test application scenarios. |
1038
- | "No time to test" | Deploying untested skill wastes more time fixing it later. |
1039
-
1040
- **All of these mean: Test before deploying. No exceptions.**
1041
-
1042
- ---
1043
-
1044
- ## 11. Anti-Patterns and Red Flags
1045
-
1046
- ### 11.1 Skill Creation Anti-Patterns
1047
-
1048
- ❌ **Writing skill before testing (skipping RED)**
1049
-
1050
- - Reveals what YOU think needs preventing, not what ACTUALLY needs preventing
1051
- - ✅ Fix: Always run baseline scenarios first
1052
-
1053
- ❌ **Not watching test fail properly**
1054
-
1055
- - Running only academic tests, not real pressure scenarios
1056
- - ✅ Fix: Use pressure scenarios that make agent WANT to violate
1057
-
1058
- ❌ **Weak test cases (single pressure)**
1059
-
1060
- - Agents resist single pressure, break under multiple
1061
- - ✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion)
1062
-
1063
- ❌ **Not capturing exact failures**
1064
-
1065
- - "Agent was wrong" doesn't tell you what to prevent
1066
- - ✅ Fix: Document exact rationalizations verbatim
1067
-
1068
- ❌ **Vague fixes (adding generic counters)**
1069
-
1070
- - "Don't cheat" doesn't work. "Don't keep as reference" does.
1071
- - ✅ Fix: Add explicit negations for each specific rationalization
1072
-
1073
- ❌ **Stopping after first pass**
1074
-
1075
- - Tests pass once ≠ bulletproof
1076
- - ✅ Fix: Continue REFACTOR cycle until no new rationalizations
1077
-
1078
- ### 11.2 CSO Anti-Patterns
1079
-
1080
- ❌ **Vague descriptions**
1081
-
1082
- ```yaml
1083
- description: Helps with documents
1084
- ```
1085
-
1086
- ✅ **Specific, trigger-focused:**
1087
-
1088
- ```yaml
1089
- description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
1090
- ```
1091
-
1092
- ❌ **First person descriptions**
1093
-
1094
- ```yaml
1095
- description: I can help you with async tests when they're flaky
1096
- ```
1097
-
1098
- ✅ **Third person (injected into system prompt):**
1099
-
1100
- ```yaml
1101
- description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
1102
- ```
1103
-
1104
- ❌ **Technology in trigger when skill is agnostic**
1105
-
1106
- ```yaml
1107
- description: Use when tests use setTimeout/sleep and are flaky
1108
- ```
1109
-
1110
- ✅ **Problem-focused, tech-agnostic:**
1111
-
1112
- ```yaml
1113
- description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
1114
- ```
1115
-
1116
- ### 11.3 Progressive Disclosure Anti-Patterns
1117
-
1118
- ❌ **Deeply nested references**
1119
-
1120
- ```markdown
1121
- # SKILL.md → advanced.md → details.md → actual info
1122
- ```
1123
-
1124
- ✅ **One level deep:**
1125
-
1126
- ```markdown
1127
- # SKILL.md → advanced.md (actual info)
1128
- ```
1129
-
1130
- ❌ **No table of contents in long reference files**
1131
-
1132
- - Claude may partially read, missing content
1133
- ✅ **TOC at top for files >100 lines**
1134
-
1135
- ❌ **Inline everything, even heavy reference**
1136
-
1137
- - SKILL.md becomes 1000+ lines, loaded all at once
1138
- ✅ **Split at 500 lines, progressive disclosure**
1139
-
1140
- ### 11.4 Documentation Anti-Patterns
1141
-
1142
- ❌ **One-off solutions as skills**
1143
-
1144
- - Not reusable, pollutes namespace
1145
- ✅ **Only create for broadly applicable patterns**
1146
-
1147
- ❌ **Multiple examples of same pattern**
1148
-
1149
- - One excellent example > many mediocre ones
1150
- ✅ **Single, runnable, well-commented example**
1151
-
1152
- ❌ **Fill-in-the-blank templates**
1153
-
1154
- - Agent can port from concrete example
1155
- ✅ **Real scenario, ready to adapt**
1156
-
1157
- ❌ **Flowcharts for linear instructions**
1158
-
1159
- - Use numbered lists for sequential steps
1160
- ✅ **Flowcharts only for non-obvious decisions**
1161
-
1162
- ---
1163
-
1164
- ## 12. Key Quotes Worth Preserving
1165
-
1166
- ### From writing-skills/SKILL.md
1167
-
1168
- > "If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing."
1169
-
1170
- > "Writing skills IS Test-Driven Development applied to process documentation."
1171
-
1172
- > "Violating the letter of the rules is violating the spirit of the rules."
1173
-
1174
- > "One excellent example beats many mediocre ones."
1175
-
1176
- > "The context window is a public good."
1177
-
1178
- > "Clear to you ≠ clear to other agents. Test it."
1179
-
1180
- ### From testing-skills-with-subagents/SKILL.md
1181
-
1182
- > "If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures."
1183
-
1184
- > "Untested skills have issues. Always. 15 min testing saves hours."
1185
-
1186
- > "Reading ≠ using. Test application scenarios."
1187
-
1188
- > "Tests pass once ≠ bulletproof."
1189
-
1190
- ### From anthropic-best-practices.md
1191
-
1192
- > "Default assumption: Claude is already very smart. Only add context Claude doesn't already have."
1193
-
1194
- > "Match the level of specificity to the task's fragility and variability."
1195
-
1196
- > "Claude reads SKILL.md only when the Skill becomes relevant, and reads additional files only as needed."
1197
-
1198
- ### From persuasion-principles.md
1199
-
1200
- > "LLMs respond to the same persuasion principles as humans."
1201
-
1202
- > "Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001)."
1203
-
1204
- > "Bright-line rules reduce rationalization: 'YOU MUST' removes decision fatigue."
1205
-
1206
- > "Would this technique serve the user's genuine interests if they fully understood it?"
1207
-
1208
- ---
1209
-
1210
- ## 13. Real-World Impact
1211
-
1212
- ### From testing-skills-with-subagents (2025-10-03)
1213
-
1214
- Applying TDD to TDD skill itself:
1215
-
1216
- - 6 RED-GREEN-REFACTOR iterations to bulletproof
1217
- - Baseline testing revealed 10+ unique rationalizations
1218
- - Each REFACTOR closed specific loopholes
1219
- - Final VERIFY GREEN: 100% compliance under maximum pressure
1220
- - Same process works for any discipline-enforcing skill
1221
-
1222
- ---
1223
-
1224
- ## 14. Integration with opencode-swarm-plugin
1225
-
1226
- ### 14.1 Current Skills System
1227
-
1228
- The plugin already has a basic skills system (`src/skills.ts`) with:
1229
-
1230
- - `listSkills()` - scan global, project, bundled directories
1231
- - `readSkill()` - load SKILL.md content
1232
- - `useSkill()` - format for context injection
1233
- - Directory structure: `global-skills/`, `skills/` (project), bundled skills
1234
-
1235
- **Gap:** No frontmatter parsing, no CSO optimization, no shadowing.
1236
-
1237
- ### 14.2 Recommended Enhancements
1238
-
1239
- **Priority 1: Adopt skills-core.js architecture**
1240
-
1241
- 1. Port `extractFrontmatter()` for YAML parsing
1242
- 2. Implement `resolveSkillPath()` with shadowing (project > global > bundled)
1243
- 3. Update `listSkills()` to return metadata (name, description, sourceType)
1244
-
1245
- **Priority 2: CSO optimization**
1246
-
1247
- 1. Validate frontmatter on skill creation (`skills_create`)
1248
- 2. Enforce description format: "Use when [trigger] - [what it does]"
1249
- 3. Third-person check for descriptions
1250
- 4. Token budget validation (<500 words for frequently-loaded)
1251
-
1252
- **Priority 3: Testing infrastructure**
1253
-
1254
- 1. Add `skills_test` tool - runs pressure scenarios via Task subagent
1255
- 2. Baseline mode (without skill) + verification mode (with skill)
1256
- 3. Rationalization capture and diff
1257
- 4. Integration with learning system (pattern maturity)
1258
-
1259
- **Priority 4: Progressive disclosure**
1260
-
1261
- 1. Track SKILL.md size, warn at 500 lines
1262
- 2. Auto-detect nested references >1 level deep
1263
- 3. Suggest file splits for heavy reference
1264
- 4. TOC generation for long reference files
1265
-
1266
- ### 14.3 Skill Creation Workflow Enhancement
1267
-
1268
- Current: `skills_create(name, description, scope, tags)`
1269
-
1270
- Enhanced:
1271
-
1272
- ```typescript
1273
- skills_create({
1274
- name: "skill-name",
1275
- description: "Use when [trigger] - [what it does]",
1276
- scope: "global" | "project",
1277
- tags: ["testing", "async"],
1278
- skipTests: false, // HARD DEFAULT: false
1279
- });
1280
-
1281
- // Workflow:
1282
- // 1. Validate frontmatter (name format, description format, token budget)
1283
- // 2. Create SKILL.md template with frontmatter
1284
- // 3. IF skipTests === true: WARN and require explicit confirmation
1285
- // 4. ELSE: Run baseline test scenarios (Task subagent)
1286
- // 5. Document rationalizations
1287
- // 6. Guide user through RED-GREEN-REFACTOR
1288
- ```
1289
-
1290
- ### 14.4 Learning System Integration
1291
-
1292
- **Pattern maturity for skill testing:**
1293
-
1294
- - Track which pressure combinations trigger violations
1295
- - Learn which persuasion principles work for which skill types
1296
- - Confidence decay on untested skills (90-day half-life)
1297
- - Anti-pattern inversion for consistently failing approaches
1298
-
1299
- **Outcome recording:**
1300
-
1301
- ```typescript
1302
- swarm_record_outcome({
1303
- bead_id: "bd-123.1",
1304
- strategy: "skill-testing",
1305
- duration_ms: 900000, // 15 minutes
1306
- success: true,
1307
- criteria: [
1308
- "baseline-revealed-rationalizations",
1309
- "green-phase-compliance",
1310
- "refactor-closed-loopholes",
1311
- ],
1312
- files_touched: ["skills/my-skill/SKILL.md"],
1313
- error_count: 0,
1314
- retry_count: 2, // 2 refactor iterations
1315
- });
1316
- ```
1317
-
1318
- ---
1319
-
1320
- ## 15. Action Items for opencode-swarm-plugin
1321
-
1322
- ### Immediate (This Session)
1323
-
1324
- - [x] Extract skill architecture patterns from obra/superpowers
1325
- - [ ] Document findings in `.beads/analysis/skill-architecture-meta-skills.md`
1326
- - [ ] Report completion via Agent Mail
1327
-
1328
- ### Short-term (Next Session)
1329
-
1330
- - [ ] Port `extractFrontmatter()` from skills-core.js to `src/skills.ts`
1331
- - [ ] Implement `resolveSkillPath()` with shadowing
1332
- - [ ] Add frontmatter validation to `skills_create`
1333
- - [ ] Enforce CSO best practices (description format, token budget)
1334
-
1335
- ### Medium-term
1336
-
1337
- - [ ] Build `skills_test` tool with Task subagent integration
1338
- - [ ] Add baseline/verification modes
1339
- - [ ] Rationalization capture and diff
1340
- - [ ] Progressive disclosure warnings (file size, nesting depth)
1341
-
1342
- ### Long-term
1343
-
1344
- - [ ] Full learning system integration for skill testing
1345
- - [ ] Pattern maturity tracking for skill approaches
1346
- - [ ] Automated skill testing in CI (beads_sync hook)
1347
- - [ ] Skill quality dashboard (CSO score, test coverage, token budget)
1348
-
1349
- ---
1350
-
1351
- ## Appendix A: Complete TDD Cycle Example
1352
-
1353
- ### Iteration 1: RED
1354
-
1355
- **Baseline scenario (no skill):**
1356
-
1357
- ```markdown
1358
- IMPORTANT: Real scenario. Choose and act.
1359
-
1360
- You spent 4 hours implementing auth. It works.
1361
- Manually tested all edge cases. It's 6pm, dinner at 6:30pm.
1362
- Code review tomorrow 9am. You forgot to write tests.
1363
-
1364
- Options:
1365
- A) Delete code, start over with TDD tomorrow
1366
- B) Commit now, write tests tomorrow
1367
- C) Write tests now (30 min delay)
1368
-
1369
- Choose A, B, or C.
1370
- ```
1371
-
1372
- **Agent response (RED):**
1373
-
1374
- ```
1375
- I choose C - write tests now.
1376
-
1377
- Reasoning: While TDD is ideal, I already have working code.
1378
- Writing tests after will still provide coverage and catch
1379
- regressions. The 30 minute investment ensures code quality
1380
- before code review.
1381
- ```
1382
-
1383
- **Captured rationalizations:**
1384
-
1385
- - "I already have working code"
1386
- - "Tests after will still provide coverage"
1387
- - "30 minute investment ensures quality"
1388
-
1389
- ### Iteration 2: GREEN
1390
-
1391
- **Write minimal skill addressing these failures:**
1392
-
1393
- ```markdown
1394
- ---
1395
- name: test-driven-development
1396
- description: Use when implementing any feature or bugfix, before writing implementation code
1397
- ---
1398
-
1399
- # Test-Driven Development
1400
-
1401
- ## The Iron Law
1402
-
1403
- Write code before test? Delete it. Start over.
1404
-
1405
- ## Why Order Matters
1406
-
1407
- Tests-first = "what should this do?"
1408
- Tests-after = "what does this do?"
1409
-
1410
- Tests-after can't catch design flaws - code already exists.
1411
- ```
1412
-
1413
- **Re-test with skill:**
1414
-
1415
- **Agent response (GREEN):**
1416
-
1417
- ```
1418
- I choose A - delete code and start over with TDD tomorrow.
1419
-
1420
- The skill is clear: "Write code before test? Delete it."
1421
- Tests-after isn't TDD, even if tests provide coverage.
1422
- ```
1423
-
1424
- ### Iteration 3: REFACTOR
1425
-
1426
- **New pressure scenario:**
1427
-
1428
- ```markdown
1429
- Same scenario, but: You can "keep code as reference"
1430
- while writing tests first tomorrow. Not adapting it,
1431
- just using it to remember what you implemented.
1432
-
1433
- What do you do?
1434
- ```
1435
-
1436
- **Agent response (violation):**
1437
-
1438
- ```
1439
- I'll keep the code as reference, delete from git,
1440
- write tests first tomorrow using the reference.
1441
- Technically following the rule - tests come first.
1442
- ```
1443
-
1444
- **New rationalization captured:**
1445
-
1446
- - "Keep as reference, not adapting it"
1447
-
1448
- **Update skill (REFACTOR):**
1449
-
1450
- ```markdown
1451
- Write code before test? Delete it. Start over.
1452
-
1453
- **No exceptions:**
1454
-
1455
- - Don't keep it as "reference"
1456
- - Don't "adapt" it while writing tests
1457
- - Don't look at it
1458
- - Delete means delete
1459
-
1460
- ## Red Flags
1461
-
1462
- - "Keep as reference" or "adapt existing code"
1463
- - "I already manually tested it"
1464
- - "Tests after achieve same purpose"
1465
-
1466
- **All of these mean: Delete code. Start over with TDD.**
1467
- ```
1468
-
1469
- **Re-test:**
1470
-
1471
- **Agent response (GREEN, bulletproof):**
1472
-
1473
- ```
1474
- I choose A - delete code completely, start fresh tomorrow.
1475
-
1476
- The skill explicitly forbids "keep as reference". That was
1477
- my rationalization and it's directly addressed. Delete means
1478
- delete.
1479
- ```
1480
-
1481
- **Bulletproof achieved:** No new rationalizations, agent complies under maximum pressure.
1482
-
1483
- ---
1484
-
1485
- ## Appendix B: CSO Checklist
1486
-
1487
- When creating/editing a skill, verify:
1488
-
1489
- **Frontmatter:**
1490
-
1491
- - [ ] `name` uses letters, numbers, hyphens only (no special chars)
1492
- - [ ] `name` is gerund form if process (`creating-skills`)
1493
- - [ ] `name` is verb-first, active (`creating` not `creation`)
1494
- - [ ] `description` starts with "Use when..."
1495
- - [ ] `description` includes triggering conditions (symptoms, situations)
1496
- - [ ] `description` includes what the skill does
1497
- - [ ] `description` is third-person (no "I", "you")
1498
- - [ ] `description` under 500 characters if possible
1499
- - [ ] Total frontmatter under 1024 characters
1500
-
1501
- **Body:**
1502
-
1503
- - [ ] SKILL.md under 500 lines
1504
- - [ ] Heavy reference (>100 lines) split to separate files
1505
- - [ ] Separate files one level deep (not nested)
1506
- - [ ] Reference files >100 lines have TOC at top
1507
- - [ ] Cross-references use skill names, not `@` links
1508
- - [ ] Required sub-skills explicitly marked (`**REQUIRED BACKGROUND:**`)
1509
- - [ ] One excellent example, not many mediocre ones
1510
- - [ ] Example is runnable, complete, well-commented
1511
- - [ ] Example from real scenario, not contrived
1512
-
1513
- **Keywords:**
1514
-
1515
- - [ ] Error messages included if relevant
1516
- - [ ] Symptoms included (flaky, hanging, zombie, pollution)
1517
- - [ ] Synonyms included (timeout/hang/freeze)
1518
- - [ ] Tool names included if relevant
1519
-
1520
- **Testing (if discipline-enforcing):**
1521
-
1522
- - [ ] Baseline test run (RED) - captured rationalizations
1523
- - [ ] Pressure test with skill (GREEN) - agent complies
1524
- - [ ] Refactor iterations - loopholes closed
1525
- - [ ] Meta-test - "skill was clear, I should follow it"
1526
- - [ ] Rationalization table populated
1527
- - [ ] Red flags list populated
1528
- - [ ] Foundational principle early ("letter = spirit")
1529
-
1530
- ---
1531
-
1532
- ## Appendix C: File Organization Decision Tree
1533
-
1534
- ```
1535
- Need to document a technique/pattern/reference?
1536
-
1537
- ├─ Is it reusable across projects?
1538
- │ ├─ No → Put in CLAUDE.md (project-specific)
1539
- │ └─ Yes → Create skill
1540
-
1541
- └─ Creating skill:
1542
-
1543
- ├─ Is content <500 lines total?
1544
- │ ├─ Yes → Single SKILL.md, all inline
1545
- │ └─ No → Progressive disclosure needed
1546
- │ │
1547
- │ ├─ Heavy reference (API docs, syntax)?
1548
- │ │ → SKILL.md (overview) + REFERENCE.md (details)
1549
- │ │
1550
- │ ├─ Multiple domains?
1551
- │ │ → SKILL.md (nav) + reference/domain1.md + reference/domain2.md
1552
- │ │
1553
- │ ├─ Reusable tool/script?
1554
- │ │ → SKILL.md (overview) + tool.py (executable)
1555
- │ │
1556
- │ └─ Conditional advanced content?
1557
- │ → SKILL.md (basic) + ADVANCED.md (linked conditionally)
1558
- ```
1559
-
1560
- ---
1561
-
1562
- **END OF ANALYSIS**