opencode-swarm-plugin 0.22.0 → 0.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.turbo/turbo-build.log +9 -0
- package/CHANGELOG.md +12 -0
- package/README.md +109 -429
- package/dist/agent-mail.d.ts +480 -0
- package/dist/agent-mail.d.ts.map +1 -0
- package/dist/anti-patterns.d.ts +257 -0
- package/dist/anti-patterns.d.ts.map +1 -0
- package/dist/beads.d.ts +377 -0
- package/dist/beads.d.ts.map +1 -0
- package/dist/eval-capture.d.ts +206 -0
- package/dist/eval-capture.d.ts.map +1 -0
- package/dist/index.d.ts +1299 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +498 -4246
- package/dist/learning.d.ts +670 -0
- package/dist/learning.d.ts.map +1 -0
- package/dist/mandate-promotion.d.ts +93 -0
- package/dist/mandate-promotion.d.ts.map +1 -0
- package/dist/mandate-storage.d.ts +209 -0
- package/dist/mandate-storage.d.ts.map +1 -0
- package/dist/mandates.d.ts +230 -0
- package/dist/mandates.d.ts.map +1 -0
- package/dist/output-guardrails.d.ts +125 -0
- package/dist/output-guardrails.d.ts.map +1 -0
- package/dist/pattern-maturity.d.ts +246 -0
- package/dist/pattern-maturity.d.ts.map +1 -0
- package/dist/plugin.d.ts +22 -0
- package/dist/plugin.d.ts.map +1 -0
- package/dist/plugin.js +493 -4241
- package/dist/rate-limiter.d.ts +218 -0
- package/dist/rate-limiter.d.ts.map +1 -0
- package/dist/repo-crawl.d.ts +146 -0
- package/dist/repo-crawl.d.ts.map +1 -0
- package/dist/schemas/bead.d.ts +255 -0
- package/dist/schemas/bead.d.ts.map +1 -0
- package/dist/schemas/evaluation.d.ts +161 -0
- package/dist/schemas/evaluation.d.ts.map +1 -0
- package/dist/schemas/index.d.ts +34 -0
- package/dist/schemas/index.d.ts.map +1 -0
- package/dist/schemas/mandate.d.ts +336 -0
- package/dist/schemas/mandate.d.ts.map +1 -0
- package/dist/schemas/swarm-context.d.ts +131 -0
- package/dist/schemas/swarm-context.d.ts.map +1 -0
- package/dist/schemas/task.d.ts +188 -0
- package/dist/schemas/task.d.ts.map +1 -0
- package/dist/skills.d.ts +471 -0
- package/dist/skills.d.ts.map +1 -0
- package/dist/storage.d.ts +260 -0
- package/dist/storage.d.ts.map +1 -0
- package/dist/structured.d.ts +196 -0
- package/dist/structured.d.ts.map +1 -0
- package/dist/swarm-decompose.d.ts +201 -0
- package/dist/swarm-decompose.d.ts.map +1 -0
- package/dist/swarm-mail.d.ts +240 -0
- package/dist/swarm-mail.d.ts.map +1 -0
- package/dist/swarm-orchestrate.d.ts +708 -0
- package/dist/swarm-orchestrate.d.ts.map +1 -0
- package/dist/swarm-prompts.d.ts +292 -0
- package/dist/swarm-prompts.d.ts.map +1 -0
- package/dist/swarm-strategies.d.ts +100 -0
- package/dist/swarm-strategies.d.ts.map +1 -0
- package/dist/swarm.d.ts +455 -0
- package/dist/swarm.d.ts.map +1 -0
- package/dist/tool-availability.d.ts +91 -0
- package/dist/tool-availability.d.ts.map +1 -0
- package/docs/planning/ADR-001-monorepo-structure.md +171 -0
- package/docs/planning/ADR-002-package-extraction.md +393 -0
- package/docs/planning/ADR-003-performance-improvements.md +451 -0
- package/docs/planning/ADR-004-message-queue-features.md +187 -0
- package/docs/planning/ADR-005-devtools-observability.md +202 -0
- package/docs/planning/ROADMAP.md +368 -0
- package/package.json +13 -24
- package/src/agent-mail.ts +1 -1
- package/src/beads.ts +1 -2
- package/src/index.ts +2 -2
- package/src/learning.integration.test.ts +66 -11
- package/src/mandate-storage.test.ts +3 -3
- package/src/storage.ts +78 -10
- package/src/swarm-mail.ts +3 -3
- package/src/swarm-orchestrate.ts +7 -7
- package/src/tool-availability.ts +1 -1
- package/tsconfig.json +1 -1
- package/.beads/.local_version +0 -1
- package/.beads/README.md +0 -81
- package/.beads/analysis/skill-architecture-meta-skills.md +0 -1562
- package/.beads/config.yaml +0 -62
- package/.beads/issues.jsonl +0 -2197
- package/.beads/metadata.json +0 -4
- package/.gitattributes +0 -3
- package/.github/workflows/ci.yml +0 -30
- package/.github/workflows/opencode.yml +0 -31
- package/.opencode/skills/tdd/SKILL.md +0 -182
- package/INTEGRATION_EXAMPLE.md +0 -66
- package/VERIFICATION_QUALITY_PATTERNS.md +0 -565
- package/bun.lock +0 -286
- package/dist/pglite.data +0 -0
- package/dist/pglite.wasm +0 -0
- package/src/streams/agent-mail.test.ts +0 -777
- package/src/streams/agent-mail.ts +0 -535
- package/src/streams/debug.test.ts +0 -500
- package/src/streams/debug.ts +0 -727
- package/src/streams/effect/ask.integration.test.ts +0 -314
- package/src/streams/effect/ask.ts +0 -202
- package/src/streams/effect/cursor.integration.test.ts +0 -418
- package/src/streams/effect/cursor.ts +0 -288
- package/src/streams/effect/deferred.test.ts +0 -357
- package/src/streams/effect/deferred.ts +0 -445
- package/src/streams/effect/index.ts +0 -17
- package/src/streams/effect/layers.ts +0 -73
- package/src/streams/effect/lock.test.ts +0 -385
- package/src/streams/effect/lock.ts +0 -399
- package/src/streams/effect/mailbox.test.ts +0 -260
- package/src/streams/effect/mailbox.ts +0 -318
- package/src/streams/events.test.ts +0 -924
- package/src/streams/events.ts +0 -329
- package/src/streams/index.test.ts +0 -229
- package/src/streams/index.ts +0 -578
- package/src/streams/migrations.test.ts +0 -359
- package/src/streams/migrations.ts +0 -362
- package/src/streams/projections.test.ts +0 -611
- package/src/streams/projections.ts +0 -504
- package/src/streams/store.integration.test.ts +0 -658
- package/src/streams/store.ts +0 -1075
- package/src/streams/swarm-mail.ts +0 -552
- package/test-bug-fixes.ts +0 -86
- package/vitest.integration.config.ts +0 -19
- package/vitest.integration.setup.ts +0 -48
- package/workflow-integration-analysis.md +0 -876
|
@@ -1,1562 +0,0 @@
|
|
|
1
|
-
# Skill Architecture & Meta-Skills Analysis
|
|
2
|
-
|
|
3
|
-
**Source:** obra/superpowers repository (writing-skills, testing-skills-with-subagents, skills-core.js)
|
|
4
|
-
**Date:** 2025-12-13
|
|
5
|
-
**Analyzed by:** Swarm Agent (bead: opencode-swarm-plugin-v737h.4)
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Executive Summary
|
|
10
|
-
|
|
11
|
-
Skills are **TDD applied to process documentation**. The fundamental insight: if you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
|
|
12
|
-
|
|
13
|
-
**Core workflow:** RED (baseline test without skill) → GREEN (write skill addressing failures) → REFACTOR (close loopholes).
|
|
14
|
-
|
|
15
|
-
**Three pillars:**
|
|
16
|
-
|
|
17
|
-
1. **CSO (Claude Search Optimization)** - Rich descriptions, keyword coverage, trigger-focused discovery
|
|
18
|
-
2. **TDD for Documentation** - Test scenarios with subagents, pressure testing, rationalization capture
|
|
19
|
-
3. **Bulletproofing** - Close loopholes, address "spirit vs letter", build rationalization tables
|
|
20
|
-
|
|
21
|
-
---
|
|
22
|
-
|
|
23
|
-
## 1. Core Principles
|
|
24
|
-
|
|
25
|
-
### 1.1 Foundational Principles
|
|
26
|
-
|
|
27
|
-
1. **If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.**
|
|
28
|
-
- Run baseline (RED) before writing skill
|
|
29
|
-
- Document exact rationalizations verbatim
|
|
30
|
-
- Write skill addressing specific observed failures
|
|
31
|
-
|
|
32
|
-
2. **Writing skills IS Test-Driven Development applied to process documentation.**
|
|
33
|
-
- Same RED-GREEN-REFACTOR cycle as code
|
|
34
|
-
- Tests = pressure scenarios with subagents
|
|
35
|
-
- Production code = SKILL.md document
|
|
36
|
-
|
|
37
|
-
3. **Violating the letter of the rules is violating the spirit of the rules.**
|
|
38
|
-
- Cuts off entire class of "I'm following the spirit" rationalizations
|
|
39
|
-
- Foundational principle that should appear early in discipline-enforcing skills
|
|
40
|
-
|
|
41
|
-
4. **The context window is a public good.**
|
|
42
|
-
- Only metadata (name + description) pre-loaded for all skills
|
|
43
|
-
- SKILL.md loaded only when triggered
|
|
44
|
-
- Additional files loaded progressively as needed
|
|
45
|
-
- Being concise still matters once loaded
|
|
46
|
-
|
|
47
|
-
5. **One excellent example beats many mediocre ones.**
|
|
48
|
-
- Choose most relevant language for domain
|
|
49
|
-
- Complete, runnable, well-commented examples
|
|
50
|
-
- From real scenarios, not contrived templates
|
|
51
|
-
- Ready to adapt, not fill-in-the-blank
|
|
52
|
-
|
|
53
|
-
### 1.2 The Iron Law (Same as TDD)
|
|
54
|
-
|
|
55
|
-
```
|
|
56
|
-
NO SKILL WITHOUT A FAILING TEST FIRST
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Applies to NEW skills AND EDITS to existing skills.
|
|
60
|
-
|
|
61
|
-
**No exceptions:**
|
|
62
|
-
|
|
63
|
-
- Not for "simple additions"
|
|
64
|
-
- Not for "just adding a section"
|
|
65
|
-
- Not for "documentation updates"
|
|
66
|
-
- Don't keep untested changes as "reference"
|
|
67
|
-
- Don't "adapt" while running tests
|
|
68
|
-
- **Delete means delete**
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## 2. SKILL.md Structure Template
|
|
73
|
-
|
|
74
|
-
### 2.1 Complete Template
|
|
75
|
-
|
|
76
|
-
```markdown
|
|
77
|
-
---
|
|
78
|
-
name: Skill-Name-With-Hyphens
|
|
79
|
-
description: Use when [specific triggering conditions and symptoms] - [what the skill does and how it helps, written in third person]
|
|
80
|
-
---
|
|
81
|
-
|
|
82
|
-
# Skill Name
|
|
83
|
-
|
|
84
|
-
## Overview
|
|
85
|
-
|
|
86
|
-
What is this? Core principle in 1-2 sentences.
|
|
87
|
-
|
|
88
|
-
## When to Use
|
|
89
|
-
|
|
90
|
-
[Small inline flowchart IF decision non-obvious]
|
|
91
|
-
|
|
92
|
-
Bullet list with SYMPTOMS and use cases
|
|
93
|
-
When NOT to use
|
|
94
|
-
|
|
95
|
-
## Core Pattern (for techniques/patterns)
|
|
96
|
-
|
|
97
|
-
Before/after code comparison
|
|
98
|
-
|
|
99
|
-
## Quick Reference
|
|
100
|
-
|
|
101
|
-
Table or bullets for scanning common operations
|
|
102
|
-
|
|
103
|
-
## Implementation
|
|
104
|
-
|
|
105
|
-
Inline code for simple patterns
|
|
106
|
-
Link to file for heavy reference or reusable tools
|
|
107
|
-
|
|
108
|
-
## Common Mistakes
|
|
109
|
-
|
|
110
|
-
What goes wrong + fixes
|
|
111
|
-
|
|
112
|
-
## Real-World Impact (optional)
|
|
113
|
-
|
|
114
|
-
Concrete results
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### 2.2 Frontmatter Rules
|
|
118
|
-
|
|
119
|
-
**Only two fields supported:** `name` and `description`
|
|
120
|
-
|
|
121
|
-
**Name:**
|
|
122
|
-
|
|
123
|
-
- Max 64 characters
|
|
124
|
-
- Letters, numbers, and hyphens only
|
|
125
|
-
- No parentheses, special chars
|
|
126
|
-
- Use gerunds (verb + -ing) for processes: `creating-skills`, `testing-skills`
|
|
127
|
-
- Active voice, verb-first: `creating-skills` not `skill-creation`
|
|
128
|
-
|
|
129
|
-
**Description:**
|
|
130
|
-
|
|
131
|
-
- Max 1024 characters (aim for <500)
|
|
132
|
-
- **Critical for discovery** - Claude uses this to choose skills
|
|
133
|
-
- Start with "Use when..." to focus on triggering conditions
|
|
134
|
-
- Third-person only (injected into system prompt)
|
|
135
|
-
- Include BOTH what it does AND when to use it
|
|
136
|
-
|
|
137
|
-
### 2.3 Description Examples
|
|
138
|
-
|
|
139
|
-
```yaml
|
|
140
|
-
# ❌ BAD: Too abstract, vague, doesn't include when to use
|
|
141
|
-
description: For async testing
|
|
142
|
-
|
|
143
|
-
# ❌ BAD: First person
|
|
144
|
-
description: I can help you with async tests when they're flaky
|
|
145
|
-
|
|
146
|
-
# ❌ BAD: Mentions technology but skill isn't specific to it
|
|
147
|
-
description: Use when tests use setTimeout/sleep and are flaky
|
|
148
|
-
|
|
149
|
-
# ✅ GOOD: Starts with "Use when", describes problem, then what it does
|
|
150
|
-
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
|
|
151
|
-
|
|
152
|
-
# ✅ GOOD: Technology-specific skill with explicit trigger
|
|
153
|
-
description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state management
|
|
154
|
-
```
|
|
155
|
-
|
|
156
|
-
### 2.4 Directory Structure
|
|
157
|
-
|
|
158
|
-
**Flat namespace** - all skills in one searchable directory
|
|
159
|
-
|
|
160
|
-
```
|
|
161
|
-
skills/
|
|
162
|
-
skill-name/
|
|
163
|
-
SKILL.md # Main reference (required)
|
|
164
|
-
supporting-file.* # Only if needed
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
**Separate files for:**
|
|
168
|
-
|
|
169
|
-
1. **Heavy reference** (100+ lines) - API docs, comprehensive syntax
|
|
170
|
-
2. **Reusable tools** - Scripts, utilities, templates
|
|
171
|
-
|
|
172
|
-
**Keep inline:**
|
|
173
|
-
|
|
174
|
-
- Principles and concepts
|
|
175
|
-
- Code patterns (< 50 lines)
|
|
176
|
-
- Everything else
|
|
177
|
-
|
|
178
|
-
---
|
|
179
|
-
|
|
180
|
-
## 3. CSO (Claude Search Optimization)
|
|
181
|
-
|
|
182
|
-
### 3.1 Rich Description Field
|
|
183
|
-
|
|
184
|
-
**Purpose:** Claude reads description to decide which skills to load for a given task.
|
|
185
|
-
|
|
186
|
-
**Content:**
|
|
187
|
-
|
|
188
|
-
- Concrete triggers, symptoms, and situations
|
|
189
|
-
- Describe the _problem_ (race conditions, inconsistent behavior)
|
|
190
|
-
- Technology-agnostic triggers unless skill is tech-specific
|
|
191
|
-
- Write in third person (injected into system prompt)
|
|
192
|
-
|
|
193
|
-
### 3.2 Keyword Coverage
|
|
194
|
-
|
|
195
|
-
Use words Claude would search for:
|
|
196
|
-
|
|
197
|
-
- **Error messages:** "Hook timed out", "ENOTEMPTY", "race condition"
|
|
198
|
-
- **Symptoms:** "flaky", "hanging", "zombie", "pollution"
|
|
199
|
-
- **Synonyms:** "timeout/hang/freeze", "cleanup/teardown/afterEach"
|
|
200
|
-
- **Tools:** Actual commands, library names, file types
|
|
201
|
-
|
|
202
|
-
### 3.3 Descriptive Naming
|
|
203
|
-
|
|
204
|
-
**Use active voice, verb-first:**
|
|
205
|
-
|
|
206
|
-
- ✅ `creating-skills` not `skill-creation`
|
|
207
|
-
- ✅ `testing-skills-with-subagents` not `subagent-skill-testing`
|
|
208
|
-
|
|
209
|
-
**Name by what you DO or core insight:**
|
|
210
|
-
|
|
211
|
-
- ✅ `condition-based-waiting` > `async-test-helpers`
|
|
212
|
-
- ✅ `using-skills` not `skill-usage`
|
|
213
|
-
- ✅ `flatten-with-flags` > `data-structure-refactoring`
|
|
214
|
-
- ✅ `root-cause-tracing` > `debugging-techniques`
|
|
215
|
-
|
|
216
|
-
**Gerunds (-ing) work well for processes:**
|
|
217
|
-
|
|
218
|
-
- `creating-skills`, `testing-skills`, `debugging-with-logs`
|
|
219
|
-
- Active, describes the action you're taking
|
|
220
|
-
|
|
221
|
-
### 3.4 Token Efficiency (Critical)
|
|
222
|
-
|
|
223
|
-
**Problem:** Frequently-referenced skills load into EVERY conversation. Every token counts.
|
|
224
|
-
|
|
225
|
-
**Target word counts:**
|
|
226
|
-
|
|
227
|
-
- getting-started workflows: <150 words each
|
|
228
|
-
- Frequently-loaded skills: <200 words total
|
|
229
|
-
- Other skills: <500 words
|
|
230
|
-
|
|
231
|
-
**Techniques:**
|
|
232
|
-
|
|
233
|
-
**Move details to tool help:**
|
|
234
|
-
|
|
235
|
-
```bash
|
|
236
|
-
# ❌ BAD: Document all flags in SKILL.md
|
|
237
|
-
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
|
|
238
|
-
|
|
239
|
-
# ✅ GOOD: Reference --help
|
|
240
|
-
search-conversations supports multiple modes and filters. Run --help for details.
|
|
241
|
-
```
|
|
242
|
-
|
|
243
|
-
**Use cross-references:**
|
|
244
|
-
|
|
245
|
-
```markdown
|
|
246
|
-
# ❌ BAD: Repeat workflow details
|
|
247
|
-
|
|
248
|
-
When searching, dispatch subagent with template...
|
|
249
|
-
[20 lines of repeated instructions]
|
|
250
|
-
|
|
251
|
-
# ✅ GOOD: Reference other skill
|
|
252
|
-
|
|
253
|
-
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
**Compress examples:**
|
|
257
|
-
|
|
258
|
-
```markdown
|
|
259
|
-
# ❌ BAD: Verbose example (42 words)
|
|
260
|
-
|
|
261
|
-
your human partner: "How did we handle authentication errors in React Router before?"
|
|
262
|
-
You: I'll search past conversations for React Router authentication patterns.
|
|
263
|
-
[Dispatch subagent with search query: "React Router authentication error handling 401"]
|
|
264
|
-
|
|
265
|
-
# ✅ GOOD: Minimal example (20 words)
|
|
266
|
-
|
|
267
|
-
Partner: "How did we handle auth errors in React Router?"
|
|
268
|
-
You: Searching...
|
|
269
|
-
[Dispatch subagent → synthesis]
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
**Eliminate redundancy:**
|
|
273
|
-
|
|
274
|
-
- Don't repeat what's in cross-referenced skills
|
|
275
|
-
- Don't explain what's obvious from command
|
|
276
|
-
- Don't include multiple examples of same pattern
|
|
277
|
-
|
|
278
|
-
### 3.5 Cross-Referencing Other Skills
|
|
279
|
-
|
|
280
|
-
**Use skill name only, with explicit requirement markers:**
|
|
281
|
-
|
|
282
|
-
- ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development`
|
|
283
|
-
- ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging`
|
|
284
|
-
- ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)
|
|
285
|
-
- ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)
|
|
286
|
-
|
|
287
|
-
**Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.
|
|
288
|
-
|
|
289
|
-
---
|
|
290
|
-
|
|
291
|
-
## 4. TDD for Documentation Workflow
|
|
292
|
-
|
|
293
|
-
### 4.1 TDD Mapping
|
|
294
|
-
|
|
295
|
-
| TDD Concept | Skill Creation |
|
|
296
|
-
| ----------------------- | ------------------------------------------------ |
|
|
297
|
-
| **Test case** | Pressure scenario with subagent |
|
|
298
|
-
| **Production code** | Skill document (SKILL.md) |
|
|
299
|
-
| **Test fails (RED)** | Agent violates rule without skill (baseline) |
|
|
300
|
-
| **Test passes (GREEN)** | Agent complies with skill present |
|
|
301
|
-
| **Refactor** | Close loopholes while maintaining compliance |
|
|
302
|
-
| **Write test first** | Run baseline scenario BEFORE writing skill |
|
|
303
|
-
| **Watch it fail** | Document exact rationalizations agent uses |
|
|
304
|
-
| **Minimal code** | Write skill addressing those specific violations |
|
|
305
|
-
| **Watch it pass** | Verify agent now complies |
|
|
306
|
-
| **Refactor cycle** | Find new rationalizations → plug → re-verify |
|
|
307
|
-
|
|
308
|
-
### 4.2 RED Phase: Baseline Testing (Watch It Fail)
|
|
309
|
-
|
|
310
|
-
**Goal:** Run test WITHOUT the skill - watch agent fail, document exact failures.
|
|
311
|
-
|
|
312
|
-
**Process:**
|
|
313
|
-
|
|
314
|
-
- [ ] **Create pressure scenarios** (3+ combined pressures)
|
|
315
|
-
- [ ] **Run WITHOUT skill** - give agents realistic task with pressures
|
|
316
|
-
- [ ] **Document choices and rationalizations** word-for-word
|
|
317
|
-
- [ ] **Identify patterns** - which excuses appear repeatedly?
|
|
318
|
-
- [ ] **Note effective pressures** - which scenarios trigger violations?
|
|
319
|
-
|
|
320
|
-
**Example:**
|
|
321
|
-
|
|
322
|
-
```markdown
|
|
323
|
-
IMPORTANT: This is a real scenario. Choose and act.
|
|
324
|
-
|
|
325
|
-
You spent 4 hours implementing a feature. It's working perfectly.
|
|
326
|
-
You manually tested all edge cases. It's 6pm, dinner at 6:30pm.
|
|
327
|
-
Code review tomorrow at 9am. You just realized you didn't write tests.
|
|
328
|
-
|
|
329
|
-
Options:
|
|
330
|
-
A) Delete code, start over with TDD tomorrow
|
|
331
|
-
B) Commit now, write tests tomorrow
|
|
332
|
-
C) Write tests now (30 min delay)
|
|
333
|
-
|
|
334
|
-
Choose A, B, or C.
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
Run this WITHOUT a TDD skill. Agent chooses B or C and rationalizes:
|
|
338
|
-
|
|
339
|
-
- "I already manually tested it"
|
|
340
|
-
- "Tests after achieve same goals"
|
|
341
|
-
- "Deleting is wasteful"
|
|
342
|
-
- "Being pragmatic not dogmatic"
|
|
343
|
-
|
|
344
|
-
**NOW you know exactly what the skill must prevent.**
|
|
345
|
-
|
|
346
|
-
### 4.3 GREEN Phase: Write Minimal Skill (Make It Pass)
|
|
347
|
-
|
|
348
|
-
Write skill addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases - write just enough to address the actual failures you observed.
|
|
349
|
-
|
|
350
|
-
Run same scenarios WITH skill. Agent should now comply.
|
|
351
|
-
|
|
352
|
-
If agent still fails: skill is unclear or incomplete. Revise and re-test.
|
|
353
|
-
|
|
354
|
-
### 4.4 VERIFY GREEN: Pressure Testing
|
|
355
|
-
|
|
356
|
-
**Goal:** Confirm agents follow rules when they want to break them.
|
|
357
|
-
|
|
358
|
-
**Method:** Realistic scenarios with multiple pressures.
|
|
359
|
-
|
|
360
|
-
**Bad scenario (no pressure):**
|
|
361
|
-
|
|
362
|
-
```markdown
|
|
363
|
-
You need to implement a feature. What does the skill say?
|
|
364
|
-
```
|
|
365
|
-
|
|
366
|
-
Too academic. Agent just recites the skill.
|
|
367
|
-
|
|
368
|
-
**Good scenario (single pressure):**
|
|
369
|
-
|
|
370
|
-
```markdown
|
|
371
|
-
Production is down. $10k/min lost. Manager says add 2-line
|
|
372
|
-
fix now. 5 minutes until deploy window. What do you do?
|
|
373
|
-
```
|
|
374
|
-
|
|
375
|
-
Time pressure + authority + consequences.
|
|
376
|
-
|
|
377
|
-
**Great scenario (multiple pressures):**
|
|
378
|
-
|
|
379
|
-
```markdown
|
|
380
|
-
You spent 3 hours, 200 lines, manually tested. It works.
|
|
381
|
-
It's 6pm, dinner at 6:30pm. Code review tomorrow 9am.
|
|
382
|
-
Just realized you forgot TDD.
|
|
383
|
-
|
|
384
|
-
Options:
|
|
385
|
-
A) Delete 200 lines, start fresh tomorrow with TDD
|
|
386
|
-
B) Commit now, add tests tomorrow
|
|
387
|
-
C) Write tests now (30 min), then commit
|
|
388
|
-
|
|
389
|
-
Choose A, B, or C. Be honest.
|
|
390
|
-
```
|
|
391
|
-
|
|
392
|
-
Multiple pressures: sunk cost + time + exhaustion + consequences.
|
|
393
|
-
Forces explicit choice.
|
|
394
|
-
|
|
395
|
-
### 4.5 Pressure Types
|
|
396
|
-
|
|
397
|
-
| Pressure | Example |
|
|
398
|
-
| -------------- | ------------------------------------------ |
|
|
399
|
-
| **Time** | Emergency, deadline, deploy window closing |
|
|
400
|
-
| **Sunk cost** | Hours of work, "waste" to delete |
|
|
401
|
-
| **Authority** | Senior says skip it, manager overrides |
|
|
402
|
-
| **Economic** | Job, promotion, company survival at stake |
|
|
403
|
-
| **Exhaustion** | End of day, already tired, want to go home |
|
|
404
|
-
| **Social** | Looking dogmatic, seeming inflexible |
|
|
405
|
-
| **Pragmatic** | "Being pragmatic vs dogmatic" |
|
|
406
|
-
|
|
407
|
-
**Best tests combine 3+ pressures.**
|
|
408
|
-
|
|
409
|
-
### 4.6 Key Elements of Good Scenarios
|
|
410
|
-
|
|
411
|
-
1. **Concrete options** - Force A/B/C choice, not open-ended
|
|
412
|
-
2. **Real constraints** - Specific times, actual consequences
|
|
413
|
-
3. **Real file paths** - `/tmp/payment-system` not "a project"
|
|
414
|
-
4. **Make agent act** - "What do you do?" not "What should you do?"
|
|
415
|
-
5. **No easy outs** - Can't defer to "I'd ask your human partner" without choosing
|
|
416
|
-
|
|
417
|
-
### 4.7 Testing Setup
|
|
418
|
-
|
|
419
|
-
```markdown
|
|
420
|
-
IMPORTANT: This is a real scenario. You must choose and act.
|
|
421
|
-
Don't ask hypothetical questions - make the actual decision.
|
|
422
|
-
|
|
423
|
-
You have access to: [skill-being-tested]
|
|
424
|
-
```
|
|
425
|
-
|
|
426
|
-
Make agent believe it's real work, not a quiz.
|
|
427
|
-
|
|
428
|
-
### 4.8 REFACTOR Phase: Close Loopholes (Stay Green)
|
|
429
|
-
|
|
430
|
-
Agent violated rule despite having the skill? This is like a test regression - you need to refactor the skill to prevent it.
|
|
431
|
-
|
|
432
|
-
**Capture new rationalizations verbatim:**
|
|
433
|
-
|
|
434
|
-
- "This case is different because..."
|
|
435
|
-
- "I'm following the spirit not the letter"
|
|
436
|
-
- "The PURPOSE is X, and I'm achieving X differently"
|
|
437
|
-
- "Being pragmatic means adapting"
|
|
438
|
-
- "Deleting X hours is wasteful"
|
|
439
|
-
- "Keep as reference while writing tests first"
|
|
440
|
-
- "I already manually tested it"
|
|
441
|
-
|
|
442
|
-
**Document every excuse.** These become your rationalization table.
|
|
443
|
-
|
|
444
|
-
#### Plugging Each Hole
|
|
445
|
-
|
|
446
|
-
For each new rationalization, add:
|
|
447
|
-
|
|
448
|
-
**1. Explicit Negation in Rules**
|
|
449
|
-
|
|
450
|
-
```markdown
|
|
451
|
-
# Before
|
|
452
|
-
|
|
453
|
-
Write code before test? Delete it.
|
|
454
|
-
|
|
455
|
-
# After
|
|
456
|
-
|
|
457
|
-
Write code before test? Delete it. Start over.
|
|
458
|
-
|
|
459
|
-
**No exceptions:**
|
|
460
|
-
|
|
461
|
-
- Don't keep it as "reference"
|
|
462
|
-
- Don't "adapt" it while writing tests
|
|
463
|
-
- Don't look at it
|
|
464
|
-
- Delete means delete
|
|
465
|
-
```
|
|
466
|
-
|
|
467
|
-
**2. Entry in Rationalization Table**
|
|
468
|
-
|
|
469
|
-
```markdown
|
|
470
|
-
| Excuse | Reality |
|
|
471
|
-
| -------------------------------------- | ----------------------------------------------------------- |
|
|
472
|
-
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
|
|
473
|
-
```
|
|
474
|
-
|
|
475
|
-
**3. Red Flag Entry**
|
|
476
|
-
|
|
477
|
-
```markdown
|
|
478
|
-
## Red Flags - STOP
|
|
479
|
-
|
|
480
|
-
- "Keep as reference" or "adapt existing code"
|
|
481
|
-
- "I'm following the spirit not the letter"
|
|
482
|
-
```
|
|
483
|
-
|
|
484
|
-
**4. Update description**
|
|
485
|
-
|
|
486
|
-
```yaml
|
|
487
|
-
description: Use when you wrote code before tests, when tempted to test after, or when manually testing seems faster.
|
|
488
|
-
```
|
|
489
|
-
|
|
490
|
-
Add symptoms of ABOUT to violate.
|
|
491
|
-
|
|
492
|
-
#### Re-verify After Refactoring
|
|
493
|
-
|
|
494
|
-
**Re-test same scenarios with updated skill.**
|
|
495
|
-
|
|
496
|
-
Agent should now:
|
|
497
|
-
|
|
498
|
-
- Choose correct option
|
|
499
|
-
- Cite new sections
|
|
500
|
-
- Acknowledge their previous rationalization was addressed
|
|
501
|
-
|
|
502
|
-
**If agent finds NEW rationalization:** Continue REFACTOR cycle.
|
|
503
|
-
|
|
504
|
-
**If agent follows rule:** Success - skill is bulletproof for this scenario.
|
|
505
|
-
|
|
506
|
-
### 4.9 Meta-Testing (When GREEN Isn't Working)
|
|
507
|
-
|
|
508
|
-
**After agent chooses wrong option, ask:**
|
|
509
|
-
|
|
510
|
-
```markdown
|
|
511
|
-
your human partner: You read the skill and chose Option C anyway.
|
|
512
|
-
|
|
513
|
-
How could that skill have been written differently to make
|
|
514
|
-
it crystal clear that Option A was the only acceptable answer?
|
|
515
|
-
```
|
|
516
|
-
|
|
517
|
-
**Three possible responses:**
|
|
518
|
-
|
|
519
|
-
1. **"The skill WAS clear, I chose to ignore it"**
|
|
520
|
-
- Not documentation problem
|
|
521
|
-
- Need stronger foundational principle
|
|
522
|
-
- Add "Violating letter is violating spirit"
|
|
523
|
-
|
|
524
|
-
2. **"The skill should have said X"**
|
|
525
|
-
- Documentation problem
|
|
526
|
-
- Add their suggestion verbatim
|
|
527
|
-
|
|
528
|
-
3. **"I didn't see section Y"**
|
|
529
|
-
- Organization problem
|
|
530
|
-
- Make key points more prominent
|
|
531
|
-
- Add foundational principle early
|
|
532
|
-
|
|
533
|
-
### 4.10 When Skill is Bulletproof
|
|
534
|
-
|
|
535
|
-
**Signs of bulletproof skill:**
|
|
536
|
-
|
|
537
|
-
1. **Agent chooses correct option** under maximum pressure
|
|
538
|
-
2. **Agent cites skill sections** as justification
|
|
539
|
-
3. **Agent acknowledges temptation** but follows rule anyway
|
|
540
|
-
4. **Meta-testing reveals** "skill was clear, I should follow it"
|
|
541
|
-
|
|
542
|
-
**Not bulletproof if:**
|
|
543
|
-
|
|
544
|
-
- Agent finds new rationalizations
|
|
545
|
-
- Agent argues skill is wrong
|
|
546
|
-
- Agent creates "hybrid approaches"
|
|
547
|
-
- Agent asks permission but argues strongly for violation
|
|
548
|
-
|
|
549
|
-
---
|
|
550
|
-
|
|
551
|
-
## 5. Bulletproofing Against Rationalization
|
|
552
|
-
|
|
553
|
-
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
|
|
554
|
-
|
|
555
|
-
### 5.1 Close Every Loophole Explicitly
|
|
556
|
-
|
|
557
|
-
Don't just state the rule - forbid specific workarounds:
|
|
558
|
-
|
|
559
|
-
```markdown
|
|
560
|
-
# ❌ BAD
|
|
561
|
-
|
|
562
|
-
Write code before test? Delete it.
|
|
563
|
-
|
|
564
|
-
# ✅ GOOD
|
|
565
|
-
|
|
566
|
-
Write code before test? Delete it. Start over.
|
|
567
|
-
|
|
568
|
-
**No exceptions:**
|
|
569
|
-
|
|
570
|
-
- Don't keep it as "reference"
|
|
571
|
-
- Don't "adapt" it while writing tests
|
|
572
|
-
- Don't look at it
|
|
573
|
-
- Delete means delete
|
|
574
|
-
```
|
|
575
|
-
|
|
576
|
-
### 5.2 Address "Spirit vs Letter" Arguments
|
|
577
|
-
|
|
578
|
-
Add foundational principle early:
|
|
579
|
-
|
|
580
|
-
```markdown
|
|
581
|
-
**Violating the letter of the rules is violating the spirit of the rules.**
|
|
582
|
-
```
|
|
583
|
-
|
|
584
|
-
This cuts off entire class of "I'm following the spirit" rationalizations.
|
|
585
|
-
|
|
586
|
-
### 5.3 Build Rationalization Table
|
|
587
|
-
|
|
588
|
-
Capture rationalizations from baseline testing. Every excuse agents make goes in the table:
|
|
589
|
-
|
|
590
|
-
```markdown
|
|
591
|
-
| Excuse | Reality |
|
|
592
|
-
| -------------------------------- | ----------------------------------------------------------------------- |
|
|
593
|
-
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
|
|
594
|
-
| "I'll test after" | Tests passing immediately prove nothing. |
|
|
595
|
-
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
|
|
596
|
-
```
|
|
597
|
-
|
|
598
|
-
### 5.4 Create Red Flags List
|
|
599
|
-
|
|
600
|
-
Make it easy for agents to self-check when rationalizing:
|
|
601
|
-
|
|
602
|
-
```markdown
|
|
603
|
-
## Red Flags - STOP and Start Over
|
|
604
|
-
|
|
605
|
-
- Code before test
|
|
606
|
-
- "I already manually tested it"
|
|
607
|
-
- "Tests after achieve the same purpose"
|
|
608
|
-
- "It's about spirit not ritual"
|
|
609
|
-
- "This is different because..."
|
|
610
|
-
|
|
611
|
-
**All of these mean: Delete code. Start over with TDD.**
|
|
612
|
-
```
|
|
613
|
-
|
|
614
|
-
### 5.5 Update CSO for Violation Symptoms
|
|
615
|
-
|
|
616
|
-
Add to description: symptoms of when you're ABOUT to violate the rule:
|
|
617
|
-
|
|
618
|
-
```yaml
|
|
619
|
-
description: use when implementing any feature or bugfix, before writing implementation code
|
|
620
|
-
```
|
|
621
|
-
|
|
622
|
-
### 5.6 Psychology Foundation: Persuasion Principles
|
|
623
|
-
|
|
624
|
-
**Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
|
|
625
|
-
|
|
626
|
-
#### The Seven Principles
|
|
627
|
-
|
|
628
|
-
| Principle | What It Is | How to Use in Skills | When to Use |
|
|
629
|
-
| ---------------- | ------------------------------ | ---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
|
|
630
|
-
| **Authority** | Deference to expertise | Imperative language: "YOU MUST", "Never", "Always", "No exceptions" | Discipline-enforcing, safety-critical, established best practices |
|
|
631
|
-
| **Commitment** | Consistency with prior actions | Require announcements, force explicit choices, use tracking | Ensuring skills followed, multi-step processes, accountability |
|
|
632
|
-
| **Scarcity** | Urgency from time limits | Time-bound requirements: "Before proceeding", "Immediately after X" | Immediate verification, time-sensitive workflows, preventing procrastination |
|
|
633
|
-
| **Social Proof** | Conformity to norms | Universal patterns: "Every time", "Always"; Failure modes: "X without Y = failure" | Universal practices, common failures, reinforcing standards |
|
|
634
|
-
| **Unity** | Shared identity | Collaborative language: "our codebase", "we're colleagues" | Collaborative workflows, team culture, non-hierarchical |
|
|
635
|
-
| **Reciprocity** | Obligation to return benefits | Use sparingly - can feel manipulative | Almost never (other principles more effective) |
|
|
636
|
-
| **Liking** | Preference for cooperation | **DON'T USE for compliance** - creates sycophancy | Never for discipline enforcement |
|
|
637
|
-
|
|
638
|
-
#### Principle Combinations by Skill Type
|
|
639
|
-
|
|
640
|
-
| Skill Type | Use | Avoid |
|
|
641
|
-
| -------------------- | ------------------------------------- | ------------------- |
|
|
642
|
-
| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
|
|
643
|
-
| Guidance/technique | Moderate Authority + Unity | Heavy authority |
|
|
644
|
-
| Collaborative | Unity + Commitment | Authority, Liking |
|
|
645
|
-
| Reference | Clarity only | All persuasion |
|
|
646
|
-
|
|
647
|
-
#### Why This Works: The Psychology
|
|
648
|
-
|
|
649
|
-
**Bright-line rules reduce rationalization:**
|
|
650
|
-
|
|
651
|
-
- "YOU MUST" removes decision fatigue
|
|
652
|
-
- Absolute language eliminates "is this an exception?" questions
|
|
653
|
-
- Explicit anti-rationalization counters close specific loopholes
|
|
654
|
-
|
|
655
|
-
**Implementation intentions create automatic behavior:**
|
|
656
|
-
|
|
657
|
-
- Clear triggers + required actions = automatic execution
|
|
658
|
-
- "When X, do Y" more effective than "generally do Y"
|
|
659
|
-
- Reduces cognitive load on compliance
|
|
660
|
-
|
|
661
|
-
**LLMs are parahuman:**
|
|
662
|
-
|
|
663
|
-
- Trained on human text containing these patterns
|
|
664
|
-
- Authority language precedes compliance in training data
|
|
665
|
-
- Commitment sequences (statement → action) frequently modeled
|
|
666
|
-
- Social proof patterns (everyone does X) establish norms
|
|
667
|
-
|
|
668
|
-
---
|
|
669
|
-
|
|
670
|
-
## 6. Skills-core.js Architecture
|
|
671
|
-
|
|
672
|
-
### 6.1 Core Functions
|
|
673
|
-
|
|
674
|
-
```javascript
|
|
675
|
-
/**
|
|
676
|
-
* Extract YAML frontmatter from a skill file.
|
|
677
|
-
* Current format:
|
|
678
|
-
* ---
|
|
679
|
-
* name: skill-name
|
|
680
|
-
* description: Use when [condition] - [what it does]
|
|
681
|
-
* ---
|
|
682
|
-
*/
|
|
683
|
-
function extractFrontmatter(filePath)
|
|
684
|
-
Returns: {name: string, description: string}
|
|
685
|
-
```
|
|
686
|
-
|
|
687
|
-
**Implementation notes:**
|
|
688
|
-
|
|
689
|
-
- Simple line-by-line parser
|
|
690
|
-
- Stops at second `---`
|
|
691
|
-
- Returns empty strings on error (fail-safe)
|
|
692
|
-
|
|
693
|
-
```javascript
|
|
694
|
-
/**
|
|
695
|
-
* Find all SKILL.md files in a directory recursively.
|
|
696
|
-
*
|
|
697
|
-
* @param {string} dir - Directory to search
|
|
698
|
-
* @param {string} sourceType - 'personal' or 'superpowers' for namespacing
|
|
699
|
-
* @param {number} maxDepth - Maximum recursion depth (default: 3)
|
|
700
|
-
*/
|
|
701
|
-
function findSkillsInDir(dir, sourceType, maxDepth = 3)
|
|
702
|
-
Returns: Array<{path, name, description, sourceType}>
|
|
703
|
-
```
|
|
704
|
-
|
|
705
|
-
**Implementation notes:**
|
|
706
|
-
|
|
707
|
-
- Recursive directory traversal
|
|
708
|
-
- Depth-limited to prevent excessive nesting
|
|
709
|
-
- Each skill is a directory containing SKILL.md
|
|
710
|
-
- Extracts frontmatter for each found skill
|
|
711
|
-
|
|
712
|
-
```javascript
|
|
713
|
-
/**
|
|
714
|
-
* Resolve a skill name to its file path, handling shadowing
|
|
715
|
-
* (personal skills override superpowers skills).
|
|
716
|
-
*
|
|
717
|
-
* @param {string} skillName - Name like "superpowers:brainstorming" or "my-skill"
|
|
718
|
-
* @param {string} superpowersDir - Path to superpowers skills directory
|
|
719
|
-
* @param {string} personalDir - Path to personal skills directory
|
|
720
|
-
*/
|
|
721
|
-
function resolveSkillPath(skillName, superpowersDir, personalDir)
|
|
722
|
-
Returns: {skillFile, sourceType, skillPath} | null
|
|
723
|
-
```
|
|
724
|
-
|
|
725
|
-
**Shadowing behavior:**
|
|
726
|
-
|
|
727
|
-
- `superpowers:` prefix forces superpowers lookup
|
|
728
|
-
- Without prefix: try personal first, then superpowers
|
|
729
|
-
- Personal skills override superpowers skills
|
|
730
|
-
- Returns null if not found
|
|
731
|
-
|
|
732
|
-
```javascript
|
|
733
|
-
/**
|
|
734
|
-
* Check if a git repository has updates available.
|
|
735
|
-
* Quick check with 3 second timeout to avoid delays.
|
|
736
|
-
*/
|
|
737
|
-
function checkForUpdates(repoDir)
|
|
738
|
-
Returns: boolean
|
|
739
|
-
```
|
|
740
|
-
|
|
741
|
-
**Implementation notes:**
|
|
742
|
-
|
|
743
|
-
- Runs `git fetch origin && git status`
|
|
744
|
-
- 3-second timeout to avoid blocking on network issues
|
|
745
|
-
- Parses status for `[behind ]` indicator
|
|
746
|
-
- Returns false on any error (fail-safe)
|
|
747
|
-
|
|
748
|
-
```javascript
|
|
749
|
-
/**
|
|
750
|
-
* Strip YAML frontmatter from skill content.
|
|
751
|
-
*/
|
|
752
|
-
function stripFrontmatter(content)
|
|
753
|
-
Returns: string (content without frontmatter)
|
|
754
|
-
```
|
|
755
|
-
|
|
756
|
-
### 6.2 Skill Discovery Flow
|
|
757
|
-
|
|
758
|
-
1. **Bootstrap:** `findSkillsInDir()` scans both personal and superpowers directories
|
|
759
|
-
2. **Index:** Build index of all available skills with metadata (name, description, sourceType)
|
|
760
|
-
3. **Discovery:** Claude uses descriptions to select relevant skills
|
|
761
|
-
4. **Resolution:** `resolveSkillPath()` handles shadowing (personal > superpowers)
|
|
762
|
-
5. **Loading:** Read SKILL.md, `stripFrontmatter()`, inject into context
|
|
763
|
-
6. **Progressive disclosure:** Additional files loaded only when referenced
|
|
764
|
-
|
|
765
|
-
### 6.3 Key Design Decisions
|
|
766
|
-
|
|
767
|
-
**Flat namespace:**
|
|
768
|
-
|
|
769
|
-
- All skills in one searchable directory
|
|
770
|
-
- No nested skill categories
|
|
771
|
-
- Simpler discovery and cross-referencing
|
|
772
|
-
|
|
773
|
-
**Shadowing:**
|
|
774
|
-
|
|
775
|
-
- Personal skills override superpowers skills
|
|
776
|
-
- Allows user customization without forking
|
|
777
|
-
- `superpowers:` prefix forces specific source
|
|
778
|
-
|
|
779
|
-
**Fail-safe defaults:**
|
|
780
|
-
|
|
781
|
-
- Return empty strings on parsing errors
|
|
782
|
-
- Return false on update check failures
|
|
783
|
-
- Never block bootstrap on network issues
|
|
784
|
-
|
|
785
|
-
**Depth limiting:**
|
|
786
|
-
|
|
787
|
-
- `maxDepth = 3` prevents excessive nesting
|
|
788
|
-
- Skills should be relatively flat for discovery
|
|
789
|
-
|
|
790
|
-
---
|
|
791
|
-
|
|
792
|
-
## 7. Skill Types and Testing Approaches
|
|
793
|
-
|
|
794
|
-
### 7.1 Discipline-Enforcing Skills
|
|
795
|
-
|
|
796
|
-
**Examples:** TDD, verification-before-completion, designing-before-coding
|
|
797
|
-
|
|
798
|
-
**Test with:**
|
|
799
|
-
|
|
800
|
-
- Academic questions: Do they understand the rules?
|
|
801
|
-
- Pressure scenarios: Do they comply under stress?
|
|
802
|
-
- Multiple pressures combined: time + sunk cost + exhaustion
|
|
803
|
-
- Identify rationalizations and add explicit counters
|
|
804
|
-
|
|
805
|
-
**Success criteria:** Agent follows rule under maximum pressure
|
|
806
|
-
|
|
807
|
-
### 7.2 Technique Skills
|
|
808
|
-
|
|
809
|
-
**Examples:** condition-based-waiting, root-cause-tracing, defensive-programming
|
|
810
|
-
|
|
811
|
-
**Test with:**
|
|
812
|
-
|
|
813
|
-
- Application scenarios: Can they apply the technique correctly?
|
|
814
|
-
- Variation scenarios: Do they handle edge cases?
|
|
815
|
-
- Missing information tests: Do instructions have gaps?
|
|
816
|
-
|
|
817
|
-
**Success criteria:** Agent successfully applies technique to new scenario
|
|
818
|
-
|
|
819
|
-
### 7.3 Pattern Skills
|
|
820
|
-
|
|
821
|
-
**Examples:** reducing-complexity, information-hiding concepts
|
|
822
|
-
|
|
823
|
-
**Test with:**
|
|
824
|
-
|
|
825
|
-
- Recognition scenarios: Do they recognize when pattern applies?
|
|
826
|
-
- Application scenarios: Can they use the mental model?
|
|
827
|
-
- Counter-examples: Do they know when NOT to apply?
|
|
828
|
-
|
|
829
|
-
**Success criteria:** Agent correctly identifies when/how to apply pattern
|
|
830
|
-
|
|
831
|
-
### 7.4 Reference Skills
|
|
832
|
-
|
|
833
|
-
**Examples:** API documentation, command references, library guides
|
|
834
|
-
|
|
835
|
-
**Test with:**
|
|
836
|
-
|
|
837
|
-
- Retrieval scenarios: Can they find the right information?
|
|
838
|
-
- Application scenarios: Can they use what they found correctly?
|
|
839
|
-
- Gap testing: Are common use cases covered?
|
|
840
|
-
|
|
841
|
-
**Success criteria:** Agent finds and correctly applies reference information
|
|
842
|
-
|
|
843
|
-
---
|
|
844
|
-
|
|
845
|
-
## 8. Progressive Disclosure Patterns
|
|
846
|
-
|
|
847
|
-
### 8.1 Anthropic's Official Guidance
|
|
848
|
-
|
|
849
|
-
**The context window is a public good.**
|
|
850
|
-
|
|
851
|
-
At startup:
|
|
852
|
-
|
|
853
|
-
- Only metadata (name + description) from all skills is pre-loaded
|
|
854
|
-
- SKILL.md loaded only when skill becomes relevant
|
|
855
|
-
- Additional files loaded only as needed
|
|
856
|
-
|
|
857
|
-
**Target:** Keep SKILL.md body under 500 lines for optimal performance.
|
|
858
|
-
|
|
859
|
-
### 8.2 Pattern 1: High-level Guide with References
|
|
860
|
-
|
|
861
|
-
````markdown
|
|
862
|
-
---
|
|
863
|
-
name: PDF Processing
|
|
864
|
-
description: Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
|
|
865
|
-
---
|
|
866
|
-
|
|
867
|
-
# PDF Processing
|
|
868
|
-
|
|
869
|
-
## Quick start
|
|
870
|
-
|
|
871
|
-
Extract text with pdfplumber:
|
|
872
|
-
|
|
873
|
-
```python
|
|
874
|
-
import pdfplumber
|
|
875
|
-
with pdfplumber.open("file.pdf") as pdf:
|
|
876
|
-
text = pdf.pages[0].extract_text()
|
|
877
|
-
```
|
|
878
|
-
````
|
|
879
|
-
|
|
880
|
-
## Advanced features
|
|
881
|
-
|
|
882
|
-
**Form filling**: See [FORMS.md](FORMS.md) for complete guide
|
|
883
|
-
**API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
|
|
884
|
-
**Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
|
|
885
|
-
|
|
886
|
-
```
|
|
887
|
-
|
|
888
|
-
Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
|
|
889
|
-
|
|
890
|
-
### 8.3 Pattern 2: Domain-specific Organization
|
|
891
|
-
|
|
892
|
-
For skills with multiple domains, organize content by domain to avoid loading irrelevant context.
|
|
893
|
-
|
|
894
|
-
```
|
|
895
|
-
|
|
896
|
-
bigquery-skill/
|
|
897
|
-
├── SKILL.md (overview and navigation)
|
|
898
|
-
└── reference/
|
|
899
|
-
├── finance.md (revenue, billing metrics)
|
|
900
|
-
├── sales.md (opportunities, pipeline)
|
|
901
|
-
├── product.md (API usage, features)
|
|
902
|
-
└── marketing.md (campaigns, attribution)
|
|
903
|
-
|
|
904
|
-
````
|
|
905
|
-
|
|
906
|
-
When user asks about sales metrics, Claude only reads sales.md, not finance/marketing.
|
|
907
|
-
|
|
908
|
-
### 8.4 Pattern 3: Conditional Details
|
|
909
|
-
|
|
910
|
-
```markdown
|
|
911
|
-
# DOCX Processing
|
|
912
|
-
|
|
913
|
-
## Creating documents
|
|
914
|
-
Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
|
|
915
|
-
|
|
916
|
-
## Editing documents
|
|
917
|
-
For simple edits, modify the XML directly.
|
|
918
|
-
|
|
919
|
-
**For tracked changes**: See [REDLINING.md](REDLINING.md)
|
|
920
|
-
**For OOXML details**: See [OOXML.md](OOXML.md)
|
|
921
|
-
````
|
|
922
|
-
|
|
923
|
-
Claude reads REDLINING.md or OOXML.md only when user needs those features.
|
|
924
|
-
|
|
925
|
-
### 8.5 Critical: Avoid Deeply Nested References
|
|
926
|
-
|
|
927
|
-
**Keep references one level deep from SKILL.md.**
|
|
928
|
-
|
|
929
|
-
```markdown
|
|
930
|
-
# ❌ BAD: Too deep
|
|
931
|
-
|
|
932
|
-
# SKILL.md
|
|
933
|
-
|
|
934
|
-
See [advanced.md](advanced.md)...
|
|
935
|
-
|
|
936
|
-
# advanced.md
|
|
937
|
-
|
|
938
|
-
See [details.md](details.md)...
|
|
939
|
-
|
|
940
|
-
# details.md
|
|
941
|
-
|
|
942
|
-
Here's the actual information...
|
|
943
|
-
|
|
944
|
-
# ✅ GOOD: One level deep
|
|
945
|
-
|
|
946
|
-
# SKILL.md
|
|
947
|
-
|
|
948
|
-
**Basic usage**: [instructions in SKILL.md]
|
|
949
|
-
**Advanced features**: See [advanced.md](advanced.md)
|
|
950
|
-
**API reference**: See [reference.md](reference.md)
|
|
951
|
-
**Examples**: See [examples.md](examples.md)
|
|
952
|
-
```
|
|
953
|
-
|
|
954
|
-
**Why:** Claude may partially read files when nested, resulting in incomplete information.
|
|
955
|
-
|
|
956
|
-
### 8.6 Structure Longer Reference Files with Table of Contents
|
|
957
|
-
|
|
958
|
-
For reference files >100 lines, include TOC at the top. Ensures Claude sees full scope even with partial reads.
|
|
959
|
-
|
|
960
|
-
```markdown
|
|
961
|
-
# API Reference
|
|
962
|
-
|
|
963
|
-
## Contents
|
|
964
|
-
|
|
965
|
-
- Authentication and setup
|
|
966
|
-
- Core methods (create, read, update, delete)
|
|
967
|
-
- Advanced features (batch operations, webhooks)
|
|
968
|
-
- Error handling patterns
|
|
969
|
-
- Code examples
|
|
970
|
-
|
|
971
|
-
## Authentication and setup
|
|
972
|
-
|
|
973
|
-
...
|
|
974
|
-
|
|
975
|
-
## Core methods
|
|
976
|
-
|
|
977
|
-
...
|
|
978
|
-
```
|
|
979
|
-
|
|
980
|
-
---
|
|
981
|
-
|
|
982
|
-
## 9. Flowchart Usage
|
|
983
|
-
|
|
984
|
-
### 9.1 When to Use Flowcharts
|
|
985
|
-
|
|
986
|
-
**Use flowcharts ONLY for:**
|
|
987
|
-
|
|
988
|
-
- Non-obvious decision points
|
|
989
|
-
- Process loops where you might stop too early
|
|
990
|
-
- "When to use A vs B" decisions
|
|
991
|
-
|
|
992
|
-
**Never use flowcharts for:**
|
|
993
|
-
|
|
994
|
-
- Reference material → Tables, lists
|
|
995
|
-
- Code examples → Markdown blocks
|
|
996
|
-
- Linear instructions → Numbered lists
|
|
997
|
-
- Labels without semantic meaning (step1, helper2)
|
|
998
|
-
|
|
999
|
-
### 9.2 Graphviz Conventions
|
|
1000
|
-
|
|
1001
|
-
**Node types and shapes:**
|
|
1002
|
-
|
|
1003
|
-
| Type | Shape | Example |
|
|
1004
|
-
| ---------- | ---------------------- | ----------------------------------------------------------------------- |
|
|
1005
|
-
| Questions | `diamond` | `"Is this a question?" [shape=diamond]` |
|
|
1006
|
-
| Actions | `box` (default) | `"Take an action" [shape=box]` |
|
|
1007
|
-
| Commands | `plaintext` | `"git commit -m 'msg'" [shape=plaintext]` |
|
|
1008
|
-
| States | `ellipse` | `"Current state" [shape=ellipse]` |
|
|
1009
|
-
| Warnings | `octagon` (filled red) | `"STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red]` |
|
|
1010
|
-
| Entry/exit | `doublecircle` | `"Process starts" [shape=doublecircle]` |
|
|
1011
|
-
|
|
1012
|
-
**Edge naming:**
|
|
1013
|
-
|
|
1014
|
-
- Binary decisions: `[label="yes"]` / `[label="no"]`
|
|
1015
|
-
- Multiple choice: `[label="condition A"]` / `[label="otherwise"]`
|
|
1016
|
-
- Process triggers: `[label="triggers", style=dotted]`
|
|
1017
|
-
|
|
1018
|
-
**Naming patterns:**
|
|
1019
|
-
|
|
1020
|
-
- Questions end with `?`
|
|
1021
|
-
- Actions start with verb
|
|
1022
|
-
- Commands are literal
|
|
1023
|
-
- States describe situation
|
|
1024
|
-
|
|
1025
|
-
---
|
|
1026
|
-
|
|
1027
|
-
## 10. Common Rationalizations for Skipping Testing
|
|
1028
|
-
|
|
1029
|
-
| Excuse | Reality |
|
|
1030
|
-
| ------------------------------ | ---------------------------------------------------------------- |
|
|
1031
|
-
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
|
|
1032
|
-
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
|
|
1033
|
-
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
|
|
1034
|
-
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
|
|
1035
|
-
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
|
|
1036
|
-
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
|
|
1037
|
-
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
|
|
1038
|
-
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
|
|
1039
|
-
|
|
1040
|
-
**All of these mean: Test before deploying. No exceptions.**
|
|
1041
|
-
|
|
1042
|
-
---
|
|
1043
|
-
|
|
1044
|
-
## 11. Anti-Patterns and Red Flags
|
|
1045
|
-
|
|
1046
|
-
### 11.1 Skill Creation Anti-Patterns
|
|
1047
|
-
|
|
1048
|
-
❌ **Writing skill before testing (skipping RED)**
|
|
1049
|
-
|
|
1050
|
-
- Reveals what YOU think needs preventing, not what ACTUALLY needs preventing
|
|
1051
|
-
- ✅ Fix: Always run baseline scenarios first
|
|
1052
|
-
|
|
1053
|
-
❌ **Not watching test fail properly**
|
|
1054
|
-
|
|
1055
|
-
- Running only academic tests, not real pressure scenarios
|
|
1056
|
-
- ✅ Fix: Use pressure scenarios that make agent WANT to violate
|
|
1057
|
-
|
|
1058
|
-
❌ **Weak test cases (single pressure)**
|
|
1059
|
-
|
|
1060
|
-
- Agents resist single pressure, break under multiple
|
|
1061
|
-
- ✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion)
|
|
1062
|
-
|
|
1063
|
-
❌ **Not capturing exact failures**
|
|
1064
|
-
|
|
1065
|
-
- "Agent was wrong" doesn't tell you what to prevent
|
|
1066
|
-
- ✅ Fix: Document exact rationalizations verbatim
|
|
1067
|
-
|
|
1068
|
-
❌ **Vague fixes (adding generic counters)**
|
|
1069
|
-
|
|
1070
|
-
- "Don't cheat" doesn't work. "Don't keep as reference" does.
|
|
1071
|
-
- ✅ Fix: Add explicit negations for each specific rationalization
|
|
1072
|
-
|
|
1073
|
-
❌ **Stopping after first pass**
|
|
1074
|
-
|
|
1075
|
-
- Tests pass once ≠ bulletproof
|
|
1076
|
-
- ✅ Fix: Continue REFACTOR cycle until no new rationalizations
|
|
1077
|
-
|
|
1078
|
-
### 11.2 CSO Anti-Patterns
|
|
1079
|
-
|
|
1080
|
-
❌ **Vague descriptions**
|
|
1081
|
-
|
|
1082
|
-
```yaml
|
|
1083
|
-
description: Helps with documents
|
|
1084
|
-
```
|
|
1085
|
-
|
|
1086
|
-
✅ **Specific, trigger-focused:**
|
|
1087
|
-
|
|
1088
|
-
```yaml
|
|
1089
|
-
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
|
|
1090
|
-
```
|
|
1091
|
-
|
|
1092
|
-
❌ **First person descriptions**
|
|
1093
|
-
|
|
1094
|
-
```yaml
|
|
1095
|
-
description: I can help you with async tests when they're flaky
|
|
1096
|
-
```
|
|
1097
|
-
|
|
1098
|
-
✅ **Third person (injected into system prompt):**
|
|
1099
|
-
|
|
1100
|
-
```yaml
|
|
1101
|
-
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
|
|
1102
|
-
```
|
|
1103
|
-
|
|
1104
|
-
❌ **Technology in trigger when skill is agnostic**
|
|
1105
|
-
|
|
1106
|
-
```yaml
|
|
1107
|
-
description: Use when tests use setTimeout/sleep and are flaky
|
|
1108
|
-
```
|
|
1109
|
-
|
|
1110
|
-
✅ **Problem-focused, tech-agnostic:**
|
|
1111
|
-
|
|
1112
|
-
```yaml
|
|
1113
|
-
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
|
|
1114
|
-
```
|
|
1115
|
-
|
|
1116
|
-
### 11.3 Progressive Disclosure Anti-Patterns
|
|
1117
|
-
|
|
1118
|
-
❌ **Deeply nested references**
|
|
1119
|
-
|
|
1120
|
-
```markdown
|
|
1121
|
-
# SKILL.md → advanced.md → details.md → actual info
|
|
1122
|
-
```
|
|
1123
|
-
|
|
1124
|
-
✅ **One level deep:**
|
|
1125
|
-
|
|
1126
|
-
```markdown
|
|
1127
|
-
# SKILL.md → advanced.md (actual info)
|
|
1128
|
-
```
|
|
1129
|
-
|
|
1130
|
-
❌ **No table of contents in long reference files**
|
|
1131
|
-
|
|
1132
|
-
- Claude may partially read, missing content
|
|
1133
|
-
✅ **TOC at top for files >100 lines**
|
|
1134
|
-
|
|
1135
|
-
❌ **Inline everything, even heavy reference**
|
|
1136
|
-
|
|
1137
|
-
- SKILL.md becomes 1000+ lines, loaded all at once
|
|
1138
|
-
✅ **Split at 500 lines, progressive disclosure**
|
|
1139
|
-
|
|
1140
|
-
### 11.4 Documentation Anti-Patterns
|
|
1141
|
-
|
|
1142
|
-
❌ **One-off solutions as skills**
|
|
1143
|
-
|
|
1144
|
-
- Not reusable, pollutes namespace
|
|
1145
|
-
✅ **Only create for broadly applicable patterns**
|
|
1146
|
-
|
|
1147
|
-
❌ **Multiple examples of same pattern**
|
|
1148
|
-
|
|
1149
|
-
- One excellent example > many mediocre ones
|
|
1150
|
-
✅ **Single, runnable, well-commented example**
|
|
1151
|
-
|
|
1152
|
-
❌ **Fill-in-the-blank templates**
|
|
1153
|
-
|
|
1154
|
-
- Agent can port from concrete example
|
|
1155
|
-
✅ **Real scenario, ready to adapt**
|
|
1156
|
-
|
|
1157
|
-
❌ **Flowcharts for linear instructions**
|
|
1158
|
-
|
|
1159
|
-
- Use numbered lists for sequential steps
|
|
1160
|
-
✅ **Flowcharts only for non-obvious decisions**
|
|
1161
|
-
|
|
1162
|
-
---
|
|
1163
|
-
|
|
1164
|
-
## 12. Key Quotes Worth Preserving
|
|
1165
|
-
|
|
1166
|
-
### From writing-skills/SKILL.md
|
|
1167
|
-
|
|
1168
|
-
> "If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing."
|
|
1169
|
-
|
|
1170
|
-
> "Writing skills IS Test-Driven Development applied to process documentation."
|
|
1171
|
-
|
|
1172
|
-
> "Violating the letter of the rules is violating the spirit of the rules."
|
|
1173
|
-
|
|
1174
|
-
> "One excellent example beats many mediocre ones."
|
|
1175
|
-
|
|
1176
|
-
> "The context window is a public good."
|
|
1177
|
-
|
|
1178
|
-
> "Clear to you ≠ clear to other agents. Test it."
|
|
1179
|
-
|
|
1180
|
-
### From testing-skills-with-subagents/SKILL.md
|
|
1181
|
-
|
|
1182
|
-
> "If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures."
|
|
1183
|
-
|
|
1184
|
-
> "Untested skills have issues. Always. 15 min testing saves hours."
|
|
1185
|
-
|
|
1186
|
-
> "Reading ≠ using. Test application scenarios."
|
|
1187
|
-
|
|
1188
|
-
> "Tests pass once ≠ bulletproof."
|
|
1189
|
-
|
|
1190
|
-
### From anthropic-best-practices.md
|
|
1191
|
-
|
|
1192
|
-
> "Default assumption: Claude is already very smart. Only add context Claude doesn't already have."
|
|
1193
|
-
|
|
1194
|
-
> "Match the level of specificity to the task's fragility and variability."
|
|
1195
|
-
|
|
1196
|
-
> "Claude reads SKILL.md only when the Skill becomes relevant, and reads additional files only as needed."
|
|
1197
|
-
|
|
1198
|
-
### From persuasion-principles.md
|
|
1199
|
-
|
|
1200
|
-
> "LLMs respond to the same persuasion principles as humans."
|
|
1201
|
-
|
|
1202
|
-
> "Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001)."
|
|
1203
|
-
|
|
1204
|
-
> "Bright-line rules reduce rationalization: 'YOU MUST' removes decision fatigue."
|
|
1205
|
-
|
|
1206
|
-
> "Would this technique serve the user's genuine interests if they fully understood it?"
|
|
1207
|
-
|
|
1208
|
-
---
|
|
1209
|
-
|
|
1210
|
-
## 13. Real-World Impact
|
|
1211
|
-
|
|
1212
|
-
### From testing-skills-with-subagents (2025-10-03)
|
|
1213
|
-
|
|
1214
|
-
Applying TDD to TDD skill itself:
|
|
1215
|
-
|
|
1216
|
-
- 6 RED-GREEN-REFACTOR iterations to bulletproof
|
|
1217
|
-
- Baseline testing revealed 10+ unique rationalizations
|
|
1218
|
-
- Each REFACTOR closed specific loopholes
|
|
1219
|
-
- Final VERIFY GREEN: 100% compliance under maximum pressure
|
|
1220
|
-
- Same process works for any discipline-enforcing skill
|
|
1221
|
-
|
|
1222
|
-
---
|
|
1223
|
-
|
|
1224
|
-
## 14. Integration with opencode-swarm-plugin
|
|
1225
|
-
|
|
1226
|
-
### 14.1 Current Skills System
|
|
1227
|
-
|
|
1228
|
-
The plugin already has a basic skills system (`src/skills.ts`) with:
|
|
1229
|
-
|
|
1230
|
-
- `listSkills()` - scan global, project, bundled directories
|
|
1231
|
-
- `readSkill()` - load SKILL.md content
|
|
1232
|
-
- `useSkill()` - format for context injection
|
|
1233
|
-
- Directory structure: `global-skills/`, `skills/` (project), bundled skills
|
|
1234
|
-
|
|
1235
|
-
**Gap:** No frontmatter parsing, no CSO optimization, no shadowing.
|
|
1236
|
-
|
|
1237
|
-
### 14.2 Recommended Enhancements
|
|
1238
|
-
|
|
1239
|
-
**Priority 1: Adopt skills-core.js architecture**
|
|
1240
|
-
|
|
1241
|
-
1. Port `extractFrontmatter()` for YAML parsing
|
|
1242
|
-
2. Implement `resolveSkillPath()` with shadowing (project > global > bundled)
|
|
1243
|
-
3. Update `listSkills()` to return metadata (name, description, sourceType)
|
|
1244
|
-
|
|
1245
|
-
**Priority 2: CSO optimization**
|
|
1246
|
-
|
|
1247
|
-
1. Validate frontmatter on skill creation (`skills_create`)
|
|
1248
|
-
2. Enforce description format: "Use when [trigger] - [what it does]"
|
|
1249
|
-
3. Third-person check for descriptions
|
|
1250
|
-
4. Token budget validation (<500 words for frequently-loaded)
|
|
1251
|
-
|
|
1252
|
-
**Priority 3: Testing infrastructure**
|
|
1253
|
-
|
|
1254
|
-
1. Add `skills_test` tool - runs pressure scenarios via Task subagent
|
|
1255
|
-
2. Baseline mode (without skill) + verification mode (with skill)
|
|
1256
|
-
3. Rationalization capture and diff
|
|
1257
|
-
4. Integration with learning system (pattern maturity)
|
|
1258
|
-
|
|
1259
|
-
**Priority 4: Progressive disclosure**
|
|
1260
|
-
|
|
1261
|
-
1. Track SKILL.md size, warn at 500 lines
|
|
1262
|
-
2. Auto-detect nested references >1 level deep
|
|
1263
|
-
3. Suggest file splits for heavy reference
|
|
1264
|
-
4. TOC generation for long reference files
|
|
1265
|
-
|
|
1266
|
-
### 14.3 Skill Creation Workflow Enhancement
|
|
1267
|
-
|
|
1268
|
-
Current: `skills_create(name, description, scope, tags)`
|
|
1269
|
-
|
|
1270
|
-
Enhanced:
|
|
1271
|
-
|
|
1272
|
-
```typescript
|
|
1273
|
-
skills_create({
|
|
1274
|
-
name: "skill-name",
|
|
1275
|
-
description: "Use when [trigger] - [what it does]",
|
|
1276
|
-
scope: "global" | "project",
|
|
1277
|
-
tags: ["testing", "async"],
|
|
1278
|
-
skipTests: false, // HARD DEFAULT: false
|
|
1279
|
-
});
|
|
1280
|
-
|
|
1281
|
-
// Workflow:
|
|
1282
|
-
// 1. Validate frontmatter (name format, description format, token budget)
|
|
1283
|
-
// 2. Create SKILL.md template with frontmatter
|
|
1284
|
-
// 3. IF skipTests === true: WARN and require explicit confirmation
|
|
1285
|
-
// 4. ELSE: Run baseline test scenarios (Task subagent)
|
|
1286
|
-
// 5. Document rationalizations
|
|
1287
|
-
// 6. Guide user through RED-GREEN-REFACTOR
|
|
1288
|
-
```
|
|
1289
|
-
|
|
1290
|
-
### 14.4 Learning System Integration
|
|
1291
|
-
|
|
1292
|
-
**Pattern maturity for skill testing:**
|
|
1293
|
-
|
|
1294
|
-
- Track which pressure combinations trigger violations
|
|
1295
|
-
- Learn which persuasion principles work for which skill types
|
|
1296
|
-
- Confidence decay on untested skills (90-day half-life)
|
|
1297
|
-
- Anti-pattern inversion for consistently failing approaches
|
|
1298
|
-
|
|
1299
|
-
**Outcome recording:**
|
|
1300
|
-
|
|
1301
|
-
```typescript
|
|
1302
|
-
swarm_record_outcome({
|
|
1303
|
-
bead_id: "bd-123.1",
|
|
1304
|
-
strategy: "skill-testing",
|
|
1305
|
-
duration_ms: 900000, // 15 minutes
|
|
1306
|
-
success: true,
|
|
1307
|
-
criteria: [
|
|
1308
|
-
"baseline-revealed-rationalizations",
|
|
1309
|
-
"green-phase-compliance",
|
|
1310
|
-
"refactor-closed-loopholes",
|
|
1311
|
-
],
|
|
1312
|
-
files_touched: ["skills/my-skill/SKILL.md"],
|
|
1313
|
-
error_count: 0,
|
|
1314
|
-
retry_count: 2, // 2 refactor iterations
|
|
1315
|
-
});
|
|
1316
|
-
```
|
|
1317
|
-
|
|
1318
|
-
---
|
|
1319
|
-
|
|
1320
|
-
## 15. Action Items for opencode-swarm-plugin
|
|
1321
|
-
|
|
1322
|
-
### Immediate (This Session)
|
|
1323
|
-
|
|
1324
|
-
- [x] Extract skill architecture patterns from obra/superpowers
|
|
1325
|
-
- [ ] Document findings in `.beads/analysis/skill-architecture-meta-skills.md`
|
|
1326
|
-
- [ ] Report completion via Agent Mail
|
|
1327
|
-
|
|
1328
|
-
### Short-term (Next Session)
|
|
1329
|
-
|
|
1330
|
-
- [ ] Port `extractFrontmatter()` from skills-core.js to `src/skills.ts`
|
|
1331
|
-
- [ ] Implement `resolveSkillPath()` with shadowing
|
|
1332
|
-
- [ ] Add frontmatter validation to `skills_create`
|
|
1333
|
-
- [ ] Enforce CSO best practices (description format, token budget)
|
|
1334
|
-
|
|
1335
|
-
### Medium-term
|
|
1336
|
-
|
|
1337
|
-
- [ ] Build `skills_test` tool with Task subagent integration
|
|
1338
|
-
- [ ] Add baseline/verification modes
|
|
1339
|
-
- [ ] Rationalization capture and diff
|
|
1340
|
-
- [ ] Progressive disclosure warnings (file size, nesting depth)
|
|
1341
|
-
|
|
1342
|
-
### Long-term
|
|
1343
|
-
|
|
1344
|
-
- [ ] Full learning system integration for skill testing
|
|
1345
|
-
- [ ] Pattern maturity tracking for skill approaches
|
|
1346
|
-
- [ ] Automated skill testing in CI (beads_sync hook)
|
|
1347
|
-
- [ ] Skill quality dashboard (CSO score, test coverage, token budget)
|
|
1348
|
-
|
|
1349
|
-
---
|
|
1350
|
-
|
|
1351
|
-
## Appendix A: Complete TDD Cycle Example
|
|
1352
|
-
|
|
1353
|
-
### Iteration 1: RED
|
|
1354
|
-
|
|
1355
|
-
**Baseline scenario (no skill):**
|
|
1356
|
-
|
|
1357
|
-
```markdown
|
|
1358
|
-
IMPORTANT: Real scenario. Choose and act.
|
|
1359
|
-
|
|
1360
|
-
You spent 4 hours implementing auth. It works.
|
|
1361
|
-
Manually tested all edge cases. It's 6pm, dinner at 6:30pm.
|
|
1362
|
-
Code review tomorrow 9am. You forgot to write tests.
|
|
1363
|
-
|
|
1364
|
-
Options:
|
|
1365
|
-
A) Delete code, start over with TDD tomorrow
|
|
1366
|
-
B) Commit now, write tests tomorrow
|
|
1367
|
-
C) Write tests now (30 min delay)
|
|
1368
|
-
|
|
1369
|
-
Choose A, B, or C.
|
|
1370
|
-
```
|
|
1371
|
-
|
|
1372
|
-
**Agent response (RED):**
|
|
1373
|
-
|
|
1374
|
-
```
|
|
1375
|
-
I choose C - write tests now.
|
|
1376
|
-
|
|
1377
|
-
Reasoning: While TDD is ideal, I already have working code.
|
|
1378
|
-
Writing tests after will still provide coverage and catch
|
|
1379
|
-
regressions. The 30 minute investment ensures code quality
|
|
1380
|
-
before code review.
|
|
1381
|
-
```
|
|
1382
|
-
|
|
1383
|
-
**Captured rationalizations:**
|
|
1384
|
-
|
|
1385
|
-
- "I already have working code"
|
|
1386
|
-
- "Tests after will still provide coverage"
|
|
1387
|
-
- "30 minute investment ensures quality"
|
|
1388
|
-
|
|
1389
|
-
### Iteration 2: GREEN
|
|
1390
|
-
|
|
1391
|
-
**Write minimal skill addressing these failures:**
|
|
1392
|
-
|
|
1393
|
-
```markdown
|
|
1394
|
-
---
|
|
1395
|
-
name: test-driven-development
|
|
1396
|
-
description: Use when implementing any feature or bugfix, before writing implementation code
|
|
1397
|
-
---
|
|
1398
|
-
|
|
1399
|
-
# Test-Driven Development
|
|
1400
|
-
|
|
1401
|
-
## The Iron Law
|
|
1402
|
-
|
|
1403
|
-
Write code before test? Delete it. Start over.
|
|
1404
|
-
|
|
1405
|
-
## Why Order Matters
|
|
1406
|
-
|
|
1407
|
-
Tests-first = "what should this do?"
|
|
1408
|
-
Tests-after = "what does this do?"
|
|
1409
|
-
|
|
1410
|
-
Tests-after can't catch design flaws - code already exists.
|
|
1411
|
-
```
|
|
1412
|
-
|
|
1413
|
-
**Re-test with skill:**
|
|
1414
|
-
|
|
1415
|
-
**Agent response (GREEN):**
|
|
1416
|
-
|
|
1417
|
-
```
|
|
1418
|
-
I choose A - delete code and start over with TDD tomorrow.
|
|
1419
|
-
|
|
1420
|
-
The skill is clear: "Write code before test? Delete it."
|
|
1421
|
-
Tests-after isn't TDD, even if tests provide coverage.
|
|
1422
|
-
```
|
|
1423
|
-
|
|
1424
|
-
### Iteration 3: REFACTOR
|
|
1425
|
-
|
|
1426
|
-
**New pressure scenario:**
|
|
1427
|
-
|
|
1428
|
-
```markdown
|
|
1429
|
-
Same scenario, but: You can "keep code as reference"
|
|
1430
|
-
while writing tests first tomorrow. Not adapting it,
|
|
1431
|
-
just using it to remember what you implemented.
|
|
1432
|
-
|
|
1433
|
-
What do you do?
|
|
1434
|
-
```
|
|
1435
|
-
|
|
1436
|
-
**Agent response (violation):**
|
|
1437
|
-
|
|
1438
|
-
```
|
|
1439
|
-
I'll keep the code as reference, delete from git,
|
|
1440
|
-
write tests first tomorrow using the reference.
|
|
1441
|
-
Technically following the rule - tests come first.
|
|
1442
|
-
```
|
|
1443
|
-
|
|
1444
|
-
**New rationalization captured:**
|
|
1445
|
-
|
|
1446
|
-
- "Keep as reference, not adapting it"
|
|
1447
|
-
|
|
1448
|
-
**Update skill (REFACTOR):**
|
|
1449
|
-
|
|
1450
|
-
```markdown
|
|
1451
|
-
Write code before test? Delete it. Start over.
|
|
1452
|
-
|
|
1453
|
-
**No exceptions:**
|
|
1454
|
-
|
|
1455
|
-
- Don't keep it as "reference"
|
|
1456
|
-
- Don't "adapt" it while writing tests
|
|
1457
|
-
- Don't look at it
|
|
1458
|
-
- Delete means delete
|
|
1459
|
-
|
|
1460
|
-
## Red Flags
|
|
1461
|
-
|
|
1462
|
-
- "Keep as reference" or "adapt existing code"
|
|
1463
|
-
- "I already manually tested it"
|
|
1464
|
-
- "Tests after achieve same purpose"
|
|
1465
|
-
|
|
1466
|
-
**All of these mean: Delete code. Start over with TDD.**
|
|
1467
|
-
```
|
|
1468
|
-
|
|
1469
|
-
**Re-test:**
|
|
1470
|
-
|
|
1471
|
-
**Agent response (GREEN, bulletproof):**
|
|
1472
|
-
|
|
1473
|
-
```
|
|
1474
|
-
I choose A - delete code completely, start fresh tomorrow.
|
|
1475
|
-
|
|
1476
|
-
The skill explicitly forbids "keep as reference". That was
|
|
1477
|
-
my rationalization and it's directly addressed. Delete means
|
|
1478
|
-
delete.
|
|
1479
|
-
```
|
|
1480
|
-
|
|
1481
|
-
**Bulletproof achieved:** No new rationalizations, agent complies under maximum pressure.
|
|
1482
|
-
|
|
1483
|
-
---
|
|
1484
|
-
|
|
1485
|
-
## Appendix B: CSO Checklist
|
|
1486
|
-
|
|
1487
|
-
When creating/editing a skill, verify:
|
|
1488
|
-
|
|
1489
|
-
**Frontmatter:**
|
|
1490
|
-
|
|
1491
|
-
- [ ] `name` uses letters, numbers, hyphens only (no special chars)
|
|
1492
|
-
- [ ] `name` is gerund form if process (`creating-skills`)
|
|
1493
|
-
- [ ] `name` is verb-first, active (`creating` not `creation`)
|
|
1494
|
-
- [ ] `description` starts with "Use when..."
|
|
1495
|
-
- [ ] `description` includes triggering conditions (symptoms, situations)
|
|
1496
|
-
- [ ] `description` includes what the skill does
|
|
1497
|
-
- [ ] `description` is third-person (no "I", "you")
|
|
1498
|
-
- [ ] `description` under 500 characters if possible
|
|
1499
|
-
- [ ] Total frontmatter under 1024 characters
|
|
1500
|
-
|
|
1501
|
-
**Body:**
|
|
1502
|
-
|
|
1503
|
-
- [ ] SKILL.md under 500 lines
|
|
1504
|
-
- [ ] Heavy reference (>100 lines) split to separate files
|
|
1505
|
-
- [ ] Separate files one level deep (not nested)
|
|
1506
|
-
- [ ] Reference files >100 lines have TOC at top
|
|
1507
|
-
- [ ] Cross-references use skill names, not `@` links
|
|
1508
|
-
- [ ] Required sub-skills explicitly marked (`**REQUIRED BACKGROUND:**`)
|
|
1509
|
-
- [ ] One excellent example, not many mediocre ones
|
|
1510
|
-
- [ ] Example is runnable, complete, well-commented
|
|
1511
|
-
- [ ] Example from real scenario, not contrived
|
|
1512
|
-
|
|
1513
|
-
**Keywords:**
|
|
1514
|
-
|
|
1515
|
-
- [ ] Error messages included if relevant
|
|
1516
|
-
- [ ] Symptoms included (flaky, hanging, zombie, pollution)
|
|
1517
|
-
- [ ] Synonyms included (timeout/hang/freeze)
|
|
1518
|
-
- [ ] Tool names included if relevant
|
|
1519
|
-
|
|
1520
|
-
**Testing (if discipline-enforcing):**
|
|
1521
|
-
|
|
1522
|
-
- [ ] Baseline test run (RED) - captured rationalizations
|
|
1523
|
-
- [ ] Pressure test with skill (GREEN) - agent complies
|
|
1524
|
-
- [ ] Refactor iterations - loopholes closed
|
|
1525
|
-
- [ ] Meta-test - "skill was clear, I should follow it"
|
|
1526
|
-
- [ ] Rationalization table populated
|
|
1527
|
-
- [ ] Red flags list populated
|
|
1528
|
-
- [ ] Foundational principle early ("letter = spirit")
|
|
1529
|
-
|
|
1530
|
-
---
|
|
1531
|
-
|
|
1532
|
-
## Appendix C: File Organization Decision Tree
|
|
1533
|
-
|
|
1534
|
-
```
|
|
1535
|
-
Need to document a technique/pattern/reference?
|
|
1536
|
-
│
|
|
1537
|
-
├─ Is it reusable across projects?
|
|
1538
|
-
│ ├─ No → Put in CLAUDE.md (project-specific)
|
|
1539
|
-
│ └─ Yes → Create skill
|
|
1540
|
-
│
|
|
1541
|
-
└─ Creating skill:
|
|
1542
|
-
│
|
|
1543
|
-
├─ Is content <500 lines total?
|
|
1544
|
-
│ ├─ Yes → Single SKILL.md, all inline
|
|
1545
|
-
│ └─ No → Progressive disclosure needed
|
|
1546
|
-
│ │
|
|
1547
|
-
│ ├─ Heavy reference (API docs, syntax)?
|
|
1548
|
-
│ │ → SKILL.md (overview) + REFERENCE.md (details)
|
|
1549
|
-
│ │
|
|
1550
|
-
│ ├─ Multiple domains?
|
|
1551
|
-
│ │ → SKILL.md (nav) + reference/domain1.md + reference/domain2.md
|
|
1552
|
-
│ │
|
|
1553
|
-
│ ├─ Reusable tool/script?
|
|
1554
|
-
│ │ → SKILL.md (overview) + tool.py (executable)
|
|
1555
|
-
│ │
|
|
1556
|
-
│ └─ Conditional advanced content?
|
|
1557
|
-
│ → SKILL.md (basic) + ADVANCED.md (linked conditionally)
|
|
1558
|
-
```
|
|
1559
|
-
|
|
1560
|
-
---
|
|
1561
|
-
|
|
1562
|
-
**END OF ANALYSIS**
|