@codenhub/skills 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +53 -0
  3. package/dist/cli.js +213 -0
  4. package/package.json +36 -0
  5. package/src/agents-md-improver/SKILL.md +216 -0
  6. package/src/agents-md-improver/agents/openai.yaml +4 -0
  7. package/src/agents-md-improver/references/quality-criteria.md +116 -0
  8. package/src/agents-md-improver/references/templates.md +255 -0
  9. package/src/agents-md-improver/references/update-guidelines.md +155 -0
  10. package/src/brainstorming/SKILL.md +118 -0
  11. package/src/brainstorming/agents/openai.yaml +4 -0
  12. package/src/caveman/SKILL.md +59 -0
  13. package/src/caveman/agents/openai.yaml +4 -0
  14. package/src/caveman-commit/SKILL.md +68 -0
  15. package/src/caveman-commit/agents/openai.yaml +4 -0
  16. package/src/caveman-review/SKILL.md +54 -0
  17. package/src/caveman-review/agents/openai.yaml +4 -0
  18. package/src/cli.test.ts +102 -0
  19. package/src/cli.ts +311 -0
  20. package/src/executing-plans/SKILL.md +92 -0
  21. package/src/executing-plans/agents/openai.yaml +4 -0
  22. package/src/frontend-design/SKILL.md +60 -0
  23. package/src/frontend-design/agents/openai.yaml +4 -0
  24. package/src/subagent-specialist/SKILL.md +226 -0
  25. package/src/subagent-specialist/agents/openai.yaml +4 -0
  26. package/src/subagent-specialist/references/code-quality-reviewer-prompt.md +48 -0
  27. package/src/subagent-specialist/references/implementer-prompt.md +84 -0
  28. package/src/subagent-specialist/references/parallel-investigator-prompt.md +49 -0
  29. package/src/subagent-specialist/references/spec-reviewer-prompt.md +52 -0
  30. package/src/test-driven-development/SKILL.md +239 -0
  31. package/src/test-driven-development/agents/openai.yaml +11 -0
  32. package/src/test-driven-development/testing-anti-patterns.md +162 -0
  33. package/src/test-driven-development/verification-baselines.md +42 -0
  34. package/src/writing-plans/SKILL.md +169 -0
  35. package/src/writing-plans/agents/openai.yaml +4 -0
  36. package/src/writing-skills/SKILL.md +222 -0
  37. package/src/writing-skills/agents/openai.yaml +4 -0
  38. package/src/writing-skills/best-practices.md +321 -0
  39. package/src/writing-skills/examples/SKILL_AUTHORING_GUIDE_TESTING.md +156 -0
  40. package/src/writing-skills/persuasion-principles.md +172 -0
  41. package/src/writing-skills/testing-skills-with-subagents.md +310 -0
  42. package/src/writing-specs/SKILL.md +72 -0
  43. package/src/writing-specs/agents/openai.yaml +4 -0
@@ -0,0 +1,222 @@
1
+ ---
2
+ name: writing-skills
3
+ description: Use when creating, reviewing, testing, revising, or validating skill bundles, including SKILL.md and supporting files.
4
+ metadata:
5
+ short-description: Create and validate reusable skills
6
+ ---
7
+
8
+ # Writing Skills
9
+
10
+ ## Overview
11
+
12
+ Writing skills is test-driven development applied to process documentation.
13
+
14
+ A skill should teach reusable guidance that future agents can discover and apply. Use this skill only for explicit skill work. Do not create or edit skills in the middle of unrelated implementation.
15
+
16
+ **Core principle:** If you did not watch an agent fail without the skill, you do not know whether the skill teaches the right thing.
17
+
18
+ Load supporting references only when needed:
19
+
20
+ - `best-practices.md`: structure, discovery, examples, file layout, and bundled assets
21
+ - `testing-skills-with-subagents.md`: testing workflow, pressure scenarios, and meta-testing
22
+ - `persuasion-principles.md`: discipline-enforcing skills that must resist rationalization
23
+
24
+ ## Tool Compatibility
25
+
26
+ - Keep instructions tool-agnostic and avoid provider-specific wording.
27
+ - When behavior differs across tools, resolve conflicts in this order: OpenCode > Claude Code > Codex CLI > Gemini CLI.
28
+
29
+ ## What a Good Skill Is
30
+
31
+ - A reference guide for reusable techniques, patterns, tools, or reference material
32
+ - Discoverable from its name and description
33
+ - Focused on future execution, not a story about one past task
34
+ - Concise enough to load cheaply
35
+ - Backed by observed failures and re-testing
36
+
37
+ Common skill types:
38
+
39
+ - **Technique:** concrete method with steps to follow
40
+ - **Pattern:** way of thinking about a class of problems
41
+ - **Reference:** information to retrieve and apply correctly
42
+
43
+ ## Minimal Shape
44
+
45
+ Keep the skill easy to scan and easy to discover.
46
+
47
+ - Minimal bundle shape: `skill-name/SKILL.md`
48
+ - Supporting files: only for heavy reference material, reusable assets, or substantial worked examples
49
+ - Paths in file references: use forward slashes
50
+
51
+ Frontmatter rules:
52
+
53
+ - `name` and `description` are required
54
+ - `name` uses letters, numbers, and hyphens only
55
+ - `description` starts with `Use when...`
56
+ - `description` is written in third person
57
+ - `description` focuses on triggering conditions and searchable keywords instead of summarizing the full workflow
58
+
59
+ Suggested body shape:
60
+
61
+ 1. Overview
62
+ 2. When to use
63
+ 3. Core pattern or rules
64
+ 4. Quick reference
65
+ 5. Implementation notes or links to supporting files
66
+ 6. Common mistakes
67
+
68
+ ## Discovery and Clarity
69
+
70
+ - Use words an agent would actually search for: symptoms, synonyms, tools, commands, libraries, file types, and error phrases
71
+ - Prefer descriptive names such as `writing-skills` over vague labels such as `helper` or `utils`
72
+ - Keep `SKILL.md` concise; move heavy detail into supporting files
73
+ - Use one strong example instead of many weak ones
74
+ - When cross-referencing another skill, refer to it by skill name and explain why it is needed
75
+
76
+ ## TDD Mapping for Skills
77
+
78
+ | TDD concept | Skill creation |
79
+ | --------------- | ----------------------------------------------------------------- |
80
+ | Test case | Pressure scenario with a delegated worker |
81
+ | Production code | Skill document (`SKILL.md`) |
82
+ | RED | Agent violates the rule or misses the technique without the skill |
83
+ | GREEN | Agent complies with the skill present |
84
+ | REFACTOR | Close loopholes while maintaining compliance |
85
+ | Minimal code | Write only what addresses the observed failures |
86
+
87
+ The entire skill creation process follows RED-GREEN-REFACTOR.
88
+
89
+ ## Change Types and Validation Depth
90
+
91
+ Classify each update before editing:
92
+
93
+ - **Behavioral change:** modifies triggers, required or forbidden actions, workflow ordering, escalation gates, tool expectations, or anything likely to change agent decisions
94
+ - **Editorial change:** wording, formatting, typo fixes, heading cleanup, or link/path corrections intended to preserve behavior
95
+
96
+ Validation policy:
97
+
98
+ - Behavioral changes require full RED-GREEN-REFACTOR with a failing baseline first
99
+ - Editorial changes require lightweight validation:
100
+ 1. state the no-behavior-change intent
101
+ 2. run at least one before/after scenario or targeted prompt to confirm unchanged decisions
102
+ 3. verify frontmatter, links, and references still follow local skill rules
103
+ - If uncertain whether a change is behavioral, treat it as behavioral
104
+
105
+ ## The Iron Law
106
+
107
+ ```text
108
+ NO BEHAVIORAL SKILL CHANGE WITHOUT A FAILING TEST FIRST
109
+ ```
110
+
111
+ This applies to new skills and any edit that can change agent behavior.
112
+
113
+ For editorial or reference-only updates, use the lightweight validation policy above.
114
+
115
+ Write or edit a behavioral skill change before baseline testing? Discard that draft and start from an observed failure instead.
116
+
117
+ **No exceptions for behavioral edits:**
118
+
119
+ - Not for simple additions
120
+ - Not for a new section that feels obvious
121
+ - Do not keep untested wording as reference material
122
+ - Do not adapt the draft while pretending you are still in RED
123
+
124
+ ## RED-GREEN-REFACTOR
125
+
126
+ ### RED
127
+
128
+ Run a representative scenario without the skill. Document:
129
+
130
+ - what the agent chose
131
+ - what rationalizations it used, verbatim
132
+ - which pressures or missing cues triggered the failure
133
+
134
+ ### GREEN
135
+
136
+ Write the smallest skill that addresses those specific failures.
137
+
138
+ Run the same scenario with the skill present. The agent should now comply or apply the technique correctly.
139
+
140
+ ### REFACTOR
141
+
142
+ If the agent finds a new loophole, encode an explicit counter and test again.
143
+
144
+ For the full testing method, read `testing-skills-with-subagents.md`.
145
+
146
+ ## Testing Summary
147
+
148
+ Different skill types need different tests:
149
+
150
+ | Skill type | Test focus | Success criteria |
151
+ | -------------------- | --------------------------------------------------------------- | ------------------------------------------------ |
152
+ | Discipline-enforcing | Pressure scenarios, combined pressures, rationalization capture | Agent follows the rule under pressure |
153
+ | Technique | Application, variation, missing-information scenarios | Agent applies the method in a new scenario |
154
+ | Pattern | Recognition, application, counter-examples | Agent recognizes when and how to use the pattern |
155
+ | Reference | Retrieval, application, gap testing | Agent finds and uses the right information |
156
+
157
+ Also test against the execution profiles you care about so the skill is not only clear for one kind of model or tool environment.
158
+
159
+ ## Hardening Against Rationalization
160
+
161
+ Skills that enforce discipline need to survive pressure and excuse-making.
162
+
163
+ Compact rules:
164
+
165
+ - Close loopholes explicitly
166
+ - Address spirit-vs-letter arguments directly
167
+ - Keep a rationalization table for recurring excuses
168
+ - Keep a red-flags list for common failure language
169
+ - If a rule is ignored in the same contexts repeatedly, add those violation signals to the description
170
+
171
+ Example:
172
+
173
+ Bad:
174
+
175
+ ```markdown
176
+ Write code before test? Delete it.
177
+ ```
178
+
179
+ Better:
180
+
181
+ ```markdown
182
+ Write code before test? Delete it. Start over.
183
+
184
+ **No exceptions:**
185
+
186
+ - Do not keep it as reference
187
+ - Do not adapt it while writing tests
188
+ - Delete means delete
189
+ ```
190
+
191
+ Use `persuasion-principles.md` only when the skill needs stronger framing against authority, urgency, sunk cost, or similar pressure.
192
+
193
+ ## Stop Before the Next Skill
194
+
195
+ After writing one skill, finish validation before moving to the next.
196
+
197
+ Do not:
198
+
199
+ - batch multiple untested skills together
200
+ - move on before the current skill is verified
201
+ - skip re-testing because batching feels faster
202
+
203
+ ## Compact Checklist
204
+
205
+ Use your task tracker or checklist for each item:
206
+
207
+ - [ ] Classified the change as behavioral or editorial
208
+ - [ ] For behavioral changes, observed baseline failure without the skill
209
+ - [ ] For behavioral changes, captured failures and rationalizations verbatim
210
+ - [ ] For editorial changes, documented no-behavior-change intent and ran a before/after check
211
+ - [ ] Chose a clear, discoverable name
212
+ - [ ] Wrote a trigger-focused description with searchable terms
213
+ - [ ] Wrote the minimal content needed to address observed failures or the stated editorial intent
214
+ - [ ] Justified every supporting file
215
+ - [ ] Re-ran scenarios with the skill present
216
+ - [ ] Closed new loopholes and re-tested when behavior changed
217
+
218
+ ## Bottom Line
219
+
220
+ Creating skills is TDD for process documentation.
221
+
222
+ Same law: failing test first. Same cycle: RED, GREEN, REFACTOR. Same goal: reusable guidance that future agents can actually discover and follow.
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "Writing Skills"
3
+ short_description: "Create, adapt, and validate reusable skills"
4
+ default_prompt: "Use $writing-skills to create or update a skill, keep it aligned with local skill authoring rules, and verify the guidance is testable before deployment."
@@ -0,0 +1,321 @@
1
+ # Best Practices
2
+
3
+ Optional heuristics for making skills easier to discover, load, and use. `SKILL.md` carries the core operating model; this file is for sharpening structure, examples, and supporting material.
4
+
5
+ ## Contents
6
+
7
+ - Core Principles
8
+ - Structure and Discovery
9
+ - Progressive Disclosure
10
+ - Authoring Patterns
11
+ - Supporting Files and Executable Assets
12
+ - Evaluation and Iteration
13
+ - Sanity Check
14
+
15
+ ## Core Principles
16
+
17
+ ### Concise Is Key
18
+
19
+ The context window is shared with everything else the agent needs. Only add guidance the agent is unlikely to infer correctly on its own.
20
+
21
+ Good:
22
+
23
+ ````markdown
24
+ ## Extract PDF Text
25
+
26
+ Use `pdfplumber`:
27
+
28
+ ```python
29
+ import pdfplumber
30
+
31
+ with pdfplumber.open("file.pdf") as pdf:
32
+ text = pdf.pages[0].extract_text()
33
+ ```
34
+ ````
35
+
36
+ Bad:
37
+
38
+ ```markdown
39
+ ## Extract PDF Text
40
+
41
+ PDF files are common. There are many libraries. First choose one, then install it...
42
+ ```
43
+
44
+ ### Set Appropriate Degrees of Freedom
45
+
46
+ Match the specificity of the skill to the fragility of the task.
47
+
48
+ - **High freedom:** text instructions when many approaches are valid
49
+ - **Medium freedom:** templates or pseudocode when there is a preferred pattern
50
+ - **Low freedom:** exact commands or scripts when the process is fragile
51
+
52
+ Rule of thumb: the easier it is for variation to break the task, the less freedom the skill should allow.
53
+
54
+ ### Test Across Intended Profiles
55
+
56
+ Test the skill with the execution profiles you care about:
57
+
58
+ - smaller or faster profiles: enough guidance?
59
+ - balanced profiles: clear and efficient?
60
+ - stronger reasoning profiles: still concise and not over-explained?
61
+
62
+ Use `testing-skills-with-subagents.md` for the full testing workflow.
63
+
64
+ ## Structure and Discovery
65
+
66
+ ### Frontmatter Requirements
67
+
68
+ `SKILL.md` needs YAML frontmatter with at least:
69
+
70
+ - `name`
71
+ - `description`
72
+
73
+ Keep the frontmatter short and discovery-focused.
74
+
75
+ ### Naming Conventions
76
+
77
+ Use descriptive names that signal the action or domain.
78
+
79
+ Good:
80
+
81
+ - `writing-skills`
82
+ - `testing-skills-with-subagents`
83
+ - `managing-databases`
84
+
85
+ Avoid:
86
+
87
+ - `helper`
88
+ - `utils`
89
+ - `tools`
90
+ - vague collection names with no clear action or scope
91
+
92
+ ### Writing Effective Descriptions
93
+
94
+ The description field is the primary discovery hook. It should tell the agent when the skill should be loaded.
95
+
96
+ Use this pattern:
97
+
98
+ - start with `Use when...`
99
+ - describe triggering conditions first
100
+ - include concrete keywords and symptoms
101
+ - keep it in third person
102
+ - avoid summarizing the internal workflow
103
+
104
+ Good:
105
+
106
+ ```yaml
107
+ description: Use when analyzing Excel files, spreadsheets, tabular reports, or .xlsx data that need summaries, validation, or chart generation.
108
+ ```
109
+
110
+ Bad:
111
+
112
+ ```yaml
113
+ description: Processes data.
114
+ ```
115
+
116
+ Bad:
117
+
118
+ ```yaml
119
+ description: I can help you process spreadsheets.
120
+ ```
121
+
122
+ ### Discovery Coverage
123
+
124
+ Use words an agent would actually search for:
125
+
126
+ - symptoms
127
+ - synonyms
128
+ - tools, commands, libraries, and file types
129
+ - error messages or recurring failure phrases when relevant
130
+
131
+ ### Cross-Referencing Other Skills
132
+
133
+ When another skill is relevant, refer to it by skill name and explain why it matters.
134
+
135
+ Good:
136
+
137
+ - `**Required background:** Understand test-driven development before using this skill.`
138
+ - `**Required sub-skill:** Use executing-plans to carry out this plan.`
139
+
140
+ Bad:
141
+
142
+ - references that force extra reading without saying why
143
+
144
+ ## Progressive Disclosure
145
+
146
+ `SKILL.md` should act as an overview that points to deeper material only when needed.
147
+
148
+ Why this matters:
149
+
150
+ 1. metadata is available for discovery
151
+ 2. `SKILL.md` is read when the skill triggers
152
+ 3. supporting files are read on demand
153
+ 4. executable utilities can be run without loading their full source into context
154
+
155
+ Useful patterns:
156
+
157
+ - **High-level guide with references:** keep the main flow in `SKILL.md`, point to `reference.md` or `examples.md` for detail
158
+ - **Domain-specific organization:** split large references by domain so only the relevant file needs to be loaded
159
+ - **Conditional details:** put uncommon branches in supporting files instead of bloating the main file
160
+
161
+ Keep references shallow:
162
+
163
+ - Prefer one level deep from `SKILL.md`
164
+ - Avoid chains such as `SKILL.md -> advanced.md -> details.md`
165
+ - If a reference file grows past roughly 100 lines, add a short contents section near the top
166
+
167
+ ## Authoring Patterns
168
+
169
+ ### Use Workflows for Multi-Step Tasks
170
+
171
+ If success depends on order and verification, write an explicit workflow.
172
+
173
+ Example:
174
+
175
+ ```text
176
+ Task Progress:
177
+ - [ ] Analyze inputs
178
+ - [ ] Build the plan or mapping
179
+ - [ ] Validate the plan
180
+ - [ ] Execute the change
181
+ - [ ] Verify the output
182
+ ```
183
+
184
+ ### Build Feedback Loops In
185
+
186
+ Use a validate-fix-repeat pattern when errors are likely.
187
+
188
+ Example:
189
+
190
+ ```markdown
191
+ 1. Make the change
192
+ 2. Validate immediately
193
+ 3. If validation fails, fix and validate again
194
+ 4. Proceed only when validation passes
195
+ ```
196
+
197
+ ### Use Templates Only When Shape Matters
198
+
199
+ Use strict templates when output format must be exact. Use flexible templates when adaptation is expected.
200
+
201
+ ### Use Examples When Style Is Hard to Infer
202
+
203
+ One strong input/output example is usually better than several weak ones. Prefer realistic, directly reusable examples over contrived placeholders.
204
+
205
+ ### Use Consistent Terminology
206
+
207
+ Pick one term and stick to it.
208
+
209
+ Good:
210
+
211
+ - always `API endpoint`
212
+ - always `field`
213
+ - always `extract`
214
+
215
+ Bad:
216
+
217
+ - mixing `URL`, `route`, `path`, and `endpoint`
218
+ - mixing `field`, `element`, `box`, and `control`
219
+
220
+ ### Avoid Time-Sensitive Wording
221
+
222
+ Avoid guidance that will age badly.
223
+
224
+ Bad:
225
+
226
+ ```markdown
227
+ If you are doing this before August 2025, use the old API.
228
+ ```
229
+
230
+ Better:
231
+
232
+ ```markdown
233
+ Use the v2 API endpoint.
234
+
235
+ The v1 API is deprecated and should only be referenced for legacy maintenance.
236
+ ```
237
+
238
+ ### Avoid Offering Too Many Equivalent Options
239
+
240
+ Do not give five interchangeable choices unless there is a real decision to make.
241
+
242
+ Bad:
243
+
244
+ ```markdown
245
+ You can use library A, B, C, or D for this task.
246
+ ```
247
+
248
+ Good:
249
+
250
+ ```markdown
251
+ Use library A by default.
252
+ For scanned documents that need OCR, use library B instead.
253
+ ```
254
+
255
+ ## Supporting Files and Executable Assets
256
+
257
+ Only include scripts or utilities when they add real value.
258
+
259
+ Rules:
260
+
261
+ - say whether the agent should execute the script or read it as reference
262
+ - handle expected error conditions instead of punting everything back to the agent
263
+ - document constants instead of leaving magic values unexplained
264
+ - list required packages or external tools explicitly
265
+ - do not assume tools are already installed
266
+
267
+ Good:
268
+
269
+ ```python
270
+ def process_file(path):
271
+ try:
272
+ with open(path) as handle:
273
+ return handle.read()
274
+ except FileNotFoundError:
275
+ with open(path, "w") as handle:
276
+ handle.write("")
277
+ return ""
278
+ ```
279
+
280
+ Bad:
281
+
282
+ ```python
283
+ def process_file(path):
284
+ return open(path).read()
285
+ ```
286
+
287
+ Additional guidance:
288
+
289
+ - use verifiable intermediate outputs for high-risk or batch operations
290
+ - use visual analysis only when the task depends on layout or other visual structure
291
+ - file paths, file names, and structure matter because the skill behaves like a small filesystem bundle
292
+ - if a skill uses MCP tools, use fully qualified names such as `ServerName:tool_name`
293
+
294
+ ## Evaluation and Iteration
295
+
296
+ Use a lightweight loop:
297
+
298
+ 1. run representative tasks without the skill
299
+ 2. identify the guidance that was actually missing
300
+ 3. write the minimal instructions that close the gap
301
+ 4. re-run the tasks and refine
302
+
303
+ When iterating, pay attention to:
304
+
305
+ - file-read order that suggests the structure is awkward
306
+ - references agents miss repeatedly
307
+ - sections constantly read that may belong in `SKILL.md`
308
+ - supporting files that are never used and may be unnecessary
309
+
310
+ Use `testing-skills-with-subagents.md` when you need the full RED-GREEN-REFACTOR test process rather than a light evaluation loop.
311
+
312
+ ## Sanity Check
313
+
314
+ - [ ] `name` and `description` are specific and discoverable
315
+ - [ ] `SKILL.md` stays concise and scannable
316
+ - [ ] Supporting files exist only when they add real value
317
+ - [ ] References are shallow and easy to follow
318
+ - [ ] Examples are concrete and reusable
319
+ - [ ] Terminology is consistent
320
+ - [ ] Dependencies and tools are explicit
321
+ - [ ] The skill has been tested on the profiles and workflows you care about
@@ -0,0 +1,156 @@
1
+ # Testing Portable Skill Guidance
2
+
3
+ Testing different variants of portable skill authoring guidance to find what actually makes agents discover and follow those rules under pressure.
4
+
5
+ ## Contents
6
+
7
+ - Test Scenarios
8
+ - Documentation Variants to Test
9
+ - Testing Protocol
10
+ - Success Criteria
11
+ - Expected Results
12
+ - Next Steps
13
+
14
+ ## Test Scenarios
15
+
16
+ ### Scenario 1: Time Pressure + Confidence
17
+
18
+ ```text
19
+ IMPORTANT: This is a real scenario. Choose and act.
20
+
21
+ You need to add a new skill quickly. The task is urgent and you already know
22
+ roughly how skills are structured.
23
+
24
+ You could:
25
+ A) Start writing the new skill immediately
26
+ B) Read the skill authoring guide first, then write it
27
+
28
+ Time is tight. What do you do?
29
+ ```
30
+
31
+ ### Scenario 2: Sunk Cost + Works Already
32
+
33
+ ```text
34
+ IMPORTANT: This is a real scenario. Choose and act.
35
+
36
+ You already drafted a new skill bundle and wrote most of `SKILL.md`.
37
+ It looks good. Then you remember there is a skill authoring guide with
38
+ authoring rules.
39
+
40
+ You would need to:
41
+ - Read the file
42
+ - Potentially revise names, descriptions, and references
43
+
44
+ Do you:
45
+ A) Read the skill authoring guide and reconcile the draft
46
+ B) Keep the working draft as-is
47
+ ```
48
+
49
+ ### Scenario 3: Authority + Speed Bias
50
+
51
+ ```text
52
+ IMPORTANT: This is a real scenario. Choose and act.
53
+
54
+ Your partner says: "Just copy the existing skill over. We do not need to spend
55
+ time checking the authoring rules right now."
56
+
57
+ You could:
58
+ A) Read the skill authoring guide and align the copied skill
59
+ B) Copy first and skip alignment
60
+
61
+ What do you do?
62
+ ```
63
+
64
+ ### Scenario 4: Familiarity + Efficiency
65
+
66
+ ```text
67
+ IMPORTANT: This is a real scenario. Choose and act.
68
+
69
+ You have edited several skills before and know the usual pattern.
70
+ You are about to rename a supporting file and update references.
71
+
72
+ Do you:
73
+ A) Check the skill authoring guide for naming, path, and reference rules
74
+ B) Rely on memory and keep moving
75
+ ```
76
+
77
+ ## Documentation Variants to Test
78
+
79
+ ### NULL Baseline
80
+
81
+ No skill authoring guidance exists.
82
+
83
+ ### Variant A: Soft Suggestion
84
+
85
+ ```markdown
86
+ ## Skill Authoring Guidelines
87
+
88
+ There is a skill authoring guide available.
89
+ Consider checking it when working on skills.
90
+ ```
91
+
92
+ ### Variant B: Directive
93
+
94
+ ```markdown
95
+ ## Skill Authoring Guidelines
96
+
97
+ Before creating or editing any skill, read the skill authoring guide.
98
+ Follow its naming, description, path, and compatibility rules.
99
+ ```
100
+
101
+ ### Variant C: Process-Oriented
102
+
103
+ ```markdown
104
+ ## Skill Authoring Workflow
105
+
106
+ For every skill task:
107
+
108
+ 1. Read the skill authoring guide
109
+ 2. Apply its structure and wording rules
110
+ 3. Update related files in the same skill bundle
111
+ 4. Verify the result still follows the compatibility order
112
+ ```
113
+
114
+ ## Testing Protocol
115
+
116
+ For each variant:
117
+
118
+ 1. Run the NULL baseline first.
119
+ 2. Run the variant against the same scenario.
120
+ 3. Add pressure such as time, sunk cost, or authority.
121
+ 4. Capture rationalizations if the agent ignores the guidance.
122
+ 5. Ask how the documentation could have made the correct action unavoidable.
123
+
124
+ ## Success Criteria
125
+
126
+ The variant succeeds if the agent:
127
+
128
+ - checks the skill authoring guide unprompted
129
+ - follows its rules before writing or editing the skill
130
+ - reconciles related files instead of changing only `SKILL.md`
131
+ - still complies under pressure
132
+
133
+ The variant fails if the agent:
134
+
135
+ - skips the guidance entirely
136
+ - treats the guidance as optional when copied content conflicts with the guide
137
+ - copies existing content without aligning it properly
138
+ - rationalizes away compliance under pressure
139
+
140
+ ## Expected Results
141
+
142
+ **NULL:** fastest path wins and the guidance gets skipped.
143
+
144
+ **Variant A:** may work without pressure, likely fails under pressure.
145
+
146
+ **Variant B:** stronger compliance, but still vulnerable to rationalization.
147
+
148
+ **Variant C:** clearest process, strongest chance of consistent compliance.
149
+
150
+ ## Next Steps
151
+
152
+ 1. Run the baseline.
153
+ 2. Test each variant with the same scenarios.
154
+ 3. Compare compliance rates.
155
+ 4. Capture rationalizations that break through.
156
+ 5. Tighten the wording until the rules are followed consistently.