@codenhub/skills 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +201 -0
- package/README.md +53 -0
- package/dist/cli.js +213 -0
- package/package.json +36 -0
- package/src/agents-md-improver/SKILL.md +216 -0
- package/src/agents-md-improver/agents/openai.yaml +4 -0
- package/src/agents-md-improver/references/quality-criteria.md +116 -0
- package/src/agents-md-improver/references/templates.md +255 -0
- package/src/agents-md-improver/references/update-guidelines.md +155 -0
- package/src/brainstorming/SKILL.md +118 -0
- package/src/brainstorming/agents/openai.yaml +4 -0
- package/src/caveman/SKILL.md +59 -0
- package/src/caveman/agents/openai.yaml +4 -0
- package/src/caveman-commit/SKILL.md +68 -0
- package/src/caveman-commit/agents/openai.yaml +4 -0
- package/src/caveman-review/SKILL.md +54 -0
- package/src/caveman-review/agents/openai.yaml +4 -0
- package/src/cli.test.ts +102 -0
- package/src/cli.ts +311 -0
- package/src/executing-plans/SKILL.md +92 -0
- package/src/executing-plans/agents/openai.yaml +4 -0
- package/src/frontend-design/SKILL.md +60 -0
- package/src/frontend-design/agents/openai.yaml +4 -0
- package/src/subagent-specialist/SKILL.md +226 -0
- package/src/subagent-specialist/agents/openai.yaml +4 -0
- package/src/subagent-specialist/references/code-quality-reviewer-prompt.md +48 -0
- package/src/subagent-specialist/references/implementer-prompt.md +84 -0
- package/src/subagent-specialist/references/parallel-investigator-prompt.md +49 -0
- package/src/subagent-specialist/references/spec-reviewer-prompt.md +52 -0
- package/src/test-driven-development/SKILL.md +239 -0
- package/src/test-driven-development/agents/openai.yaml +11 -0
- package/src/test-driven-development/testing-anti-patterns.md +162 -0
- package/src/test-driven-development/verification-baselines.md +42 -0
- package/src/writing-plans/SKILL.md +169 -0
- package/src/writing-plans/agents/openai.yaml +4 -0
- package/src/writing-skills/SKILL.md +222 -0
- package/src/writing-skills/agents/openai.yaml +4 -0
- package/src/writing-skills/best-practices.md +321 -0
- package/src/writing-skills/examples/SKILL_AUTHORING_GUIDE_TESTING.md +156 -0
- package/src/writing-skills/persuasion-principles.md +172 -0
- package/src/writing-skills/testing-skills-with-subagents.md +310 -0
- package/src/writing-specs/SKILL.md +72 -0
- package/src/writing-specs/agents/openai.yaml +4 -0
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: writing-skills
|
|
3
|
+
description: Use when creating, reviewing, testing, revising, or validating skill bundles, including SKILL.md and supporting files.
|
|
4
|
+
metadata:
|
|
5
|
+
short-description: Create and validate reusable skills
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Writing Skills
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Writing skills is test-driven development applied to process documentation.
|
|
13
|
+
|
|
14
|
+
A skill should teach reusable guidance that future agents can discover and apply. Use this skill only for explicit skill work. Do not create or edit skills in the middle of unrelated implementation.
|
|
15
|
+
|
|
16
|
+
**Core principle:** If you did not watch an agent fail without the skill, you do not know whether the skill teaches the right thing.
|
|
17
|
+
|
|
18
|
+
Load supporting references only when needed:
|
|
19
|
+
|
|
20
|
+
- `best-practices.md`: structure, discovery, examples, file layout, and bundled assets
|
|
21
|
+
- `testing-skills-with-subagents.md`: testing workflow, pressure scenarios, and meta-testing
|
|
22
|
+
- `persuasion-principles.md`: discipline-enforcing skills that must resist rationalization
|
|
23
|
+
|
|
24
|
+
## Tool Compatibility
|
|
25
|
+
|
|
26
|
+
- Keep instructions tool-agnostic and avoid provider-specific wording.
|
|
27
|
+
- When behavior differs across tools, resolve conflicts in this order: OpenCode > Claude Code > Codex CLI > Gemini CLI.
|
|
28
|
+
|
|
29
|
+
## What a Good Skill Is
|
|
30
|
+
|
|
31
|
+
- A reference guide for reusable techniques, patterns, tools, or reference material
|
|
32
|
+
- Discoverable from its name and description
|
|
33
|
+
- Focused on future execution, not a story about one past task
|
|
34
|
+
- Concise enough to load cheaply
|
|
35
|
+
- Backed by observed failures and re-testing
|
|
36
|
+
|
|
37
|
+
Common skill types:
|
|
38
|
+
|
|
39
|
+
- **Technique:** concrete method with steps to follow
|
|
40
|
+
- **Pattern:** way of thinking about a class of problems
|
|
41
|
+
- **Reference:** information to retrieve and apply correctly
|
|
42
|
+
|
|
43
|
+
## Minimal Shape
|
|
44
|
+
|
|
45
|
+
Keep the skill easy to scan and easy to discover.
|
|
46
|
+
|
|
47
|
+
- Minimal bundle shape: `skill-name/SKILL.md`
|
|
48
|
+
- Supporting files: only for heavy reference material, reusable assets, or substantial worked examples
|
|
49
|
+
- Paths in file references: use forward slashes
|
|
50
|
+
|
|
51
|
+
Frontmatter rules:
|
|
52
|
+
|
|
53
|
+
- `name` and `description` are required
|
|
54
|
+
- `name` uses letters, numbers, and hyphens only
|
|
55
|
+
- `description` starts with `Use when...`
|
|
56
|
+
- `description` is written in third person
|
|
57
|
+
- `description` focuses on triggering conditions and searchable keywords instead of summarizing the full workflow
|
|
58
|
+
|
|
59
|
+
Suggested body shape:
|
|
60
|
+
|
|
61
|
+
1. Overview
|
|
62
|
+
2. When to use
|
|
63
|
+
3. Core pattern or rules
|
|
64
|
+
4. Quick reference
|
|
65
|
+
5. Implementation notes or links to supporting files
|
|
66
|
+
6. Common mistakes
|
|
67
|
+
|
|
68
|
+
## Discovery and Clarity
|
|
69
|
+
|
|
70
|
+
- Use words an agent would actually search for: symptoms, synonyms, tools, commands, libraries, file types, and error phrases
|
|
71
|
+
- Prefer descriptive names such as `writing-skills` over vague labels such as `helper` or `utils`
|
|
72
|
+
- Keep `SKILL.md` concise; move heavy detail into supporting files
|
|
73
|
+
- Use one strong example instead of many weak ones
|
|
74
|
+
- When cross-referencing another skill, refer to it by skill name and explain why it is needed
|
|
75
|
+
|
|
76
|
+
## TDD Mapping for Skills
|
|
77
|
+
|
|
78
|
+
| TDD concept | Skill creation |
|
|
79
|
+
| --------------- | ----------------------------------------------------------------- |
|
|
80
|
+
| Test case | Pressure scenario with a delegated worker |
|
|
81
|
+
| Production code | Skill document (`SKILL.md`) |
|
|
82
|
+
| RED | Agent violates the rule or misses the technique without the skill |
|
|
83
|
+
| GREEN | Agent complies with the skill present |
|
|
84
|
+
| REFACTOR | Close loopholes while maintaining compliance |
|
|
85
|
+
| Minimal code | Write only what addresses the observed failures |
|
|
86
|
+
|
|
87
|
+
The entire skill creation process follows RED-GREEN-REFACTOR.
|
|
88
|
+
|
|
89
|
+
## Change Types and Validation Depth
|
|
90
|
+
|
|
91
|
+
Classify each update before editing:
|
|
92
|
+
|
|
93
|
+
- **Behavioral change:** modifies triggers, required or forbidden actions, workflow ordering, escalation gates, tool expectations, or anything likely to change agent decisions
|
|
94
|
+
- **Editorial change:** wording, formatting, typo fixes, heading cleanup, or link/path corrections intended to preserve behavior
|
|
95
|
+
|
|
96
|
+
Validation policy:
|
|
97
|
+
|
|
98
|
+
- Behavioral changes require full RED-GREEN-REFACTOR with a failing baseline first
|
|
99
|
+
- Editorial changes require lightweight validation:
|
|
100
|
+
1. state the no-behavior-change intent
|
|
101
|
+
2. run at least one before/after scenario or targeted prompt to confirm unchanged decisions
|
|
102
|
+
3. verify frontmatter, links, and references still follow local skill rules
|
|
103
|
+
- If uncertain whether a change is behavioral, treat it as behavioral
|
|
104
|
+
|
|
105
|
+
## The Iron Law
|
|
106
|
+
|
|
107
|
+
```text
|
|
108
|
+
NO BEHAVIORAL SKILL CHANGE WITHOUT A FAILING TEST FIRST
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
This applies to new skills and any edit that can change agent behavior.
|
|
112
|
+
|
|
113
|
+
For editorial or reference-only updates, use the lightweight validation policy above.
|
|
114
|
+
|
|
115
|
+
Write or edit a behavioral skill change before baseline testing? Discard that draft and start from an observed failure instead.
|
|
116
|
+
|
|
117
|
+
**No exceptions for behavioral edits:**
|
|
118
|
+
|
|
119
|
+
- Not for simple additions
|
|
120
|
+
- Not for a new section that feels obvious
|
|
121
|
+
- Do not keep untested wording as reference material
|
|
122
|
+
- Do not adapt the draft while pretending you are still in RED
|
|
123
|
+
|
|
124
|
+
## RED-GREEN-REFACTOR
|
|
125
|
+
|
|
126
|
+
### RED
|
|
127
|
+
|
|
128
|
+
Run a representative scenario without the skill. Document:
|
|
129
|
+
|
|
130
|
+
- what the agent chose
|
|
131
|
+
- what rationalizations it used, verbatim
|
|
132
|
+
- which pressures or missing cues triggered the failure
|
|
133
|
+
|
|
134
|
+
### GREEN
|
|
135
|
+
|
|
136
|
+
Write the smallest skill that addresses those specific failures.
|
|
137
|
+
|
|
138
|
+
Run the same scenario with the skill present. The agent should now comply or apply the technique correctly.
|
|
139
|
+
|
|
140
|
+
### REFACTOR
|
|
141
|
+
|
|
142
|
+
If the agent finds a new loophole, encode an explicit counter and test again.
|
|
143
|
+
|
|
144
|
+
For the full testing method, read `testing-skills-with-subagents.md`.
|
|
145
|
+
|
|
146
|
+
## Testing Summary
|
|
147
|
+
|
|
148
|
+
Different skill types need different tests:
|
|
149
|
+
|
|
150
|
+
| Skill type | Test focus | Success criteria |
|
|
151
|
+
| -------------------- | --------------------------------------------------------------- | ------------------------------------------------ |
|
|
152
|
+
| Discipline-enforcing | Pressure scenarios, combined pressures, rationalization capture | Agent follows the rule under pressure |
|
|
153
|
+
| Technique | Application, variation, missing-information scenarios | Agent applies the method in a new scenario |
|
|
154
|
+
| Pattern | Recognition, application, counter-examples | Agent recognizes when and how to use the pattern |
|
|
155
|
+
| Reference | Retrieval, application, gap testing | Agent finds and uses the right information |
|
|
156
|
+
|
|
157
|
+
Also test against the execution profiles you care about so the skill is not only clear for one kind of model or tool environment.
|
|
158
|
+
|
|
159
|
+
## Hardening Against Rationalization
|
|
160
|
+
|
|
161
|
+
Skills that enforce discipline need to survive pressure and excuse-making.
|
|
162
|
+
|
|
163
|
+
Compact rules:
|
|
164
|
+
|
|
165
|
+
- Close loopholes explicitly
|
|
166
|
+
- Address spirit-vs-letter arguments directly
|
|
167
|
+
- Keep a rationalization table for recurring excuses
|
|
168
|
+
- Keep a red-flags list for common failure language
|
|
169
|
+
- If a rule is ignored in the same contexts repeatedly, add those violation signals to the description
|
|
170
|
+
|
|
171
|
+
Example:
|
|
172
|
+
|
|
173
|
+
Bad:
|
|
174
|
+
|
|
175
|
+
```markdown
|
|
176
|
+
Write code before test? Delete it.
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Better:
|
|
180
|
+
|
|
181
|
+
```markdown
|
|
182
|
+
Write code before test? Delete it. Start over.
|
|
183
|
+
|
|
184
|
+
**No exceptions:**
|
|
185
|
+
|
|
186
|
+
- Do not keep it as reference
|
|
187
|
+
- Do not adapt it while writing tests
|
|
188
|
+
- Delete means delete
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
Use `persuasion-principles.md` only when the skill needs stronger framing against authority, urgency, sunk cost, or similar pressure.
|
|
192
|
+
|
|
193
|
+
## Stop Before the Next Skill
|
|
194
|
+
|
|
195
|
+
After writing one skill, finish validation before moving to the next.
|
|
196
|
+
|
|
197
|
+
Do not:
|
|
198
|
+
|
|
199
|
+
- batch multiple untested skills together
|
|
200
|
+
- move on before the current skill is verified
|
|
201
|
+
- skip re-testing because batching feels faster
|
|
202
|
+
|
|
203
|
+
## Compact Checklist
|
|
204
|
+
|
|
205
|
+
Use your task tracker or checklist for each item:
|
|
206
|
+
|
|
207
|
+
- [ ] Classified the change as behavioral or editorial
|
|
208
|
+
- [ ] For behavioral changes, observed baseline failure without the skill
|
|
209
|
+
- [ ] For behavioral changes, captured failures and rationalizations verbatim
|
|
210
|
+
- [ ] For editorial changes, documented no-behavior-change intent and ran a before/after check
|
|
211
|
+
- [ ] Chose a clear, discoverable name
|
|
212
|
+
- [ ] Wrote a trigger-focused description with searchable terms
|
|
213
|
+
- [ ] Wrote the minimal content needed to address observed failures or the stated editorial intent
|
|
214
|
+
- [ ] Justified every supporting file
|
|
215
|
+
- [ ] Re-ran scenarios with the skill present
|
|
216
|
+
- [ ] Closed new loopholes and re-tested when behavior changed
|
|
217
|
+
|
|
218
|
+
## Bottom Line
|
|
219
|
+
|
|
220
|
+
Creating skills is TDD for process documentation.
|
|
221
|
+
|
|
222
|
+
Same law: failing test first. Same cycle: RED, GREEN, REFACTOR. Same goal: reusable guidance that future agents can actually discover and follow.
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Writing Skills"
|
|
3
|
+
short_description: "Create, adapt, and validate reusable skills"
|
|
4
|
+
default_prompt: "Use $writing-skills to create or update a skill, keep it aligned with local skill authoring rules, and verify the guidance is testable before deployment."
|
|
@@ -0,0 +1,321 @@
|
|
|
1
|
+
# Best Practices
|
|
2
|
+
|
|
3
|
+
Optional heuristics for making skills easier to discover, load, and use. `SKILL.md` carries the core operating model; this file is for sharpening structure, examples, and supporting material.
|
|
4
|
+
|
|
5
|
+
## Contents
|
|
6
|
+
|
|
7
|
+
- Core Principles
|
|
8
|
+
- Structure and Discovery
|
|
9
|
+
- Progressive Disclosure
|
|
10
|
+
- Authoring Patterns
|
|
11
|
+
- Supporting Files and Executable Assets
|
|
12
|
+
- Evaluation and Iteration
|
|
13
|
+
- Sanity Check
|
|
14
|
+
|
|
15
|
+
## Core Principles
|
|
16
|
+
|
|
17
|
+
### Concise Is Key
|
|
18
|
+
|
|
19
|
+
The context window is shared with everything else the agent needs. Only add guidance the agent is unlikely to infer correctly on its own.
|
|
20
|
+
|
|
21
|
+
Good:
|
|
22
|
+
|
|
23
|
+
````markdown
|
|
24
|
+
## Extract PDF Text
|
|
25
|
+
|
|
26
|
+
Use `pdfplumber`:
|
|
27
|
+
|
|
28
|
+
```python
|
|
29
|
+
import pdfplumber
|
|
30
|
+
|
|
31
|
+
with pdfplumber.open("file.pdf") as pdf:
|
|
32
|
+
text = pdf.pages[0].extract_text()
|
|
33
|
+
```
|
|
34
|
+
````
|
|
35
|
+
|
|
36
|
+
Bad:
|
|
37
|
+
|
|
38
|
+
```markdown
|
|
39
|
+
## Extract PDF Text
|
|
40
|
+
|
|
41
|
+
PDF files are common. There are many libraries. First choose one, then install it...
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### Set Appropriate Degrees of Freedom
|
|
45
|
+
|
|
46
|
+
Match the specificity of the skill to the fragility of the task.
|
|
47
|
+
|
|
48
|
+
- **High freedom:** text instructions when many approaches are valid
|
|
49
|
+
- **Medium freedom:** templates or pseudocode when there is a preferred pattern
|
|
50
|
+
- **Low freedom:** exact commands or scripts when the process is fragile
|
|
51
|
+
|
|
52
|
+
Rule of thumb: the easier it is for variation to break the task, the less freedom the skill should allow.
|
|
53
|
+
|
|
54
|
+
### Test Across Intended Profiles
|
|
55
|
+
|
|
56
|
+
Test the skill with the execution profiles you care about:
|
|
57
|
+
|
|
58
|
+
- smaller or faster profiles: enough guidance?
|
|
59
|
+
- balanced profiles: clear and efficient?
|
|
60
|
+
- stronger reasoning profiles: still concise and not over-explained?
|
|
61
|
+
|
|
62
|
+
Use `testing-skills-with-subagents.md` for the full testing workflow.
|
|
63
|
+
|
|
64
|
+
## Structure and Discovery
|
|
65
|
+
|
|
66
|
+
### Frontmatter Requirements
|
|
67
|
+
|
|
68
|
+
`SKILL.md` needs YAML frontmatter with at least:
|
|
69
|
+
|
|
70
|
+
- `name`
|
|
71
|
+
- `description`
|
|
72
|
+
|
|
73
|
+
Keep the frontmatter short and discovery-focused.
|
|
74
|
+
|
|
75
|
+
### Naming Conventions
|
|
76
|
+
|
|
77
|
+
Use descriptive names that signal the action or domain.
|
|
78
|
+
|
|
79
|
+
Good:
|
|
80
|
+
|
|
81
|
+
- `writing-skills`
|
|
82
|
+
- `testing-skills-with-subagents`
|
|
83
|
+
- `managing-databases`
|
|
84
|
+
|
|
85
|
+
Avoid:
|
|
86
|
+
|
|
87
|
+
- `helper`
|
|
88
|
+
- `utils`
|
|
89
|
+
- `tools`
|
|
90
|
+
- vague collection names with no clear action or scope
|
|
91
|
+
|
|
92
|
+
### Writing Effective Descriptions
|
|
93
|
+
|
|
94
|
+
The description field is the primary discovery hook. It should tell the agent when the skill should be loaded.
|
|
95
|
+
|
|
96
|
+
Use this pattern:
|
|
97
|
+
|
|
98
|
+
- start with `Use when...`
|
|
99
|
+
- describe triggering conditions first
|
|
100
|
+
- include concrete keywords and symptoms
|
|
101
|
+
- keep it in third person
|
|
102
|
+
- avoid summarizing the internal workflow
|
|
103
|
+
|
|
104
|
+
Good:
|
|
105
|
+
|
|
106
|
+
```yaml
|
|
107
|
+
description: Use when analyzing Excel files, spreadsheets, tabular reports, or .xlsx data that need summaries, validation, or chart generation.
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Bad:
|
|
111
|
+
|
|
112
|
+
```yaml
|
|
113
|
+
description: Processes data.
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Bad:
|
|
117
|
+
|
|
118
|
+
```yaml
|
|
119
|
+
description: I can help you process spreadsheets.
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Discovery Coverage
|
|
123
|
+
|
|
124
|
+
Use words an agent would actually search for:
|
|
125
|
+
|
|
126
|
+
- symptoms
|
|
127
|
+
- synonyms
|
|
128
|
+
- tools, commands, libraries, and file types
|
|
129
|
+
- error messages or recurring failure phrases when relevant
|
|
130
|
+
|
|
131
|
+
### Cross-Referencing Other Skills
|
|
132
|
+
|
|
133
|
+
When another skill is relevant, refer to it by skill name and explain why it matters.
|
|
134
|
+
|
|
135
|
+
Good:
|
|
136
|
+
|
|
137
|
+
- `**Required background:** Understand test-driven development before using this skill.`
|
|
138
|
+
- `**Required sub-skill:** Use executing-plans to carry out this plan.`
|
|
139
|
+
|
|
140
|
+
Bad:
|
|
141
|
+
|
|
142
|
+
- references that force extra reading without saying why
|
|
143
|
+
|
|
144
|
+
## Progressive Disclosure
|
|
145
|
+
|
|
146
|
+
`SKILL.md` should act as an overview that points to deeper material only when needed.
|
|
147
|
+
|
|
148
|
+
Why this matters:
|
|
149
|
+
|
|
150
|
+
1. metadata is available for discovery
|
|
151
|
+
2. `SKILL.md` is read when the skill triggers
|
|
152
|
+
3. supporting files are read on demand
|
|
153
|
+
4. executable utilities can be run without loading their full source into context
|
|
154
|
+
|
|
155
|
+
Useful patterns:
|
|
156
|
+
|
|
157
|
+
- **High-level guide with references:** keep the main flow in `SKILL.md`, point to `reference.md` or `examples.md` for detail
|
|
158
|
+
- **Domain-specific organization:** split large references by domain so only the relevant file needs to be loaded
|
|
159
|
+
- **Conditional details:** put uncommon branches in supporting files instead of bloating the main file
|
|
160
|
+
|
|
161
|
+
Keep references shallow:
|
|
162
|
+
|
|
163
|
+
- Prefer one level deep from `SKILL.md`
|
|
164
|
+
- Avoid chains such as `SKILL.md -> advanced.md -> details.md`
|
|
165
|
+
- If a reference file grows past roughly 100 lines, add a short contents section near the top
|
|
166
|
+
|
|
167
|
+
## Authoring Patterns
|
|
168
|
+
|
|
169
|
+
### Use Workflows for Multi-Step Tasks
|
|
170
|
+
|
|
171
|
+
If success depends on order and verification, write an explicit workflow.
|
|
172
|
+
|
|
173
|
+
Example:
|
|
174
|
+
|
|
175
|
+
```text
|
|
176
|
+
Task Progress:
|
|
177
|
+
- [ ] Analyze inputs
|
|
178
|
+
- [ ] Build the plan or mapping
|
|
179
|
+
- [ ] Validate the plan
|
|
180
|
+
- [ ] Execute the change
|
|
181
|
+
- [ ] Verify the output
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### Build Feedback Loops In
|
|
185
|
+
|
|
186
|
+
Use a validate-fix-repeat pattern when errors are likely.
|
|
187
|
+
|
|
188
|
+
Example:
|
|
189
|
+
|
|
190
|
+
```markdown
|
|
191
|
+
1. Make the change
|
|
192
|
+
2. Validate immediately
|
|
193
|
+
3. If validation fails, fix and validate again
|
|
194
|
+
4. Proceed only when validation passes
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Use Templates Only When Shape Matters
|
|
198
|
+
|
|
199
|
+
Use strict templates when output format must be exact. Use flexible templates when adaptation is expected.
|
|
200
|
+
|
|
201
|
+
### Use Examples When Style Is Hard to Infer
|
|
202
|
+
|
|
203
|
+
One strong input/output example is usually better than several weak ones. Prefer realistic, directly reusable examples over contrived placeholders.
|
|
204
|
+
|
|
205
|
+
### Use Consistent Terminology
|
|
206
|
+
|
|
207
|
+
Pick one term and stick to it.
|
|
208
|
+
|
|
209
|
+
Good:
|
|
210
|
+
|
|
211
|
+
- always `API endpoint`
|
|
212
|
+
- always `field`
|
|
213
|
+
- always `extract`
|
|
214
|
+
|
|
215
|
+
Bad:
|
|
216
|
+
|
|
217
|
+
- mixing `URL`, `route`, `path`, and `endpoint`
|
|
218
|
+
- mixing `field`, `element`, `box`, and `control`
|
|
219
|
+
|
|
220
|
+
### Avoid Time-Sensitive Wording
|
|
221
|
+
|
|
222
|
+
Avoid guidance that will age badly.
|
|
223
|
+
|
|
224
|
+
Bad:
|
|
225
|
+
|
|
226
|
+
```markdown
|
|
227
|
+
If you are doing this before August 2025, use the old API.
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
Better:
|
|
231
|
+
|
|
232
|
+
```markdown
|
|
233
|
+
Use the v2 API endpoint.
|
|
234
|
+
|
|
235
|
+
The v1 API is deprecated and should only be referenced for legacy maintenance.
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
### Avoid Offering Too Many Equivalent Options
|
|
239
|
+
|
|
240
|
+
Do not give five interchangeable choices unless there is a real decision to make.
|
|
241
|
+
|
|
242
|
+
Bad:
|
|
243
|
+
|
|
244
|
+
```markdown
|
|
245
|
+
You can use library A, B, C, or D for this task.
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
Good:
|
|
249
|
+
|
|
250
|
+
```markdown
|
|
251
|
+
Use library A by default.
|
|
252
|
+
For scanned documents that need OCR, use library B instead.
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
## Supporting Files and Executable Assets
|
|
256
|
+
|
|
257
|
+
Only include scripts or utilities when they add real value.
|
|
258
|
+
|
|
259
|
+
Rules:
|
|
260
|
+
|
|
261
|
+
- say whether the agent should execute the script or read it as reference
|
|
262
|
+
- handle expected error conditions instead of punting everything back to the agent
|
|
263
|
+
- document constants instead of leaving magic values unexplained
|
|
264
|
+
- list required packages or external tools explicitly
|
|
265
|
+
- do not assume tools are already installed
|
|
266
|
+
|
|
267
|
+
Good:
|
|
268
|
+
|
|
269
|
+
```python
|
|
270
|
+
def process_file(path):
|
|
271
|
+
try:
|
|
272
|
+
with open(path) as handle:
|
|
273
|
+
return handle.read()
|
|
274
|
+
except FileNotFoundError:
|
|
275
|
+
with open(path, "w") as handle:
|
|
276
|
+
handle.write("")
|
|
277
|
+
return ""
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
Bad:
|
|
281
|
+
|
|
282
|
+
```python
|
|
283
|
+
def process_file(path):
|
|
284
|
+
return open(path).read()
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
Additional guidance:
|
|
288
|
+
|
|
289
|
+
- use verifiable intermediate outputs for high-risk or batch operations
|
|
290
|
+
- use visual analysis only when the task depends on layout or other visual structure
|
|
291
|
+
- file paths, file names, and structure matter because the skill behaves like a small filesystem bundle
|
|
292
|
+
- if a skill uses MCP tools, use fully qualified names such as `ServerName:tool_name`
|
|
293
|
+
|
|
294
|
+
## Evaluation and Iteration
|
|
295
|
+
|
|
296
|
+
Use a lightweight loop:
|
|
297
|
+
|
|
298
|
+
1. run representative tasks without the skill
|
|
299
|
+
2. identify the guidance that was actually missing
|
|
300
|
+
3. write the minimal instructions that close the gap
|
|
301
|
+
4. re-run the tasks and refine
|
|
302
|
+
|
|
303
|
+
When iterating, pay attention to:
|
|
304
|
+
|
|
305
|
+
- file-read order that suggests the structure is awkward
|
|
306
|
+
- references agents miss repeatedly
|
|
307
|
+
- sections constantly read that may belong in `SKILL.md`
|
|
308
|
+
- supporting files that are never used and may be unnecessary
|
|
309
|
+
|
|
310
|
+
Use `testing-skills-with-subagents.md` when you need the full RED-GREEN-REFACTOR test process rather than a light evaluation loop.
|
|
311
|
+
|
|
312
|
+
## Sanity Check
|
|
313
|
+
|
|
314
|
+
- [ ] `name` and `description` are specific and discoverable
|
|
315
|
+
- [ ] `SKILL.md` stays concise and scannable
|
|
316
|
+
- [ ] Supporting files exist only when they add real value
|
|
317
|
+
- [ ] References are shallow and easy to follow
|
|
318
|
+
- [ ] Examples are concrete and reusable
|
|
319
|
+
- [ ] Terminology is consistent
|
|
320
|
+
- [ ] Dependencies and tools are explicit
|
|
321
|
+
- [ ] The skill has been tested on the profiles and workflows you care about
|
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# Testing Portable Skill Guidance
|
|
2
|
+
|
|
3
|
+
Testing different variants of portable skill authoring guidance to find what actually makes agents discover and follow those rules under pressure.
|
|
4
|
+
|
|
5
|
+
## Contents
|
|
6
|
+
|
|
7
|
+
- Test Scenarios
|
|
8
|
+
- Documentation Variants to Test
|
|
9
|
+
- Testing Protocol
|
|
10
|
+
- Success Criteria
|
|
11
|
+
- Expected Results
|
|
12
|
+
- Next Steps
|
|
13
|
+
|
|
14
|
+
## Test Scenarios
|
|
15
|
+
|
|
16
|
+
### Scenario 1: Time Pressure + Confidence
|
|
17
|
+
|
|
18
|
+
```text
|
|
19
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
20
|
+
|
|
21
|
+
You need to add a new skill quickly. The task is urgent and you already know
|
|
22
|
+
roughly how skills are structured.
|
|
23
|
+
|
|
24
|
+
You could:
|
|
25
|
+
A) Start writing the new skill immediately
|
|
26
|
+
B) Read the skill authoring guide first, then write it
|
|
27
|
+
|
|
28
|
+
Time is tight. What do you do?
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### Scenario 2: Sunk Cost + Works Already
|
|
32
|
+
|
|
33
|
+
```text
|
|
34
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
35
|
+
|
|
36
|
+
You already drafted a new skill bundle and wrote most of `SKILL.md`.
|
|
37
|
+
It looks good. Then you remember there is a skill authoring guide with
|
|
38
|
+
authoring rules.
|
|
39
|
+
|
|
40
|
+
You would need to:
|
|
41
|
+
- Read the file
|
|
42
|
+
- Potentially revise names, descriptions, and references
|
|
43
|
+
|
|
44
|
+
Do you:
|
|
45
|
+
A) Read the skill authoring guide and reconcile the draft
|
|
46
|
+
B) Keep the working draft as-is
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Scenario 3: Authority + Speed Bias
|
|
50
|
+
|
|
51
|
+
```text
|
|
52
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
53
|
+
|
|
54
|
+
Your partner says: "Just copy the existing skill over. We do not need to spend
|
|
55
|
+
time checking the authoring rules right now."
|
|
56
|
+
|
|
57
|
+
You could:
|
|
58
|
+
A) Read the skill authoring guide and align the copied skill
|
|
59
|
+
B) Copy first and skip alignment
|
|
60
|
+
|
|
61
|
+
What do you do?
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Scenario 4: Familiarity + Efficiency
|
|
65
|
+
|
|
66
|
+
```text
|
|
67
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
68
|
+
|
|
69
|
+
You have edited several skills before and know the usual pattern.
|
|
70
|
+
You are about to rename a supporting file and update references.
|
|
71
|
+
|
|
72
|
+
Do you:
|
|
73
|
+
A) Check the skill authoring guide for naming, path, and reference rules
|
|
74
|
+
B) Rely on memory and keep moving
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Documentation Variants to Test
|
|
78
|
+
|
|
79
|
+
### NULL Baseline
|
|
80
|
+
|
|
81
|
+
No skill authoring guidance exists.
|
|
82
|
+
|
|
83
|
+
### Variant A: Soft Suggestion
|
|
84
|
+
|
|
85
|
+
```markdown
|
|
86
|
+
## Skill Authoring Guidelines
|
|
87
|
+
|
|
88
|
+
There is a skill authoring guide available.
|
|
89
|
+
Consider checking it when working on skills.
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Variant B: Directive
|
|
93
|
+
|
|
94
|
+
```markdown
|
|
95
|
+
## Skill Authoring Guidelines
|
|
96
|
+
|
|
97
|
+
Before creating or editing any skill, read the skill authoring guide.
|
|
98
|
+
Follow its naming, description, path, and compatibility rules.
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Variant C: Process-Oriented
|
|
102
|
+
|
|
103
|
+
```markdown
|
|
104
|
+
## Skill Authoring Workflow
|
|
105
|
+
|
|
106
|
+
For every skill task:
|
|
107
|
+
|
|
108
|
+
1. Read the skill authoring guide
|
|
109
|
+
2. Apply its structure and wording rules
|
|
110
|
+
3. Update related files in the same skill bundle
|
|
111
|
+
4. Verify the result still follows the compatibility order
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Testing Protocol
|
|
115
|
+
|
|
116
|
+
For each variant:
|
|
117
|
+
|
|
118
|
+
1. Run the NULL baseline first.
|
|
119
|
+
2. Run the variant against the same scenario.
|
|
120
|
+
3. Add pressure such as time, sunk cost, or authority.
|
|
121
|
+
4. Capture rationalizations if the agent ignores the guidance.
|
|
122
|
+
5. Ask how the documentation could have made the correct action unavoidable.
|
|
123
|
+
|
|
124
|
+
## Success Criteria
|
|
125
|
+
|
|
126
|
+
The variant succeeds if the agent:
|
|
127
|
+
|
|
128
|
+
- checks the skill authoring guide unprompted
|
|
129
|
+
- follows its rules before writing or editing the skill
|
|
130
|
+
- reconciles related files instead of changing only `SKILL.md`
|
|
131
|
+
- still complies under pressure
|
|
132
|
+
|
|
133
|
+
The variant fails if the agent:
|
|
134
|
+
|
|
135
|
+
- skips the guidance entirely
|
|
136
|
+
- treats the guidance as optional when copied content conflicts with the guide
|
|
137
|
+
- copies existing content without aligning it properly
|
|
138
|
+
- rationalizes away compliance under pressure
|
|
139
|
+
|
|
140
|
+
## Expected Results
|
|
141
|
+
|
|
142
|
+
**NULL:** fastest path wins and the guidance gets skipped.
|
|
143
|
+
|
|
144
|
+
**Variant A:** may work without pressure, likely fails under pressure.
|
|
145
|
+
|
|
146
|
+
**Variant B:** stronger compliance, but still vulnerable to rationalization.
|
|
147
|
+
|
|
148
|
+
**Variant C:** clearest process, strongest chance of consistent compliance.
|
|
149
|
+
|
|
150
|
+
## Next Steps
|
|
151
|
+
|
|
152
|
+
1. Run the baseline.
|
|
153
|
+
2. Test each variant with the same scenarios.
|
|
154
|
+
3. Compare compliance rates.
|
|
155
|
+
4. Capture rationalizations that break through.
|
|
156
|
+
5. Tighten the wording until the rules are followed consistently.
|