@intentsolutionsio/skill-creator 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/.claude-plugin/plugin.json +17 -0
  2. package/README.md +55 -0
  3. package/package.json +38 -0
  4. package/scripts/validate-skill.py +1132 -0
  5. package/skills/agent-creator/SKILL.md +305 -0
  6. package/skills/agent-creator/references/anthropic-agent-spec.md +89 -0
  7. package/skills/skill-creator/SKILL.md +267 -0
  8. package/skills/skill-creator/agents/analyzer.md +279 -0
  9. package/skills/skill-creator/agents/comparator.md +207 -0
  10. package/skills/skill-creator/agents/grader.md +228 -0
  11. package/skills/skill-creator/assets/eval_review.html +146 -0
  12. package/skills/skill-creator/eval-viewer/generate_review.py +471 -0
  13. package/skills/skill-creator/eval-viewer/viewer.html +1325 -0
  14. package/skills/skill-creator/references/advanced-eval-workflow.md +320 -0
  15. package/skills/skill-creator/references/anthropic-comparison.md +93 -0
  16. package/skills/skill-creator/references/ard-template.md +47 -0
  17. package/skills/skill-creator/references/creation-guide.md +305 -0
  18. package/skills/skill-creator/references/errors-template.md +27 -0
  19. package/skills/skill-creator/references/examples-template.md +40 -0
  20. package/skills/skill-creator/references/frontmatter-spec.md +531 -0
  21. package/skills/skill-creator/references/implementation-template.md +42 -0
  22. package/skills/skill-creator/references/output-patterns.md +193 -0
  23. package/skills/skill-creator/references/prd-template.md +55 -0
  24. package/skills/skill-creator/references/schemas.md +430 -0
  25. package/skills/skill-creator/references/source-of-truth.md +658 -0
  26. package/skills/skill-creator/references/validation-rules.md +528 -0
  27. package/skills/skill-creator/references/workflows.md +233 -0
  28. package/skills/skill-creator/scripts/__init__.py +0 -0
  29. package/skills/skill-creator/scripts/aggregate_benchmark.py +401 -0
  30. package/skills/skill-creator/scripts/generate_report.py +326 -0
  31. package/skills/skill-creator/scripts/improve_description.py +247 -0
  32. package/skills/skill-creator/scripts/package_skill.py +136 -0
  33. package/skills/skill-creator/scripts/quick_validate.py +103 -0
  34. package/skills/skill-creator/scripts/run_eval.py +344 -0
  35. package/skills/skill-creator/scripts/run_loop.py +329 -0
  36. package/skills/skill-creator/scripts/utils.py +47 -0
  37. package/skills/skill-creator/scripts/validate-skill.py +87 -0
  38. package/skills/skill-creator/templates/agent-template.md +99 -0
  39. package/skills/skill-creator/templates/skill-template.md +122 -0
@@ -0,0 +1,320 @@
1
+ # Advanced Eval Workflow
2
+
3
+ Detailed instructions for empirical skill evaluation, iteration, description optimization,
4
+ blind comparison, packaging, and platform-specific adaptations.
5
+
6
+ Read this reference when using the eval infrastructure (Steps E1-E5), improving skills
7
+ through iteration, running automated description optimization, or working on Claude.ai/Cowork.
8
+
9
+ ---
10
+
11
+ ## Running and Evaluating Test Cases
12
+
13
+ This extends Steps 7-8 with concrete tooling for empirical evaluation. The core idea: run
14
+ the skill on real prompts, compare against a baseline, grade the results, and show them to
15
+ the user in an interactive viewer.
16
+
17
+ Put results in `<skill-name>-workspace/` as a sibling to the skill directory. Within the
18
+ workspace, organize by iteration (`iteration-1/`, `iteration-2/`, etc.) and within that,
19
+ each test case gets a directory (`eval-0/`, `eval-1/`, etc.). Create directories as you go.
20
+
21
+ ### Step E1: Spawn all runs (with-skill AND baseline) in the same turn
22
+
23
+ For each test case, spawn two subagents in the same turn — one with the skill, one without.
24
+ Launch everything at once so runs finish around the same time.
25
+
26
+ **With-skill run:**
27
+ ```
28
+ Execute this task:
29
+ - Skill path: <path-to-skill>
30
+ - Task: <eval prompt>
31
+ - Input files: <eval files if any, or "none">
32
+ - Save outputs to: <workspace>/iteration-<N>/eval-<ID>/with_skill/outputs/
33
+ - Outputs to save: <what the user cares about>
34
+ ```
35
+
36
+ **Baseline run** (same prompt, no skill):
37
+ - **Creating a new skill**: no skill at all. Save to `without_skill/outputs/`.
38
+ - **Improving an existing skill**: snapshot the old version first (`cp -r`), point baseline
39
+ at the snapshot. Save to `old_skill/outputs/`.
40
+
41
+ Write an `eval_metadata.json` for each test case:
42
+ ```json
43
+ {
44
+ "eval_id": 0,
45
+ "eval_name": "descriptive-name-here",
46
+ "prompt": "The user's task prompt",
47
+ "assertions": []
48
+ }
49
+ ```
50
+
51
+ ### Step E2: While runs are in progress, draft assertions
52
+
53
+ Don't wait idle. Draft quantitative assertions for each test case and explain them to the
54
+ user. Good assertions are objectively verifiable and have descriptive names. Subjective
55
+ skills (writing style, design) are better evaluated qualitatively — don't force assertions
56
+ onto things that need human judgment.
57
+
58
+ Update `eval_metadata.json` files and `evals/evals.json` with the assertions. See
59
+ `${CLAUDE_SKILL_DIR}/references/schemas.md` for the full schema.
60
+
61
+ ### Step E3: As runs complete, capture timing data
62
+
63
+ When each subagent task completes, the notification contains `total_tokens` and
64
+ `duration_ms`. Save immediately to `timing.json` — this is the only opportunity to
65
+ capture this data:
66
+ ```json
67
+ {
68
+ "total_tokens": 84852,
69
+ "duration_ms": 23332,
70
+ "total_duration_seconds": 23.3
71
+ }
72
+ ```
73
+
74
+ ### Step E4: Grade, aggregate, and launch the viewer
75
+
76
+ Once all runs are done:
77
+
78
+ 1. **Grade each run** — spawn a grader subagent that reads
79
+ `${CLAUDE_SKILL_DIR}/agents/grader.md` and evaluates each assertion against the outputs.
80
+ Save results to `grading.json` in each run directory. The grading.json expectations array
81
+ must use fields `text`, `passed`, and `evidence` — the viewer depends on these exact field
82
+ names. For programmatically checkable assertions, write and run a script rather than
83
+ eyeballing it.
84
+
85
+ 2. **Aggregate into benchmark**:
86
+ ```bash
87
+ python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
88
+ ```
89
+ This produces `benchmark.json` and `benchmark.md` with pass_rate, time, and tokens for
90
+ each configuration, with mean +/- stddev and the delta. If generating benchmark.json
91
+ manually, see `${CLAUDE_SKILL_DIR}/references/schemas.md` for the exact schema the
92
+ viewer expects.
93
+
94
+ 3. **Analyst pass** — read the benchmark data and surface patterns. See
95
+ `${CLAUDE_SKILL_DIR}/agents/analyzer.md` (the "Analyzing Benchmark Results" section) for
96
+ what to look for — non-discriminating assertions, high-variance evals, time/token tradeoffs.
97
+
98
+ 4. **Launch the viewer**:
99
+ ```bash
100
+ nohup python ${CLAUDE_SKILL_DIR}/eval-viewer/generate_review.py \
101
+ <workspace>/iteration-N \
102
+ --skill-name "my-skill" \
103
+ --benchmark <workspace>/iteration-N/benchmark.json \
104
+ > /dev/null 2>&1 &
105
+ VIEWER_PID=$!
106
+ ```
107
+ For iteration 2+, also pass `--previous-workspace <workspace>/iteration-<N-1>`.
108
+
109
+ **Headless/Cowork:** Use `--static <output_path>` to write standalone HTML instead of
110
+ starting a server.
111
+
112
+ 5. **Tell the user** the viewer is open. They'll see an "Outputs" tab (per-test-case
113
+ feedback) and a "Benchmark" tab (quantitative comparison).
114
+
115
+ ### What the user sees in the viewer
116
+
117
+ The "Outputs" tab shows one test case at a time:
118
+ - **Prompt**: the task that was given
119
+ - **Output**: the files the skill produced, rendered inline where possible
120
+ - **Previous Output** (iteration 2+): collapsed section showing last iteration's output
121
+ - **Formal Grades** (if grading was run): collapsed section showing assertion pass/fail
122
+ - **Feedback**: a textbox that auto-saves as they type
123
+ - **Previous Feedback** (iteration 2+): their comments from last time
124
+
125
+ The "Benchmark" tab shows the stats summary: pass rates, timing, and token usage for each
126
+ configuration, with per-eval breakdowns and analyst observations.
127
+
128
+ Navigation is via prev/next buttons or arrow keys. When done, "Submit All Reviews" saves
129
+ all feedback to `feedback.json`.
130
+
131
+ ### Step E5: Read the feedback
132
+
133
+ When the user is done, read `feedback.json`:
134
+ ```json
135
+ {
136
+ "reviews": [
137
+ {"run_id": "eval-0-with_skill", "feedback": "the chart is missing axis labels", "timestamp": "..."},
138
+ {"run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..."}
139
+ ],
140
+ "status": "complete"
141
+ }
142
+ ```
143
+ Empty feedback means the user thought it was fine. Focus improvements on test cases with
144
+ specific complaints. Kill the viewer server when done: `kill $VIEWER_PID 2>/dev/null`.
145
+
146
+ ---
147
+
148
+ ## Improving the Skill
149
+
150
+ After running test cases and collecting feedback, improve the skill based on what you learned.
151
+
152
+ ### Key principles
153
+
154
+ 1. **Generalize from feedback.** Skills may be invoked millions of times across many prompts.
155
+ Don't overfit to the few test examples — generalize from failures to broader categories of
156
+ user intent. Rather than fiddly overfitty changes or oppressive MUSTs, try branching out
157
+ with different metaphors or patterns. It's cheap to experiment.
158
+
159
+ 2. **Keep the prompt lean.** Remove what isn't pulling its weight. Read transcripts, not just
160
+ outputs — if the skill makes the model waste time on unproductive steps, remove those
161
+ instructions.
162
+
163
+ 3. **Explain the why.** Try to explain *why* things matter rather than just prescribing rules.
164
+ Today's LLMs are smart — they have good theory of mind and when given a good harness can
165
+ go beyond rote instructions. If you find yourself writing ALWAYS or NEVER in all caps,
166
+ reframe and explain the reasoning. That's more humane, powerful, and effective.
167
+
168
+ 4. **Look for repeated work.** If all test runs independently wrote similar helper scripts,
169
+ that's a signal the skill should bundle that script in `scripts/`. Write it once and tell
170
+ the skill to use it.
171
+
172
+ ### The iteration loop
173
+
174
+ After improving the skill:
175
+ 1. Apply improvements to the skill
176
+ 2. Rerun all test cases into a new `iteration-<N+1>/` directory, including baselines
177
+ 3. Launch the reviewer with `--previous-workspace` pointing at the previous iteration
178
+ 4. Wait for the user to review and tell you they're done
179
+ 5. Read new feedback, improve again, repeat
180
+
181
+ Keep going until:
182
+ - The user says they're happy
183
+ - The feedback is all empty (everything looks good)
184
+ - You're not making meaningful progress
185
+
186
+ ---
187
+
188
+ ## Description Optimization (Automated)
189
+
190
+ The description field is the primary mechanism determining whether Claude invokes a skill.
191
+ After creating or improving a skill, offer to optimize the description for better triggering
192
+ accuracy.
193
+
194
+ ### Step D1: Generate trigger eval queries
195
+
196
+ Create 20 eval queries — a mix of should-trigger and should-not-trigger. Save as JSON:
197
+ ```json
198
+ [
199
+ {"query": "the user prompt", "should_trigger": true},
200
+ {"query": "another prompt", "should_trigger": false}
201
+ ]
202
+ ```
203
+
204
+ Queries must be realistic — concrete, specific, with details like file paths, personal
205
+ context, column names. Include casual speech, typos, abbreviations. Focus on edge cases
206
+ rather than clear-cut examples.
207
+
208
+ For **should-trigger** (8-10): different phrasings of the same intent, cases where the user
209
+ doesn't name the skill but clearly needs it, uncommon use cases, cases where this skill
210
+ competes with another but should win.
211
+
212
+ For **should-not-trigger** (8-10): near-misses — queries sharing keywords but needing
213
+ something different. Adjacent domains, ambiguous phrasing where naive keyword match would
214
+ trigger but shouldn't. Don't make negatives obviously irrelevant.
215
+
216
+ Bad: `"Format this data"`, `"Extract text from PDF"`, `"Create a chart"`
217
+ Good: `"ok so my boss just sent me this xlsx file (its in my downloads, called something like 'Q4 sales final FINAL v2.xlsx') and she wants me to add a column that shows the profit margin"`
218
+
219
+ ### Step D2: Review with user
220
+
221
+ Present the eval set using the HTML template:
222
+ 1. Read `${CLAUDE_SKILL_DIR}/assets/eval_review.html`
223
+ 2. Replace placeholders: `__EVAL_DATA_PLACEHOLDER__` (JSON array, no quotes — it's a JS
224
+ variable assignment), `__SKILL_NAME_PLACEHOLDER__`, `__SKILL_DESCRIPTION_PLACEHOLDER__`
225
+ 3. Write to `/tmp/eval_review_<skill-name>.html` and open it
226
+ 4. User edits queries, toggles triggers, clicks "Export Eval Set"
227
+ 5. Read the exported file from `~/Downloads/eval_set.json` (check for `eval_set (1).json` etc.)
228
+
229
+ ### Step D3: Run the optimization loop
230
+
231
+ Tell the user this will take some time. Save the eval set to the workspace, then run:
232
+
233
+ ```bash
234
+ python -m scripts.run_loop \
235
+ --eval-set <path-to-trigger-eval.json> \
236
+ --skill-path <path-to-skill> \
237
+ --model <model-id-powering-this-session> \
238
+ --max-iterations 5 \
239
+ --verbose
240
+ ```
241
+
242
+ Use the model ID from your system prompt so the triggering test matches what the user
243
+ actually experiences.
244
+
245
+ This splits 60% train / 40% test, evaluates the current description (3 runs per query for
246
+ reliability), proposes improvements based on failures, and iterates up to 5 times. Returns
247
+ JSON with `best_description` selected by test score to avoid overfitting.
248
+
249
+ ### How skill triggering works
250
+
251
+ Skills appear in Claude's `available_skills` list with their name + description. Claude
252
+ decides whether to consult a skill based on that description. Important: Claude only consults
253
+ skills for tasks it can't easily handle alone — simple one-step queries may not trigger even
254
+ if the description matches. Complex, multi-step, or specialized queries reliably trigger when
255
+ the description matches.
256
+
257
+ Eval queries should be substantive enough that Claude would benefit from consulting a skill.
258
+
259
+ ### Step D4: Apply the result
260
+
261
+ Take `best_description` from the JSON output and update the skill's SKILL.md frontmatter.
262
+ Show the user before/after and report the scores.
263
+
264
+ ---
265
+
266
+ ## Advanced: Blind Comparison
267
+
268
+ For rigorous comparison between two skill versions, use the blind comparison system. Read
269
+ `${CLAUDE_SKILL_DIR}/agents/comparator.md` and `${CLAUDE_SKILL_DIR}/agents/analyzer.md` for
270
+ details.
271
+
272
+ The basic idea: give two outputs to an independent agent without telling it which is which,
273
+ and let it judge quality. Then analyze why the winner won.
274
+
275
+ This is optional, requires subagents, and most users won't need it. The human review loop
276
+ is usually sufficient.
277
+
278
+ ---
279
+
280
+ ## Packaging
281
+
282
+ Package a skill into a distributable `.skill` file:
283
+
284
+ ```bash
285
+ python -m scripts.package_skill <path/to/skill-folder> [output-directory]
286
+ ```
287
+
288
+ This validates the skill first, then creates a zip archive excluding `__pycache__`,
289
+ `node_modules`, `evals/`, and `.DS_Store`.
290
+
291
+ Only package if `present_files` tool is available or the user requests it. After packaging,
292
+ direct the user to the resulting `.skill` file path so they can install it.
293
+
294
+ ---
295
+
296
+ ## Platform-Specific Notes
297
+
298
+ ### Claude.ai
299
+
300
+ The core workflow (draft -> test -> review -> improve) is the same, but without subagents:
301
+ - **Test cases**: Run them yourself one at a time. Skip baseline runs.
302
+ - **Review**: Present results directly in conversation. Save output files and tell the user
303
+ where they are.
304
+ - **Benchmarking**: Skip quantitative benchmarking — it relies on baselines.
305
+ - **Description optimization**: Requires `claude` CLI — skip on Claude.ai.
306
+ - **Packaging**: `package_skill.py` works anywhere with Python.
307
+ - **Updating existing skills**: Preserve the original name. Copy to `/tmp/` before editing
308
+ (installed path may be read-only). If packaging manually, stage in `/tmp/` first.
309
+
310
+ ### Cowork
311
+
312
+ - You have subagents — the full workflow works, though you may need to run test prompts in
313
+ series if timeouts occur.
314
+ - No browser: use `--static <output_path>` for the eval viewer and share the HTML file link.
315
+ - Always generate the eval viewer BEFORE evaluating inputs yourself — get results in front
316
+ of the human ASAP.
317
+ - Feedback: "Submit All Reviews" downloads `feedback.json` as a file.
318
+ - Description optimization (`run_loop.py`) works since it uses `claude -p` via subprocess.
319
+ - Save description optimization until the skill is fully finished and the user agrees it's
320
+ in good shape.
@@ -0,0 +1,93 @@
1
+ # Gap Analysis: Our Skill Creator vs Anthropic Standards
2
+
3
+ Sources: [AgentSkills.io spec](https://agentskills.io/specification) · [Anthropic docs](https://code.claude.com/docs/en/skills) · [anthropics/skills repo](https://github.com/anthropics/skills)
4
+
5
+ Comparison of our skill-creator implementation against:
6
+ - AgentSkills.io specification (canonical open standard)
7
+ - Anthropic best practices (platform.claude.com)
8
+ - anthropics/skills official skill-creator
9
+ - Claude Code runtime extensions
10
+
11
+ ---
12
+
13
+ ## What We Over-Specify (Relaxed or Removed)
14
+
15
+ | Issue | Our Requirement | Anthropic Standard | Resolution |
16
+ |-------|----------------|-------------------|------------|
17
+ | `version` as required frontmatter | Top-level required field | Not in spec; use `metadata.version` | Kept as top-level field (marketplace validator scores at top-level); AgentSkills.io spec also allows under metadata |
18
+ | `author` as required frontmatter | Top-level required field | Not in spec; use `metadata.author` | Kept as top-level field (marketplace validator scores at top-level); AgentSkills.io spec also allows under metadata |
19
+ | `license` as required | Required in Enterprise tier | Optional in AgentSkills.io spec | Optional (Enterprise recommends) |
20
+ | `tags` field | Listed as optional | Not in official spec at all | Kept as top-level field (used by marketplace and discovery) |
21
+ | Mandatory "Use when" phrase | Error if missing | Natural language, no exact phrase | Recommend but don't enforce exact wording |
22
+ | Mandatory "Trigger with" phrase | Error if missing | Not an Anthropic requirement | Removed as requirement |
23
+ | 8 mandatory body sections | Error if any missing | "No format restrictions" | Recommended sections, not mandatory |
24
+ | Unscoped Bash as error | Hard error | Experimental feature (allowed-tools) | Warning in Standard tier, error in Enterprise |
25
+ | Hardcoded model ID | `claude-opus-4-5-20251101` | Use `inherit` or omit | Default to `inherit` |
26
+
27
+ ---
28
+
29
+ ## What We're Missing (Added)
30
+
31
+ | Feature | Source | Priority | Status |
32
+ |---------|--------|----------|--------|
33
+ | `compatibility` field | AgentSkills.io spec | Medium | Added to frontmatter spec |
34
+ | `argument-hint` field | Claude Code extension | Medium | Added to frontmatter spec |
35
+ | `user-invocable` field | Claude Code extension | Medium | Added to frontmatter spec |
36
+ | `context: fork` field | Claude Code extension | High | Added to frontmatter spec |
37
+ | `agent` field (subagent type) | Claude Code extension | High | Added to frontmatter spec |
38
+ | `hooks` field (skill-scoped) | Claude Code extension | Medium | Added to frontmatter spec |
39
+ | String substitutions | Claude Code runtime | High | Added to source-of-truth |
40
+ | Dynamic context injection (`` !`cmd` ``) | Claude Code runtime | Medium | Added to source-of-truth |
41
+ | Degrees of freedom concept | Anthropic engineering blog | High | Added to SKILL.md instructions |
42
+ | Evaluation-driven development | Anthropic best practices | High | Added to SKILL.md instructions |
43
+ | Claude A/B iterative methodology | Anthropic best practices | Medium | Referenced in evaluation section |
44
+ | Gerund naming recommendation | Anthropic best practices | Low | Added to naming guidance |
45
+ | MCP tool references | Claude Code runtime | Medium | Added to allowed-tools docs |
46
+ | Plan-validate-execute pattern | Anthropic best practices | Medium | Added to workflows.md |
47
+ | Visual output generation pattern | Anthropic best practices | Low | Added to output-patterns.md |
48
+ | Token budget awareness | Claude Code runtime | High | Added to source-of-truth |
49
+ | `SLASH_COMMAND_TOOL_CHAR_BUDGET` | Claude Code internals | Low | Documented in token budget |
50
+ | Anthropic official validation checklist | platform.claude.com | High | Integrated into validation |
51
+ | `metadata` as home for author/version | AgentSkills.io spec | High | Primary recommendation |
52
+
53
+ ---
54
+
55
+ ## What We Do Well (Kept)
56
+
57
+ | Strength | Value | Kept In |
58
+ |----------|-------|---------|
59
+ | Interactive wizard | Great UX for skill creation | SKILL.md Step 1 (streamlined) |
60
+ | Validation script | Automated quality assurance | validate-skill.py (rewritten) |
61
+ | Error handling table format | Clear, scannable error docs | Templates and examples |
62
+ | Quality grading system | Enterprise accountability | Enterprise tier (default) |
63
+ | Template with placeholders | Fast skill scaffolding | skill-template.md (updated) |
64
+ | Multiple validation tiers | Flexible strictness | Standard + Enterprise tiers |
65
+ | Resource existence checking | Catches broken references | validate-skill.py |
66
+ | `{baseDir}` path convention | Portable, no absolute paths | All templates and docs |
67
+ | Scoped Bash enforcement | Security best practice | Enterprise tier default |
68
+
69
+ ---
70
+
71
+ ## Migration Summary
72
+
73
+ ### For Existing Skills
74
+
75
+ Existing skills that pass our old validator will mostly pass the new one because:
76
+ - Enterprise tier (default) still checks `metadata.author`, `metadata.version`, scoped tools
77
+ - The body section checks are warnings, not errors
78
+ - "Use when" / "Trigger with" are now recommended patterns, not hard requirements
79
+
80
+ ### Breaking Changes
81
+
82
+ 1. `version`, `author`, and `tags` are top-level fields (marketplace validator scores them here; AgentSkills.io spec also allows under `metadata`)
83
+ 2. Hardcoded model IDs trigger a warning (use `inherit` or short names)
84
+
85
+ ### New Capabilities
86
+
87
+ Skills can now use:
88
+ - `$ARGUMENTS` for dynamic input
89
+ - `context: fork` for subagent execution
90
+ - `hooks` for lifecycle automation
91
+ - `compatibility` for environment requirements
92
+ - `argument-hint` for better autocomplete UX
93
+ - `effort` for model reasoning override (v2.1.80)
@@ -0,0 +1,47 @@
1
+ # ARD Template
2
+
3
+ Standard Architecture Requirements Document for all marketplace skills. Every ARD.md MUST follow this exact structure. No sections added, no sections removed.
4
+
5
+ ---
6
+
7
+ ```markdown
8
+ # ARD: {Skill Name}
9
+
10
+ > Part of [Tons of Skills](https://tonsofskills.com) by [Intent Solutions](https://intentsolutions.io) | [jeremylongshore.com](https://jeremylongshore.com)
11
+
12
+ ## System Context
13
+
14
+ {Where does this skill fit? What systems/services does it interact with? Include a simple text diagram if helpful.}
15
+
16
+ ## Data Flow
17
+
18
+ 1. **Input**: {What the skill receives — user request, files, context}
19
+ 2. **Processing**: {What happens — steps, transformations, API calls}
20
+ 3. **Output**: {What the skill produces — files, reports, deployments}
21
+
22
+ ## Key Design Decisions
23
+
24
+ | Decision | Choice | Rationale |
25
+ |----------|--------|-----------|
26
+ | {What was decided} | {What was chosen} | {Why this over alternatives} |
27
+ | {What was decided} | {What was chosen} | {Why} |
28
+
29
+ ## Tool Usage Pattern
30
+
31
+ | Tool | Purpose |
32
+ |------|---------|
33
+ | {tool from allowed-tools} | {Why this skill uses it} |
34
+ | {tool} | {Purpose} |
35
+
36
+ ## Error Handling Strategy
37
+
38
+ | Error Class | Detection | Recovery |
39
+ |------------|-----------|----------|
40
+ | {Category of error} | {How it's detected} | {What the skill does about it} |
41
+ | {Category} | {Detection} | {Recovery} |
42
+
43
+ ## Extension Points
44
+
45
+ - {How users can customize or extend this skill's behavior}
46
+ - {Integration points with other skills or tools}
47
+ ```