@intentsolutionsio/skill-creator 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/.claude-plugin/plugin.json +17 -0
  2. package/README.md +55 -0
  3. package/package.json +38 -0
  4. package/scripts/validate-skill.py +1132 -0
  5. package/skills/agent-creator/SKILL.md +305 -0
  6. package/skills/agent-creator/references/anthropic-agent-spec.md +89 -0
  7. package/skills/skill-creator/SKILL.md +267 -0
  8. package/skills/skill-creator/agents/analyzer.md +279 -0
  9. package/skills/skill-creator/agents/comparator.md +207 -0
  10. package/skills/skill-creator/agents/grader.md +228 -0
  11. package/skills/skill-creator/assets/eval_review.html +146 -0
  12. package/skills/skill-creator/eval-viewer/generate_review.py +471 -0
  13. package/skills/skill-creator/eval-viewer/viewer.html +1325 -0
  14. package/skills/skill-creator/references/advanced-eval-workflow.md +320 -0
  15. package/skills/skill-creator/references/anthropic-comparison.md +93 -0
  16. package/skills/skill-creator/references/ard-template.md +47 -0
  17. package/skills/skill-creator/references/creation-guide.md +305 -0
  18. package/skills/skill-creator/references/errors-template.md +27 -0
  19. package/skills/skill-creator/references/examples-template.md +40 -0
  20. package/skills/skill-creator/references/frontmatter-spec.md +531 -0
  21. package/skills/skill-creator/references/implementation-template.md +42 -0
  22. package/skills/skill-creator/references/output-patterns.md +193 -0
  23. package/skills/skill-creator/references/prd-template.md +55 -0
  24. package/skills/skill-creator/references/schemas.md +430 -0
  25. package/skills/skill-creator/references/source-of-truth.md +658 -0
  26. package/skills/skill-creator/references/validation-rules.md +528 -0
  27. package/skills/skill-creator/references/workflows.md +233 -0
  28. package/skills/skill-creator/scripts/__init__.py +0 -0
  29. package/skills/skill-creator/scripts/aggregate_benchmark.py +401 -0
  30. package/skills/skill-creator/scripts/generate_report.py +326 -0
  31. package/skills/skill-creator/scripts/improve_description.py +247 -0
  32. package/skills/skill-creator/scripts/package_skill.py +136 -0
  33. package/skills/skill-creator/scripts/quick_validate.py +103 -0
  34. package/skills/skill-creator/scripts/run_eval.py +344 -0
  35. package/skills/skill-creator/scripts/run_loop.py +329 -0
  36. package/skills/skill-creator/scripts/utils.py +47 -0
  37. package/skills/skill-creator/scripts/validate-skill.py +87 -0
  38. package/skills/skill-creator/templates/agent-template.md +99 -0
  39. package/skills/skill-creator/templates/skill-template.md +122 -0
@@ -0,0 +1,305 @@
1
+ # Skill Creation Guide — Detailed Steps
2
+
3
+ This reference covers the detailed implementation steps for creating production-grade skills.
4
+ Referred from the main SKILL.md Steps 4-10.
5
+
6
+ ## Step 4: Write SKILL.md
7
+
8
+ Generate the SKILL.md using the template from `${CLAUDE_SKILL_DIR}/templates/skill-template.md`.
9
+
10
+ **Frontmatter rules** (see `${CLAUDE_SKILL_DIR}/references/frontmatter-spec.md`):
11
+
12
+ Required fields:
13
+ ```yaml
14
+ name: {skill-name} # Must match directory name
15
+ description: | # Third person, what + when + keywords
16
+ {What it does}. Use when {scenario}.
17
+ Trigger with "/{skill-name}" or "{natural phrase}".
18
+ ```
19
+
20
+ **Frontmatter constraints (Anthropic spec):**
21
+ - `name`: No XML tags (`<`, `>` characters prohibited). No reserved words (`anthropic`, `claude`) in isolation.
22
+ - `description`: No XML tags. Description is injected into Claude's system prompt — third person prevents discovery issues where Claude speaks as the skill author.
23
+
24
+ Identity fields (top-level — marketplace validator scores these here):
25
+ ```yaml
26
+ version: 1.0.0
27
+ author: {name} <{email}>
28
+ license: MIT
29
+ ```
30
+
31
+ **IMPORTANT**: `version`, `author`, `license`, `tags`, and `compatible-with` are TOP-LEVEL fields.
32
+ Do NOT nest them under `metadata:`. The marketplace 100-point validator checks them at top-level.
33
+
34
+ Recommended fields:
35
+ ```yaml
36
+ allowed-tools: "{scoped tools}"
37
+ model: inherit
38
+ ```
39
+
40
+ Optional Claude Code extensions:
41
+ ```yaml
42
+ argument-hint: "[arg]" # If accepts $ARGUMENTS
43
+ context: fork # If needs isolated execution
44
+ agent: general-purpose # Subagent type (with context: fork)
45
+ disable-model-invocation: true # If explicit /name only (no auto-activation)
46
+ user-invocable: false # If background knowledge only
47
+ compatibility: "Python 3.10+" # If environment-specific
48
+ compatible-with: claude-code, codex # Platforms this works on
49
+ tags: [devops, ci] # Discovery tags
50
+ ```
51
+
52
+ **Description writing — maximize discoverability scoring:**
53
+
54
+ Descriptions determine activation AND marketplace grade. "Use when"/"Trigger with" scoring is enterprise-tier only (marketplace grading). Standard tier does not penalize for missing these patterns. However, they remain best practices for discoverability regardless of tier.
55
+
56
+ ```yaml
57
+ # Good - scores +6 pts on enterprise marketplace grading
58
+ description: |
59
+ Analyze Python code for security vulnerabilities. Use when reviewing code
60
+ before deployment. Trigger with "/security-scan" or "scan for vulnerabilities".
61
+
62
+ # Acceptable at standard tier, but loses 6 pts at enterprise tier
63
+ description: |
64
+ Analyzes code for security issues.
65
+ ```
66
+
67
+ Pattern (enterprise): "Use when [scenario]" (+3 pts) + "Trigger with [phrases]" (+3 pts) + "Make sure to use whenever..." for aggressive claiming.
68
+
69
+ **Token budget awareness:** All installed skill descriptions load at startup (~100 tokens each). The total skill list is capped at ~15,000 characters (`SLASH_COMMAND_TOOL_CHAR_BUDGET`). Keep descriptions impactful but efficient.
70
+
71
+ **Body content guidelines — section recommendations:**
72
+
73
+ Anthropic's spec places no format restrictions on body content. The sections below are enterprise-tier quality recommendations scored by the Intent Solutions marketplace rubric. At standard tier, these are not required but are still good practice:
74
+ ```
75
+ ## Overview (>50 chars content: +4 pts enterprise)
76
+ ## Prerequisites (+2 pts enterprise)
77
+ ## Instructions (numbered steps: +3 pts enterprise)
78
+ ## Output (+2 pts enterprise)
79
+ ## Error Handling (+2 pts enterprise)
80
+ ## Examples (+2 pts enterprise)
81
+ ## Resources (+1 pt enterprise)
82
+ 5+ sections total: +2 pts bonus (enterprise)
83
+ ```
84
+
85
+ Additional guidelines:
86
+ - Keep under 500 lines (offload to `references/` if longer)
87
+ - Concise — Claude is smart, don't over-explain
88
+ - Concrete examples over abstract descriptions
89
+ - Reference supporting files with relative markdown links: `[details](reference.md)` or `[API](references/api.md)` — Claude reads these on demand
90
+ - Use `${CLAUDE_SKILL_DIR}/` in DCI/bash contexts only: exclamation + backtick-wrapped command, e.g. `cat ${CLAUDE_SKILL_DIR}/references/config.md`
91
+ - Sections >20 lines (Output, Error Handling, Examples) → offload to `references/` with relative links
92
+ - If skill has 3+ distinct user operations → split into individual `commands/*.md` files
93
+ - Add DCI for common discovery: file existence checks, git status, tool version detection
94
+ - Include edge cases that actually matter
95
+ - No time-sensitive information — use an 'old patterns' section for deprecated approaches that users may encounter
96
+ - Consistent terminology throughout — pick one term per concept and use it everywhere
97
+ - Include feedback loops for quality-critical workflows (run validator -> fix -> repeat until passing)
98
+ - No TOC in SKILL.md body (wastes tokens). For reference files >100 lines, include a TOC at the top
99
+ - Checklist workflow pattern: for complex multi-step processes, include a copy-pasteable checklist
100
+ - **No surprise behavior**: Skills must not contain malware, exploit code, or content that could compromise security. A skill's behavior should not surprise the user if described honestly
101
+
102
+ **String substitutions available:**
103
+ - `$ARGUMENTS` / `$0`, `$1` - user-provided arguments (pair with `argument-hint` frontmatter)
104
+ - `${CLAUDE_SESSION_ID}` - current session ID
105
+ - `` !`command` `` syntax — dynamic context injection (Anthropic spec feature):
106
+ - Runs shell command at skill activation time, injects stdout into body
107
+ - **Use for**: always-needed, small references (<5KB) — e.g., `!`cat ${CLAUDE_SKILL_DIR}/references/config.md``
108
+ - **Don't use for**: large references (>5KB), conditional content, or anything that varies by mode
109
+ - Conditional or large references → keep manual `Load ${CLAUDE_SKILL_DIR}/references/...` instructions
110
+
111
+ ## Step 5: Create Supporting Files
112
+
113
+ **Scripts** (`scripts/`):
114
+ - Scripts should solve problems, not punt to Claude
115
+ - Explicit error handling
116
+ - No voodoo constants (document all magic values)
117
+ - List required packages
118
+ - Make executable: `chmod +x scripts/*.py`
119
+
120
+ **References** (`references/`):
121
+ - Heavy documentation that doesn't need to load at activation
122
+ - Use clear section headers for navigability
123
+ - For reference files >100 lines, include a TOC at the top so Claude can see full scope even with partial reads
124
+ - One-level-deep references only (no `references/sub/dir/`)
125
+
126
+ **Templates** (`templates/`):
127
+ - Boilerplate files used for generation
128
+ - Use clear placeholder syntax (`{{PLACEHOLDER}}`)
129
+
130
+ **Assets** (`assets/`):
131
+ - Static resources (images, configs, data files)
132
+
133
+ ## Step 6: Validate
134
+
135
+ Run validation (see `${CLAUDE_SKILL_DIR}/references/validation-rules.md`):
136
+
137
+ ```bash
138
+ python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py {skill-dir}/SKILL.md
139
+ python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {skill-dir}/SKILL.md
140
+ ```
141
+
142
+ Standard tier is the default (no required fields, broad compatibility). Use `--enterprise` for full 100-point marketplace grading.
143
+
144
+ **Validation checks:**
145
+ - Frontmatter: required fields, types, constraints
146
+ - Description: third person, what + when, keywords, length
147
+ - Body: under 500 lines, no absolute paths, has instructions + examples
148
+ - Tools: valid names, scoped Bash
149
+ - Resources: all `${CLAUDE_SKILL_DIR}/` references exist
150
+ - Anti-patterns: Windows paths, nested refs, hardcoded model IDs
151
+ - Progressive disclosure: appropriate use of references/
152
+
153
+ **If validation fails:** fix issues and re-run. Common fixes:
154
+ - Scope Bash tools: `Bash(git:*)` not `Bash`
155
+ - Remove absolute paths, use `${CLAUDE_SKILL_DIR}/`
156
+ - Split long SKILL.md into references
157
+ - Add missing sections (Overview, Prerequisites, Output)
158
+ - Move author/version to top-level if nested in metadata
159
+
160
+ ## Step 7: Test & Evaluate
161
+
162
+ Create `evals/evals.json` with minimum 3 scenarios: happy path, edge case, negative test.
163
+
164
+ ```json
165
+ [
166
+ {"name": "basic_usage", "prompt": "Trigger prompt", "assertions": ["Expected behavior"]},
167
+ {"name": "edge_case", "prompt": "Edge case prompt", "assertions": ["Expected handling"]},
168
+ {"name": "negative_test", "prompt": "Should NOT trigger", "assertions": ["Skill inactive"]}
169
+ ]
170
+ ```
171
+
172
+ Run parallel evaluation: Claude A with skill installed vs Claude B without. Compare outputs against assertions — the skill should produce meaningfully better results for its target use cases.
173
+
174
+ **Additional testing practices:**
175
+ - **Team feedback**: If applicable, share the skill with teammates and observe usage patterns
176
+ - **Observe Claude navigation**: Watch how Claude reads and navigates the skill — look for unexpected exploration paths, missed references, or overreliance on certain sections
177
+
178
+ ## Step 8: Iterate
179
+
180
+ 1. Review which assertions passed/failed
181
+ 2. Modify SKILL.md instructions, examples, or constraints
182
+ 3. Re-validate with `validate-skill.py --grade`
183
+ 4. Re-test evals until all assertions pass
184
+
185
+ Common fixes: undertriggering -> pushier description, wrong format -> explicit output examples, over-constraining -> increase degrees of freedom.
186
+
187
+ **Look for repeated work across test cases**: Read transcripts from test runs. If all test cases independently wrote similar helper scripts or took the same multi-step approach, that's a signal the skill should bundle that script in `scripts/`. Write it once and tell the skill to use it.
188
+
189
+ ## Step 9: Optimize Description
190
+
191
+ Create 20 trigger evaluation queries (10 should-trigger, 10 should-not-trigger). Split into train (14) and test (6) sets. Iterate description until >90% accuracy on both sets.
192
+
193
+ **How skill triggering works:** Skills appear in Claude's available_skills list with their name + description. Claude decides whether to consult a skill based on that description. Important: Claude only consults skills for tasks it can't easily handle on its own — simple, one-step queries may not trigger a skill even if the description matches perfectly. Complex, multi-step, or specialized queries reliably trigger skills when the description matches. Design eval queries accordingly — make them substantive enough that Claude would benefit from consulting a skill.
194
+
195
+ Tips: front-load distinctive keywords, include specific file types/tools/domains, add "Use when...", "Trigger with...", "Make sure to use whenever..." patterns. Avoid generic terms that overlap with other skills. Ensure no XML tags (`<`, `>`) appear in the description.
196
+
197
+ ## Step 10: Report
198
+
199
+ Show the user:
200
+ ```
201
+ SKILL CREATED
202
+ ====================================
203
+
204
+ Location: {full path}
205
+
206
+ Files:
207
+ SKILL.md ({lines} lines)
208
+ scripts/{files}
209
+ references/{files}
210
+ templates/{files}
211
+ evals/evals.json
212
+
213
+ Validation: Enterprise tier
214
+ Errors: {count}
215
+ Warnings: {count}
216
+ Disclosure Score: {score}/6
217
+ Grade: {letter} ({points}/100)
218
+
219
+ Eval Results:
220
+ Scenarios: {count}
221
+ Passed: {count}/{count}
222
+ Description Accuracy: {percentage}%
223
+
224
+ Usage:
225
+ /{skill-name} {argument-hint}
226
+ or: "{natural language trigger}"
227
+
228
+ ====================================
229
+ ```
230
+
231
+ ## Validation Workflow
232
+
233
+ When the user wants to validate, grade, or audit an existing skill:
234
+
235
+ ### Step V1: Locate the Skill
236
+
237
+ Ask for the SKILL.md path or detect from context. Common locations:
238
+ - `~/.claude/skills/{name}/SKILL.md` (global)
239
+ - `.claude/skills/{name}/SKILL.md` (project)
240
+
241
+ ### Step V2: Run Validator
242
+
243
+ ```bash
244
+ python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {path}/SKILL.md
245
+ ```
246
+
247
+ ### Step V3: Review Grade
248
+
249
+ 100-point rubric across 5 pillars:
250
+
251
+ | Pillar | Max | What It Measures |
252
+ |--------|-----|------------------|
253
+ | Progressive Disclosure | 30 | Token economy, layered structure, navigation |
254
+ | Ease of Use | 25 | Metadata, discoverability, workflow clarity |
255
+ | Utility | 20 | Problem solving, examples, feedback loops |
256
+ | Spec Compliance | 15 | Frontmatter, naming, description quality |
257
+ | Writing Style | 10 | Voice, objectivity, conciseness |
258
+ | Modifiers | +/-5 | Bonuses/penalties for patterns |
259
+
260
+ Grade scale: A (90+), B (80-89), C (70-79), D (60-69), F (<60)
261
+
262
+ See `${CLAUDE_SKILL_DIR}/references/validation-rules.md` for detailed sub-criteria.
263
+
264
+ ### Step V4: Report Results
265
+
266
+ Present the grade report with specific fix recommendations. Prioritize fixes by point value (highest first).
267
+
268
+ ### Step V5: Auto-Fix (if requested)
269
+
270
+ If the user says "fix it" or "auto-fix", apply the suggested improvements:
271
+ 1. Add missing sections (Overview, Prerequisites, Output)
272
+ 2. Add "Use when" / "Trigger with" to description
273
+ 3. Move author/version from metadata to top-level
274
+ 4. Fix path variables to `${CLAUDE_SKILL_DIR}/`
275
+ 5. Re-run grading to confirm improvement
276
+
277
+ ## Running and Evaluating Test Cases
278
+
279
+ For detailed empirical eval workflow (Steps E1-E5), read `${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md`.
280
+
281
+ **Quick summary:** Spawn with-skill and baseline subagents in parallel -> draft assertions while running -> capture timing data from task notifications -> grade with `${CLAUDE_SKILL_DIR}/agents/grader.md` -> aggregate with `scripts/aggregate_benchmark.py` -> launch `eval-viewer/generate_review.py` for interactive human review -> read `feedback.json`.
282
+
283
+ ## Improving the Skill
284
+
285
+ For iteration loop details, read `${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md` (section "Improving the Skill").
286
+
287
+ **Key principles:** Generalize from feedback (don't overfit), keep prompts lean, explain the *why* behind rules (not just prescriptions), and bundle repeated helper scripts.
288
+
289
+ ## Description Optimization (Automated)
290
+
291
+ For the full pipeline (Steps D1-D4), read `${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md` (section "Description Optimization"). Quick summary: generate 20 realistic trigger eval queries -> review with user via `${CLAUDE_SKILL_DIR}/assets/eval_review.html` -> run `python -m scripts.run_loop` (60/40 train/test, 3 runs/query, up to 5 iterations) -> apply `best_description`.
292
+
293
+ ## Advanced: Blind Comparison
294
+
295
+ For A/B testing between skill versions, read `${CLAUDE_SKILL_DIR}/agents/comparator.md` and `${CLAUDE_SKILL_DIR}/agents/analyzer.md`. Optional; most users won't need it.
296
+
297
+ ## Packaging
298
+
299
+ `python -m scripts.package_skill <path/to/skill-folder> [output-directory]` — Creates distributable `.skill` zip after validation.
300
+
301
+ ## Platform-Specific Notes
302
+
303
+ See `${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md` (section "Platform-Specific Notes").
304
+ - **Claude.ai**: No subagents — run tests yourself, skip benchmarking/description optimization.
305
+ - **Cowork**: Full subagent workflow. Use `--static` for eval viewer. Generate viewer BEFORE self-evaluation.
@@ -0,0 +1,27 @@
1
+ # Errors Template
2
+
3
+ Standard troubleshooting reference for all marketplace skills. Every references/errors.md MUST follow this format.
4
+
5
+ ---
6
+
7
+ ```markdown
8
+ # {Skill Name} — Common Errors
9
+
10
+ ## {Error Category 1}
11
+
12
+ | Error | Cause | Fix |
13
+ |-------|-------|-----|
14
+ | {Error message or symptom} | {Root cause} | {Specific fix command or action} |
15
+ | {Error} | {Cause} | {Fix} |
16
+
17
+ ## {Error Category 2}
18
+
19
+ | Error | Cause | Fix |
20
+ |-------|-------|-----|
21
+ | {Error} | {Cause} | {Fix} |
22
+ ```
23
+
24
+ Categories should match the skill's domain. Examples:
25
+ - For GCP skills: Authentication Errors, API Errors, Deployment Errors, Quota Errors
26
+ - For database skills: Connection Errors, Query Errors, Permission Errors, Data Errors
27
+ - For CI/CD skills: Build Errors, Deploy Errors, Configuration Errors, Permission Errors
@@ -0,0 +1,40 @@
1
+ # Examples Template
2
+
3
+ Standard usage examples reference for all marketplace skills. Every references/examples.md MUST follow this format.
4
+
5
+ ---
6
+
7
+ ```markdown
8
+ # {Skill Name} — Usage Examples
9
+
10
+ ## Example 1: {Scenario Title}
11
+
12
+ **Request**: "{What the user asks for}"
13
+
14
+ **What the skill produces**:
15
+
16
+ {File structure, output format, or result description}
17
+
18
+ **Key output** ({filename or artifact}):
19
+ \`\`\`{language}
20
+ {Real code or config — not pseudocode}
21
+ \`\`\`
22
+
23
+ ---
24
+
25
+ ## Example 2: {Scenario Title}
26
+
27
+ {Same format as Example 1}
28
+
29
+ ---
30
+
31
+ ## Example 3: {Scenario Title}
32
+
33
+ {Same format}
34
+ ```
35
+
36
+ Rules:
37
+ - Minimum 3 examples, maximum 6
38
+ - Each example has: scenario title, user request, what's produced, code/config
39
+ - Code must be real and runnable — no pseudocode, no "add your logic here"
40
+ - Examples should cover different use cases (simple → complex)