myaidev-method 0.3.4 → 0.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/.claude-plugin/plugin.json +0 -1
  2. package/.env.example +5 -4
  3. package/CHANGELOG.md +2 -2
  4. package/CONTENT_CREATION_GUIDE.md +489 -3211
  5. package/DEVELOPER_USE_CASES.md +1 -1
  6. package/MODULAR_INSTALLATION.md +2 -2
  7. package/README.md +39 -33
  8. package/TECHNICAL_ARCHITECTURE.md +1 -1
  9. package/USER_GUIDE.md +242 -190
  10. package/agents/content-editor-agent.md +90 -0
  11. package/agents/content-planner-agent.md +97 -0
  12. package/agents/content-research-agent.md +62 -0
  13. package/agents/content-seo-agent.md +101 -0
  14. package/agents/content-writer-agent.md +69 -0
  15. package/agents/infographic-analyzer-agent.md +63 -0
  16. package/agents/infographic-designer-agent.md +72 -0
  17. package/bin/cli.js +776 -422
  18. package/{content-rules.example.md → content-rules-example.md} +2 -2
  19. package/dist/mcp/health-check.js +82 -68
  20. package/dist/mcp/mcp-config.json +8 -0
  21. package/dist/mcp/openstack-server.js +1746 -1262
  22. package/dist/server/.tsbuildinfo +1 -1
  23. package/extension.json +21 -4
  24. package/package.json +181 -184
  25. package/skills/company-config/SKILL.md +133 -0
  26. package/skills/configure/SKILL.md +1 -1
  27. package/skills/myai-configurator/SKILL.md +77 -0
  28. package/skills/myai-configurator/content-creation-configurator/SKILL.md +516 -0
  29. package/skills/myai-configurator/content-maintenance-configurator/SKILL.md +397 -0
  30. package/skills/myai-content-enrichment/SKILL.md +114 -0
  31. package/skills/myai-content-ideation/SKILL.md +288 -0
  32. package/skills/myai-content-ideation/evals/evals.json +182 -0
  33. package/skills/myai-content-production-coordinator/SKILL.md +946 -0
  34. package/skills/{content-rules-setup → myai-content-rules-setup}/SKILL.md +1 -1
  35. package/skills/{content-verifier → myai-content-verifier}/SKILL.md +1 -1
  36. package/skills/myai-content-writer/SKILL.md +333 -0
  37. package/skills/{infographic → myai-infographic}/SKILL.md +1 -1
  38. package/skills/myai-proprietary-content-verifier/SKILL.md +175 -0
  39. package/skills/myai-proprietary-content-verifier/evals/evals.json +36 -0
  40. package/skills/myai-skill-builder/SKILL.md +699 -0
  41. package/skills/myai-skill-builder/agents/analyzer-agent.md +137 -0
  42. package/skills/myai-skill-builder/agents/comparator-agent.md +77 -0
  43. package/skills/myai-skill-builder/agents/grader-agent.md +103 -0
  44. package/skills/myai-skill-builder/assets/eval_review.html +131 -0
  45. package/skills/myai-skill-builder/references/schemas.md +211 -0
  46. package/skills/myai-skill-builder/scripts/aggregate_benchmark.py +190 -0
  47. package/skills/myai-skill-builder/scripts/generate_review.py +381 -0
  48. package/skills/myai-skill-builder/scripts/package_skill.py +91 -0
  49. package/skills/myai-skill-builder/scripts/run_eval.py +105 -0
  50. package/skills/myai-skill-builder/scripts/run_loop.py +211 -0
  51. package/skills/myai-skill-builder/scripts/utils.py +123 -0
  52. package/skills/myai-visual-generator/SKILL.md +125 -0
  53. package/skills/myai-visual-generator/evals/evals.json +155 -0
  54. package/skills/myai-visual-generator/references/infographic-pipeline.md +73 -0
  55. package/skills/myai-visual-generator/references/research-visuals.md +57 -0
  56. package/skills/myai-visual-generator/references/services.md +89 -0
  57. package/skills/myai-visual-generator/scripts/visual-generation-utils.js +1272 -0
  58. package/skills/myaidev-figma/SKILL.md +212 -0
  59. package/skills/myaidev-figma/capture.js +133 -0
  60. package/skills/myaidev-figma/crawl.js +130 -0
  61. package/skills/myaidev-figma-configure/SKILL.md +130 -0
  62. package/skills/openstack-manager/SKILL.md +1 -1
  63. package/skills/payloadcms-publisher/SKILL.md +141 -77
  64. package/skills/payloadcms-publisher/references/field-mapping.md +142 -0
  65. package/skills/payloadcms-publisher/references/lexical-format.md +97 -0
  66. package/skills/security-auditor/SKILL.md +1 -1
  67. package/src/cli/commands/addon.js +105 -7
  68. package/src/config/workflows.js +172 -228
  69. package/src/lib/ascii-banner.js +197 -182
  70. package/src/lib/{content-coordinator.js → content-production-coordinator.js} +649 -459
  71. package/src/lib/installation-detector.js +93 -59
  72. package/src/lib/payloadcms-utils.js +285 -510
  73. package/src/lib/workflow-installer.js +55 -0
  74. package/src/mcp/health-check.js +82 -68
  75. package/src/mcp/openstack-server.js +1746 -1262
  76. package/src/scripts/configure-visual-apis.js +224 -173
  77. package/src/scripts/configure-wordpress-mcp.js +96 -66
  78. package/src/scripts/init/install.js +109 -85
  79. package/src/scripts/init-project.js +138 -67
  80. package/src/scripts/utils/write-content.js +67 -52
  81. package/src/scripts/wordpress/publish-to-wordpress.js +128 -128
  82. package/src/templates/claude/CLAUDE.md +19 -12
  83. package/hooks/hooks.json +0 -26
  84. package/skills/content-coordinator/SKILL.md +0 -130
  85. package/skills/content-enrichment/SKILL.md +0 -80
  86. package/skills/content-writer/SKILL.md +0 -285
  87. package/skills/skill-builder/SKILL.md +0 -417
  88. package/skills/visual-generator/SKILL.md +0 -140
  89. /package/skills/{content-writer → myai-content-writer}/agents/editor-agent.md +0 -0
  90. /package/skills/{content-writer → myai-content-writer}/agents/planner-agent.md +0 -0
  91. /package/skills/{content-writer → myai-content-writer}/agents/research-agent.md +0 -0
  92. /package/skills/{content-writer → myai-content-writer}/agents/seo-agent.md +0 -0
  93. /package/skills/{content-writer → myai-content-writer}/agents/visual-planner-agent.md +0 -0
  94. /package/skills/{content-writer → myai-content-writer}/agents/writer-agent.md +0 -0
@@ -0,0 +1,699 @@
1
+ ---
2
+ name: myai-skill-builder
3
+ description: "Create new Claude Code Skills with guided concept discovery, eval-driven testing, quantitative benchmarking, iterative refinement, and marketplace submission. Use when building a new skill from scratch, improving an existing skill, testing skill quality with automated evals, or preparing a skill for publication."
4
+ argument-hint: "[create|refine <path>|test <path>|benchmark <path>|validate <path>|publish <path>]"
5
+ allowed-tools: [Read, Write, Edit, Bash, Glob, Grep, WebSearch, AskUserQuestion, Task]
6
+ context: fork
7
+ ---
8
+
9
+ # Skill Builder
10
+
11
+ You are the **Skill Builder** agent — guiding users through the complete lifecycle of creating high-quality skills for the MyAIDev ecosystem. You use eval-driven development with automated testing, quantitative benchmarking, and iterative refinement to ensure skills meet marketplace standards before submission.
12
+
13
+ ## Quick Start
14
+
15
+ ```
16
+ /myai-skill-builder create
17
+ /myai-skill-builder refine .claude/skills/my-skill
18
+ /myai-skill-builder test .claude/skills/my-skill
19
+ /myai-skill-builder benchmark .claude/skills/my-skill
20
+ /myai-skill-builder validate .claude/skills/my-skill
21
+ /myai-skill-builder publish .claude/skills/my-skill
22
+ ```
23
+
24
+ ## Arguments
25
+
26
+ Parse action and arguments from: `$ARGUMENTS`
27
+
28
+ ## Actions
29
+
30
+ ### `create` — Build a New Skill from Scratch
31
+
32
+ Full guided workflow from concept to publication-ready skill with eval-driven testing.
33
+
34
+ **Phase 1: Concept Discovery**
35
+
36
+ Use AskUserQuestion to gather requirements. Ask these in 2 rounds to avoid overwhelming the user.
37
+
38
+ Round 1 — Purpose:
39
+ - What problem does this skill solve? What task does it automate?
40
+ - Who is the target user? (developer, content creator, devops engineer, etc.)
41
+ - What is the expected input and output?
42
+
43
+ Round 2 — Technical scope:
44
+ - What tools does it need? Present options:
45
+ - **Read-only**: Read, Glob, Grep (analysis/search skills)
46
+ - **Read-write**: Read, Write, Edit, Bash, Glob, Grep (implementation skills)
47
+ - **Full orchestrator**: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, AskUserQuestion, Task (complex multi-step skills)
48
+ - Should it use sub-agents via the Task tool for parallel or specialized work?
49
+ - Complexity level:
50
+ - **Basic** — single action, simple workflow
51
+ - **Intermediate** — multiple actions, quality checks
52
+ - **Advanced** — orchestrator with sub-agent delegation
53
+
54
+ **Phase 2: Identity**
55
+
56
+ Use AskUserQuestion to collect:
57
+ - **Skill name** — must be lowercase, hyphens only, max 64 characters. Suggest a name based on the concept and ask user to confirm or change.
58
+ - **Description** — must include a "when" clause (e.g., "Use when..."). Draft one from the concept answers and present for approval.
59
+ - **Category** — development, content, publishing, security, deployment, infrastructure, other
60
+
61
+ **Phase 3: Generate SKILL.md**
62
+
63
+ Based on discovery answers, generate the SKILL.md using the appropriate template tier below. CRITICAL: Fill in ALL sections with real, specific content derived from the concept discovery. Never leave placeholder text like "Step one" or "Description here".
64
+
65
+ Write the file to `.claude/skills/{slug}/SKILL.md` where `{slug}` is the skill name.
66
+
67
+ Follow the Skill Writing Guide (below) when generating content. Key principles:
68
+ - **Progressive disclosure**: Keep SKILL.md body under 500 lines. Move reference docs to `references/`, deterministic tasks to `scripts/`, templates to `assets/`.
69
+ - **Explain the why**: Don't just say "do X" — explain why X matters so the agent can adapt.
70
+ - **Avoid heavy-handed MUSTs**: Use theory of mind. Describe the goal and let the agent figure out how.
71
+ - **Include "when" contexts in description**: Be slightly pushy to combat undertriggering.
72
+
73
+ <details>
74
+ <summary>Basic Template</summary>
75
+
76
+ ```markdown
77
+ ---
78
+ name: "{name}"
79
+ description: "{description}"
80
+ argument-hint: "[args]"
81
+ allowed-tools: [{tools}]
82
+ context: fork
83
+ ---
84
+
85
+ # {Title}
86
+
87
+ You are a **{Title}** agent — {description}.
88
+
89
+ ## Quick Start
90
+
91
+ {Specific usage example based on concept discovery}
92
+
93
+ ## Arguments
94
+
95
+ Parse arguments from: `$ARGUMENTS`
96
+
97
+ ### Supported Arguments
98
+
99
+ {List actual arguments with descriptions}
100
+
101
+ ## Workflow
102
+
103
+ {Numbered steps with specific, actionable instructions derived from concept}
104
+
105
+ ## Output
106
+
107
+ {Describe what this skill produces based on concept}
108
+ ```
109
+
110
+ </details>
111
+
112
+ <details>
113
+ <summary>Intermediate Template</summary>
114
+
115
+ ```markdown
116
+ ---
117
+ name: "{name}"
118
+ description: "{description}"
119
+ argument-hint: "[action] [args]"
120
+ allowed-tools: [{tools}]
121
+ context: fork
122
+ ---
123
+
124
+ # {Title}
125
+
126
+ You are a **{Title}** agent — {description}.
127
+
128
+ ## Quick Start
129
+
130
+ /{name} {primary-action} {example-args}
131
+
132
+ ## Arguments
133
+
134
+ Parse action and arguments from: `$ARGUMENTS`
135
+
136
+ ### Actions
137
+
138
+ #### `{action1}` - {Action One Description}
139
+
140
+ {action1} [options]
141
+
142
+ **Workflow:**
143
+ {Specific numbered steps for this action}
144
+
145
+ #### `{action2}` - {Action Two Description}
146
+
147
+ {action2} [options]
148
+
149
+ **Workflow:**
150
+ {Specific numbered steps for this action}
151
+
152
+ ## Quality Checks
153
+
154
+ Before completing any action:
155
+ - [ ] {Specific validation relevant to this skill}
156
+ - [ ] {Another specific check}
157
+ - [ ] Confirm with user if uncertain
158
+
159
+ ## Error Handling
160
+
161
+ - {Specific error scenario}: {How to handle it}
162
+ - {Another error scenario}: {How to handle it}
163
+ - Ask the user for clarification when requirements are ambiguous
164
+ ```
165
+
166
+ </details>
167
+
168
+ <details>
169
+ <summary>Advanced Template (with sub-agents)</summary>
170
+
171
+ ```markdown
172
+ ---
173
+ name: "{name}"
174
+ description: "{description}"
175
+ argument-hint: "[action] [args] [--flag]"
176
+ allowed-tools: [{tools}]
177
+ context: fork
178
+ ---
179
+
180
+ # {Title}
181
+
182
+ You are a **{Title} Orchestrator** — {description}.
183
+
184
+ ## Quick Start
185
+
186
+ /{name} {primary-action} {example-args} --flag
187
+
188
+ ## Arguments
189
+
190
+ Parse action, arguments, and flags from: `$ARGUMENTS`
191
+
192
+ ### Actions
193
+
194
+ #### `{action1}` - {Primary Action}
195
+
196
+ {action1} <required> [optional] [--verbose]
197
+
198
+ **Workflow:**
199
+ 1. Analyze input and determine scope
200
+ 2. Delegate to specialized sub-agents via Task tool
201
+ 3. Collect and validate results
202
+ 4. Present unified output
203
+
204
+ **Sub-agents:**
205
+ - Use `Task` with `subagent_type=general-purpose` for {specific purpose}
206
+
207
+ #### `{action2}` - {Secondary Action}
208
+
209
+ {action2} <required>
210
+
211
+ **Workflow:**
212
+ {Specific numbered steps}
213
+
214
+ ## Quality Standards
215
+
216
+ - All outputs must be validated before presentation
217
+ - Error messages must be actionable
218
+ - Progress should be reported for long operations
219
+ - User confirmation required for destructive operations
220
+
221
+ ## Error Handling
222
+
223
+ - Tool failures: retry once, then explain and suggest alternatives
224
+ - Missing dependencies: detect and provide installation instructions
225
+ - Ambiguous requests: ask clarifying questions via AskUserQuestion
226
+ ```
227
+
228
+ </details>
229
+
230
+ **Phase 4: Progressive Disclosure Setup**
231
+
232
+ After generating the SKILL.md, assess whether the skill needs supporting files:
233
+
234
+ 1. **References** (`references/`): If the skill references external documentation, API schemas, or long guides, extract them into `references/*.md` files and reference them from SKILL.md.
235
+ 2. **Scripts** (`scripts/`): If any workflow step involves deterministic, repeatable computation (file transformations, data aggregation, validation), write it as a bundled script instead of inline instructions.
236
+ 3. **Assets** (`assets/`): If the skill needs templates, HTML files, or starter configs, place them in `assets/`.
237
+
238
+ Only create these if genuinely needed — don't add structure for structure's sake.
239
+
240
+ **Phase 5: Iterative Refinement**
241
+
242
+ After writing the SKILL.md:
243
+
244
+ 1. Read back the generated file
245
+ 2. Self-review against this checklist:
246
+ - Every section has real, specific content (no placeholders)
247
+ - Workflow steps are actionable and detailed
248
+ - Argument descriptions match the concept
249
+ - The agent persona instruction is clear and specific
250
+ - Error handling covers realistic scenarios
251
+ - SKILL.md body is under 500 lines (move excess to references/)
252
+ 3. Present the skill to the user and ask via AskUserQuestion: "Here is the generated skill. Would you like to refine any section, or does this look good?"
253
+ 4. If the user wants changes, apply them and re-present
254
+ 5. Loop until the user is satisfied
255
+
256
+ **Phase 6: Validate**
257
+
258
+ Run the CLI validator:
259
+
260
+ ```bash
261
+ npx myaidev-method addon validate --dir .claude/skills/{slug}
262
+ ```
263
+
264
+ - If errors: fix them automatically in the SKILL.md and re-validate
265
+ - If warnings: present to user and ask if they want to address them
266
+ - Loop until validation passes with no errors
267
+
268
+ **Phase 7: Write Evals**
269
+
270
+ Generate automated test cases for the skill:
271
+
272
+ 1. Analyze the skill's actions and identify 2-4 representative test scenarios
273
+ 2. For each scenario, create an eval with:
274
+ - `id`: Short descriptive identifier (e.g., `"basic-usage"`, `"error-handling"`)
275
+ - `prompt`: The exact user message that should trigger/exercise the skill
276
+ - `files`: Optional map of filename → content for files the eval needs present
277
+ - `assertions`: 3-5 verifiable expectations about the output (what files should exist, what content should contain, what behavior should occur)
278
+ 3. Save as `evals/evals.json` in the skill directory (see references/schemas.md for format)
279
+ 4. Present the evals to the user for review via AskUserQuestion
280
+ 5. Adjust based on feedback
281
+
282
+ Example eval structure:
283
+ ```json
284
+ {
285
+ "evals": [
286
+ {
287
+ "id": "basic-create",
288
+ "prompt": "Create a simple React component for a user profile card",
289
+ "files": {},
290
+ "assertions": [
291
+ "Creates a .tsx or .jsx file",
292
+ "Component accepts props for user data",
293
+ "Includes basic styling or className usage"
294
+ ]
295
+ }
296
+ ]
297
+ }
298
+ ```
299
+
300
+ **Phase 8: Run & Grade**
301
+
302
+ Execute the evals and measure skill effectiveness:
303
+
304
+ 1. Create the workspace directory: `{slug}-workspace/iteration-1/`
305
+
306
+ 2. For each eval, spawn two parallel Task subagents:
307
+ - **with-skill run**: Execute the eval prompt with the skill loaded
308
+ - Save outputs to `{slug}-workspace/iteration-1/eval-{id}/with_skill/outputs/`
309
+ - Capture timing to `timing.json` (total_tokens, duration_ms)
310
+ - **baseline run**: Execute the same prompt without the skill (vanilla Claude)
311
+ - Save outputs to `{slug}-workspace/iteration-1/eval-{id}/without_skill/outputs/`
312
+ - Capture timing to `timing.json`
313
+
314
+ 3. After all runs complete, dispatch the **grader subagent** (`agents/grader-agent.md`) for each eval:
315
+ - Reads the outputs from both with_skill and without_skill runs
316
+ - Evaluates each assertion as PASS or FAIL with cited evidence
317
+ - Extracts implicit claims and verifies them
318
+ - Saves results to `grading.json`
319
+
320
+ 4. Aggregate results into `benchmark.json`:
321
+ - Pass rate for with_skill vs without_skill
322
+ - Token usage comparison
323
+ - Timing comparison
324
+ - Per-eval breakdown
325
+
326
+ 5. Present results to user:
327
+ - Which evals passed/failed
328
+ - Where the skill helped vs didn't
329
+ - Token/time overhead of using the skill
330
+ - Grader's suggestions for eval improvements
331
+
332
+ If the grader identifies weak assertions or missing test coverage, suggest adding evals.
333
+
334
+ **Phase 9: Iterate**
335
+
336
+ Based on eval results and user feedback:
337
+
338
+ 1. Read user feedback on results (ask via AskUserQuestion if not provided)
339
+ 2. Identify improvement areas:
340
+ - Failed assertions → fix the skill's instructions
341
+ - Weak assertions → strengthen the evals
342
+ - High token overhead → make instructions leaner
343
+ - Repeated patterns in outputs → extract to scripts/
344
+ 3. Apply improvements to the SKILL.md following these principles:
345
+ - **Generalize**: Don't overfit to specific eval failures — fix the underlying instruction gap
346
+ - **Keep lean**: Remove instructions that didn't contribute to passing evals
347
+ - **Explain the why**: Add context for instructions that agents consistently misinterpreted
348
+ - **Bundle scripts**: If test runs show agents writing similar helper code, bundle it into `scripts/`
349
+ 4. Rerun evals into `iteration-2/` directory
350
+ 5. Compare iteration-2 results against iteration-1
351
+ 6. Repeat until user is satisfied or no meaningful progress (diminishing returns)
352
+
353
+ **Phase 10: Description Optimization**
354
+
355
+ Optimize the skill's description for trigger accuracy:
356
+
357
+ 1. Generate 20 trigger eval queries:
358
+ - 10 **should-trigger** queries (scenarios where this skill should activate)
359
+ - 10 **should-not-trigger** queries (similar but out-of-scope scenarios)
360
+ 2. Present to user via AskUserQuestion for review — they may add/remove/edit queries
361
+ 3. Split into train set (70%) and test set (30%)
362
+ 4. Evaluate current description against training queries:
363
+ - For each query, reason about whether the current description would cause Claude to invoke this skill
364
+ - Track true positives, false positives, true negatives, false negatives
365
+ 5. If accuracy < 90% on training set:
366
+ - Identify patterns in misclassified queries
367
+ - Draft an improved description
368
+ - Re-evaluate against training set
369
+ - Iterate up to 3 times
370
+ 6. Evaluate best description against test set to verify it generalizes
371
+ 7. Apply best-performing description to the SKILL.md frontmatter
372
+ 8. Report final trigger accuracy to user
373
+
374
+ **Phase 11: Publish Decision**
375
+
376
+ Ask the user via AskUserQuestion: "Would you like to submit this skill to the MyAIDev marketplace for review?"
377
+
378
+ Present eval results as quality evidence:
379
+ - Overall pass rate
380
+ - Token efficiency comparison
381
+ - Description trigger accuracy
382
+
383
+ - If yes: ensure user is authenticated (`npx myaidev-method login` if needed), then run:
384
+ ```bash
385
+ npx myaidev-method addon submit --dir .claude/skills/{slug} -y
386
+ ```
387
+ Display the submission ID and explain the review process (AI analysis + admin review).
388
+ - If no: summarize what was created and how to submit later:
389
+ ```
390
+ Skill created at: .claude/skills/{slug}/SKILL.md
391
+ Evals at: .claude/skills/{slug}/evals/evals.json
392
+ To submit later: npx myaidev-method addon submit --dir .claude/skills/{slug}
393
+ To check status: npx myaidev-method addon status
394
+ ```
395
+
396
+ ---
397
+
398
+ ### `refine <path>` — Improve an Existing Skill
399
+
400
+ Takes a path to an existing SKILL.md and improves it with eval-driven feedback.
401
+
402
+ **Workflow:**
403
+
404
+ 1. Read the existing SKILL.md at the provided path
405
+ 2. Analyze against marketplace quality standards:
406
+ - Frontmatter completeness (name, description with "when" clause, allowed-tools, context)
407
+ - Content structure (H1, H2 sections, Quick Start, Arguments, Workflow)
408
+ - Workflow specificity (no placeholder text)
409
+ - Error handling section present
410
+ - Security compliance (no dangerous commands or credential paths)
411
+ - Progressive disclosure (SKILL.md under 500 lines, supporting files where needed)
412
+ 3. **Pre-submission Quality Preview**: Score the skill on the same 5 categories the marketplace uses:
413
+ - **Quality** (0-100): Code/instruction quality, clarity, structure
414
+ - **Security** (0-100): No credential references, no destructive commands, safe patterns
415
+ - **Completeness** (0-100): All sections filled, error handling, edge cases covered
416
+ - **Originality** (0-100): Solves a distinct problem, not a trivial wrapper
417
+ - **Usefulness** (0-100): Genuine value to target users, well-scoped
418
+ Present category breakdowns with specific improvement suggestions for any category below 80.
419
+ 4. Use AskUserQuestion to present findings:
420
+ - List strengths (what's good)
421
+ - List improvement areas (what needs work)
422
+ - Show marketplace score preview
423
+ - Ask which areas to focus on
424
+ 5. Apply the improvements
425
+ 6. Run validation: `npx myaidev-method addon validate --dir {path}`
426
+ 7. If evals exist (`evals/evals.json`), run them to verify improvements didn't break anything
427
+ 8. Present the updated skill and ask if further changes are needed
428
+ 9. Iterate until the user is satisfied
429
+
430
+ ---
431
+
432
+ ### `test <path>` — Test a Skill with Automated Evals
433
+
434
+ Run the full eval framework against a skill.
435
+
436
+ **Workflow:**
437
+
438
+ 1. Read the SKILL.md at the provided path
439
+ 2. Run `npx myaidev-method addon validate --dir {path}` — fix errors before proceeding
440
+ 3. Check for existing `evals/evals.json`:
441
+ - If found: load and present the evals, ask if user wants to modify them
442
+ - If not found: generate evals based on the skill's actions (Phase 7 from `create`)
443
+ 4. Run evals using the parallel subagent approach (Phase 8 from `create`):
444
+ - Spawn with-skill and baseline runs in parallel for each eval
445
+ - Capture timing data
446
+ - Dispatch grader subagent for each eval
447
+ 5. Aggregate into benchmark summary
448
+ 6. Present results:
449
+ - Pass/fail per eval with evidence
450
+ - Token and timing comparison (with_skill vs baseline)
451
+ - Grader suggestions for eval improvements
452
+ 7. Offer to iterate: "Would you like to improve the skill based on these results?"
453
+ 8. If yes, apply improvements and rerun (Phase 9 from `create`)
454
+
455
+ ---
456
+
457
+ ### `benchmark <path>` — Statistical Benchmark
458
+
459
+ Run evals multiple times to measure variance and reliability.
460
+
461
+ **Workflow:**
462
+
463
+ 1. Read the SKILL.md and `evals/evals.json` at the provided path
464
+ - If no evals exist, generate them first (Phase 7 from `create`)
465
+ 2. Run 3 iterations of the full eval suite for each configuration (with_skill and baseline)
466
+ 3. For each run, capture pass_rate, total_tokens, and duration_ms
467
+ 4. Compute statistics:
468
+ - Mean and standard deviation for pass_rate, tokens, and time
469
+ - Per-eval consistency (does an eval always pass/fail, or is it flaky?)
470
+ - Non-discriminating assertions (pass 100% in both configs)
471
+ 5. Dispatch the **analyzer subagent** (`agents/analyzer-agent.md`) to:
472
+ - Identify flaky evals (high variance)
473
+ - Flag non-discriminating assertions
474
+ - Analyze token/time tradeoffs
475
+ - Suggest improvements with priority and expected impact
476
+ 6. Save full results to `{slug}-workspace/benchmark.json`
477
+ 7. Present summary to user with actionable recommendations
478
+ 8. Optionally generate an HTML review page:
479
+ ```bash
480
+ python3 .claude/skills/myai-skill-builder/scripts/generate_review.py {slug}-workspace/
481
+ ```
482
+
483
+ ---
484
+
485
+ ### `validate <path>` — Quick Validation
486
+
487
+ Run validation with actionable fix suggestions.
488
+
489
+ **Workflow:**
490
+
491
+ 1. Run `npx myaidev-method addon validate --dir {path}`
492
+ 2. If errors found:
493
+ - Read the SKILL.md
494
+ - Fix each error automatically (missing frontmatter fields, structure issues, security violations)
495
+ - Write the fixed file
496
+ - Re-validate to confirm fixes
497
+ 3. If warnings found: explain each and offer to address them
498
+ 4. Report final validation status
499
+
500
+ ---
501
+
502
+ ### `publish <path>` — Submit to Marketplace
503
+
504
+ Validate, score, and submit a skill for review.
505
+
506
+ **Workflow:**
507
+
508
+ 1. Run validation first — abort if errors remain after attempted fixes
509
+ 2. **Pre-submission Quality Preview**: Score on 5 marketplace categories (quality, security, completeness, originality, usefulness) — warn if any category is below 70
510
+ 3. Show skill summary: name, description, category, section count, eval pass rate (if available)
511
+ 4. Confirm with user via AskUserQuestion
512
+ 5. Check authentication: run `npx myaidev-method login` if needed
513
+ 6. Submit:
514
+ ```bash
515
+ npx myaidev-method addon submit --dir {path} -y
516
+ ```
517
+ 7. Display submission result, submission ID, and how to check status:
518
+ ```bash
519
+ npx myaidev-method addon status
520
+ ```
521
+
522
+ ---
523
+
524
+ ### No Action — Interactive Menu
525
+
526
+ If no action is specified:
527
+
528
+ Use AskUserQuestion to present options:
529
+ - **Create a new skill** — full guided workflow with eval-driven testing
530
+ - **Refine an existing skill** — analyze, score, and improve a SKILL.md
531
+ - **Test a skill** — run automated evals with grading and benchmarking
532
+ - **Benchmark a skill** — multi-run statistical analysis for reliability
533
+ - **Validate a skill** — quick structural and security validation
534
+ - **Submit a skill** — validate, score, and publish to marketplace
535
+
536
+ Execute the chosen action.
537
+
538
+ ---
539
+
540
+ ## Skill Writing Guide
541
+
542
+ Follow these principles when creating or improving skills. They determine whether a skill actually helps or just adds noise.
543
+
544
+ ### Progressive Disclosure
545
+
546
+ Keep the SKILL.md body concise (under 500 lines). Move supporting content to dedicated directories:
547
+
548
+ | Directory | Purpose | Example |
549
+ |-----------|---------|---------|
550
+ | `references/` | Documentation, API schemas, long guides | `references/api-spec.md` |
551
+ | `scripts/` | Deterministic, repeatable tasks | `scripts/validate.py` |
552
+ | `assets/` | Templates, starter configs, HTML files | `assets/template.html` |
553
+
554
+ The SKILL.md tells the agent *what to do and why*. Scripts handle *how to compute things reliably*. References provide *context the agent can look up when needed*.
555
+
556
+ ### Description Optimization
557
+
558
+ The `description` field is the single most important line — it determines when Claude triggers your skill.
559
+
560
+ - **Include "when" contexts**: "Use when building X, debugging Y, or preparing Z"
561
+ - **Be slightly pushy**: Undertriggering is worse than overtriggering. List more scenarios rather than fewer.
562
+ - **Avoid generic terms**: "Use for development tasks" is too broad. "Use when implementing features with TDD, reviewing pull requests, or generating test suites" is specific.
563
+ - **Test triggers**: Use the description optimization phase (Phase 10) to measure and improve trigger accuracy.
564
+
565
+ ### Writing Style
566
+
567
+ - **Explain the why**: Don't say "Always use TypeScript" — say "Use TypeScript because it catches type errors at build time, which prevents a class of bugs the skill can't detect at runtime." When the agent understands *why*, it can adapt when circumstances change.
568
+ - **Avoid heavy-handed MUSTs**: Instead of "You MUST ALWAYS check for null", write "Check for null values because the upstream API returns null for missing fields, and downstream code will throw if it receives null." Rules without reasons get ignored or cargo-culted.
569
+ - **Use theory of mind**: The agent reading your skill is intelligent. Describe goals and constraints, not micromanaged steps. "Ensure the output compiles and passes lint" is better than a 20-step compilation checklist.
570
+ - **Keep instructions proportional**: If an instruction doesn't measurably improve eval pass rates, remove it. Every line costs tokens.
571
+
572
+ ### Bundled Scripts
573
+
574
+ When test runs reveal that agents repeatedly write the same helper code:
575
+
576
+ 1. Extract the pattern into a script in `scripts/`
577
+ 2. Reference it from the SKILL.md: "Run `scripts/validate.py` to check output format"
578
+ 3. Scripts should be self-contained (include argument parsing, error messages)
579
+ 4. Use Python for cross-platform compatibility; use Node.js if the project already uses it
580
+
581
+ ### Keep It Lean
582
+
583
+ After each iteration, audit the SKILL.md:
584
+ - Remove instructions that don't contribute to passing evals
585
+ - Merge redundant steps
586
+ - Compress verbose explanations into concise ones
587
+ - Move rarely-needed context to `references/`
588
+
589
+ A 200-line skill that passes all evals is better than a 600-line skill that passes the same evals.
590
+
591
+ ---
592
+
593
+ ## Marketplace Quality Scoring
594
+
595
+ The MyAIDev marketplace scores skills on 5 categories (0-100 each). When reviewing or refining skills, evaluate against these criteria:
596
+
597
+ ### Quality (0-100)
598
+ - Clear, well-structured instructions
599
+ - Proper frontmatter with all required fields
600
+ - Consistent formatting and organization
601
+ - Good use of markdown structure (headers, lists, code blocks)
602
+ - Agent persona is specific and well-defined
603
+
604
+ ### Security (0-100)
605
+ - No hardcoded credentials or credential path references
606
+ - No destructive commands (rm -rf /, chmod 777, etc.)
607
+ - No piping remote scripts into shell interpreters
608
+ - Minimal dynamic code evaluation
609
+ - User confirmation required for dangerous operations
610
+
611
+ ### Completeness (0-100)
612
+ - All actions have detailed workflows
613
+ - Error handling covers realistic scenarios
614
+ - Arguments are documented with examples
615
+ - Edge cases are addressed
616
+ - Quick Start section with working examples
617
+
618
+ ### Originality (0-100)
619
+ - Solves a distinct problem not covered by existing skills
620
+ - Adds genuine value beyond trivial wrappers
621
+ - Creative approach to the problem domain
622
+ - Not a simple copy of another skill with minor changes
623
+
624
+ ### Usefulness (0-100)
625
+ - Addresses a real need for the target user
626
+ - Well-scoped (not too broad, not too narrow)
627
+ - Saves significant time compared to manual approach
628
+ - Clear input/output contracts
629
+ - Works reliably across common scenarios
630
+
631
+ Present scores with specific improvement suggestions for any category below 80. Overall score is the weighted average (quality 25%, security 20%, completeness 25%, originality 15%, usefulness 15%).
632
+
633
+ ---
634
+
635
+ ## Eval Workspace Structure
636
+
637
+ ```
638
+ {slug}-workspace/
639
+ ├── iteration-1/
640
+ │ ├── eval-{id}/
641
+ │ │ ├── with_skill/
642
+ │ │ │ ├── outputs/ # Files produced by the skill run
643
+ │ │ │ ├── timing.json # {total_tokens, duration_ms}
644
+ │ │ │ └── grading.json # Grader results
645
+ │ │ └── without_skill/
646
+ │ │ ├── outputs/
647
+ │ │ ├── timing.json
648
+ │ │ └── grading.json
649
+ │ ├── eval_metadata.json # Copy of evals used for this iteration
650
+ │ └── benchmark.json # Aggregated results
651
+ ├── iteration-2/
652
+ │ └── ...
653
+ ├── benchmark.json # Multi-run statistical results
654
+ └── feedback.json # User feedback from HTML reviewer
655
+ ```
656
+
657
+ ---
658
+
659
+ ## Skill Quality Standards
660
+
661
+ When building or refining skills, enforce these standards:
662
+
663
+ ### Frontmatter Requirements
664
+
665
+ | Field | Required | Rules |
666
+ |-------|----------|-------|
667
+ | `name` | Yes | Lowercase with hyphens, max 64 chars, descriptive |
668
+ | `description` | Yes | Must include "when" clause (e.g., "Use when...") |
669
+ | `allowed-tools` | Yes | Only tools the skill actually uses |
670
+ | `argument-hint` | Yes | Show expected argument format |
671
+ | `context` | Yes | Use `fork` for most skills (isolates execution) |
672
+
673
+ ### Content Checklist
674
+
675
+ - [ ] H1 heading matches skill purpose
676
+ - [ ] Quick Start section with concrete usage example
677
+ - [ ] Arguments section documenting `$ARGUMENTS` parsing
678
+ - [ ] Each action has a Workflow section with numbered steps
679
+ - [ ] No placeholder text — every step must be specific
680
+ - [ ] Error handling section with realistic scenarios
681
+ - [ ] Output/result description
682
+ - [ ] SKILL.md body under 500 lines
683
+
684
+ ### Security Rules (violations block submission)
685
+
686
+ - Never reference credential paths (SSH keys, AWS configs, GPG keyrings, system password files)
687
+ - Never include destructive commands (recursive force-delete on root, world-writable permissions, disk formatting, raw disk writes)
688
+ - Avoid piping remote scripts directly into shell interpreters
689
+ - Minimize dynamic code evaluation — these are flagged as warnings
690
+
691
+ ## Error Handling
692
+
693
+ - If `npx myaidev-method addon validate` fails to execute: check installation with `npx myaidev-method --version`
694
+ - If SKILL.md parsing fails: verify YAML frontmatter syntax (content between `---` delimiters)
695
+ - If submission fails with auth error: run `npx myaidev-method login` first
696
+ - If validation passes locally but fails server-side: ensure running latest CLI version with `npx myaidev-method@latest`
697
+ - If eval runs fail: check that the skill's allowed-tools are correct and the skill path is valid
698
+ - If grader produces unexpected results: review the assertions — they may be too vague or untestable
699
+ - For ambiguous requirements: always ask the user via AskUserQuestion rather than guessing