npm - myaidev-method - Versions diffs - 0.3.3 → 0.3.5 - Mend

myaidev-method 0.3.3 → 0.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (132) hide show

package/.claude-plugin/plugin.json +0 -1
package/.env.example +5 -4
package/CHANGELOG.md +2 -2
package/CONTENT_CREATION_GUIDE.md +489 -3211
package/DEVELOPER_USE_CASES.md +1 -1
package/MODULAR_INSTALLATION.md +2 -2
package/README.md +39 -33
package/TECHNICAL_ARCHITECTURE.md +1 -1
package/USER_GUIDE.md +242 -190
package/agents/content-editor-agent.md +90 -0
package/agents/content-planner-agent.md +97 -0
package/agents/content-research-agent.md +62 -0
package/agents/content-seo-agent.md +101 -0
package/agents/content-writer-agent.md +69 -0
package/agents/infographic-analyzer-agent.md +63 -0
package/agents/infographic-designer-agent.md +72 -0
package/bin/cli.js +777 -535
package/{content-rules.example.md → content-rules-example.md} +2 -2
package/dist/mcp/health-check.js +82 -68
package/dist/mcp/mcp-config.json +8 -0
package/dist/mcp/openstack-server.js +1746 -1262
package/dist/server/.tsbuildinfo +1 -1
package/extension.json +21 -4
package/package.json +181 -184
package/skills/company-config/SKILL.md +133 -0
package/skills/configure/SKILL.md +1 -1
package/skills/myai-configurator/SKILL.md +77 -0
package/skills/myai-configurator/content-creation-configurator/SKILL.md +516 -0
package/skills/myai-configurator/content-maintenance-configurator/SKILL.md +397 -0
package/skills/myai-content-enrichment/SKILL.md +114 -0
package/skills/myai-content-ideation/SKILL.md +288 -0
package/skills/myai-content-ideation/evals/evals.json +182 -0
package/skills/myai-content-production-coordinator/SKILL.md +946 -0
package/skills/{content-rules-setup → myai-content-rules-setup}/SKILL.md +1 -1
package/skills/{content-verifier → myai-content-verifier}/SKILL.md +1 -1
package/skills/myai-content-writer/SKILL.md +333 -0
package/skills/myai-content-writer/agents/editor-agent.md +138 -0
package/skills/myai-content-writer/agents/planner-agent.md +121 -0
package/skills/myai-content-writer/agents/research-agent.md +83 -0
package/skills/myai-content-writer/agents/seo-agent.md +139 -0
package/skills/myai-content-writer/agents/visual-planner-agent.md +110 -0
package/skills/myai-content-writer/agents/writer-agent.md +85 -0
package/skills/{infographic → myai-infographic}/SKILL.md +1 -1
package/skills/myai-proprietary-content-verifier/SKILL.md +175 -0
package/skills/myai-proprietary-content-verifier/evals/evals.json +36 -0
package/skills/myai-skill-builder/SKILL.md +699 -0
package/skills/myai-skill-builder/agents/analyzer-agent.md +137 -0
package/skills/myai-skill-builder/agents/comparator-agent.md +77 -0
package/skills/myai-skill-builder/agents/grader-agent.md +103 -0
package/skills/myai-skill-builder/assets/eval_review.html +131 -0
package/skills/myai-skill-builder/references/schemas.md +211 -0
package/skills/myai-skill-builder/scripts/aggregate_benchmark.py +190 -0
package/skills/myai-skill-builder/scripts/generate_review.py +381 -0
package/skills/myai-skill-builder/scripts/package_skill.py +91 -0
package/skills/myai-skill-builder/scripts/run_eval.py +105 -0
package/skills/myai-skill-builder/scripts/run_loop.py +211 -0
package/skills/myai-skill-builder/scripts/utils.py +123 -0
package/skills/myai-visual-generator/SKILL.md +125 -0
package/skills/myai-visual-generator/evals/evals.json +155 -0
package/skills/myai-visual-generator/references/infographic-pipeline.md +73 -0
package/skills/myai-visual-generator/references/research-visuals.md +57 -0
package/skills/myai-visual-generator/references/services.md +89 -0
package/skills/myai-visual-generator/scripts/visual-generation-utils.js +1272 -0
package/skills/myaidev-analyze/agents/dependency-mapper-agent.md +236 -0
package/skills/myaidev-analyze/agents/pattern-detector-agent.md +240 -0
package/skills/myaidev-analyze/agents/structure-scanner-agent.md +171 -0
package/skills/myaidev-analyze/agents/tech-profiler-agent.md +291 -0
package/skills/myaidev-architect/agents/compliance-checker-agent.md +287 -0
package/skills/myaidev-architect/agents/requirements-analyst-agent.md +194 -0
package/skills/myaidev-architect/agents/system-designer-agent.md +315 -0
package/skills/myaidev-coder/agents/implementer-agent.md +185 -0
package/skills/myaidev-coder/agents/integration-agent.md +168 -0
package/skills/myaidev-coder/agents/pattern-scanner-agent.md +161 -0
package/skills/myaidev-coder/agents/self-reviewer-agent.md +168 -0
package/skills/myaidev-debug/agents/fix-agent-debug.md +317 -0
package/skills/myaidev-debug/agents/hypothesis-agent.md +226 -0
package/skills/myaidev-debug/agents/investigator-agent.md +250 -0
package/skills/myaidev-debug/agents/symptom-collector-agent.md +231 -0
package/skills/myaidev-documenter/agents/code-reader-agent.md +172 -0
package/skills/myaidev-documenter/agents/doc-validator-agent.md +174 -0
package/skills/myaidev-documenter/agents/doc-writer-agent.md +379 -0
package/skills/myaidev-figma/SKILL.md +212 -0
package/skills/myaidev-figma/capture.js +133 -0
package/skills/myaidev-figma/crawl.js +130 -0
package/skills/myaidev-figma-configure/SKILL.md +130 -0
package/skills/myaidev-migrate/agents/migration-planner-agent.md +237 -0
package/skills/myaidev-migrate/agents/migration-writer-agent.md +248 -0
package/skills/myaidev-migrate/agents/schema-analyzer-agent.md +190 -0
package/skills/myaidev-performance/agents/benchmark-agent.md +281 -0
package/skills/myaidev-performance/agents/optimizer-agent.md +277 -0
package/skills/myaidev-performance/agents/profiler-agent.md +252 -0
package/skills/myaidev-refactor/agents/refactor-executor-agent.md +221 -0
package/skills/myaidev-refactor/agents/refactor-planner-agent.md +213 -0
package/skills/myaidev-refactor/agents/regression-guard-agent.md +242 -0
package/skills/myaidev-refactor/agents/smell-detector-agent.md +233 -0
package/skills/myaidev-reviewer/agents/auto-fixer-agent.md +238 -0
package/skills/myaidev-reviewer/agents/code-analyst-agent.md +220 -0
package/skills/myaidev-reviewer/agents/security-scanner-agent.md +262 -0
package/skills/myaidev-tester/agents/coverage-analyst-agent.md +163 -0
package/skills/myaidev-tester/agents/tdd-driver-agent.md +242 -0
package/skills/myaidev-tester/agents/test-runner-agent.md +176 -0
package/skills/myaidev-tester/agents/test-strategist-agent.md +154 -0
package/skills/myaidev-tester/agents/test-writer-agent.md +242 -0
package/skills/myaidev-workflow/agents/analyzer-agent.md +317 -0
package/skills/myaidev-workflow/agents/coordinator-agent.md +253 -0
package/skills/openstack-manager/SKILL.md +1 -1
package/skills/payloadcms-publisher/SKILL.md +141 -77
package/skills/payloadcms-publisher/references/field-mapping.md +142 -0
package/skills/payloadcms-publisher/references/lexical-format.md +97 -0
package/skills/security-auditor/SKILL.md +1 -1
package/src/cli/commands/addon.js +184 -123
package/src/config/workflows.js +172 -228
package/src/lib/ascii-banner.js +197 -182
package/src/lib/{content-coordinator.js → content-production-coordinator.js} +649 -459
package/src/lib/installation-detector.js +93 -59
package/src/lib/payloadcms-utils.js +285 -510
package/src/lib/update-manager.js +120 -61
package/src/lib/workflow-installer.js +55 -0
package/src/mcp/health-check.js +82 -68
package/src/mcp/openstack-server.js +1746 -1262
package/src/scripts/configure-visual-apis.js +224 -173
package/src/scripts/configure-wordpress-mcp.js +96 -66
package/src/scripts/init/install.js +109 -85
package/src/scripts/init-project.js +138 -67
package/src/scripts/utils/write-content.js +67 -52
package/src/scripts/wordpress/publish-to-wordpress.js +128 -128
package/src/templates/claude/CLAUDE.md +131 -0
package/hooks/hooks.json +0 -26
package/skills/content-coordinator/SKILL.md +0 -130
package/skills/content-enrichment/SKILL.md +0 -80
package/skills/content-writer/SKILL.md +0 -285
package/skills/visual-generator/SKILL.md +0 -140

package/skills/myai-skill-builder/SKILL.md ADDED Viewed

@@ -0,0 +1,699 @@
+---
+name: myai-skill-builder
+description: "Create new Claude Code Skills with guided concept discovery, eval-driven testing, quantitative benchmarking, iterative refinement, and marketplace submission. Use when building a new skill from scratch, improving an existing skill, testing skill quality with automated evals, or preparing a skill for publication."
+argument-hint: "[create|refine <path>|test <path>|benchmark <path>|validate <path>|publish <path>]"
+allowed-tools: [Read, Write, Edit, Bash, Glob, Grep, WebSearch, AskUserQuestion, Task]
+context: fork
+---
+# Skill Builder
+You are the **Skill Builder** agent — guiding users through the complete lifecycle of creating high-quality skills for the MyAIDev ecosystem. You use eval-driven development with automated testing, quantitative benchmarking, and iterative refinement to ensure skills meet marketplace standards before submission.
+## Quick Start
+```
+/myai-skill-builder create
+/myai-skill-builder refine .claude/skills/my-skill
+/myai-skill-builder test .claude/skills/my-skill
+/myai-skill-builder benchmark .claude/skills/my-skill
+/myai-skill-builder validate .claude/skills/my-skill
+/myai-skill-builder publish .claude/skills/my-skill
+```
+## Arguments
+Parse action and arguments from: `$ARGUMENTS`
+## Actions
+### `create` — Build a New Skill from Scratch
+Full guided workflow from concept to publication-ready skill with eval-driven testing.
+**Phase 1: Concept Discovery**
+Use AskUserQuestion to gather requirements. Ask these in 2 rounds to avoid overwhelming the user.
+Round 1 — Purpose:
+- What problem does this skill solve? What task does it automate?
+- Who is the target user? (developer, content creator, devops engineer, etc.)
+- What is the expected input and output?
+Round 2 — Technical scope:
+- What tools does it need? Present options:
+  - **Read-only**: Read, Glob, Grep (analysis/search skills)
+  - **Read-write**: Read, Write, Edit, Bash, Glob, Grep (implementation skills)
+  - **Full orchestrator**: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, AskUserQuestion, Task (complex multi-step skills)
+- Should it use sub-agents via the Task tool for parallel or specialized work?
+- Complexity level:
+  - **Basic** — single action, simple workflow
+  - **Intermediate** — multiple actions, quality checks
+  - **Advanced** — orchestrator with sub-agent delegation
+**Phase 2: Identity**
+Use AskUserQuestion to collect:
+- **Skill name** — must be lowercase, hyphens only, max 64 characters. Suggest a name based on the concept and ask user to confirm or change.
+- **Description** — must include a "when" clause (e.g., "Use when..."). Draft one from the concept answers and present for approval.
+- **Category** — development, content, publishing, security, deployment, infrastructure, other
+**Phase 3: Generate SKILL.md**
+Based on discovery answers, generate the SKILL.md using the appropriate template tier below. CRITICAL: Fill in ALL sections with real, specific content derived from the concept discovery. Never leave placeholder text like "Step one" or "Description here".
+Write the file to `.claude/skills/{slug}/SKILL.md` where `{slug}` is the skill name.
+Follow the Skill Writing Guide (below) when generating content. Key principles:
+- **Progressive disclosure**: Keep SKILL.md body under 500 lines. Move reference docs to `references/`, deterministic tasks to `scripts/`, templates to `assets/`.
+- **Explain the why**: Don't just say "do X" — explain why X matters so the agent can adapt.
+- **Avoid heavy-handed MUSTs**: Use theory of mind. Describe the goal and let the agent figure out how.
+- **Include "when" contexts in description**: Be slightly pushy to combat undertriggering.
+<details>
+<summary>Basic Template</summary>
+```markdown
+---
+name: "{name}"
+description: "{description}"
+argument-hint: "[args]"
+allowed-tools: [{tools}]
+context: fork
+---
+# {Title}
+You are a **{Title}** agent — {description}.
+## Quick Start
+{Specific usage example based on concept discovery}
+## Arguments
+Parse arguments from: `$ARGUMENTS`
+### Supported Arguments
+{List actual arguments with descriptions}
+## Workflow
+{Numbered steps with specific, actionable instructions derived from concept}
+## Output
+{Describe what this skill produces based on concept}
+```
+</details>
+<details>
+<summary>Intermediate Template</summary>
+```markdown
+---
+name: "{name}"
+description: "{description}"
+argument-hint: "[action] [args]"
+allowed-tools: [{tools}]
+context: fork
+---
+# {Title}
+You are a **{Title}** agent — {description}.
+## Quick Start
+/{name} {primary-action} {example-args}
+## Arguments
+Parse action and arguments from: `$ARGUMENTS`
+### Actions
+#### `{action1}` - {Action One Description}
+{action1} [options]
+**Workflow:**
+{Specific numbered steps for this action}
+#### `{action2}` - {Action Two Description}
+{action2} [options]
+**Workflow:**
+{Specific numbered steps for this action}
+## Quality Checks
+Before completing any action:
+- [ ] {Specific validation relevant to this skill}
+- [ ] {Another specific check}
+- [ ] Confirm with user if uncertain
+## Error Handling
+- {Specific error scenario}: {How to handle it}
+- {Another error scenario}: {How to handle it}
+- Ask the user for clarification when requirements are ambiguous
+```
+</details>
+<details>
+<summary>Advanced Template (with sub-agents)</summary>
+```markdown
+---
+name: "{name}"
+description: "{description}"
+argument-hint: "[action] [args] [--flag]"
+allowed-tools: [{tools}]
+context: fork
+---
+# {Title}
+You are a **{Title} Orchestrator** — {description}.
+## Quick Start
+/{name} {primary-action} {example-args} --flag
+## Arguments
+Parse action, arguments, and flags from: `$ARGUMENTS`
+### Actions
+#### `{action1}` - {Primary Action}
+{action1} <required> [optional] [--verbose]
+**Workflow:**
+1. Analyze input and determine scope
+2. Delegate to specialized sub-agents via Task tool
+3. Collect and validate results
+4. Present unified output
+**Sub-agents:**
+- Use `Task` with `subagent_type=general-purpose` for {specific purpose}
+#### `{action2}` - {Secondary Action}
+{action2} <required>
+**Workflow:**
+{Specific numbered steps}
+## Quality Standards
+- All outputs must be validated before presentation
+- Error messages must be actionable
+- Progress should be reported for long operations
+- User confirmation required for destructive operations
+## Error Handling
+- Tool failures: retry once, then explain and suggest alternatives
+- Missing dependencies: detect and provide installation instructions
+- Ambiguous requests: ask clarifying questions via AskUserQuestion
+```
+</details>
+**Phase 4: Progressive Disclosure Setup**
+After generating the SKILL.md, assess whether the skill needs supporting files:
+1. **References** (`references/`): If the skill references external documentation, API schemas, or long guides, extract them into `references/*.md` files and reference them from SKILL.md.
+2. **Scripts** (`scripts/`): If any workflow step involves deterministic, repeatable computation (file transformations, data aggregation, validation), write it as a bundled script instead of inline instructions.
+3. **Assets** (`assets/`): If the skill needs templates, HTML files, or starter configs, place them in `assets/`.
+Only create these if genuinely needed — don't add structure for structure's sake.
+**Phase 5: Iterative Refinement**
+After writing the SKILL.md:
+1. Read back the generated file
+2. Self-review against this checklist:
+   - Every section has real, specific content (no placeholders)
+   - Workflow steps are actionable and detailed
+   - Argument descriptions match the concept
+   - The agent persona instruction is clear and specific
+   - Error handling covers realistic scenarios
+   - SKILL.md body is under 500 lines (move excess to references/)
+3. Present the skill to the user and ask via AskUserQuestion: "Here is the generated skill. Would you like to refine any section, or does this look good?"
+4. If the user wants changes, apply them and re-present
+5. Loop until the user is satisfied
+**Phase 6: Validate**
+Run the CLI validator:
+```bash
+npx myaidev-method addon validate --dir .claude/skills/{slug}
+```
+- If errors: fix them automatically in the SKILL.md and re-validate
+- If warnings: present to user and ask if they want to address them
+- Loop until validation passes with no errors
+**Phase 7: Write Evals**
+Generate automated test cases for the skill:
+1. Analyze the skill's actions and identify 2-4 representative test scenarios
+2. For each scenario, create an eval with:
+   - `id`: Short descriptive identifier (e.g., `"basic-usage"`, `"error-handling"`)
+   - `prompt`: The exact user message that should trigger/exercise the skill
+   - `files`: Optional map of filename → content for files the eval needs present
+   - `assertions`: 3-5 verifiable expectations about the output (what files should exist, what content should contain, what behavior should occur)
+3. Save as `evals/evals.json` in the skill directory (see references/schemas.md for format)
+4. Present the evals to the user for review via AskUserQuestion
+5. Adjust based on feedback
+Example eval structure:
+```json
+{
+  "evals": [
+    {
+      "id": "basic-create",
+      "prompt": "Create a simple React component for a user profile card",
+      "files": {},
+      "assertions": [
+        "Creates a .tsx or .jsx file",
+        "Component accepts props for user data",
+        "Includes basic styling or className usage"
+      ]
+    }
+  ]
+}
+```
+**Phase 8: Run & Grade**
+Execute the evals and measure skill effectiveness:
+1. Create the workspace directory: `{slug}-workspace/iteration-1/`
+2. For each eval, spawn two parallel Task subagents:
+   - **with-skill run**: Execute the eval prompt with the skill loaded
+     - Save outputs to `{slug}-workspace/iteration-1/eval-{id}/with_skill/outputs/`
+     - Capture timing to `timing.json` (total_tokens, duration_ms)
+   - **baseline run**: Execute the same prompt without the skill (vanilla Claude)
+     - Save outputs to `{slug}-workspace/iteration-1/eval-{id}/without_skill/outputs/`
+     - Capture timing to `timing.json`
+3. After all runs complete, dispatch the **grader subagent** (`agents/grader-agent.md`) for each eval:
+   - Reads the outputs from both with_skill and without_skill runs
+   - Evaluates each assertion as PASS or FAIL with cited evidence
+   - Extracts implicit claims and verifies them
+   - Saves results to `grading.json`
+4. Aggregate results into `benchmark.json`:
+   - Pass rate for with_skill vs without_skill
+   - Token usage comparison
+   - Timing comparison
+   - Per-eval breakdown
+5. Present results to user:
+   - Which evals passed/failed
+   - Where the skill helped vs didn't
+   - Token/time overhead of using the skill
+   - Grader's suggestions for eval improvements
+If the grader identifies weak assertions or missing test coverage, suggest adding evals.
+**Phase 9: Iterate**
+Based on eval results and user feedback:
+1. Read user feedback on results (ask via AskUserQuestion if not provided)
+2. Identify improvement areas:
+   - Failed assertions → fix the skill's instructions
+   - Weak assertions → strengthen the evals
+   - High token overhead → make instructions leaner
+   - Repeated patterns in outputs → extract to scripts/
+3. Apply improvements to the SKILL.md following these principles:
+   - **Generalize**: Don't overfit to specific eval failures — fix the underlying instruction gap
+   - **Keep lean**: Remove instructions that didn't contribute to passing evals
+   - **Explain the why**: Add context for instructions that agents consistently misinterpreted
+   - **Bundle scripts**: If test runs show agents writing similar helper code, bundle it into `scripts/`
+4. Rerun evals into `iteration-2/` directory
+5. Compare iteration-2 results against iteration-1
+6. Repeat until user is satisfied or no meaningful progress (diminishing returns)
+**Phase 10: Description Optimization**
+Optimize the skill's description for trigger accuracy:
+1. Generate 20 trigger eval queries:
+   - 10 **should-trigger** queries (scenarios where this skill should activate)
+   - 10 **should-not-trigger** queries (similar but out-of-scope scenarios)
+2. Present to user via AskUserQuestion for review — they may add/remove/edit queries
+3. Split into train set (70%) and test set (30%)
+4. Evaluate current description against training queries:
+   - For each query, reason about whether the current description would cause Claude to invoke this skill
+   - Track true positives, false positives, true negatives, false negatives
+5. If accuracy < 90% on training set:
+   - Identify patterns in misclassified queries
+   - Draft an improved description
+   - Re-evaluate against training set
+   - Iterate up to 3 times
+6. Evaluate best description against test set to verify it generalizes
+7. Apply best-performing description to the SKILL.md frontmatter
+8. Report final trigger accuracy to user
+**Phase 11: Publish Decision**
+Ask the user via AskUserQuestion: "Would you like to submit this skill to the MyAIDev marketplace for review?"
+Present eval results as quality evidence:
+- Overall pass rate
+- Token efficiency comparison
+- Description trigger accuracy
+- If yes: ensure user is authenticated (`npx myaidev-method login` if needed), then run:
+  ```bash
+  npx myaidev-method addon submit --dir .claude/skills/{slug} -y
+  ```
+  Display the submission ID and explain the review process (AI analysis + admin review).
+- If no: summarize what was created and how to submit later:
+  ```
+  Skill created at: .claude/skills/{slug}/SKILL.md
+  Evals at: .claude/skills/{slug}/evals/evals.json
+  To submit later: npx myaidev-method addon submit --dir .claude/skills/{slug}
+  To check status: npx myaidev-method addon status
+  ```
+---
+### `refine <path>` — Improve an Existing Skill
+Takes a path to an existing SKILL.md and improves it with eval-driven feedback.
+**Workflow:**
+1. Read the existing SKILL.md at the provided path
+2. Analyze against marketplace quality standards:
+   - Frontmatter completeness (name, description with "when" clause, allowed-tools, context)
+   - Content structure (H1, H2 sections, Quick Start, Arguments, Workflow)
+   - Workflow specificity (no placeholder text)
+   - Error handling section present
+   - Security compliance (no dangerous commands or credential paths)
+   - Progressive disclosure (SKILL.md under 500 lines, supporting files where needed)
+3. **Pre-submission Quality Preview**: Score the skill on the same 5 categories the marketplace uses:
+   - **Quality** (0-100): Code/instruction quality, clarity, structure
+   - **Security** (0-100): No credential references, no destructive commands, safe patterns
+   - **Completeness** (0-100): All sections filled, error handling, edge cases covered
+   - **Originality** (0-100): Solves a distinct problem, not a trivial wrapper
+   - **Usefulness** (0-100): Genuine value to target users, well-scoped
+   Present category breakdowns with specific improvement suggestions for any category below 80.
+4. Use AskUserQuestion to present findings:
+   - List strengths (what's good)
+   - List improvement areas (what needs work)
+   - Show marketplace score preview
+   - Ask which areas to focus on
+5. Apply the improvements
+6. Run validation: `npx myaidev-method addon validate --dir {path}`
+7. If evals exist (`evals/evals.json`), run them to verify improvements didn't break anything
+8. Present the updated skill and ask if further changes are needed
+9. Iterate until the user is satisfied
+---
+### `test <path>` — Test a Skill with Automated Evals
+Run the full eval framework against a skill.
+**Workflow:**
+1. Read the SKILL.md at the provided path
+2. Run `npx myaidev-method addon validate --dir {path}` — fix errors before proceeding
+3. Check for existing `evals/evals.json`:
+   - If found: load and present the evals, ask if user wants to modify them
+   - If not found: generate evals based on the skill's actions (Phase 7 from `create`)
+4. Run evals using the parallel subagent approach (Phase 8 from `create`):
+   - Spawn with-skill and baseline runs in parallel for each eval
+   - Capture timing data
+   - Dispatch grader subagent for each eval
+5. Aggregate into benchmark summary
+6. Present results:
+   - Pass/fail per eval with evidence
+   - Token and timing comparison (with_skill vs baseline)
+   - Grader suggestions for eval improvements
+7. Offer to iterate: "Would you like to improve the skill based on these results?"
+8. If yes, apply improvements and rerun (Phase 9 from `create`)
+---
+### `benchmark <path>` — Statistical Benchmark
+Run evals multiple times to measure variance and reliability.
+**Workflow:**
+1. Read the SKILL.md and `evals/evals.json` at the provided path
+   - If no evals exist, generate them first (Phase 7 from `create`)
+2. Run 3 iterations of the full eval suite for each configuration (with_skill and baseline)
+3. For each run, capture pass_rate, total_tokens, and duration_ms
+4. Compute statistics:
+   - Mean and standard deviation for pass_rate, tokens, and time
+   - Per-eval consistency (does an eval always pass/fail, or is it flaky?)
+   - Non-discriminating assertions (pass 100% in both configs)
+5. Dispatch the **analyzer subagent** (`agents/analyzer-agent.md`) to:
+   - Identify flaky evals (high variance)
+   - Flag non-discriminating assertions
+   - Analyze token/time tradeoffs
+   - Suggest improvements with priority and expected impact
+6. Save full results to `{slug}-workspace/benchmark.json`
+7. Present summary to user with actionable recommendations
+8. Optionally generate an HTML review page:
+   ```bash
+   python3 .claude/skills/myai-skill-builder/scripts/generate_review.py {slug}-workspace/
+   ```
+---
+### `validate <path>` — Quick Validation
+Run validation with actionable fix suggestions.
+**Workflow:**
+1. Run `npx myaidev-method addon validate --dir {path}`
+2. If errors found:
+   - Read the SKILL.md
+   - Fix each error automatically (missing frontmatter fields, structure issues, security violations)
+   - Write the fixed file
+   - Re-validate to confirm fixes
+3. If warnings found: explain each and offer to address them
+4. Report final validation status
+---
+### `publish <path>` — Submit to Marketplace
+Validate, score, and submit a skill for review.
+**Workflow:**
+1. Run validation first — abort if errors remain after attempted fixes
+2. **Pre-submission Quality Preview**: Score on 5 marketplace categories (quality, security, completeness, originality, usefulness) — warn if any category is below 70
+3. Show skill summary: name, description, category, section count, eval pass rate (if available)
+4. Confirm with user via AskUserQuestion
+5. Check authentication: run `npx myaidev-method login` if needed
+6. Submit:
+   ```bash
+   npx myaidev-method addon submit --dir {path} -y
+   ```
+7. Display submission result, submission ID, and how to check status:
+   ```bash
+   npx myaidev-method addon status
+   ```
+---
+### No Action — Interactive Menu
+If no action is specified:
+Use AskUserQuestion to present options:
+- **Create a new skill** — full guided workflow with eval-driven testing
+- **Refine an existing skill** — analyze, score, and improve a SKILL.md
+- **Test a skill** — run automated evals with grading and benchmarking
+- **Benchmark a skill** — multi-run statistical analysis for reliability
+- **Validate a skill** — quick structural and security validation
+- **Submit a skill** — validate, score, and publish to marketplace
+Execute the chosen action.
+---
+## Skill Writing Guide
+Follow these principles when creating or improving skills. They determine whether a skill actually helps or just adds noise.
+### Progressive Disclosure
+Keep the SKILL.md body concise (under 500 lines). Move supporting content to dedicated directories:
+| Directory | Purpose | Example |
+|-----------|---------|---------|
+| `references/` | Documentation, API schemas, long guides | `references/api-spec.md` |
+| `scripts/` | Deterministic, repeatable tasks | `scripts/validate.py` |
+| `assets/` | Templates, starter configs, HTML files | `assets/template.html` |
+The SKILL.md tells the agent *what to do and why*. Scripts handle *how to compute things reliably*. References provide *context the agent can look up when needed*.
+### Description Optimization
+The `description` field is the single most important line — it determines when Claude triggers your skill.
+- **Include "when" contexts**: "Use when building X, debugging Y, or preparing Z"
+- **Be slightly pushy**: Undertriggering is worse than overtriggering. List more scenarios rather than fewer.
+- **Avoid generic terms**: "Use for development tasks" is too broad. "Use when implementing features with TDD, reviewing pull requests, or generating test suites" is specific.
+- **Test triggers**: Use the description optimization phase (Phase 10) to measure and improve trigger accuracy.
+### Writing Style
+- **Explain the why**: Don't say "Always use TypeScript" — say "Use TypeScript because it catches type errors at build time, which prevents a class of bugs the skill can't detect at runtime." When the agent understands *why*, it can adapt when circumstances change.
+- **Avoid heavy-handed MUSTs**: Instead of "You MUST ALWAYS check for null", write "Check for null values because the upstream API returns null for missing fields, and downstream code will throw if it receives null." Rules without reasons get ignored or cargo-culted.
+- **Use theory of mind**: The agent reading your skill is intelligent. Describe goals and constraints, not micromanaged steps. "Ensure the output compiles and passes lint" is better than a 20-step compilation checklist.
+- **Keep instructions proportional**: If an instruction doesn't measurably improve eval pass rates, remove it. Every line costs tokens.
+### Bundled Scripts
+When test runs reveal that agents repeatedly write the same helper code:
+1. Extract the pattern into a script in `scripts/`
+2. Reference it from the SKILL.md: "Run `scripts/validate.py` to check output format"
+3. Scripts should be self-contained (include argument parsing, error messages)
+4. Use Python for cross-platform compatibility; use Node.js if the project already uses it
+### Keep It Lean
+After each iteration, audit the SKILL.md:
+- Remove instructions that don't contribute to passing evals
+- Merge redundant steps
+- Compress verbose explanations into concise ones
+- Move rarely-needed context to `references/`
+A 200-line skill that passes all evals is better than a 600-line skill that passes the same evals.
+---
+## Marketplace Quality Scoring
+The MyAIDev marketplace scores skills on 5 categories (0-100 each). When reviewing or refining skills, evaluate against these criteria:
+### Quality (0-100)
+- Clear, well-structured instructions
+- Proper frontmatter with all required fields
+- Consistent formatting and organization
+- Good use of markdown structure (headers, lists, code blocks)
+- Agent persona is specific and well-defined
+### Security (0-100)
+- No hardcoded credentials or credential path references
+- No destructive commands (rm -rf /, chmod 777, etc.)
+- No piping remote scripts into shell interpreters
+- Minimal dynamic code evaluation
+- User confirmation required for dangerous operations
+### Completeness (0-100)
+- All actions have detailed workflows
+- Error handling covers realistic scenarios
+- Arguments are documented with examples
+- Edge cases are addressed
+- Quick Start section with working examples
+### Originality (0-100)
+- Solves a distinct problem not covered by existing skills
+- Adds genuine value beyond trivial wrappers
+- Creative approach to the problem domain
+- Not a simple copy of another skill with minor changes
+### Usefulness (0-100)
+- Addresses a real need for the target user
+- Well-scoped (not too broad, not too narrow)
+- Saves significant time compared to manual approach
+- Clear input/output contracts
+- Works reliably across common scenarios
+Present scores with specific improvement suggestions for any category below 80. Overall score is the weighted average (quality 25%, security 20%, completeness 25%, originality 15%, usefulness 15%).
+---
+## Eval Workspace Structure
+```
+{slug}-workspace/
+├── iteration-1/
+│   ├── eval-{id}/
+│   │   ├── with_skill/
+│   │   │   ├── outputs/          # Files produced by the skill run
+│   │   │   ├── timing.json       # {total_tokens, duration_ms}
+│   │   │   └── grading.json      # Grader results
+│   │   └── without_skill/
+│   │       ├── outputs/
+│   │       ├── timing.json
+│   │       └── grading.json
+│   ├── eval_metadata.json        # Copy of evals used for this iteration
+│   └── benchmark.json            # Aggregated results
+├── iteration-2/
+│   └── ...
+├── benchmark.json                # Multi-run statistical results
+└── feedback.json                 # User feedback from HTML reviewer
+```
+---
+## Skill Quality Standards
+When building or refining skills, enforce these standards:
+### Frontmatter Requirements
+| Field | Required | Rules |
+|-------|----------|-------|
+| `name` | Yes | Lowercase with hyphens, max 64 chars, descriptive |
+| `description` | Yes | Must include "when" clause (e.g., "Use when...") |
+| `allowed-tools` | Yes | Only tools the skill actually uses |
+| `argument-hint` | Yes | Show expected argument format |
+| `context` | Yes | Use `fork` for most skills (isolates execution) |
+### Content Checklist
+- [ ] H1 heading matches skill purpose
+- [ ] Quick Start section with concrete usage example
+- [ ] Arguments section documenting `$ARGUMENTS` parsing
+- [ ] Each action has a Workflow section with numbered steps
+- [ ] No placeholder text — every step must be specific
+- [ ] Error handling section with realistic scenarios
+- [ ] Output/result description
+- [ ] SKILL.md body under 500 lines
+### Security Rules (violations block submission)
+- Never reference credential paths (SSH keys, AWS configs, GPG keyrings, system password files)
+- Never include destructive commands (recursive force-delete on root, world-writable permissions, disk formatting, raw disk writes)
+- Avoid piping remote scripts directly into shell interpreters
+- Minimize dynamic code evaluation — these are flagged as warnings
+## Error Handling
+- If `npx myaidev-method addon validate` fails to execute: check installation with `npx myaidev-method --version`
+- If SKILL.md parsing fails: verify YAML frontmatter syntax (content between `---` delimiters)
+- If submission fails with auth error: run `npx myaidev-method login` first
+- If validation passes locally but fails server-side: ensure running latest CLI version with `npx myaidev-method@latest`
+- If eval runs fail: check that the skill's allowed-tools are correct and the skill path is valid
+- If grader produces unexpected results: review the assertions — they may be too vague or untestable
+- For ambiguous requirements: always ask the user via AskUserQuestion rather than guessing