npm - @intentsolutionsio/skill-creator - Versions diffs - 5.0.0 → 5.0.6 - Mend

@intentsolutionsio/skill-creator 5.0.0 → 5.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/skills/skill-creator/references/validation-rules.md CHANGED Viewed

@@ -1,4 +1,5 @@
 # Skill & Plugin Validation Rules
 Sources: [Anthropic docs](https://code.claude.com/docs/en/skills) · Intent Solutions enterprise policy
 Universal validation aligned with the Anthropic 2026 spec. Two tiers: Standard (Anthropic minimum) and Enterprise (our marketplace default — all fields required, zero tolerance for non-standard fields).
@@ -57,6 +58,7 @@ Body must contain all 7 sections (hard ERROR if any missing):
 ```
 Supporting files required (gold standard):
 - `PRD.md` must exist in skill root — Product Requirements Document
 - `ARD.md` must exist in skill root — Architecture Requirements Document
 - `references/` directory must exist (plural directory, NOT `reference.md` singular)
@@ -133,11 +135,13 @@ capabilities: []                  # NOTE: valid for agents ONLY, not skills
 ```
 **Plugin agents CANNOT use** (WARN if present):
 - `hooks` — plugin-level only, not agent-level
 - `mcpServers` — plugin-level only
 - `permissionMode` — standalone agent only, not plugin-scoped
 **Invalid for agents** (ERROR):
 - `expertise_level`, `activation_priority`, `color`, `activation_triggers`, `type`, `category` — invented, not Anthropic
 ---
@@ -237,6 +241,7 @@ Plus MCP tools in `ServerName:tool_name` format.
 | Enterprise | Error |
 Valid scoped patterns:
 ```
 Bash(git:*)
 Bash(npm:*)
@@ -284,6 +289,7 @@ Validate MCP server configuration structure.
 ### 7. Roll Up Plugin Score
 Plugin score = weighted average of component scores:
 - Skills: 50% weight
 - Agents: 20% weight
 - Commands: 15% weight
@@ -311,11 +317,13 @@ Anthropic defines 14 valid fields for agents. `name` and `description` are REQUI
 ### Context-Aware Rules
 **Plugin agents** (`plugins/*/agents/*.md`):
 - WARN if `hooks` present (hooks belong at plugin level, not agent level)
 - WARN if `mcpServers` present (plugin-level concern)
 - WARN if `permissionMode` present (standalone-only field)
 **Standalone agents** (`~/.claude/agents/*.md`):
 - All fields valid without restriction
 ### Invalid Agent Fields (ERROR)
@@ -409,6 +417,7 @@ The command runs at skill activation time. Output is injected verbatim into the
 ## String Substitution Validation
 If SKILL.md body contains `$ARGUMENTS` or `$0`, `$1`, etc.:
 - `argument-hint` SHOULD be set in frontmatter (WARNING if missing)
 - Instructions SHOULD handle empty `$ARGUMENTS` case
 - `$ARGUMENTS[N]` indexing should be sequential from 0
@@ -420,12 +429,14 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 ## Validation Process
 ### Pre-flight
 1. File exists and is readable
 2. YAML frontmatter parses without error
 3. Frontmatter separator (`---`) present at start and end
 4. No non-standard fields present (ERROR on any invented/deprecated field)
 ### Field Validation
 1. All 8 required fields present (enterprise) or 2 required fields (standard)
 2. Field types correct (string, array, boolean, semver)
 3. Field constraints met (kebab-case, SPDX, valid tool names)
@@ -434,6 +445,7 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 6. Conditional field logic (`context` requires `agent` and vice versa)
 ### Body Validation
 1. Length within limits (301-500 = WARNING, >500 = ERROR)
 2. All 7 required sections present (enterprise) — hard ERROR if any missing
 3. No absolute paths outside code blocks
@@ -442,15 +454,17 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 6. `references/` directory exists (enterprise)
 ### Resource Validation
 1. All `${CLAUDE_SKILL_DIR}/scripts/*` references exist
 2. All `${CLAUDE_SKILL_DIR}/references/*` references exist
 3. All `${CLAUDE_SKILL_DIR}/templates/*` references exist
 4. All `${CLAUDE_SKILL_DIR}/assets/*` references exist
-5. Relative markdown links (e.g., `[ref](references/api.md)`) point to existing files
+5. Relative markdown links (e.g., `ref`) point to existing files
 6. No path escape attempts (`../`)
 7. No empty (0-byte) supporting files (stub detection)
 ### Report
 - Errors: Must fix (blocks pass)
 - Warnings: Should fix (does not block pass)
 - Info: Optional improvements (includes structural advisor suggestions)
@@ -465,21 +479,25 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 INFO-level suggestions emitted after grading. Not scored — purely advisory.
 ### Split to Commands
 - **Trigger**: 3+ kebab-case `## operation-name` sections without `commands/` directory
 - **Suggestion**: Split into individual `commands/*.md` files
 - **Why**: Each operation becomes a separate slash command; skill stays lean
 ### Offload to References
 - **Trigger**: Body sections >20 lines (Output, Error Handling, Examples) without `references/`
 - **Suggestion**: Move to `references/section-name.md` with relative markdown link
 - **Why**: Reduces token footprint; Claude reads on demand
 ### DCI Opportunities
 - **Trigger**: File existence checks, git operations, or tool version detection without DCI
 - **Suggestion**: Add `` !`command` `` directives for auto-detection at activation
 - **Why**: Eliminates discovery tool calls; Claude starts with context pre-loaded
 ### Migrate Commands to Skills
 - **Trigger**: `commands/*.md` files present without corresponding `skills/` entries
 - **Suggestion**: Consider migrating to SKILL.md format for auto-activation
 - **Why**: Skills activate automatically on context; commands require explicit `/name` invocation

package/skills/skill-creator/scripts/aggregate_benchmark.py CHANGED Viewed

@@ -60,7 +60,7 @@ def calculate_stats(values: list[float]) -> dict:
         "mean": round(mean, 4),
         "stddev": round(stddev, 4),
         "min": round(min(values), 4),
-        "max": round(max(values), 4)
+        "max": round(max(values), 4),
     }
@@ -157,7 +157,9 @@ def load_run_results(benchmark_dir: Path) -> dict:
                 raw_expectations = grading.get("expectations", [])
                 for exp in raw_expectations:
                     if "text" not in exp or "passed" not in exp:
-                        print(f"Warning: expectation in {grading_file} missing required fields (text, passed, evidence): {exp}")
+                        print(
+                            f"Warning: expectation in {grading_file} missing required fields (text, passed, evidence): {exp}"
+                        )
                 result["expectations"] = raw_expectations
                 # Extract notes from user_notes_summary
@@ -189,7 +191,7 @@ def aggregate_results(results: dict) -> dict:
             run_summary[config] = {
                 "pass_rate": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
                 "time_seconds": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
-                "tokens": {"mean": 0, "stddev": 0, "min": 0, "max": 0}
+                "tokens": {"mean": 0, "stddev": 0, "min": 0, "max": 0},
             }
             continue
@@ -200,7 +202,7 @@ def aggregate_results(results: dict) -> dict:
         run_summary[config] = {
             "pass_rate": calculate_stats(pass_rates),
             "time_seconds": calculate_stats(times),
-            "tokens": calculate_stats(tokens)
+            "tokens": calculate_stats(tokens),
         }
     # Calculate delta between the first two configs (if two exist)
@@ -218,7 +220,7 @@ def aggregate_results(results: dict) -> dict:
     run_summary["delta"] = {
         "pass_rate": f"{delta_pass_rate:+.2f}",
         "time_seconds": f"{delta_time:+.1f}",
-        "tokens": f"{delta_tokens:+.0f}"
+        "tokens": f"{delta_tokens:+.0f}",
     }
     return run_summary
@@ -235,30 +237,28 @@ def generate_benchmark(benchmark_dir: Path, skill_name: str = "", skill_path: st
     runs = []
     for config in results:
         for result in results[config]:
-            runs.append({
-                "eval_id": result["eval_id"],
-                "configuration": config,
-                "run_number": result["run_number"],
-                "result": {
-                    "pass_rate": result["pass_rate"],
-                    "passed": result["passed"],
-                    "failed": result["failed"],
-                    "total": result["total"],
-                    "time_seconds": result["time_seconds"],
-                    "tokens": result.get("tokens", 0),
-                    "tool_calls": result.get("tool_calls", 0),
-                    "errors": result.get("errors", 0)
-                },
-                "expectations": result["expectations"],
-                "notes": result["notes"]
-            })
+            runs.append(
+                {
+                    "eval_id": result["eval_id"],
+                    "configuration": config,
+                    "run_number": result["run_number"],
+                    "result": {
+                        "pass_rate": result["pass_rate"],
+                        "passed": result["passed"],
+                        "failed": result["failed"],
+                        "total": result["total"],
+                        "time_seconds": result["time_seconds"],
+                        "tokens": result.get("tokens", 0),
+                        "tool_calls": result.get("tool_calls", 0),
+                        "errors": result.get("errors", 0),
+                    },
+                    "expectations": result["expectations"],
+                    "notes": result["notes"],
+                }
+            )
     # Determine eval IDs from results
-    eval_ids = sorted(set(
-        r["eval_id"]
-        for config in results.values()
-        for r in config
-    ))
+    eval_ids = sorted(set(r["eval_id"] for config in results.values() for r in config))
     benchmark = {
         "metadata": {
@@ -268,11 +268,11 @@ def generate_benchmark(benchmark_dir: Path, skill_name: str = "", skill_path: st
             "analyzer_model": "<model-name>",
             "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
             "evals_run": eval_ids,
-            "runs_per_configuration": 3
+            "runs_per_configuration": 3,
         },
         "runs": runs,
         "run_summary": run_summary,
-        "notes": []  # To be filled by analyzer
+        "notes": [],  # To be filled by analyzer
     }
     return benchmark
@@ -310,25 +310,27 @@ def generate_markdown(benchmark: dict) -> str:
     # Format pass rate
     a_pr = a_summary.get("pass_rate", {})
     b_pr = b_summary.get("pass_rate", {})
-    lines.append(f"| Pass Rate | {a_pr.get('mean', 0)*100:.0f}% ± {a_pr.get('stddev', 0)*100:.0f}% | {b_pr.get('mean', 0)*100:.0f}% ± {b_pr.get('stddev', 0)*100:.0f}% | {delta.get('pass_rate', '—')} |")
+    lines.append(
+        f"| Pass Rate | {a_pr.get('mean', 0) * 100:.0f}% ± {a_pr.get('stddev', 0) * 100:.0f}% | {b_pr.get('mean', 0) * 100:.0f}% ± {b_pr.get('stddev', 0) * 100:.0f}% | {delta.get('pass_rate', '—')} |"
+    )
     # Format time
     a_time = a_summary.get("time_seconds", {})
     b_time = b_summary.get("time_seconds", {})
-    lines.append(f"| Time | {a_time.get('mean', 0):.1f}s ± {a_time.get('stddev', 0):.1f}s | {b_time.get('mean', 0):.1f}s ± {b_time.get('stddev', 0):.1f}s | {delta.get('time_seconds', '—')}s |")
+    lines.append(
+        f"| Time | {a_time.get('mean', 0):.1f}s ± {a_time.get('stddev', 0):.1f}s | {b_time.get('mean', 0):.1f}s ± {b_time.get('stddev', 0):.1f}s | {delta.get('time_seconds', '—')}s |"
+    )
     # Format tokens
     a_tokens = a_summary.get("tokens", {})
     b_tokens = b_summary.get("tokens", {})
-    lines.append(f"| Tokens | {a_tokens.get('mean', 0):.0f} ± {a_tokens.get('stddev', 0):.0f} | {b_tokens.get('mean', 0):.0f} ± {b_tokens.get('stddev', 0):.0f} | {delta.get('tokens', '—')} |")
+    lines.append(
+        f"| Tokens | {a_tokens.get('mean', 0):.0f} ± {a_tokens.get('stddev', 0):.0f} | {b_tokens.get('mean', 0):.0f} ± {b_tokens.get('stddev', 0):.0f} | {delta.get('tokens', '—')} |"
+    )
     # Notes section
     if benchmark.get("notes"):
-        lines.extend([
-            "",
-            "## Notes",
-            ""
-        ])
+        lines.extend(["", "## Notes", ""])
         for note in benchmark["notes"]:
             lines.append(f"- {note}")
@@ -336,28 +338,12 @@ def generate_markdown(benchmark: dict) -> str:
 def main():
-    parser = argparse.ArgumentParser(
-        description="Aggregate benchmark run results into summary statistics"
-    )
-    parser.add_argument(
-        "benchmark_dir",
-        type=Path,
-        help="Path to the benchmark directory"
-    )
-    parser.add_argument(
-        "--skill-name",
-        default="",
-        help="Name of the skill being benchmarked"
-    )
-    parser.add_argument(
-        "--skill-path",
-        default="",
-        help="Path to the skill being benchmarked"
-    )
+    parser = argparse.ArgumentParser(description="Aggregate benchmark run results into summary statistics")
+    parser.add_argument("benchmark_dir", type=Path, help="Path to the benchmark directory")
+    parser.add_argument("--skill-name", default="", help="Name of the skill being benchmarked")
+    parser.add_argument("--skill-path", default="", help="Path to the skill being benchmarked")
     parser.add_argument(
-        "--output", "-o",
-        type=Path,
-        help="Output path for benchmark.json (default: <benchmark_dir>/benchmark.json)"
+        "--output", "-o", type=Path, help="Output path for benchmark.json (default: <benchmark_dir>/benchmark.json)"
     )
     args = parser.parse_args()
@@ -389,11 +375,11 @@ def main():
     configs = [k for k in run_summary if k != "delta"]
     delta = run_summary.get("delta", {})
-    print(f"\nSummary:")
+    print("\nSummary:")
     for config in configs:
         pr = run_summary[config]["pass_rate"]["mean"]
         label = config.replace("_", " ").title()
-        print(f"  {label}: {pr*100:.1f}% pass rate")
+        print(f"  {label}: {pr * 100:.1f}% pass rate")
     print(f"  Delta:         {delta.get('pass_rate', '—')}")

package/skills/skill-creator/scripts/generate_report.py CHANGED Viewed

@@ -16,7 +16,7 @@ from pathlib import Path
 def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "") -> str:
     """Generate HTML report from loop output data. If auto_refresh is True, adds a meta refresh tag."""
     history = data.get("history", [])
-    holdout = data.get("holdout", 0)
+    data.get("holdout", 0)
     title_prefix = html.escape(skill_name + " \u2014 ") if skill_name else ""
     # Get all unique queries from train and test sets, with should_trigger info
@@ -31,11 +31,16 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     refresh_tag = '    <meta http-equiv="refresh" content="5">\n' if auto_refresh else ""
-    html_parts = ["""<!DOCTYPE html>
+    html_parts = [
+        """<!DOCTYPE html>
 <html>
 <head>
     <meta charset="utf-8">
-""" + refresh_tag + """    <title>""" + title_prefix + """Skill Description Optimization</title>
+"""
+        + refresh_tag
+        + """    <title>"""
+        + title_prefix
+        + """Skill Description Optimization</title>
     <link rel="preconnect" href="https://fonts.googleapis.com">
     <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
     <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
@@ -146,21 +151,24 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     </style>
 </head>
 <body>
-    <h1>""" + title_prefix + """Skill Description Optimization</h1>
+    <h1>"""
+        + title_prefix
+        + """Skill Description Optimization</h1>
     <div class="explainer">
         <strong>Optimizing your skill's description.</strong> This page updates automatically as Claude tests different versions of your skill's description. Each row is an iteration — a new description attempt. The columns show test queries: green checkmarks mean the skill triggered correctly (or correctly didn't trigger), red crosses mean it got it wrong. The "Train" score shows performance on queries used to improve the description; the "Test" score shows performance on held-out queries the optimizer hasn't seen. When it's done, Claude will apply the best-performing description to your skill.
     </div>
-"""]
+"""
+    ]
     # Summary section
-    best_test_score = data.get('best_test_score')
-    best_train_score = data.get('best_train_score')
+    best_test_score = data.get("best_test_score")
+    data.get("best_train_score")
     html_parts.append(f"""
     <div class="summary">
-        <p><strong>Original:</strong> {html.escape(data.get('original_description', 'N/A'))}</p>
-        <p class="best"><strong>Best:</strong> {html.escape(data.get('best_description', 'N/A'))}</p>
-        <p><strong>Best Score:</strong> {data.get('best_score', 'N/A')} {'(test)' if best_test_score else '(train)'}</p>
-        <p><strong>Iterations:</strong> {data.get('iterations_run', 0)} | <strong>Train:</strong> {data.get('train_size', '?')} | <strong>Test:</strong> {data.get('test_size', '?')}</p>
+        <p><strong>Original:</strong> {html.escape(data.get("original_description", "N/A"))}</p>
+        <p class="best"><strong>Best:</strong> {html.escape(data.get("best_description", "N/A"))}</p>
+        <p><strong>Best Score:</strong> {data.get("best_score", "N/A")} {"(test)" if best_test_score else "(train)"}</p>
+        <p><strong>Iterations:</strong> {data.get("iterations_run", 0)} | <strong>Train:</strong> {data.get("train_size", "?")} | <strong>Test:</strong> {data.get("test_size", "?")}</p>
     </div>
 """)
@@ -211,10 +219,10 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     # Add rows for each iteration
     for h in history:
         iteration = h.get("iteration", "?")
-        train_passed = h.get("train_passed", h.get("passed", 0))
-        train_total = h.get("train_total", h.get("total", 0))
-        test_passed = h.get("test_passed")
-        test_total = h.get("test_total")
+        h.get("train_passed", h.get("passed", 0))
+        h.get("train_total", h.get("total", 0))
+        h.get("test_passed")
+        h.get("test_total")
         description = h.get("description", "")
         train_results = h.get("train_results", h.get("results", []))
         test_results = h.get("test_results", [])
@@ -272,7 +280,9 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
             icon = "✓" if did_pass else "✗"
             css_class = "pass" if did_pass else "fail"
-            html_parts.append(f'                <td class="result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
+            html_parts.append(
+                f'                <td class="result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n'
+            )
         # Add result for each test query (with different background)
         for qinfo in test_queries:
@@ -284,7 +294,9 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
             icon = "✓" if did_pass else "✗"
             css_class = "pass" if did_pass else "fail"
-            html_parts.append(f'                <td class="result test-result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
+            html_parts.append(
+                f'                <td class="result test-result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n'
+            )
         html_parts.append("            </tr>\n")

package/skills/skill-creator/scripts/improve_description.py CHANGED Viewed

@@ -41,9 +41,7 @@ def _call_claude(prompt: str, model: str | None, timeout: int = 300) -> str:
         timeout=timeout,
     )
     if result.returncode != 0:
-        raise RuntimeError(
-            f"claude -p exited {result.returncode}\nstderr: {result.stderr}"
-        )
+        raise RuntimeError(f"claude -p exited {result.returncode}\nstderr: {result.stderr}")
     return result.stdout
@@ -59,14 +57,8 @@ def improve_description(
     iteration: int | None = None,
 ) -> str:
     """Call Claude to improve the description based on eval results."""
-    failed_triggers = [
-        r for r in eval_results["results"]
-        if r["should_trigger"] and not r["pass"]
-    ]
-    false_triggers = [
-        r for r in eval_results["results"]
-        if not r["should_trigger"] and not r["pass"]
-    ]
+    failed_triggers = [r for r in eval_results["results"] if r["should_trigger"] and not r["pass"]]
+    false_triggers = [r for r in eval_results["results"] if not r["should_trigger"] and not r["pass"]]
     # Build scores summary
     train_score = f"{eval_results['summary']['passed']}/{eval_results['summary']['total']}"
@@ -104,9 +96,11 @@ Current scores ({scores_summary}):
         prompt += "PREVIOUS ATTEMPTS (do NOT repeat these — try something structurally different):\n\n"
         for h in history:
             train_s = f"{h.get('train_passed', h.get('passed', 0))}/{h.get('train_total', h.get('total', 0))}"
-            test_s = f"{h.get('test_passed', '?')}/{h.get('test_total', '?')}" if h.get('test_passed') is not None else None
+            test_s = (
+                f"{h.get('test_passed', '?')}/{h.get('test_total', '?')}" if h.get("test_passed") is not None else None
+            )
             score_str = f"train={train_s}" + (f", test={test_s}" if test_s else "")
-            prompt += f'<attempt {score_str}>\n'
+            prompt += f"<attempt {score_str}>\n"
             prompt += f'Description: "{h["description"]}"\n'
             if "results" in h:
                 prompt += "Train results:\n"
@@ -114,7 +108,7 @@ Current scores ({scores_summary}):
                     status = "PASS" if r["pass"] else "FAIL"
                     prompt += f'  [{status}] "{r["query"][:80]}" (triggered {r["triggers"]}/{r["runs"]})\n'
             if h.get("note"):
-                prompt += f'Note: {h["note"]}\n'
+                prompt += f"Note: {h['note']}\n"
             prompt += "</attempt>\n\n"
     prompt += f"""</scores_summary>
@@ -232,13 +226,16 @@ def main():
     # Output as JSON with both the new description and updated history
     output = {
         "description": new_description,
-        "history": history + [{
-            "description": current_description,
-            "passed": eval_results["summary"]["passed"],
-            "failed": eval_results["summary"]["failed"],
-            "total": eval_results["summary"]["total"],
-            "results": eval_results["results"],
-        }],
+        "history": history
+        + [
+            {
+                "description": current_description,
+                "passed": eval_results["summary"]["passed"],
+                "failed": eval_results["summary"]["failed"],
+                "total": eval_results["summary"]["total"],
+                "results": eval_results["results"],
+            }
+        ],
     }
     print(json.dumps(output, indent=2))

package/skills/skill-creator/scripts/package_skill.py CHANGED Viewed

@@ -88,9 +88,9 @@ def package_skill(skill_path, output_dir=None):
     # Create the .skill file (zip format)
     try:
-        with zipfile.ZipFile(skill_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
+        with zipfile.ZipFile(skill_filename, "w", zipfile.ZIP_DEFLATED) as zipf:
             # Walk through the skill directory, excluding build artifacts
-            for file_path in skill_path.rglob('*'):
+            for file_path in skill_path.rglob("*"):
                 if not file_path.is_file():
                     continue
                 arcname = file_path.relative_to(skill_path.parent)

package/skills/skill-creator/scripts/quick_validate.py CHANGED Viewed

@@ -4,27 +4,27 @@ Quick validation script for skills - minimal version
 """
 import sys
-import os
 import re
 import yaml
 from pathlib import Path
 def validate_skill(skill_path):
     """Basic validation of a skill"""
     skill_path = Path(skill_path)
     # Check SKILL.md exists
-    skill_md = skill_path / 'SKILL.md'
+    skill_md = skill_path / "SKILL.md"
     if not skill_md.exists():
         return False, "SKILL.md not found"
     # Read and validate frontmatter
     content = skill_md.read_text()
-    if not content.startswith('---'):
+    if not content.startswith("---"):
         return False, "No YAML frontmatter found"
     # Extract frontmatter
-    match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
+    match = re.match(r"^---\n(.*?)\n---", content, re.DOTALL)
     if not match:
         return False, "Invalid frontmatter format"
@@ -39,7 +39,7 @@ def validate_skill(skill_path):
         return False, f"Invalid YAML in frontmatter: {e}"
     # Define allowed properties
-    ALLOWED_PROPERTIES = {'name', 'description', 'license', 'allowed-tools', 'metadata', 'compatibility'}
+    ALLOWED_PROPERTIES = {"name", "description", "license", "allowed-tools", "metadata", "compatibility"}
     # Check for unexpected properties (excluding nested keys under metadata)
     unexpected_keys = set(frontmatter.keys()) - ALLOWED_PROPERTIES
@@ -50,41 +50,41 @@ def validate_skill(skill_path):
         )
     # Check required fields
-    if 'name' not in frontmatter:
+    if "name" not in frontmatter:
         return False, "Missing 'name' in frontmatter"
-    if 'description' not in frontmatter:
+    if "description" not in frontmatter:
         return False, "Missing 'description' in frontmatter"
     # Extract name for validation
-    name = frontmatter.get('name', '')
+    name = frontmatter.get("name", "")
     if not isinstance(name, str):
         return False, f"Name must be a string, got {type(name).__name__}"
     name = name.strip()
     if name:
         # Check naming convention (kebab-case: lowercase with hyphens)
-        if not re.match(r'^[a-z0-9-]+$', name):
+        if not re.match(r"^[a-z0-9-]+$", name):
             return False, f"Name '{name}' should be kebab-case (lowercase letters, digits, and hyphens only)"
-        if name.startswith('-') or name.endswith('-') or '--' in name:
+        if name.startswith("-") or name.endswith("-") or "--" in name:
             return False, f"Name '{name}' cannot start/end with hyphen or contain consecutive hyphens"
         # Check name length (max 64 characters per spec)
         if len(name) > 64:
             return False, f"Name is too long ({len(name)} characters). Maximum is 64 characters."
     # Extract and validate description
-    description = frontmatter.get('description', '')
+    description = frontmatter.get("description", "")
     if not isinstance(description, str):
         return False, f"Description must be a string, got {type(description).__name__}"
     description = description.strip()
     if description:
         # Check for angle brackets
-        if '<' in description or '>' in description:
+        if "<" in description or ">" in description:
             return False, "Description cannot contain angle brackets (< or >)"
         # Check description length (max 1024 characters per spec)
         if len(description) > 1024:
             return False, f"Description is too long ({len(description)} characters). Maximum is 1024 characters."
     # Validate compatibility field if present (optional)
-    compatibility = frontmatter.get('compatibility', '')
+    compatibility = frontmatter.get("compatibility", "")
     if compatibility:
         if not isinstance(compatibility, str):
             return False, f"Compatibility must be a string, got {type(compatibility).__name__}"
@@ -93,11 +93,12 @@ def validate_skill(skill_path):
     return True, "Skill is valid!"
 if __name__ == "__main__":
     if len(sys.argv) != 2:
         print("Usage: python quick_validate.py <skill_directory>")
         sys.exit(1)
     valid, message = validate_skill(sys.argv[1])
     print(message)
-    sys.exit(0 if valid else 1)
+    sys.exit(0 if valid else 1)

package/skills/skill-creator/scripts/run_eval.py CHANGED Viewed

@@ -101,8 +101,10 @@ def run_single_query(
         cmd = [
             "claude",
-            "-p", query,
-            "--output-format", "stream-json",
+            "-p",
+            query,
+            "--output-format",
+            "stream-json",
             "--verbose",
             "--include-partial-messages",
         ]
@@ -265,14 +267,16 @@ def run_eval(
             did_pass = trigger_rate >= trigger_threshold
         else:
             did_pass = trigger_rate < trigger_threshold
-        results.append({
-            "query": query,
-            "should_trigger": should_trigger,
-            "trigger_rate": trigger_rate,
-            "triggers": sum(triggers),
-            "runs": len(triggers),
-            "pass": did_pass,
-        })
+        results.append(
+            {
+                "query": query,
+                "should_trigger": should_trigger,
+                "trigger_rate": trigger_rate,
+                "triggers": sum(triggers),
+                "runs": len(triggers),
+                "pass": did_pass,
+            }
+        )
     passed = sum(1 for r in results if r["pass"])
     total = len(results)