npm - @intentsolutionsio/skill-creator - Versions diffs - 5.0.0 → 5.0.3 - Mend

@intentsolutionsio/skill-creator 5.0.0 → 5.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/skills/skill-creator/references/source-of-truth.md CHANGED Viewed

@@ -3,6 +3,7 @@
 Canonical reference from [Anthropic docs](https://code.claude.com/docs/en/skills). Last synced: 2026-03-21.
 Additional references:
 - **Claude Code Extensions** — platform-specific fields ([changelog](https://code.claude.com/docs/en/changelog))
 - **Anthropic Engineering Blog** — progressive disclosure, degrees of freedom
 - **anthropics/skills** — [github.com/anthropics/skills](https://github.com/anthropics/skills) (official skill-creator reference implementation)
@@ -100,6 +101,7 @@ tags: [devops, automation]
 MCP tools use `ServerName:tool_name` format.
 Bash scoping patterns:
 ```yaml
 Bash(git:*)       # All git commands
 Bash(npm:*)       # All npm commands
@@ -141,6 +143,7 @@ Agents live in `agents/*.md` and use a different frontmatter schema than skills.
 ### Plugin Agent Restrictions
 When agents are distributed inside plugins, these fields are NOT supported:
 - `hooks`
 - `mcpServers`
 - `permissionMode`
@@ -222,17 +225,20 @@ skill-name/
 Skills use progressive disclosure to minimize context window usage.
 ### Level 1: Metadata (~100 tokens)
 - Frontmatter `name` and `description` only
 - Always loaded at startup for all installed skills
 - Aggregated into skill list in Claude's system prompt
 - **Budget**: ~2% of context window (configurable via `SLASH_COMMAND_TOOL_CHAR_BUDGET`)
 ### Level 2: SKILL.md Body (<5000 tokens / <500 lines)
 - Full instruction body loaded when skill activates
 - Contains workflow steps, examples, edge cases
 - Keep concise — Claude is already capable
 ### Level 3: Bundled Resources (unlimited)
 - `references/`, `scripts/`, `templates/`, `assets/`
 - Loaded only when explicitly needed during execution
 - Use clear section headers for navigability
@@ -307,7 +313,9 @@ description: "A helpful tool for documents"
 ## 6. Core Principles (Anthropic Official)
 ### Concise is Key
 Claude is already smart. Don't over-explain. Provide:
 - Clear workflow steps
 - Concrete examples
 - Edge cases that matter
@@ -343,6 +351,7 @@ Choose the right level. Over-constraining wastes tokens and fights Claude's capa
 ### Checklist Workflow Pattern
 For complex skills, structure the body as a checklist that Claude works through sequentially. Each item should be:
 - A concrete action (not a vague instruction)
 - Independently verifiable (Claude can confirm it's done)
 - Ordered by dependency (prerequisites first)
@@ -352,15 +361,17 @@ This pattern reduces skipped steps and improves consistency across models (Haiku
 ### Observation of Claude Navigation
 Claude navigates SKILL.md and references differently than humans:
 - **Reads top-down on first activation** — front-load the most important instructions
 - **Searches by heading** when returning to a section — use descriptive H2/H3 headers
-- **Follows markdown links eagerly** — a `[reference](./references/foo.md)` link will trigger a Read tool call
+- **Follows markdown links eagerly** — a `reference` link will trigger a Read tool call
 - **Skips content after long code blocks** — keep code examples short, move long ones to references
 - **Loses context in long files** — the 500-line limit exists because Claude's attention degrades past it
 ### Team Feedback
 When multiple authors maintain skills in a shared plugin:
 - Establish a shared glossary of terms used in descriptions (prevents synonym drift)
 - Use PR review checklists that include trigger-eval accuracy checks
 - Rotate skill ownership periodically to catch assumptions baked into instructions
@@ -369,16 +380,19 @@ When multiple authors maintain skills in a shared plugin:
 ### Description Optimization ("Pushy" Pattern)
 Skills frequently undertrigger because descriptions are too passive. Use aggressive claiming language:
 - "Make sure to use this skill whenever..." + specific scenarios
 - Front-load distinctive keywords
 - Include trigger phrases: "Use when...", "Activates for..."
 - Token budget: all descriptions load at startup (~15,000 char total via `SLASH_COMMAND_TOOL_CHAR_BUDGET`)
 ### No Time-Sensitive Information
 - Don't include dates, versions, or URLs that change
 - Reference tools by name, not version
 ### Consistent Terminology
 - Pick terms and stick with them throughout
 - Don't alternate between synonyms
 - Match terminology to the domain
@@ -445,7 +459,7 @@ Available in SKILL.md body for dynamic content:
 | `${CLAUDE_PLUGIN_ROOT}` | Hooks, plugin-level | Resolves to plugin root directory |
 | `${CLAUDE_PLUGIN_DATA}` | Persistent state | Survives updates/reinstalls (v2.1.78+) |
-Relative markdown links (`[API Reference](reference.md)`) work without path variables — Claude follows these with the Read tool on demand.
+Relative markdown links (`API Reference`) work without path variables — Claude follows these with the Read tool on demand.
 ### Usage Examples
@@ -488,59 +502,83 @@ Session tracking: ${CLAUDE_SESSION_ID}
 ## 9. Skill Patterns
 ### Script Automation
 Deterministic scripts that solve specific problems.
 ```
 skill activates -> runs script -> returns result
 ```
 Best for: file conversion, data transformation, API calls.
 ### Read-Process-Write
 Format conversion and transformation pipeline.
 ```
 read input -> process/transform -> write output
 ```
 Best for: document conversion, code generation, data formatting.
 ### Search-Analyze-Report
 Codebase analysis and reporting.
 ```
 search codebase -> analyze findings -> generate report
 ```
 Best for: code review, security audit, dependency analysis.
 ### Template-Based Generation
 Generate output from templates with variable substitution.
 ```
 load template -> fill variables -> validate -> output
 ```
 Best for: boilerplate generation, project scaffolding, config files.
 ### Wizard-Style Workflow
 Interactive multi-step gathering with AskUserQuestion.
 ```
 ask question -> gather input -> ask more -> generate result
 ```
 Best for: complex configuration, multi-option setup.
 ### Conditional Workflow
 Branch based on input or context.
 ```
 analyze input -> choose path -> execute branch -> output
 ```
 Best for: skills that handle multiple related tasks.
 ### Plan-Validate-Execute
 Verifiable intermediates with feedback loops.
 ```
 plan steps -> validate plan -> execute -> verify each step -> report
 ```
 Best for: deployment, migration, refactoring tasks.
 ### Visual Output Generation
 Generate HTML or visual artifacts.
 ```
 gather data -> generate HTML -> render preview
 ```
 Best for: dashboards, reports, documentation sites.
 ---

package/skills/skill-creator/references/validation-rules.md CHANGED Viewed

@@ -1,4 +1,5 @@
 # Skill & Plugin Validation Rules
 Sources: [Anthropic docs](https://code.claude.com/docs/en/skills) · Intent Solutions enterprise policy
 Universal validation aligned with the Anthropic 2026 spec. Two tiers: Standard (Anthropic minimum) and Enterprise (our marketplace default — all fields required, zero tolerance for non-standard fields).
@@ -57,6 +58,7 @@ Body must contain all 7 sections (hard ERROR if any missing):
 ```
 Supporting files required (gold standard):
 - `PRD.md` must exist in skill root — Product Requirements Document
 - `ARD.md` must exist in skill root — Architecture Requirements Document
 - `references/` directory must exist (plural directory, NOT `reference.md` singular)
@@ -133,11 +135,13 @@ capabilities: []                  # NOTE: valid for agents ONLY, not skills
 ```
 **Plugin agents CANNOT use** (WARN if present):
 - `hooks` — plugin-level only, not agent-level
 - `mcpServers` — plugin-level only
 - `permissionMode` — standalone agent only, not plugin-scoped
 **Invalid for agents** (ERROR):
 - `expertise_level`, `activation_priority`, `color`, `activation_triggers`, `type`, `category` — invented, not Anthropic
 ---
@@ -237,6 +241,7 @@ Plus MCP tools in `ServerName:tool_name` format.
 | Enterprise | Error |
 Valid scoped patterns:
 ```
 Bash(git:*)
 Bash(npm:*)
@@ -284,6 +289,7 @@ Validate MCP server configuration structure.
 ### 7. Roll Up Plugin Score
 Plugin score = weighted average of component scores:
 - Skills: 50% weight
 - Agents: 20% weight
 - Commands: 15% weight
@@ -311,11 +317,13 @@ Anthropic defines 14 valid fields for agents. `name` and `description` are REQUI
 ### Context-Aware Rules
 **Plugin agents** (`plugins/*/agents/*.md`):
 - WARN if `hooks` present (hooks belong at plugin level, not agent level)
 - WARN if `mcpServers` present (plugin-level concern)
 - WARN if `permissionMode` present (standalone-only field)
 **Standalone agents** (`~/.claude/agents/*.md`):
 - All fields valid without restriction
 ### Invalid Agent Fields (ERROR)
@@ -409,6 +417,7 @@ The command runs at skill activation time. Output is injected verbatim into the
 ## String Substitution Validation
 If SKILL.md body contains `$ARGUMENTS` or `$0`, `$1`, etc.:
 - `argument-hint` SHOULD be set in frontmatter (WARNING if missing)
 - Instructions SHOULD handle empty `$ARGUMENTS` case
 - `$ARGUMENTS[N]` indexing should be sequential from 0
@@ -420,12 +429,14 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 ## Validation Process
 ### Pre-flight
 1. File exists and is readable
 2. YAML frontmatter parses without error
 3. Frontmatter separator (`---`) present at start and end
 4. No non-standard fields present (ERROR on any invented/deprecated field)
 ### Field Validation
 1. All 8 required fields present (enterprise) or 2 required fields (standard)
 2. Field types correct (string, array, boolean, semver)
 3. Field constraints met (kebab-case, SPDX, valid tool names)
@@ -434,6 +445,7 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 6. Conditional field logic (`context` requires `agent` and vice versa)
 ### Body Validation
 1. Length within limits (301-500 = WARNING, >500 = ERROR)
 2. All 7 required sections present (enterprise) — hard ERROR if any missing
 3. No absolute paths outside code blocks
@@ -442,15 +454,17 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 6. `references/` directory exists (enterprise)
 ### Resource Validation
 1. All `${CLAUDE_SKILL_DIR}/scripts/*` references exist
 2. All `${CLAUDE_SKILL_DIR}/references/*` references exist
 3. All `${CLAUDE_SKILL_DIR}/templates/*` references exist
 4. All `${CLAUDE_SKILL_DIR}/assets/*` references exist
-5. Relative markdown links (e.g., `[ref](references/api.md)`) point to existing files
+5. Relative markdown links (e.g., `ref`) point to existing files
 6. No path escape attempts (`../`)
 7. No empty (0-byte) supporting files (stub detection)
 ### Report
 - Errors: Must fix (blocks pass)
 - Warnings: Should fix (does not block pass)
 - Info: Optional improvements (includes structural advisor suggestions)
@@ -465,21 +479,25 @@ Also recognized: `${CLAUDE_SESSION_ID}` — current session identifier (Anthropi
 INFO-level suggestions emitted after grading. Not scored — purely advisory.
 ### Split to Commands
 - **Trigger**: 3+ kebab-case `## operation-name` sections without `commands/` directory
 - **Suggestion**: Split into individual `commands/*.md` files
 - **Why**: Each operation becomes a separate slash command; skill stays lean
 ### Offload to References
 - **Trigger**: Body sections >20 lines (Output, Error Handling, Examples) without `references/`
 - **Suggestion**: Move to `references/section-name.md` with relative markdown link
 - **Why**: Reduces token footprint; Claude reads on demand
 ### DCI Opportunities
 - **Trigger**: File existence checks, git operations, or tool version detection without DCI
 - **Suggestion**: Add `` !`command` `` directives for auto-detection at activation
 - **Why**: Eliminates discovery tool calls; Claude starts with context pre-loaded
 ### Migrate Commands to Skills
 - **Trigger**: `commands/*.md` files present without corresponding `skills/` entries
 - **Suggestion**: Consider migrating to SKILL.md format for auto-activation
 - **Why**: Skills activate automatically on context; commands require explicit `/name` invocation

package/skills/skill-creator/scripts/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file

package/skills/skill-creator/scripts/__pycache__/run_eval.cpython-312.pyc ADDED Viewed

Binary file

package/skills/skill-creator/scripts/__pycache__/utils.cpython-312.pyc ADDED Viewed

Binary file

package/skills/skill-creator/scripts/aggregate_benchmark.py CHANGED Viewed

@@ -60,7 +60,7 @@ def calculate_stats(values: list[float]) -> dict:
         "mean": round(mean, 4),
         "stddev": round(stddev, 4),
         "min": round(min(values), 4),
-        "max": round(max(values), 4)
+        "max": round(max(values), 4),
     }
@@ -157,7 +157,9 @@ def load_run_results(benchmark_dir: Path) -> dict:
                 raw_expectations = grading.get("expectations", [])
                 for exp in raw_expectations:
                     if "text" not in exp or "passed" not in exp:
-                        print(f"Warning: expectation in {grading_file} missing required fields (text, passed, evidence): {exp}")
+                        print(
+                            f"Warning: expectation in {grading_file} missing required fields (text, passed, evidence): {exp}"
+                        )
                 result["expectations"] = raw_expectations
                 # Extract notes from user_notes_summary
@@ -189,7 +191,7 @@ def aggregate_results(results: dict) -> dict:
             run_summary[config] = {
                 "pass_rate": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
                 "time_seconds": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
-                "tokens": {"mean": 0, "stddev": 0, "min": 0, "max": 0}
+                "tokens": {"mean": 0, "stddev": 0, "min": 0, "max": 0},
             }
             continue
@@ -200,7 +202,7 @@ def aggregate_results(results: dict) -> dict:
         run_summary[config] = {
             "pass_rate": calculate_stats(pass_rates),
             "time_seconds": calculate_stats(times),
-            "tokens": calculate_stats(tokens)
+            "tokens": calculate_stats(tokens),
         }
     # Calculate delta between the first two configs (if two exist)
@@ -218,7 +220,7 @@ def aggregate_results(results: dict) -> dict:
     run_summary["delta"] = {
         "pass_rate": f"{delta_pass_rate:+.2f}",
         "time_seconds": f"{delta_time:+.1f}",
-        "tokens": f"{delta_tokens:+.0f}"
+        "tokens": f"{delta_tokens:+.0f}",
     }
     return run_summary
@@ -235,30 +237,28 @@ def generate_benchmark(benchmark_dir: Path, skill_name: str = "", skill_path: st
     runs = []
     for config in results:
         for result in results[config]:
-            runs.append({
-                "eval_id": result["eval_id"],
-                "configuration": config,
-                "run_number": result["run_number"],
-                "result": {
-                    "pass_rate": result["pass_rate"],
-                    "passed": result["passed"],
-                    "failed": result["failed"],
-                    "total": result["total"],
-                    "time_seconds": result["time_seconds"],
-                    "tokens": result.get("tokens", 0),
-                    "tool_calls": result.get("tool_calls", 0),
-                    "errors": result.get("errors", 0)
-                },
-                "expectations": result["expectations"],
-                "notes": result["notes"]
-            })
+            runs.append(
+                {
+                    "eval_id": result["eval_id"],
+                    "configuration": config,
+                    "run_number": result["run_number"],
+                    "result": {
+                        "pass_rate": result["pass_rate"],
+                        "passed": result["passed"],
+                        "failed": result["failed"],
+                        "total": result["total"],
+                        "time_seconds": result["time_seconds"],
+                        "tokens": result.get("tokens", 0),
+                        "tool_calls": result.get("tool_calls", 0),
+                        "errors": result.get("errors", 0),
+                    },
+                    "expectations": result["expectations"],
+                    "notes": result["notes"],
+                }
+            )
     # Determine eval IDs from results
-    eval_ids = sorted(set(
-        r["eval_id"]
-        for config in results.values()
-        for r in config
-    ))
+    eval_ids = sorted(set(r["eval_id"] for config in results.values() for r in config))
     benchmark = {
         "metadata": {
@@ -268,11 +268,11 @@ def generate_benchmark(benchmark_dir: Path, skill_name: str = "", skill_path: st
             "analyzer_model": "<model-name>",
             "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
             "evals_run": eval_ids,
-            "runs_per_configuration": 3
+            "runs_per_configuration": 3,
         },
         "runs": runs,
         "run_summary": run_summary,
-        "notes": []  # To be filled by analyzer
+        "notes": [],  # To be filled by analyzer
     }
     return benchmark
@@ -310,25 +310,27 @@ def generate_markdown(benchmark: dict) -> str:
     # Format pass rate
     a_pr = a_summary.get("pass_rate", {})
     b_pr = b_summary.get("pass_rate", {})
-    lines.append(f"| Pass Rate | {a_pr.get('mean', 0)*100:.0f}% ± {a_pr.get('stddev', 0)*100:.0f}% | {b_pr.get('mean', 0)*100:.0f}% ± {b_pr.get('stddev', 0)*100:.0f}% | {delta.get('pass_rate', '—')} |")
+    lines.append(
+        f"| Pass Rate | {a_pr.get('mean', 0) * 100:.0f}% ± {a_pr.get('stddev', 0) * 100:.0f}% | {b_pr.get('mean', 0) * 100:.0f}% ± {b_pr.get('stddev', 0) * 100:.0f}% | {delta.get('pass_rate', '—')} |"
+    )
     # Format time
     a_time = a_summary.get("time_seconds", {})
     b_time = b_summary.get("time_seconds", {})
-    lines.append(f"| Time | {a_time.get('mean', 0):.1f}s ± {a_time.get('stddev', 0):.1f}s | {b_time.get('mean', 0):.1f}s ± {b_time.get('stddev', 0):.1f}s | {delta.get('time_seconds', '—')}s |")
+    lines.append(
+        f"| Time | {a_time.get('mean', 0):.1f}s ± {a_time.get('stddev', 0):.1f}s | {b_time.get('mean', 0):.1f}s ± {b_time.get('stddev', 0):.1f}s | {delta.get('time_seconds', '—')}s |"
+    )
     # Format tokens
     a_tokens = a_summary.get("tokens", {})
     b_tokens = b_summary.get("tokens", {})
-    lines.append(f"| Tokens | {a_tokens.get('mean', 0):.0f} ± {a_tokens.get('stddev', 0):.0f} | {b_tokens.get('mean', 0):.0f} ± {b_tokens.get('stddev', 0):.0f} | {delta.get('tokens', '—')} |")
+    lines.append(
+        f"| Tokens | {a_tokens.get('mean', 0):.0f} ± {a_tokens.get('stddev', 0):.0f} | {b_tokens.get('mean', 0):.0f} ± {b_tokens.get('stddev', 0):.0f} | {delta.get('tokens', '—')} |"
+    )
     # Notes section
     if benchmark.get("notes"):
-        lines.extend([
-            "",
-            "## Notes",
-            ""
-        ])
+        lines.extend(["", "## Notes", ""])
         for note in benchmark["notes"]:
             lines.append(f"- {note}")
@@ -336,28 +338,12 @@ def generate_markdown(benchmark: dict) -> str:
 def main():
-    parser = argparse.ArgumentParser(
-        description="Aggregate benchmark run results into summary statistics"
-    )
-    parser.add_argument(
-        "benchmark_dir",
-        type=Path,
-        help="Path to the benchmark directory"
-    )
-    parser.add_argument(
-        "--skill-name",
-        default="",
-        help="Name of the skill being benchmarked"
-    )
-    parser.add_argument(
-        "--skill-path",
-        default="",
-        help="Path to the skill being benchmarked"
-    )
+    parser = argparse.ArgumentParser(description="Aggregate benchmark run results into summary statistics")
+    parser.add_argument("benchmark_dir", type=Path, help="Path to the benchmark directory")
+    parser.add_argument("--skill-name", default="", help="Name of the skill being benchmarked")
+    parser.add_argument("--skill-path", default="", help="Path to the skill being benchmarked")
     parser.add_argument(
-        "--output", "-o",
-        type=Path,
-        help="Output path for benchmark.json (default: <benchmark_dir>/benchmark.json)"
+        "--output", "-o", type=Path, help="Output path for benchmark.json (default: <benchmark_dir>/benchmark.json)"
     )
     args = parser.parse_args()
@@ -389,11 +375,11 @@ def main():
     configs = [k for k in run_summary if k != "delta"]
     delta = run_summary.get("delta", {})
-    print(f"\nSummary:")
+    print("\nSummary:")
     for config in configs:
         pr = run_summary[config]["pass_rate"]["mean"]
         label = config.replace("_", " ").title()
-        print(f"  {label}: {pr*100:.1f}% pass rate")
+        print(f"  {label}: {pr * 100:.1f}% pass rate")
     print(f"  Delta:         {delta.get('pass_rate', '—')}")

package/skills/skill-creator/scripts/generate_report.py CHANGED Viewed

@@ -16,7 +16,7 @@ from pathlib import Path
 def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "") -> str:
     """Generate HTML report from loop output data. If auto_refresh is True, adds a meta refresh tag."""
     history = data.get("history", [])
-    holdout = data.get("holdout", 0)
+    data.get("holdout", 0)
     title_prefix = html.escape(skill_name + " \u2014 ") if skill_name else ""
     # Get all unique queries from train and test sets, with should_trigger info
@@ -31,11 +31,16 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     refresh_tag = '    <meta http-equiv="refresh" content="5">\n' if auto_refresh else ""
-    html_parts = ["""<!DOCTYPE html>
+    html_parts = [
+        """<!DOCTYPE html>
 <html>
 <head>
     <meta charset="utf-8">
-""" + refresh_tag + """    <title>""" + title_prefix + """Skill Description Optimization</title>
+"""
+        + refresh_tag
+        + """    <title>"""
+        + title_prefix
+        + """Skill Description Optimization</title>
     <link rel="preconnect" href="https://fonts.googleapis.com">
     <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
     <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
@@ -146,21 +151,24 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     </style>
 </head>
 <body>
-    <h1>""" + title_prefix + """Skill Description Optimization</h1>
+    <h1>"""
+        + title_prefix
+        + """Skill Description Optimization</h1>
     <div class="explainer">
         <strong>Optimizing your skill's description.</strong> This page updates automatically as Claude tests different versions of your skill's description. Each row is an iteration — a new description attempt. The columns show test queries: green checkmarks mean the skill triggered correctly (or correctly didn't trigger), red crosses mean it got it wrong. The "Train" score shows performance on queries used to improve the description; the "Test" score shows performance on held-out queries the optimizer hasn't seen. When it's done, Claude will apply the best-performing description to your skill.
     </div>
-"""]
+"""
+    ]
     # Summary section
-    best_test_score = data.get('best_test_score')
-    best_train_score = data.get('best_train_score')
+    best_test_score = data.get("best_test_score")
+    data.get("best_train_score")
     html_parts.append(f"""
     <div class="summary">
-        <p><strong>Original:</strong> {html.escape(data.get('original_description', 'N/A'))}</p>
-        <p class="best"><strong>Best:</strong> {html.escape(data.get('best_description', 'N/A'))}</p>
-        <p><strong>Best Score:</strong> {data.get('best_score', 'N/A')} {'(test)' if best_test_score else '(train)'}</p>
-        <p><strong>Iterations:</strong> {data.get('iterations_run', 0)} | <strong>Train:</strong> {data.get('train_size', '?')} | <strong>Test:</strong> {data.get('test_size', '?')}</p>
+        <p><strong>Original:</strong> {html.escape(data.get("original_description", "N/A"))}</p>
+        <p class="best"><strong>Best:</strong> {html.escape(data.get("best_description", "N/A"))}</p>
+        <p><strong>Best Score:</strong> {data.get("best_score", "N/A")} {"(test)" if best_test_score else "(train)"}</p>
+        <p><strong>Iterations:</strong> {data.get("iterations_run", 0)} | <strong>Train:</strong> {data.get("train_size", "?")} | <strong>Test:</strong> {data.get("test_size", "?")}</p>
     </div>
 """)
@@ -211,10 +219,10 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
     # Add rows for each iteration
     for h in history:
         iteration = h.get("iteration", "?")
-        train_passed = h.get("train_passed", h.get("passed", 0))
-        train_total = h.get("train_total", h.get("total", 0))
-        test_passed = h.get("test_passed")
-        test_total = h.get("test_total")
+        h.get("train_passed", h.get("passed", 0))
+        h.get("train_total", h.get("total", 0))
+        h.get("test_passed")
+        h.get("test_total")
         description = h.get("description", "")
         train_results = h.get("train_results", h.get("results", []))
         test_results = h.get("test_results", [])
@@ -272,7 +280,9 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
             icon = "✓" if did_pass else "✗"
             css_class = "pass" if did_pass else "fail"
-            html_parts.append(f'                <td class="result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
+            html_parts.append(
+                f'                <td class="result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n'
+            )
         # Add result for each test query (with different background)
         for qinfo in test_queries:
@@ -284,7 +294,9 @@ def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "")
             icon = "✓" if did_pass else "✗"
             css_class = "pass" if did_pass else "fail"
-            html_parts.append(f'                <td class="result test-result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
+            html_parts.append(
+                f'                <td class="result test-result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n'
+            )
         html_parts.append("            </tr>\n")