npm - aw-ecc - Versions diffs - 1.4.32 → 1.4.47 - Mend

aw-ecc 1.4.32 → 1.4.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (258) hide show

package/.claude-plugin/plugin.json +1 -1
package/.codex/hooks/aw-post-tool-use.sh +8 -2
package/.codex/hooks/aw-session-start.sh +11 -4
package/.codex/hooks/aw-stop.sh +8 -2
package/.codex/hooks/aw-user-prompt-submit.sh +10 -2
package/.codex/hooks.json +8 -8
package/.cursor/INSTALL.md +7 -5
package/.cursor/hooks/adapter.js +41 -4
package/.cursor/hooks/after-agent-response.js +62 -0
package/.cursor/hooks/before-submit-prompt.js +7 -1
package/.cursor/hooks/post-tool-use-failure.js +21 -0
package/.cursor/hooks/post-tool-use.js +39 -0
package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
package/.cursor/hooks/subagent-start.js +22 -4
package/.cursor/hooks/subagent-stop.js +18 -1
package/.cursor/hooks.json +23 -2
package/.opencode/package.json +1 -1
package/AGENTS.md +3 -3
package/README.md +5 -5
package/commands/adk.md +52 -0
package/commands/build.md +22 -9
package/commands/deploy.md +12 -0
package/commands/execute.md +9 -0
package/commands/feature.md +333 -0
package/commands/investigate.md +18 -5
package/commands/plan.md +23 -9
package/commands/publish.md +65 -0
package/commands/review.md +12 -0
package/commands/ship.md +12 -0
package/commands/test.md +12 -0
package/commands/verify.md +9 -0
package/hooks/hooks.json +36 -0
package/manifests/install-components.json +8 -0
package/manifests/install-modules.json +83 -0
package/manifests/install-profiles.json +7 -0
package/package.json +1 -1
package/scripts/ci/validate-rules.js +51 -0
package/scripts/cursor-aw-home/hooks.json +23 -2
package/scripts/cursor-aw-hooks/adapter.js +41 -4
package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
package/scripts/hooks/aw-usage-commit-created.js +32 -0
package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
package/scripts/hooks/aw-usage-session-start.js +48 -0
package/scripts/hooks/aw-usage-stop.js +182 -0
package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
package/scripts/hooks/cost-tracker.js +3 -23
package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
package/scripts/hooks/shared/aw-phase-runner.js +3 -1
package/scripts/lib/aw-hook-contract.js +2 -2
package/scripts/lib/aw-pricing.js +306 -0
package/scripts/lib/aw-usage-telemetry.js +472 -0
package/scripts/lib/codex-hook-config.js +8 -8
package/scripts/lib/cursor-hook-config.js +25 -10
package/scripts/lib/install-targets/cursor-project.js +3 -0
package/scripts/lib/install-targets/helpers.js +20 -3
package/skills/aw-adk/SKILL.md +317 -0
package/skills/aw-adk/agents/analyzer.md +113 -0
package/skills/aw-adk/agents/comparator.md +113 -0
package/skills/aw-adk/agents/grader.md +115 -0
package/skills/aw-adk/assets/eval_review.html +76 -0
package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
package/skills/aw-adk/eval-viewer/viewer.html +181 -0
package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
package/skills/aw-adk/evals/eval-create-agent.md +90 -0
package/skills/aw-adk/evals/eval-create-command.md +98 -0
package/skills/aw-adk/evals/eval-create-eval.md +89 -0
package/skills/aw-adk/evals/eval-create-rule.md +99 -0
package/skills/aw-adk/evals/eval-create-skill.md +97 -0
package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
package/skills/aw-adk/evals/eval-delete-command.md +89 -0
package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
package/skills/aw-adk/evals/evals.json +96 -0
package/skills/aw-adk/references/artifact-wiring.md +162 -0
package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
package/skills/aw-adk/references/eval-placement-guide.md +183 -0
package/skills/aw-adk/references/external-resources.md +75 -0
package/skills/aw-adk/references/getting-started.md +66 -0
package/skills/aw-adk/references/registry-structure.md +152 -0
package/skills/aw-adk/references/rubric-agent.md +36 -0
package/skills/aw-adk/references/rubric-command.md +36 -0
package/skills/aw-adk/references/rubric-eval.md +36 -0
package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
package/skills/aw-adk/references/rubric-rule.md +36 -0
package/skills/aw-adk/references/rubric-skill.md +36 -0
package/skills/aw-adk/references/schemas.md +222 -0
package/skills/aw-adk/references/template-agent.md +251 -0
package/skills/aw-adk/references/template-command.md +279 -0
package/skills/aw-adk/references/template-eval.md +176 -0
package/skills/aw-adk/references/template-rule.md +119 -0
package/skills/aw-adk/references/template-skill.md +123 -0
package/skills/aw-adk/references/type-classifier.md +98 -0
package/skills/aw-adk/references/writing-good-agents.md +227 -0
package/skills/aw-adk/references/writing-good-commands.md +258 -0
package/skills/aw-adk/references/writing-good-evals.md +271 -0
package/skills/aw-adk/references/writing-good-rules.md +214 -0
package/skills/aw-adk/references/writing-good-skills.md +159 -0
package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
package/skills/aw-adk/scripts/score-artifact.sh +179 -0
package/skills/aw-adk/scripts/trigger-eval.py +192 -0
package/skills/aw-build/SKILL.md +19 -2
package/skills/aw-deploy/SKILL.md +65 -3
package/skills/aw-design/SKILL.md +156 -0
package/skills/aw-design/references/highrise-tokens.md +394 -0
package/skills/aw-design/references/micro-interactions.md +76 -0
package/skills/aw-design/references/prompt-template.md +160 -0
package/skills/aw-design/references/quality-checklist.md +70 -0
package/skills/aw-design/references/self-review.md +497 -0
package/skills/aw-design/references/stitch-workflow.md +127 -0
package/skills/aw-feature/SKILL.md +293 -0
package/skills/aw-investigate/SKILL.md +17 -0
package/skills/aw-plan/SKILL.md +34 -3
package/skills/aw-publish/SKILL.md +300 -0
package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
package/skills/aw-publish/evals/eval-push-modes.md +67 -0
package/skills/aw-publish/evals/eval-rules-push.md +60 -0
package/skills/aw-publish/evals/evals.json +29 -0
package/skills/aw-publish/references/push-modes.md +38 -0
package/skills/aw-review/SKILL.md +88 -9
package/skills/aw-rules-review/SKILL.md +124 -0
package/skills/aw-rules-review/agents/openai.yaml +3 -0
package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
package/skills/aw-ship/SKILL.md +16 -0
package/skills/aw-spec/SKILL.md +15 -0
package/skills/aw-tasks/SKILL.md +15 -0
package/skills/aw-test/SKILL.md +16 -0
package/skills/aw-yolo/SKILL.md +4 -0
package/skills/diagnose/SKILL.md +121 -0
package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
package/skills/finish-only-when-green/SKILL.md +265 -0
package/skills/grill-me/SKILL.md +24 -0
package/skills/grill-with-docs/SKILL.md +92 -0
package/skills/grill-with-docs/adr-format.md +47 -0
package/skills/grill-with-docs/context-format.md +67 -0
package/skills/improve-codebase-architecture/SKILL.md +75 -0
package/skills/improve-codebase-architecture/deepening.md +37 -0
package/skills/improve-codebase-architecture/interface-design.md +44 -0
package/skills/improve-codebase-architecture/language.md +53 -0
package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
package/skills/tdd/SKILL.md +115 -0
package/skills/tdd/deep-modules.md +33 -0
package/skills/tdd/interface-design.md +31 -0
package/skills/tdd/mocking.md +59 -0
package/skills/tdd/refactoring.md +10 -0
package/skills/tdd/tests.md +61 -0
package/skills/to-issues/SKILL.md +62 -0
package/skills/to-prd/SKILL.md +75 -0
package/skills/using-aw-skills/SKILL.md +170 -237
package/skills/using-aw-skills/hooks/session-start.sh +11 -41
package/skills/zoom-out/SKILL.md +24 -0
package/.cursor/rules/common-agents.md +0 -53
package/.cursor/rules/common-aw-routing.md +0 -43
package/.cursor/rules/common-coding-style.md +0 -52
package/.cursor/rules/common-development-workflow.md +0 -33
package/.cursor/rules/common-git-workflow.md +0 -28
package/.cursor/rules/common-hooks.md +0 -34
package/.cursor/rules/common-patterns.md +0 -35
package/.cursor/rules/common-performance.md +0 -59
package/.cursor/rules/common-security.md +0 -33
package/.cursor/rules/common-testing.md +0 -33
package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
package/.cursor/skills/article-writing/SKILL.md +0 -85
package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
package/.cursor/skills/aw-build/SKILL.md +0 -152
package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
package/.cursor/skills/aw-debug/SKILL.md +0 -49
package/.cursor/skills/aw-deploy/SKILL.md +0 -101
package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
package/.cursor/skills/aw-execute/SKILL.md +0 -47
package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
package/.cursor/skills/aw-finish/SKILL.md +0 -111
package/.cursor/skills/aw-investigate/SKILL.md +0 -109
package/.cursor/skills/aw-plan/SKILL.md +0 -368
package/.cursor/skills/aw-prepare/SKILL.md +0 -118
package/.cursor/skills/aw-review/SKILL.md +0 -118
package/.cursor/skills/aw-ship/SKILL.md +0 -115
package/.cursor/skills/aw-spec/SKILL.md +0 -104
package/.cursor/skills/aw-tasks/SKILL.md +0 -138
package/.cursor/skills/aw-test/SKILL.md +0 -118
package/.cursor/skills/aw-verify/SKILL.md +0 -51
package/.cursor/skills/aw-yolo/SKILL.md +0 -111
package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
package/.cursor/skills/bun-runtime/SKILL.md +0 -84
package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
package/.cursor/skills/code-simplification/SKILL.md +0 -74
package/.cursor/skills/content-engine/SKILL.md +0 -88
package/.cursor/skills/context-engineering/SKILL.md +0 -74
package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
package/.cursor/skills/frontend-slides/SKILL.md +0 -184
package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
package/.cursor/skills/idea-refine/SKILL.md +0 -84
package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
package/.cursor/skills/investor-materials/SKILL.md +0 -96
package/.cursor/skills/investor-outreach/SKILL.md +0 -76
package/.cursor/skills/market-research/SKILL.md +0 -75
package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
package/.cursor/skills/performance-optimization/SKILL.md +0 -77
package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
/package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
/package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
/package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
/package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
/package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
/package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
/package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
/package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
/package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
/package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
/package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
/package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
/package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
/package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
/package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
/package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
/package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
/package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
/package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
/package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
/package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
/package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
/package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
/package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
/package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
/package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
/package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
/package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
/package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
/package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0

package/skills/aw-adk/scripts/trigger-eval.py ADDED Viewed

@@ -0,0 +1,192 @@
+#!/usr/bin/env python3
+"""
+trigger-eval.py — Tests skill/agent description triggering accuracy
+Usage:
+    python skills/aw-adk/scripts/trigger-eval.py \\
+        --eval-set <path-to-eval-set.json> \\
+        --skill-path <path-to-skill> \\
+        [--model <model-id>] \\
+        [--max-iterations 5] \\
+        [--verbose]
+Evaluates whether a skill's description causes it to trigger correctly:
+- should_trigger queries should activate the skill
+- should_not_trigger queries should NOT activate the skill
+Tests each query against the configured AI runner and checks if the skill was consulted.
+Supports multiple runners: claude (default), cursor, codex.
+Adapted from skill-creator's run_eval.py + run_loop.py for CASRE context.
+"""
+import argparse
+import json
+import os
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+def load_eval_set(path: str) -> list[dict]:
+    """Load eval set from JSON file."""
+    with open(path, "r") as f:
+        return json.load(f)
+def read_skill_description(skill_path: str) -> str:
+    """Extract the description from a skill's SKILL.md frontmatter."""
+    skill_md = os.path.join(skill_path, "SKILL.md")
+    if not os.path.exists(skill_md):
+        # Maybe it's a single .md file (agent)
+        skill_md = skill_path
+    with open(skill_md, "r") as f:
+        content = f.read()
+    # Parse frontmatter
+    if content.startswith("---"):
+        end = content.index("---", 3)
+        frontmatter = content[3:end]
+        for line in frontmatter.split("\n"):
+            if line.strip().startswith("description:"):
+                return line.split("description:", 1)[1].strip().strip('"').strip("'")
+    return ""
+def detect_runner() -> str:
+    """Auto-detect which AI runner is available."""
+    for runner in ["claude", "cursor", "codex"]:
+        try:
+            subprocess.run([runner, "--version"], capture_output=True, timeout=5)
+            return runner
+        except (FileNotFoundError, subprocess.TimeoutExpired):
+            continue
+    return "claude"  # fallback
+def build_runner_command(runner: str, query: str, skill_path: str, model: str) -> list[str]:
+    """Build the CLI command for the given runner."""
+    if runner == "claude":
+        return ["claude", "-p", query, "--model", model, "--max-turns", "1"]
+    elif runner == "cursor":
+        # Cursor uses --prompt flag in CLI mode
+        return ["cursor", "--prompt", query, "--model", model]
+    elif runner == "codex":
+        # Codex uses positional prompt
+        return ["codex", "-q", query, "--model", model]
+    else:
+        return ["claude", "-p", query, "--model", model, "--max-turns", "1"]
+def test_trigger(query: str, skill_path: str, model: str, runner: str = "claude") -> bool:
+    """Test if a query triggers the skill using the configured runner.
+    Returns True if the skill was consulted (triggered).
+    """
+    cmd = build_runner_command(runner, query, skill_path, model)
+    try:
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=120,
+        )
+        # Check if the skill name appears in the output (indicating it was loaded)
+        output = result.stdout + result.stderr
+        skill_name = os.path.basename(skill_path.rstrip("/"))
+        return skill_name.lower() in output.lower()
+    except (subprocess.TimeoutExpired, FileNotFoundError) as e:
+        print(f"  Warning: {runner} failed: {e}", file=sys.stderr)
+        return False
+def evaluate(eval_set: list[dict], skill_path: str, model: str, verbose: bool = False, runner: str = "claude") -> dict:
+    """Run all eval queries and compute accuracy."""
+    results = []
+    correct = 0
+    total = len(eval_set)
+    for i, item in enumerate(eval_set):
+        query = item["query"]
+        should_trigger = item["should_trigger"]
+        if verbose:
+            print(f"  [{i + 1}/{total}] Testing: {query[:60]}...", file=sys.stderr)
+        triggered = test_trigger(query, skill_path, model, runner)
+        is_correct = triggered == should_trigger
+        if is_correct:
+            correct += 1
+        results.append(
+            {
+                "query": query,
+                "should_trigger": should_trigger,
+                "triggered": triggered,
+                "correct": is_correct,
+            }
+        )
+        if verbose:
+            status = "PASS" if is_correct else "FAIL"
+            print(f"         {status} (should_trigger={should_trigger}, triggered={triggered})", file=sys.stderr)
+    accuracy = correct / total if total > 0 else 0
+    return {
+        "accuracy": round(accuracy, 3),
+        "correct": correct,
+        "total": total,
+        "results": results,
+        "false_positives": [r for r in results if not r["should_trigger"] and r["triggered"]],
+        "false_negatives": [r for r in results if r["should_trigger"] and not r["triggered"]],
+    }
+def main():
+    parser = argparse.ArgumentParser(description="Test skill/agent description triggering accuracy")
+    parser.add_argument("--eval-set", required=True, help="Path to eval set JSON")
+    parser.add_argument("--skill-path", required=True, help="Path to skill directory or agent file")
+    parser.add_argument("--model", default="claude-sonnet-4-6", help="Model ID for testing")
+    parser.add_argument("--runner", default="auto", help="AI runner: claude, cursor, codex, or auto (detect)")
+    parser.add_argument("--max-iterations", type=int, default=1, help="Number of evaluation iterations")
+    parser.add_argument("--verbose", action="store_true", help="Print progress")
+    args = parser.parse_args()
+    runner = args.runner if args.runner != "auto" else detect_runner()
+    print(f"Using runner: {runner}")
+    eval_set = load_eval_set(args.eval_set)
+    print(f"Loaded {len(eval_set)} eval queries ({sum(1 for e in eval_set if e['should_trigger'])} should-trigger, {sum(1 for e in eval_set if not e['should_trigger'])} should-not-trigger)")
+    description = read_skill_description(args.skill_path)
+    if description:
+        print(f"Current description: {description[:100]}...")
+    for iteration in range(1, args.max_iterations + 1):
+        print(f"\n--- Iteration {iteration} ---")
+        result = evaluate(eval_set, args.skill_path, args.model, args.verbose, runner)
+        print(f"Accuracy: {result['accuracy']:.1%} ({result['correct']}/{result['total']})")
+        if result["false_positives"]:
+            print(f"False positives ({len(result['false_positives'])}):")
+            for fp in result["false_positives"]:
+                print(f"  - {fp['query'][:80]}")
+        if result["false_negatives"]:
+            print(f"False negatives ({len(result['false_negatives'])}):")
+            for fn in result["false_negatives"]:
+                print(f"  - {fn['query'][:80]}")
+    # Write results
+    output_path = os.path.join(os.path.dirname(args.eval_set), "trigger-eval-results.json")
+    with open(output_path, "w") as f:
+        json.dump(result, f, indent=2)
+    print(f"\nResults saved to {output_path}")
+if __name__ == "__main__":
+    main()

package/skills/aw-build/SKILL.md CHANGED Viewed

@@ -34,7 +34,8 @@ Do not use for vague ideation, unclear bugs, or release-only work.
    Use `../../references/build-increments.md` to keep changes thin, reversible, and rollback-friendly.
    For multi-file or high-risk work, load `incremental-implementation`.
 5. Build one slice or one bounded parallel wave at a time.
-   For any slice that changes observable behavior, fixes a bug, or refactors live behavior, load `tdd-guide` and require explicit RED-GREEN-REFACTOR (RED -> GREEN -> REFACTOR).
+   For any slice that changes observable behavior, fixes a bug, or refactors live behavior, load `tdd-workflow` and require explicit RED-GREEN-REFACTOR (RED -> GREEN -> REFACTOR).
+   Use `tdd-guide` when a specialist subagent is useful, and use the `tdd` companion skill when the slice needs deeper behavior-test, mocking, or tracer-bullet guidance.
    For config, docs, infra, migration, or other non-behavior slices where test-first is not meaningful, record the best pre-change proof available before editing and the focused post-change validation that will prove the slice.
    During implementation, prefer the simplest change that fits existing patterns.
    Avoid speculative abstractions, unnecessary branching, and adjacent cleanup outside the approved slice.
@@ -149,16 +150,30 @@ Parallel build fan-out must stay within the planned `max_parallel_subagents` cap
 - deferred findings
 - simplification notes
 - `save_point_commits`
+- `html_companion_artifacts`
 - blockers or concerns
 - recommended next commands
+## Human HTML Companion
+Markdown `execution.md` remains canonical for agents.
+When build writes or materially updates `execution.md`, also create or refresh `.aw_docs/features/<feature_slug>/execution.html`. HTML sidecars are required stage outputs, not advisory metadata.
+Delegate to the `aw:echo` subagent with the `implementation-plan` profile.
+Invoking `/aw:build` in default `dual` mode is explicit authorization to spawn exactly one `aw:echo` subagent for HTML companion generation; do not skip HTML only because no direct command is available.
+Resolve output mode as: explicit user request for Markdown-only -> otherwise `dual`. `.aw_docs/config.json` and `AW_DOCS_OUTPUT_MODE` may request `dual` or `html`, but must not silently suppress required SDLC HTML sidecars.
+Pass approved inputs, completed slices, phase progress, file map, validation evidence, save-point commits, deferred findings, and next command as the source bundle.
+Record the colocated sidecar in `state.json` `html_companion_artifacts` with `source_path`, `html_path`, profile, status, `run_ref` when available, publish status, and any explicit Markdown-only skip or fallback reason.
+Spawn exactly one `aw:echo` subagent and wait for the colocated `.html` sidecar before the final handoff unless the user explicitly asks not to wait. If the harness still cannot spawn `aw:echo`, create a conservative self-contained fallback HTML sidecar in the same turn using the `aw:echo` safety and design contract, record `generated_fallback` plus the blocker, and keep Markdown canonical.
 ## Verification
 Before leaving build, confirm:
 - [ ] the change came from approved inputs or a clearly approved direct technical request
 - [ ] the work was split into thin, reversible increments when non-trivial
-- [ ] behavior-changing slices used explicit RED -> GREEN -> REFACTOR via `tdd-guide`
+- [ ] behavior-changing slices used explicit RED -> GREEN -> REFACTOR via `tdd-workflow` and, when useful, `tdd-guide` or `tdd`
 - [ ] non-behavior slices recorded pre-change proof and focused post-change validation
 - [ ] each meaningful completed slice reached green before the next slice started
 - [ ] each meaningful completed slice had a focused review with the right reviewer agent before the next slice started
@@ -169,6 +184,7 @@ Before leaving build, confirm:
 - [ ] phased plans, if used, recorded phase completion plus the next phase transition
 - [ ] meaningful completed slices produced recorded save-point commits
 - [ ] `execution.md` and `state.json` are updated
+- [ ] the HTML companion file exists, or the user explicitly requested Markdown-only
 ## Final Output Shape
@@ -185,5 +201,6 @@ Always end with:
 - `Chunk Reviews`
 - `Simplification`
 - `Save Points`
+- `HTML Companion`
 - `Blockers`
 - `Next`

package/skills/aw-deploy/SKILL.md CHANGED Viewed

@@ -25,6 +25,7 @@ Do not use for launch discipline or end-to-end orchestration.
    The required QA and review outputs must exist.
 2. Select one release path.
    PR, branch, staging, or production.
+   **Default deployment environment:** if the user does **not** explicitly name a target (`production`, `prod`, `staging-versions`, green/default/promote, a named cluster/VPN, etc.), assume **`staging`** only. Do not infer production or staging-versions from vague phrases (“deploy”, “ship”, “push it”). Jenkins paths and MCP job folders must use the **`staging/...`** prefix unless an explicit non-staging target is confirmed.
 3. Resolve the org-standard mechanism.
    Use the repo archetype and resolved baseline profile to choose provider and mechanism.
    Load `ci-cd-and-automation` for gate ordering, preview/deploy automation, and rollback-aware pipeline expectations.
@@ -35,6 +36,50 @@ Do not use for launch discipline or end-to-end orchestration.
 5. Hand off to `aw-ship` when requested.
    Use `aw-ship` for rollout safety, rollback readiness, and closeout.
+## Repo routing: Revex membership (`ghl-revex-backend`)
+**Hard rule:** If the deployment is from the **`ghl-revex-backend`** repo **and** it targets **membership** workloads (communities, client-portal, courses, `ghl-revex-backend` server/workers, Debezium, ProxySQL, membership workers, etc.), **do not** treat this as a generic `aw-deploy` + **platform-backend** Jenkins path.
+**Always** use one of these instead:
+- **Cursor command:** `/aw/revex-memberships-infra-deploy` or `/aw/revex-membership-frontend-infra-deploy` (same command surface — argument hint: `staging` / `staging-versions` / `production` + app/worker names; **default env = `staging`** per workflow step 2).
+- **Agent path:** the **infra-release-engineer** / membership **deployment** flow that loads **`backend-deployment-skill`** from the registry (e.g. `revex/memberships/infra/skills/backend-deployment-skill/SKILL.md` when `.aw` resolves to the AW registry).
+**Jenkins MCP** for that stack is still **`user-ghl-ai`** (`jenkins_trigger-build`, `jenkins_list-jobs`, `jenkins_get-build`, …), but **job paths, `DeploymentOption`, green/default/promote**, and **parameter names** must come from **`backend-deployment-skill`**, not from the `platform-backend` worker job list below.
+If the user only said “deploy” without naming the repo, **infer from workspace root / `package.json` / remote** — when it is `ghl-revex-backend`, **hand off** to the command above rather than improvising `staging/revex/...` paths from memory.
+## Deploy execution: Jenkins (GHL platform-backend)
+Use this section **only** when the repo is **`platform-backend`** (or another app explicitly using `deployments/<env>/workers/` under that monorepo’s Jenkins layout), **not** when [Repo routing: Revex membership](#repo-routing-revex-membership-ghl-revex-backend) applies.
+When the resolved mechanism is **Jenkins** (repo has `deployments/<env>/workers/Jenkinsfile*.` pipelines), **execute the deploy**, do not stop at “open Jenkins manually” if the agent session exposes MCP.
+### MCP (preferred when available)
+- **Server id:** `user-ghl-ai` (Cursor often shows the label **ghl-ai** — if `call_mcp_tool` fails, confirm the server identifier from the project MCP metadata, e.g. `.cursor/projects/<repo>/mcps/user-ghl-ai/SERVER_METADATA.json`, or the Cursor MCP panel.)
+- **Tool order (fail-closed discovery):**
+  1. `jenkins_list-jobs` with `folder` path segments (e.g. `staging/common/platform-workers`) until you find the **WorkflowJob**, not a folder.
+  2. `jenkins_get-build` on the job’s **lastSuccessfulBuild** to learn exact **parameter names** and shapes (boolean params still go in as strings in step 3).
+  3. `jenkins_trigger-build` with `action: "trigger"`, full job `path`, and `parameters` as **string key → string value** only (`"true"` / `"false"` for booleans; `SkipBuild` is typically `"Yes"` / `"No"`).
+  4. `jenkins_list-builds` (and optionally `jenkins_get-build-log`) to confirm the new build queued / finished — the trigger API may return success while the build is still pending.
+- **Local schemas (optional):** Cursor may mirror tool contracts under `.cursor/projects/<workspace>/mcps/user-ghl-ai/tools/*.json` — read before calling if the session does not show tool args inline.
+### Common `platform-backend` staging job paths (verify with `jenkins_list-jobs`)
+These are the usual umbrellas; **always** confirm in Jenkins for the current folder layout:
+- **Events / mixed workers** (`deployments/staging/workers/Jenkinsfile.eventsworker`):
+  `staging/common/platform-workers/platform-events-worker`
+  Worker selection uses **slug parameters** matching the Jenkinsfile list (e.g. `events-worker=true`).
+- **Mongo change-stream workers by team** (`deployments/staging/workers/Jenkinsfile.*.mongoEventsWorker`):
+  `staging/common/platform-workers/platform-mongo-events-workers/platform-mongo-events-worker-{automation|crm|leadgen|platform|revex}`
+  Revex flags are **uppercase** boolean params from that Jenkinsfile (e.g. `CLIENTPORTAL_USERS_CONTACTS=true`).
+### If MCP is not callable in this session
+Record a **clear blocker** in the handoff: UI shows `ghl-ai` enabled but the agent has no `call_mcp_tool` route, or auth denied. Still provide **exact** job path(s), branch parameter, and boolean flags so a human can run the same build in the Jenkins UI.
 ## Completion Contract
 Deploy is complete only when one of these is true:
@@ -52,16 +97,17 @@ Every deploy handoff must make these things obvious:
 ## Common Rationalizations
-| Rationalization | Reality |
-|---|---|
+| Rationalization                           | Reality                                                        |
+| ----------------------------------------- | -------------------------------------------------------------- |
 | "Deploy can also handle launch closeout." | Release action and launch discipline are related but distinct. |
-| "I'll just guess the staging mechanism." | Unknown deployment config must fail closed. |
+| "I'll just guess the staging mechanism."  | Unknown deployment config must fail closed.                    |
 ## Red Flags
 - deploy runs without clear test and review evidence
 - provider or mechanism is guessed
 - deploy silently turns into release orchestration
+- environment is assumed to be **production** or **staging-versions** when the user did not state it (default must remain **staging**)
 ## State File
@@ -77,15 +123,30 @@ Every deploy handoff must make these things obvious:
 - build or release links
 - execution evidence
 - rollback path
+- `html_companion_artifacts`
 - blockers
 - recommended next commands
+## Human HTML Companion
+Markdown `release.md` remains canonical for agents.
+When deploy writes or materially updates `release.md`, also create or refresh `.aw_docs/features/<feature_slug>/release.html`. HTML sidecars are required stage outputs, not advisory metadata.
+Delegate to the `aw:echo` subagent with the `release-report` profile.
+Invoking `/aw:deploy` in default `dual` mode is explicit authorization to spawn exactly one `aw:echo` subagent for HTML companion generation; do not skip HTML only because no direct command is available.
+Resolve output mode as: explicit user request for Markdown-only -> otherwise `dual`. `.aw_docs/config.json` and `AW_DOCS_OUTPUT_MODE` may request `dual` or `html`, but must not silently suppress required SDLC HTML sidecars.
+Pass selected mode, provider, resolved mechanism, release links, execution evidence, rollback path, blockers, and next command as the source bundle.
+Record the colocated sidecar in `state.json` `html_companion_artifacts` with `source_path`, `html_path`, profile, status, `run_ref` when available, publish status, and any explicit Markdown-only skip or fallback reason.
+Spawn exactly one `aw:echo` subagent and wait for the colocated `.html` sidecar before the final handoff unless the user explicitly asks not to wait. If the harness still cannot spawn `aw:echo`, create a conservative self-contained fallback HTML sidecar in the same turn using the `aw:echo` safety and design contract, record `generated_fallback` plus the blocker, and keep Markdown canonical.
 ## Verification
 - [ ] one release action was selected explicitly
 - [ ] provider and mechanism came from repo archetype and baseline resolution
 - [ ] `release.md` and `state.json` are updated
 - [ ] handoff to `aw-ship` is clear when launch discipline is still needed
+- [ ] the HTML companion file exists, or the user explicitly requested Markdown-only
 ## Final Output Shape
@@ -98,4 +159,5 @@ Always end with:
 - `Execution Evidence`
 - `Rollback Path`
 - `Outcome`
+- `HTML Companion`
 - `Next`

package/skills/aw-design/SKILL.md ADDED Viewed

@@ -0,0 +1,156 @@
+---
+name: aw-design
+description: Generate premium SaaS UI designs using the Highrise design system. Produces linked HTML prototypes with all state variants, micro-interactions, dark mode, responsive breakpoints, and an index page that maps the full feature. Stitch MCP first, static HTML fallback.
+trigger: Phase 5 of aw-feature, or when the user asks for UI design, screen mockups, HTML prototypes, or design exploration for a GHL feature.
+---
+# AW Design
+Generate production-grade SaaS screens for GoHighLevel features. Every screen should feel like it belongs in Linear, Vercel, or Stripe — clean, restrained, premium, and alive with subtle motion. Highrise provides the tokens; your job is to make something genuinely good within them.
+## References (read on demand)
+This skill is intentionally lean. Read the reference files as you need them:
+- `references/highrise-tokens.md` — colors (light + dark), typography, spacing, components, restraint rules, responsive breakpoints. **Read before generating any screen.**
+- `references/prompt-template.md` — the full screen prompt, layout hints by screen type, and state variant prompt additions. **Read before writing your first prompt.**
+- `references/stitch-workflow.md` — Stitch tool reference, generation steps, timeout/model fallback, HTML fallback path, iteration patterns. **Read before the first Stitch call.**
+- `references/micro-interactions.md` — all CSS transitions, keyframes, and `prefers-reduced-motion` fallback. **Read when writing HTML directly.**
+- `references/quality-checklist.md` — the pass/fail contract the self-review enforces, plus the mandatory index page spec. **Read before step 5 (index).**
+- `references/self-review.md` — deterministic + visual review tracks, capture matrix, evidence-required REVIEW.md template, per-iteration audit format, status-rule table (✅/⚠️/❌), and the anti-fake sanity checks. **Read at the start of step 6. Non-optional.**
+## Path Precedence
+**Try Stitch MCP first.** Static HTML is the fallback, not a parallel option.
+1. Check if Stitch MCP tools are registered (look for `stitch_*` tools on the `user-ghl-ai` server).
+2. If yes → run the Stitch path. Do not fall back unless a call actually fails.
+3. A **timeout is not a failure** — the default client timeout (~70s) is shorter than Stitch's typical generation time. Poll via `stitch_list-screens` + `stitch_get-screen` before giving up. See timeout handling in `references/stitch-workflow.md`.
+4. If polling confirms the call truly failed (error, auth, quota exhausted, or no screen after 3 min + Flash retry) → document the reason and fall back to hand-written HTML for that screen.
+5. If Stitch tools are missing entirely → skip Stitch and write HTML directly.
+6. If the user explicitly says "offline" or "static HTML only" → skip Stitch and write HTML directly.
+Never silently pick the HTML path just because it's faster. The user asked for design; Stitch produces better output.
+## Design Thinking
+Before generating anything, commit to a clear direction:
+- **Purpose** — What problem does this screen solve? Who's looking at it and what do they need?
+- **Hierarchy** — What's the single most important thing on the page? Build everything around it.
+- **Restraint** — Premium SaaS is defined by what you leave out. Monochrome-first. One accent color. Let content breathe.
+- **Craft** — Spacing, alignment, typography weight, and motion — the details that separate "looks AI-generated" from "looks designed."
+The goal is intentional, polished, and cohesive — not flashy.
+## Workflow
+### 1. Understand what to design
+Read the feature's `requirements.md` or `prd.md`. For each screen, identify:
+- What it shows (purpose, key data, user actions)
+- Which states it needs (default, empty, loading, error, modal)
+- How screens connect (navigation flow)
+Produce `.aw_docs/features/<slug>/designs/SCREEN_PLAN.md` listing every screen and the nav structure linking them. If scope is unclear, ask the user which pages and flows to cover before generating.
+### 2. Prepare the prompt
+Read `references/highrise-tokens.md` and `references/prompt-template.md`. Fill in the bracketed parts of the template for the first screen (screen type, layout, nav items, current page).
+### 3. Generate screens
+Read `references/stitch-workflow.md`. Follow the Stitch path. **A timeout is not a failure.** Stitch's 70s MCP client timeout regularly fires before generation finishes; the screen is still being built server-side.
+**Polling protocol on timeout (mandatory, non-negotiable):**
+1. Capture the `requestId` / note the timestamp.
+2. Call `stitch_list-screens` every **30 seconds** for **6 polls** (total 3 minutes of wall time).
+3. After each poll, check whether a new screen matching the request has appeared — if yes, call `stitch_get-screen` and continue as normal.
+4. If all 6 polls complete with no new screen → retry **once** with `model: "GEMINI_3_FLASH"` and re-apply the polling protocol.
+5. If the Flash retry also produces nothing → document the screen as a Stitch failure in `SCREEN_PLAN.md` and fall back to hand-written HTML **for that screen only**.
+Do not write HTML on the first timeout. Do not skip polls. Do not treat a single 70s wait as "Stitch failed."
+If Stitch is genuinely unavailable (tools not registered, or `stitch_list-screens` errors), read `references/micro-interactions.md` and write self-contained HTML files using the same prompt template as the spec.
+### 4. Generate state variants
+For every screen, produce default + loading + empty + error variants (+ modal if applicable). Use the state variant prompt additions at the bottom of `references/prompt-template.md`. Keep them on `GEMINI_3_FLASH` (the default) — they're simple derivations of an existing screen.
+### 5. Build the index page (mandatory)
+Write `.aw_docs/features/<slug>/designs/index.html` — the map of the entire feature. This is the stakeholder entry point. See the "Index page requirements" section in `references/quality-checklist.md` for what it must contain.
+### 6. Self-review & iterate (until production-ready)
+The skill is not done generating — it is done **proving the output is production-ready**. Read `references/self-review.md` and run both tracks.
+**This step is mandatory and non-interactive.** Do not ask the user "should I run the review?" Do not ask "do you want me to iterate?" Do not stop after iteration 1 to check in. The contract is: run the loop until zero findings, or until 3 iterations have elapsed with fixes-on-disk in at least 2 of them. That happens silently — the user sees the result, not the permission prompt. Asking for permission to run this step is a protocol violation.
+- **Track A (deterministic):** regex-level sweep for rogue hex, missing states, missing media queries, missing focus rings, off-scale typography (including `font:` shorthand), sidebar restraint, index completeness, and placeholder data. Always runs, every iteration.
+- **Track B (visual):** navigate each screen via a browser MCP, `browser_resize` through 320 / 768 / 1024 / 1440, toggle dark mode, run cross-screen consistency spot-checks, read `browser_console_messages`. Use **Playwright MCP** (portable across Codex, Claude, and Cursor) or Cursor's `cursor-ide-browser` when running inside Cursor. Both expose the same `browser_*` tool surface. If `file://` URLs are rejected by the MCP, spin up `python3 -m http.server 8765 --directory <designs/>` in the background and use `http://127.0.0.1:8765/...` — never accept the rejection as a reason to skip Track B. Always runs every iteration. Only skipped if neither MCP is registered at all, in which case mark Track B SKIPPED in `REVIEW.md` and downgrade the status.
+- **Never skip Track B just because Track A had findings.** Both tracks run every iteration.
+Categorize every finding using the fix-method table in `references/self-review.md`. **Prefer 0-cost direct edits over Stitch regeneration** — never burn quota to fix a rogue hex.
+Loop up to **3 iterations**. Stop conditions and resulting REVIEW.md status are defined in `references/self-review.md`. Summary:
+- Zero findings → ✅ Production-ready
+- 3 iterations run, findings decreasing, fixes applied in ≥2 iterations → ⚠️ Shipped with known issues
+- Findings stopped decreasing, or a BLOCKER surfaced, or only 1 iteration ran with findings remaining → ❌ Blocked
+Each iteration must produce its own evidence section in `designs/REVIEW.md` (exact command strings + output counts for Track A, capture-matrix ratio for Track B, fixes-applied list). A status of ⚠️ without two `## Fixes applied` sections on disk is invalid — downgrade to ❌ instead of lying.
+Do not proceed to step 7 until `REVIEW.md` shows ✅, or until you've exhausted 3 iterations and explicitly documented the remaining blockers and their severity.
+### 7. Present to user
+Share `designs/index.html` as the entry point and `designs/REVIEW.md` as the audit trail. If `REVIEW.md` status is ⚠️ or ❌, **call it out explicitly** — don't bury the known issues. Take user feedback here; any change request re-enters step 6 before re-presenting.
+For user-driven revisions: use `stitch_edit-screens` for targeted fixes, `stitch_generate-variants` for alternatives, and only regenerate for major rethinks. See the iteration table in `references/stitch-workflow.md`.
+### 8. Document the design
+Write `.aw_docs/features/<slug>/design.md` **at the feature root, not inside `designs/`**:
+- Screen-by-screen walkthrough (what each screen does, how users navigate between them)
+- Component inventory (which existing HL components to reuse vs. what's new)
+- Key design decisions and rationale
+- Link to `designs/index.html` as the entry point
+- Link to `designs/SCREEN_PLAN.md` for the flow map
+- Link to `designs/REVIEW.md` for the self-review audit trail
+## Output Structure
+```
+.aw_docs/features/<slug>/
+├── requirements.md          (from earlier phase)
+├── prd.md                   (from earlier phase)
+├── design.md                ← design decisions, component inventory
+└── designs/
+    ├── index.html           ← entry point: links to every screen + state
+    ├── SCREEN_PLAN.md       ← flow map + nav structure
+    ├── REVIEW.md            ← self-review audit trail (step 6 output)
+    ├── <screen-1>/
+    │   ├── default.html
+    │   ├── empty.html
+    │   ├── loading.html
+    │   ├── error.html
+    │   └── modal-<name>.html
+    ├── <screen-2>/
+    │   └── ...
+    └── screenshots/
+        ├── <screen>-desktop-light.png   (from stitch_get-screen if used)
+        └── <screen>-<state>-<width>.png (from step 6 visual sweep)
+```
+## Platform Skills to Reference
+These exist in the platform registry and contain deeper guidance — reference specific sections when relevant, don't re-read them whole:
+- **`platform-design:md`** — anti-pattern catalog (cheap vs premium patterns), DESIGN.md output format, dark mode token mapping, responsive breakpoint table
+- **`platform-design:stitch-screen-generation`** — multi-select theme consistency, competitive benchmarking, Ralph Loop iteration
+- **`platform-design:pixel-fidelity-review`** — 6-layer audit, scoring rubric, Highrise-specific CSS override gotchas (use when implementation review is needed)
+- **`platform-design:system`** — HL component catalog, design token CSS properties, WCAG 2.1 AA rules