aw-ecc 1.4.32 → 1.4.47
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/.codex/hooks/aw-post-tool-use.sh +8 -2
- package/.codex/hooks/aw-session-start.sh +11 -4
- package/.codex/hooks/aw-stop.sh +8 -2
- package/.codex/hooks/aw-user-prompt-submit.sh +10 -2
- package/.codex/hooks.json +8 -8
- package/.cursor/INSTALL.md +7 -5
- package/.cursor/hooks/adapter.js +41 -4
- package/.cursor/hooks/after-agent-response.js +62 -0
- package/.cursor/hooks/before-submit-prompt.js +7 -1
- package/.cursor/hooks/post-tool-use-failure.js +21 -0
- package/.cursor/hooks/post-tool-use.js +39 -0
- package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
- package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
- package/.cursor/hooks/subagent-start.js +22 -4
- package/.cursor/hooks/subagent-stop.js +18 -1
- package/.cursor/hooks.json +23 -2
- package/.opencode/package.json +1 -1
- package/AGENTS.md +3 -3
- package/README.md +5 -5
- package/commands/adk.md +52 -0
- package/commands/build.md +22 -9
- package/commands/deploy.md +12 -0
- package/commands/execute.md +9 -0
- package/commands/feature.md +333 -0
- package/commands/investigate.md +18 -5
- package/commands/plan.md +23 -9
- package/commands/publish.md +65 -0
- package/commands/review.md +12 -0
- package/commands/ship.md +12 -0
- package/commands/test.md +12 -0
- package/commands/verify.md +9 -0
- package/hooks/hooks.json +36 -0
- package/manifests/install-components.json +8 -0
- package/manifests/install-modules.json +83 -0
- package/manifests/install-profiles.json +7 -0
- package/package.json +1 -1
- package/scripts/ci/validate-rules.js +51 -0
- package/scripts/cursor-aw-home/hooks.json +23 -2
- package/scripts/cursor-aw-hooks/adapter.js +41 -4
- package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
- package/scripts/hooks/aw-usage-commit-created.js +32 -0
- package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
- package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
- package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
- package/scripts/hooks/aw-usage-session-start.js +48 -0
- package/scripts/hooks/aw-usage-stop.js +182 -0
- package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
- package/scripts/hooks/cost-tracker.js +3 -23
- package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
- package/scripts/hooks/shared/aw-phase-runner.js +3 -1
- package/scripts/lib/aw-hook-contract.js +2 -2
- package/scripts/lib/aw-pricing.js +306 -0
- package/scripts/lib/aw-usage-telemetry.js +472 -0
- package/scripts/lib/codex-hook-config.js +8 -8
- package/scripts/lib/cursor-hook-config.js +25 -10
- package/scripts/lib/install-targets/cursor-project.js +3 -0
- package/scripts/lib/install-targets/helpers.js +20 -3
- package/skills/aw-adk/SKILL.md +317 -0
- package/skills/aw-adk/agents/analyzer.md +113 -0
- package/skills/aw-adk/agents/comparator.md +113 -0
- package/skills/aw-adk/agents/grader.md +115 -0
- package/skills/aw-adk/assets/eval_review.html +76 -0
- package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
- package/skills/aw-adk/eval-viewer/viewer.html +181 -0
- package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
- package/skills/aw-adk/evals/eval-create-agent.md +90 -0
- package/skills/aw-adk/evals/eval-create-command.md +98 -0
- package/skills/aw-adk/evals/eval-create-eval.md +89 -0
- package/skills/aw-adk/evals/eval-create-rule.md +99 -0
- package/skills/aw-adk/evals/eval-create-skill.md +97 -0
- package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
- package/skills/aw-adk/evals/eval-delete-command.md +89 -0
- package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
- package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
- package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
- package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
- package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
- package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
- package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
- package/skills/aw-adk/evals/evals.json +96 -0
- package/skills/aw-adk/references/artifact-wiring.md +162 -0
- package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
- package/skills/aw-adk/references/eval-placement-guide.md +183 -0
- package/skills/aw-adk/references/external-resources.md +75 -0
- package/skills/aw-adk/references/getting-started.md +66 -0
- package/skills/aw-adk/references/registry-structure.md +152 -0
- package/skills/aw-adk/references/rubric-agent.md +36 -0
- package/skills/aw-adk/references/rubric-command.md +36 -0
- package/skills/aw-adk/references/rubric-eval.md +36 -0
- package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
- package/skills/aw-adk/references/rubric-rule.md +36 -0
- package/skills/aw-adk/references/rubric-skill.md +36 -0
- package/skills/aw-adk/references/schemas.md +222 -0
- package/skills/aw-adk/references/template-agent.md +251 -0
- package/skills/aw-adk/references/template-command.md +279 -0
- package/skills/aw-adk/references/template-eval.md +176 -0
- package/skills/aw-adk/references/template-rule.md +119 -0
- package/skills/aw-adk/references/template-skill.md +123 -0
- package/skills/aw-adk/references/type-classifier.md +98 -0
- package/skills/aw-adk/references/writing-good-agents.md +227 -0
- package/skills/aw-adk/references/writing-good-commands.md +258 -0
- package/skills/aw-adk/references/writing-good-evals.md +271 -0
- package/skills/aw-adk/references/writing-good-rules.md +214 -0
- package/skills/aw-adk/references/writing-good-skills.md +159 -0
- package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
- package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
- package/skills/aw-adk/scripts/score-artifact.sh +179 -0
- package/skills/aw-adk/scripts/trigger-eval.py +192 -0
- package/skills/aw-build/SKILL.md +19 -2
- package/skills/aw-deploy/SKILL.md +65 -3
- package/skills/aw-design/SKILL.md +156 -0
- package/skills/aw-design/references/highrise-tokens.md +394 -0
- package/skills/aw-design/references/micro-interactions.md +76 -0
- package/skills/aw-design/references/prompt-template.md +160 -0
- package/skills/aw-design/references/quality-checklist.md +70 -0
- package/skills/aw-design/references/self-review.md +497 -0
- package/skills/aw-design/references/stitch-workflow.md +127 -0
- package/skills/aw-feature/SKILL.md +293 -0
- package/skills/aw-investigate/SKILL.md +17 -0
- package/skills/aw-plan/SKILL.md +34 -3
- package/skills/aw-publish/SKILL.md +300 -0
- package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
- package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
- package/skills/aw-publish/evals/eval-push-modes.md +67 -0
- package/skills/aw-publish/evals/eval-rules-push.md +60 -0
- package/skills/aw-publish/evals/evals.json +29 -0
- package/skills/aw-publish/references/push-modes.md +38 -0
- package/skills/aw-review/SKILL.md +88 -9
- package/skills/aw-rules-review/SKILL.md +124 -0
- package/skills/aw-rules-review/agents/openai.yaml +3 -0
- package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
- package/skills/aw-ship/SKILL.md +16 -0
- package/skills/aw-spec/SKILL.md +15 -0
- package/skills/aw-tasks/SKILL.md +15 -0
- package/skills/aw-test/SKILL.md +16 -0
- package/skills/aw-yolo/SKILL.md +4 -0
- package/skills/diagnose/SKILL.md +121 -0
- package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
- package/skills/finish-only-when-green/SKILL.md +265 -0
- package/skills/grill-me/SKILL.md +24 -0
- package/skills/grill-with-docs/SKILL.md +92 -0
- package/skills/grill-with-docs/adr-format.md +47 -0
- package/skills/grill-with-docs/context-format.md +67 -0
- package/skills/improve-codebase-architecture/SKILL.md +75 -0
- package/skills/improve-codebase-architecture/deepening.md +37 -0
- package/skills/improve-codebase-architecture/interface-design.md +44 -0
- package/skills/improve-codebase-architecture/language.md +53 -0
- package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
- package/skills/tdd/SKILL.md +115 -0
- package/skills/tdd/deep-modules.md +33 -0
- package/skills/tdd/interface-design.md +31 -0
- package/skills/tdd/mocking.md +59 -0
- package/skills/tdd/refactoring.md +10 -0
- package/skills/tdd/tests.md +61 -0
- package/skills/to-issues/SKILL.md +62 -0
- package/skills/to-prd/SKILL.md +75 -0
- package/skills/using-aw-skills/SKILL.md +170 -237
- package/skills/using-aw-skills/hooks/session-start.sh +11 -41
- package/skills/zoom-out/SKILL.md +24 -0
- package/.cursor/rules/common-agents.md +0 -53
- package/.cursor/rules/common-aw-routing.md +0 -43
- package/.cursor/rules/common-coding-style.md +0 -52
- package/.cursor/rules/common-development-workflow.md +0 -33
- package/.cursor/rules/common-git-workflow.md +0 -28
- package/.cursor/rules/common-hooks.md +0 -34
- package/.cursor/rules/common-patterns.md +0 -35
- package/.cursor/rules/common-performance.md +0 -59
- package/.cursor/rules/common-security.md +0 -33
- package/.cursor/rules/common-testing.md +0 -33
- package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
- package/.cursor/skills/article-writing/SKILL.md +0 -85
- package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
- package/.cursor/skills/aw-build/SKILL.md +0 -152
- package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
- package/.cursor/skills/aw-debug/SKILL.md +0 -49
- package/.cursor/skills/aw-deploy/SKILL.md +0 -101
- package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
- package/.cursor/skills/aw-execute/SKILL.md +0 -47
- package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
- package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
- package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
- package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
- package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
- package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
- package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
- package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
- package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
- package/.cursor/skills/aw-finish/SKILL.md +0 -111
- package/.cursor/skills/aw-investigate/SKILL.md +0 -109
- package/.cursor/skills/aw-plan/SKILL.md +0 -368
- package/.cursor/skills/aw-prepare/SKILL.md +0 -118
- package/.cursor/skills/aw-review/SKILL.md +0 -118
- package/.cursor/skills/aw-ship/SKILL.md +0 -115
- package/.cursor/skills/aw-spec/SKILL.md +0 -104
- package/.cursor/skills/aw-tasks/SKILL.md +0 -138
- package/.cursor/skills/aw-test/SKILL.md +0 -118
- package/.cursor/skills/aw-verify/SKILL.md +0 -51
- package/.cursor/skills/aw-yolo/SKILL.md +0 -111
- package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
- package/.cursor/skills/bun-runtime/SKILL.md +0 -84
- package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
- package/.cursor/skills/code-simplification/SKILL.md +0 -74
- package/.cursor/skills/content-engine/SKILL.md +0 -88
- package/.cursor/skills/context-engineering/SKILL.md +0 -74
- package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
- package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
- package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
- package/.cursor/skills/frontend-slides/SKILL.md +0 -184
- package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
- package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
- package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
- package/.cursor/skills/idea-refine/SKILL.md +0 -84
- package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
- package/.cursor/skills/investor-materials/SKILL.md +0 -96
- package/.cursor/skills/investor-outreach/SKILL.md +0 -76
- package/.cursor/skills/market-research/SKILL.md +0 -75
- package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
- package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
- package/.cursor/skills/performance-optimization/SKILL.md +0 -77
- package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
- package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
- package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
- package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
- package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
- package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
- package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
- package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
- /package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
- /package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
- /package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
- /package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
- /package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
- /package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
- /package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
- /package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
- /package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
- /package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
- /package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
- /package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
- /package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
- /package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
- /package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
- /package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
- /package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
- /package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
- /package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
- /package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
- /package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
- /package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
- /package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
- /package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
- /package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
- /package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
- /package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
- /package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
- /package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
- /package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""
|
|
3
|
+
trigger-eval.py — Tests skill/agent description triggering accuracy
|
|
4
|
+
|
|
5
|
+
Usage:
|
|
6
|
+
python skills/aw-adk/scripts/trigger-eval.py \\
|
|
7
|
+
--eval-set <path-to-eval-set.json> \\
|
|
8
|
+
--skill-path <path-to-skill> \\
|
|
9
|
+
[--model <model-id>] \\
|
|
10
|
+
[--max-iterations 5] \\
|
|
11
|
+
[--verbose]
|
|
12
|
+
|
|
13
|
+
Evaluates whether a skill's description causes it to trigger correctly:
|
|
14
|
+
- should_trigger queries should activate the skill
|
|
15
|
+
- should_not_trigger queries should NOT activate the skill
|
|
16
|
+
|
|
17
|
+
Tests each query against the configured AI runner and checks if the skill was consulted.
|
|
18
|
+
Supports multiple runners: claude (default), cursor, codex.
|
|
19
|
+
|
|
20
|
+
Adapted from skill-creator's run_eval.py + run_loop.py for CASRE context.
|
|
21
|
+
"""
|
|
22
|
+
|
|
23
|
+
import argparse
|
|
24
|
+
import json
|
|
25
|
+
import os
|
|
26
|
+
import subprocess
|
|
27
|
+
import sys
|
|
28
|
+
import tempfile
|
|
29
|
+
from pathlib import Path
|
|
30
|
+
|
|
31
|
+
|
|
32
|
+
def load_eval_set(path: str) -> list[dict]:
|
|
33
|
+
"""Load eval set from JSON file."""
|
|
34
|
+
with open(path, "r") as f:
|
|
35
|
+
return json.load(f)
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
def read_skill_description(skill_path: str) -> str:
|
|
39
|
+
"""Extract the description from a skill's SKILL.md frontmatter."""
|
|
40
|
+
skill_md = os.path.join(skill_path, "SKILL.md")
|
|
41
|
+
if not os.path.exists(skill_md):
|
|
42
|
+
# Maybe it's a single .md file (agent)
|
|
43
|
+
skill_md = skill_path
|
|
44
|
+
|
|
45
|
+
with open(skill_md, "r") as f:
|
|
46
|
+
content = f.read()
|
|
47
|
+
|
|
48
|
+
# Parse frontmatter
|
|
49
|
+
if content.startswith("---"):
|
|
50
|
+
end = content.index("---", 3)
|
|
51
|
+
frontmatter = content[3:end]
|
|
52
|
+
for line in frontmatter.split("\n"):
|
|
53
|
+
if line.strip().startswith("description:"):
|
|
54
|
+
return line.split("description:", 1)[1].strip().strip('"').strip("'")
|
|
55
|
+
|
|
56
|
+
return ""
|
|
57
|
+
|
|
58
|
+
|
|
59
|
+
def detect_runner() -> str:
|
|
60
|
+
"""Auto-detect which AI runner is available."""
|
|
61
|
+
for runner in ["claude", "cursor", "codex"]:
|
|
62
|
+
try:
|
|
63
|
+
subprocess.run([runner, "--version"], capture_output=True, timeout=5)
|
|
64
|
+
return runner
|
|
65
|
+
except (FileNotFoundError, subprocess.TimeoutExpired):
|
|
66
|
+
continue
|
|
67
|
+
return "claude" # fallback
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
def build_runner_command(runner: str, query: str, skill_path: str, model: str) -> list[str]:
|
|
71
|
+
"""Build the CLI command for the given runner."""
|
|
72
|
+
if runner == "claude":
|
|
73
|
+
return ["claude", "-p", query, "--model", model, "--max-turns", "1"]
|
|
74
|
+
elif runner == "cursor":
|
|
75
|
+
# Cursor uses --prompt flag in CLI mode
|
|
76
|
+
return ["cursor", "--prompt", query, "--model", model]
|
|
77
|
+
elif runner == "codex":
|
|
78
|
+
# Codex uses positional prompt
|
|
79
|
+
return ["codex", "-q", query, "--model", model]
|
|
80
|
+
else:
|
|
81
|
+
return ["claude", "-p", query, "--model", model, "--max-turns", "1"]
|
|
82
|
+
|
|
83
|
+
|
|
84
|
+
def test_trigger(query: str, skill_path: str, model: str, runner: str = "claude") -> bool:
|
|
85
|
+
"""Test if a query triggers the skill using the configured runner.
|
|
86
|
+
|
|
87
|
+
Returns True if the skill was consulted (triggered).
|
|
88
|
+
"""
|
|
89
|
+
cmd = build_runner_command(runner, query, skill_path, model)
|
|
90
|
+
try:
|
|
91
|
+
result = subprocess.run(
|
|
92
|
+
cmd,
|
|
93
|
+
capture_output=True,
|
|
94
|
+
text=True,
|
|
95
|
+
timeout=120,
|
|
96
|
+
)
|
|
97
|
+
# Check if the skill name appears in the output (indicating it was loaded)
|
|
98
|
+
output = result.stdout + result.stderr
|
|
99
|
+
skill_name = os.path.basename(skill_path.rstrip("/"))
|
|
100
|
+
return skill_name.lower() in output.lower()
|
|
101
|
+
except (subprocess.TimeoutExpired, FileNotFoundError) as e:
|
|
102
|
+
print(f" Warning: {runner} failed: {e}", file=sys.stderr)
|
|
103
|
+
return False
|
|
104
|
+
|
|
105
|
+
|
|
106
|
+
def evaluate(eval_set: list[dict], skill_path: str, model: str, verbose: bool = False, runner: str = "claude") -> dict:
|
|
107
|
+
"""Run all eval queries and compute accuracy."""
|
|
108
|
+
results = []
|
|
109
|
+
correct = 0
|
|
110
|
+
total = len(eval_set)
|
|
111
|
+
|
|
112
|
+
for i, item in enumerate(eval_set):
|
|
113
|
+
query = item["query"]
|
|
114
|
+
should_trigger = item["should_trigger"]
|
|
115
|
+
|
|
116
|
+
if verbose:
|
|
117
|
+
print(f" [{i + 1}/{total}] Testing: {query[:60]}...", file=sys.stderr)
|
|
118
|
+
|
|
119
|
+
triggered = test_trigger(query, skill_path, model, runner)
|
|
120
|
+
is_correct = triggered == should_trigger
|
|
121
|
+
|
|
122
|
+
if is_correct:
|
|
123
|
+
correct += 1
|
|
124
|
+
|
|
125
|
+
results.append(
|
|
126
|
+
{
|
|
127
|
+
"query": query,
|
|
128
|
+
"should_trigger": should_trigger,
|
|
129
|
+
"triggered": triggered,
|
|
130
|
+
"correct": is_correct,
|
|
131
|
+
}
|
|
132
|
+
)
|
|
133
|
+
|
|
134
|
+
if verbose:
|
|
135
|
+
status = "PASS" if is_correct else "FAIL"
|
|
136
|
+
print(f" {status} (should_trigger={should_trigger}, triggered={triggered})", file=sys.stderr)
|
|
137
|
+
|
|
138
|
+
accuracy = correct / total if total > 0 else 0
|
|
139
|
+
|
|
140
|
+
return {
|
|
141
|
+
"accuracy": round(accuracy, 3),
|
|
142
|
+
"correct": correct,
|
|
143
|
+
"total": total,
|
|
144
|
+
"results": results,
|
|
145
|
+
"false_positives": [r for r in results if not r["should_trigger"] and r["triggered"]],
|
|
146
|
+
"false_negatives": [r for r in results if r["should_trigger"] and not r["triggered"]],
|
|
147
|
+
}
|
|
148
|
+
|
|
149
|
+
|
|
150
|
+
def main():
|
|
151
|
+
parser = argparse.ArgumentParser(description="Test skill/agent description triggering accuracy")
|
|
152
|
+
parser.add_argument("--eval-set", required=True, help="Path to eval set JSON")
|
|
153
|
+
parser.add_argument("--skill-path", required=True, help="Path to skill directory or agent file")
|
|
154
|
+
parser.add_argument("--model", default="claude-sonnet-4-6", help="Model ID for testing")
|
|
155
|
+
parser.add_argument("--runner", default="auto", help="AI runner: claude, cursor, codex, or auto (detect)")
|
|
156
|
+
parser.add_argument("--max-iterations", type=int, default=1, help="Number of evaluation iterations")
|
|
157
|
+
parser.add_argument("--verbose", action="store_true", help="Print progress")
|
|
158
|
+
args = parser.parse_args()
|
|
159
|
+
|
|
160
|
+
runner = args.runner if args.runner != "auto" else detect_runner()
|
|
161
|
+
print(f"Using runner: {runner}")
|
|
162
|
+
|
|
163
|
+
eval_set = load_eval_set(args.eval_set)
|
|
164
|
+
print(f"Loaded {len(eval_set)} eval queries ({sum(1 for e in eval_set if e['should_trigger'])} should-trigger, {sum(1 for e in eval_set if not e['should_trigger'])} should-not-trigger)")
|
|
165
|
+
|
|
166
|
+
description = read_skill_description(args.skill_path)
|
|
167
|
+
if description:
|
|
168
|
+
print(f"Current description: {description[:100]}...")
|
|
169
|
+
|
|
170
|
+
for iteration in range(1, args.max_iterations + 1):
|
|
171
|
+
print(f"\n--- Iteration {iteration} ---")
|
|
172
|
+
result = evaluate(eval_set, args.skill_path, args.model, args.verbose, runner)
|
|
173
|
+
|
|
174
|
+
print(f"Accuracy: {result['accuracy']:.1%} ({result['correct']}/{result['total']})")
|
|
175
|
+
if result["false_positives"]:
|
|
176
|
+
print(f"False positives ({len(result['false_positives'])}):")
|
|
177
|
+
for fp in result["false_positives"]:
|
|
178
|
+
print(f" - {fp['query'][:80]}")
|
|
179
|
+
if result["false_negatives"]:
|
|
180
|
+
print(f"False negatives ({len(result['false_negatives'])}):")
|
|
181
|
+
for fn in result["false_negatives"]:
|
|
182
|
+
print(f" - {fn['query'][:80]}")
|
|
183
|
+
|
|
184
|
+
# Write results
|
|
185
|
+
output_path = os.path.join(os.path.dirname(args.eval_set), "trigger-eval-results.json")
|
|
186
|
+
with open(output_path, "w") as f:
|
|
187
|
+
json.dump(result, f, indent=2)
|
|
188
|
+
print(f"\nResults saved to {output_path}")
|
|
189
|
+
|
|
190
|
+
|
|
191
|
+
if __name__ == "__main__":
|
|
192
|
+
main()
|
package/skills/aw-build/SKILL.md
CHANGED
|
@@ -34,7 +34,8 @@ Do not use for vague ideation, unclear bugs, or release-only work.
|
|
|
34
34
|
Use `../../references/build-increments.md` to keep changes thin, reversible, and rollback-friendly.
|
|
35
35
|
For multi-file or high-risk work, load `incremental-implementation`.
|
|
36
36
|
5. Build one slice or one bounded parallel wave at a time.
|
|
37
|
-
For any slice that changes observable behavior, fixes a bug, or refactors live behavior, load `tdd-
|
|
37
|
+
For any slice that changes observable behavior, fixes a bug, or refactors live behavior, load `tdd-workflow` and require explicit RED-GREEN-REFACTOR (RED -> GREEN -> REFACTOR).
|
|
38
|
+
Use `tdd-guide` when a specialist subagent is useful, and use the `tdd` companion skill when the slice needs deeper behavior-test, mocking, or tracer-bullet guidance.
|
|
38
39
|
For config, docs, infra, migration, or other non-behavior slices where test-first is not meaningful, record the best pre-change proof available before editing and the focused post-change validation that will prove the slice.
|
|
39
40
|
During implementation, prefer the simplest change that fits existing patterns.
|
|
40
41
|
Avoid speculative abstractions, unnecessary branching, and adjacent cleanup outside the approved slice.
|
|
@@ -149,16 +150,30 @@ Parallel build fan-out must stay within the planned `max_parallel_subagents` cap
|
|
|
149
150
|
- deferred findings
|
|
150
151
|
- simplification notes
|
|
151
152
|
- `save_point_commits`
|
|
153
|
+
- `html_companion_artifacts`
|
|
152
154
|
- blockers or concerns
|
|
153
155
|
- recommended next commands
|
|
154
156
|
|
|
157
|
+
## Human HTML Companion
|
|
158
|
+
|
|
159
|
+
Markdown `execution.md` remains canonical for agents.
|
|
160
|
+
When build writes or materially updates `execution.md`, also create or refresh `.aw_docs/features/<feature_slug>/execution.html`. HTML sidecars are required stage outputs, not advisory metadata.
|
|
161
|
+
|
|
162
|
+
Delegate to the `aw:echo` subagent with the `implementation-plan` profile.
|
|
163
|
+
Invoking `/aw:build` in default `dual` mode is explicit authorization to spawn exactly one `aw:echo` subagent for HTML companion generation; do not skip HTML only because no direct command is available.
|
|
164
|
+
Resolve output mode as: explicit user request for Markdown-only -> otherwise `dual`. `.aw_docs/config.json` and `AW_DOCS_OUTPUT_MODE` may request `dual` or `html`, but must not silently suppress required SDLC HTML sidecars.
|
|
165
|
+
|
|
166
|
+
Pass approved inputs, completed slices, phase progress, file map, validation evidence, save-point commits, deferred findings, and next command as the source bundle.
|
|
167
|
+
Record the colocated sidecar in `state.json` `html_companion_artifacts` with `source_path`, `html_path`, profile, status, `run_ref` when available, publish status, and any explicit Markdown-only skip or fallback reason.
|
|
168
|
+
Spawn exactly one `aw:echo` subagent and wait for the colocated `.html` sidecar before the final handoff unless the user explicitly asks not to wait. If the harness still cannot spawn `aw:echo`, create a conservative self-contained fallback HTML sidecar in the same turn using the `aw:echo` safety and design contract, record `generated_fallback` plus the blocker, and keep Markdown canonical.
|
|
169
|
+
|
|
155
170
|
## Verification
|
|
156
171
|
|
|
157
172
|
Before leaving build, confirm:
|
|
158
173
|
|
|
159
174
|
- [ ] the change came from approved inputs or a clearly approved direct technical request
|
|
160
175
|
- [ ] the work was split into thin, reversible increments when non-trivial
|
|
161
|
-
- [ ] behavior-changing slices used explicit RED -> GREEN -> REFACTOR via `tdd-guide`
|
|
176
|
+
- [ ] behavior-changing slices used explicit RED -> GREEN -> REFACTOR via `tdd-workflow` and, when useful, `tdd-guide` or `tdd`
|
|
162
177
|
- [ ] non-behavior slices recorded pre-change proof and focused post-change validation
|
|
163
178
|
- [ ] each meaningful completed slice reached green before the next slice started
|
|
164
179
|
- [ ] each meaningful completed slice had a focused review with the right reviewer agent before the next slice started
|
|
@@ -169,6 +184,7 @@ Before leaving build, confirm:
|
|
|
169
184
|
- [ ] phased plans, if used, recorded phase completion plus the next phase transition
|
|
170
185
|
- [ ] meaningful completed slices produced recorded save-point commits
|
|
171
186
|
- [ ] `execution.md` and `state.json` are updated
|
|
187
|
+
- [ ] the HTML companion file exists, or the user explicitly requested Markdown-only
|
|
172
188
|
|
|
173
189
|
## Final Output Shape
|
|
174
190
|
|
|
@@ -185,5 +201,6 @@ Always end with:
|
|
|
185
201
|
- `Chunk Reviews`
|
|
186
202
|
- `Simplification`
|
|
187
203
|
- `Save Points`
|
|
204
|
+
- `HTML Companion`
|
|
188
205
|
- `Blockers`
|
|
189
206
|
- `Next`
|
|
@@ -25,6 +25,7 @@ Do not use for launch discipline or end-to-end orchestration.
|
|
|
25
25
|
The required QA and review outputs must exist.
|
|
26
26
|
2. Select one release path.
|
|
27
27
|
PR, branch, staging, or production.
|
|
28
|
+
**Default deployment environment:** if the user does **not** explicitly name a target (`production`, `prod`, `staging-versions`, green/default/promote, a named cluster/VPN, etc.), assume **`staging`** only. Do not infer production or staging-versions from vague phrases (“deploy”, “ship”, “push it”). Jenkins paths and MCP job folders must use the **`staging/...`** prefix unless an explicit non-staging target is confirmed.
|
|
28
29
|
3. Resolve the org-standard mechanism.
|
|
29
30
|
Use the repo archetype and resolved baseline profile to choose provider and mechanism.
|
|
30
31
|
Load `ci-cd-and-automation` for gate ordering, preview/deploy automation, and rollback-aware pipeline expectations.
|
|
@@ -35,6 +36,50 @@ Do not use for launch discipline or end-to-end orchestration.
|
|
|
35
36
|
5. Hand off to `aw-ship` when requested.
|
|
36
37
|
Use `aw-ship` for rollout safety, rollback readiness, and closeout.
|
|
37
38
|
|
|
39
|
+
## Repo routing: Revex membership (`ghl-revex-backend`)
|
|
40
|
+
|
|
41
|
+
**Hard rule:** If the deployment is from the **`ghl-revex-backend`** repo **and** it targets **membership** workloads (communities, client-portal, courses, `ghl-revex-backend` server/workers, Debezium, ProxySQL, membership workers, etc.), **do not** treat this as a generic `aw-deploy` + **platform-backend** Jenkins path.
|
|
42
|
+
|
|
43
|
+
**Always** use one of these instead:
|
|
44
|
+
|
|
45
|
+
- **Cursor command:** `/aw/revex-memberships-infra-deploy` or `/aw/revex-membership-frontend-infra-deploy` (same command surface — argument hint: `staging` / `staging-versions` / `production` + app/worker names; **default env = `staging`** per workflow step 2).
|
|
46
|
+
- **Agent path:** the **infra-release-engineer** / membership **deployment** flow that loads **`backend-deployment-skill`** from the registry (e.g. `revex/memberships/infra/skills/backend-deployment-skill/SKILL.md` when `.aw` resolves to the AW registry).
|
|
47
|
+
|
|
48
|
+
**Jenkins MCP** for that stack is still **`user-ghl-ai`** (`jenkins_trigger-build`, `jenkins_list-jobs`, `jenkins_get-build`, …), but **job paths, `DeploymentOption`, green/default/promote**, and **parameter names** must come from **`backend-deployment-skill`**, not from the `platform-backend` worker job list below.
|
|
49
|
+
|
|
50
|
+
If the user only said “deploy” without naming the repo, **infer from workspace root / `package.json` / remote** — when it is `ghl-revex-backend`, **hand off** to the command above rather than improvising `staging/revex/...` paths from memory.
|
|
51
|
+
|
|
52
|
+
## Deploy execution: Jenkins (GHL platform-backend)
|
|
53
|
+
|
|
54
|
+
Use this section **only** when the repo is **`platform-backend`** (or another app explicitly using `deployments/<env>/workers/` under that monorepo’s Jenkins layout), **not** when [Repo routing: Revex membership](#repo-routing-revex-membership-ghl-revex-backend) applies.
|
|
55
|
+
|
|
56
|
+
When the resolved mechanism is **Jenkins** (repo has `deployments/<env>/workers/Jenkinsfile*.` pipelines), **execute the deploy**, do not stop at “open Jenkins manually” if the agent session exposes MCP.
|
|
57
|
+
|
|
58
|
+
### MCP (preferred when available)
|
|
59
|
+
|
|
60
|
+
- **Server id:** `user-ghl-ai` (Cursor often shows the label **ghl-ai** — if `call_mcp_tool` fails, confirm the server identifier from the project MCP metadata, e.g. `.cursor/projects/<repo>/mcps/user-ghl-ai/SERVER_METADATA.json`, or the Cursor MCP panel.)
|
|
61
|
+
- **Tool order (fail-closed discovery):**
|
|
62
|
+
1. `jenkins_list-jobs` with `folder` path segments (e.g. `staging/common/platform-workers`) until you find the **WorkflowJob**, not a folder.
|
|
63
|
+
2. `jenkins_get-build` on the job’s **lastSuccessfulBuild** to learn exact **parameter names** and shapes (boolean params still go in as strings in step 3).
|
|
64
|
+
3. `jenkins_trigger-build` with `action: "trigger"`, full job `path`, and `parameters` as **string key → string value** only (`"true"` / `"false"` for booleans; `SkipBuild` is typically `"Yes"` / `"No"`).
|
|
65
|
+
4. `jenkins_list-builds` (and optionally `jenkins_get-build-log`) to confirm the new build queued / finished — the trigger API may return success while the build is still pending.
|
|
66
|
+
- **Local schemas (optional):** Cursor may mirror tool contracts under `.cursor/projects/<workspace>/mcps/user-ghl-ai/tools/*.json` — read before calling if the session does not show tool args inline.
|
|
67
|
+
|
|
68
|
+
### Common `platform-backend` staging job paths (verify with `jenkins_list-jobs`)
|
|
69
|
+
|
|
70
|
+
These are the usual umbrellas; **always** confirm in Jenkins for the current folder layout:
|
|
71
|
+
|
|
72
|
+
- **Events / mixed workers** (`deployments/staging/workers/Jenkinsfile.eventsworker`):
|
|
73
|
+
`staging/common/platform-workers/platform-events-worker`
|
|
74
|
+
Worker selection uses **slug parameters** matching the Jenkinsfile list (e.g. `events-worker=true`).
|
|
75
|
+
- **Mongo change-stream workers by team** (`deployments/staging/workers/Jenkinsfile.*.mongoEventsWorker`):
|
|
76
|
+
`staging/common/platform-workers/platform-mongo-events-workers/platform-mongo-events-worker-{automation|crm|leadgen|platform|revex}`
|
|
77
|
+
Revex flags are **uppercase** boolean params from that Jenkinsfile (e.g. `CLIENTPORTAL_USERS_CONTACTS=true`).
|
|
78
|
+
|
|
79
|
+
### If MCP is not callable in this session
|
|
80
|
+
|
|
81
|
+
Record a **clear blocker** in the handoff: UI shows `ghl-ai` enabled but the agent has no `call_mcp_tool` route, or auth denied. Still provide **exact** job path(s), branch parameter, and boolean flags so a human can run the same build in the Jenkins UI.
|
|
82
|
+
|
|
38
83
|
## Completion Contract
|
|
39
84
|
|
|
40
85
|
Deploy is complete only when one of these is true:
|
|
@@ -52,16 +97,17 @@ Every deploy handoff must make these things obvious:
|
|
|
52
97
|
|
|
53
98
|
## Common Rationalizations
|
|
54
99
|
|
|
55
|
-
| Rationalization
|
|
56
|
-
|
|
100
|
+
| Rationalization | Reality |
|
|
101
|
+
| ----------------------------------------- | -------------------------------------------------------------- |
|
|
57
102
|
| "Deploy can also handle launch closeout." | Release action and launch discipline are related but distinct. |
|
|
58
|
-
| "I'll just guess the staging mechanism."
|
|
103
|
+
| "I'll just guess the staging mechanism." | Unknown deployment config must fail closed. |
|
|
59
104
|
|
|
60
105
|
## Red Flags
|
|
61
106
|
|
|
62
107
|
- deploy runs without clear test and review evidence
|
|
63
108
|
- provider or mechanism is guessed
|
|
64
109
|
- deploy silently turns into release orchestration
|
|
110
|
+
- environment is assumed to be **production** or **staging-versions** when the user did not state it (default must remain **staging**)
|
|
65
111
|
|
|
66
112
|
## State File
|
|
67
113
|
|
|
@@ -77,15 +123,30 @@ Every deploy handoff must make these things obvious:
|
|
|
77
123
|
- build or release links
|
|
78
124
|
- execution evidence
|
|
79
125
|
- rollback path
|
|
126
|
+
- `html_companion_artifacts`
|
|
80
127
|
- blockers
|
|
81
128
|
- recommended next commands
|
|
82
129
|
|
|
130
|
+
## Human HTML Companion
|
|
131
|
+
|
|
132
|
+
Markdown `release.md` remains canonical for agents.
|
|
133
|
+
When deploy writes or materially updates `release.md`, also create or refresh `.aw_docs/features/<feature_slug>/release.html`. HTML sidecars are required stage outputs, not advisory metadata.
|
|
134
|
+
|
|
135
|
+
Delegate to the `aw:echo` subagent with the `release-report` profile.
|
|
136
|
+
Invoking `/aw:deploy` in default `dual` mode is explicit authorization to spawn exactly one `aw:echo` subagent for HTML companion generation; do not skip HTML only because no direct command is available.
|
|
137
|
+
Resolve output mode as: explicit user request for Markdown-only -> otherwise `dual`. `.aw_docs/config.json` and `AW_DOCS_OUTPUT_MODE` may request `dual` or `html`, but must not silently suppress required SDLC HTML sidecars.
|
|
138
|
+
|
|
139
|
+
Pass selected mode, provider, resolved mechanism, release links, execution evidence, rollback path, blockers, and next command as the source bundle.
|
|
140
|
+
Record the colocated sidecar in `state.json` `html_companion_artifacts` with `source_path`, `html_path`, profile, status, `run_ref` when available, publish status, and any explicit Markdown-only skip or fallback reason.
|
|
141
|
+
Spawn exactly one `aw:echo` subagent and wait for the colocated `.html` sidecar before the final handoff unless the user explicitly asks not to wait. If the harness still cannot spawn `aw:echo`, create a conservative self-contained fallback HTML sidecar in the same turn using the `aw:echo` safety and design contract, record `generated_fallback` plus the blocker, and keep Markdown canonical.
|
|
142
|
+
|
|
83
143
|
## Verification
|
|
84
144
|
|
|
85
145
|
- [ ] one release action was selected explicitly
|
|
86
146
|
- [ ] provider and mechanism came from repo archetype and baseline resolution
|
|
87
147
|
- [ ] `release.md` and `state.json` are updated
|
|
88
148
|
- [ ] handoff to `aw-ship` is clear when launch discipline is still needed
|
|
149
|
+
- [ ] the HTML companion file exists, or the user explicitly requested Markdown-only
|
|
89
150
|
|
|
90
151
|
## Final Output Shape
|
|
91
152
|
|
|
@@ -98,4 +159,5 @@ Always end with:
|
|
|
98
159
|
- `Execution Evidence`
|
|
99
160
|
- `Rollback Path`
|
|
100
161
|
- `Outcome`
|
|
162
|
+
- `HTML Companion`
|
|
101
163
|
- `Next`
|
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: aw-design
|
|
3
|
+
description: Generate premium SaaS UI designs using the Highrise design system. Produces linked HTML prototypes with all state variants, micro-interactions, dark mode, responsive breakpoints, and an index page that maps the full feature. Stitch MCP first, static HTML fallback.
|
|
4
|
+
trigger: Phase 5 of aw-feature, or when the user asks for UI design, screen mockups, HTML prototypes, or design exploration for a GHL feature.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# AW Design
|
|
8
|
+
|
|
9
|
+
Generate production-grade SaaS screens for GoHighLevel features. Every screen should feel like it belongs in Linear, Vercel, or Stripe — clean, restrained, premium, and alive with subtle motion. Highrise provides the tokens; your job is to make something genuinely good within them.
|
|
10
|
+
|
|
11
|
+
## References (read on demand)
|
|
12
|
+
|
|
13
|
+
This skill is intentionally lean. Read the reference files as you need them:
|
|
14
|
+
|
|
15
|
+
- `references/highrise-tokens.md` — colors (light + dark), typography, spacing, components, restraint rules, responsive breakpoints. **Read before generating any screen.**
|
|
16
|
+
- `references/prompt-template.md` — the full screen prompt, layout hints by screen type, and state variant prompt additions. **Read before writing your first prompt.**
|
|
17
|
+
- `references/stitch-workflow.md` — Stitch tool reference, generation steps, timeout/model fallback, HTML fallback path, iteration patterns. **Read before the first Stitch call.**
|
|
18
|
+
- `references/micro-interactions.md` — all CSS transitions, keyframes, and `prefers-reduced-motion` fallback. **Read when writing HTML directly.**
|
|
19
|
+
- `references/quality-checklist.md` — the pass/fail contract the self-review enforces, plus the mandatory index page spec. **Read before step 5 (index).**
|
|
20
|
+
- `references/self-review.md` — deterministic + visual review tracks, capture matrix, evidence-required REVIEW.md template, per-iteration audit format, status-rule table (✅/⚠️/❌), and the anti-fake sanity checks. **Read at the start of step 6. Non-optional.**
|
|
21
|
+
|
|
22
|
+
## Path Precedence
|
|
23
|
+
|
|
24
|
+
**Try Stitch MCP first.** Static HTML is the fallback, not a parallel option.
|
|
25
|
+
|
|
26
|
+
1. Check if Stitch MCP tools are registered (look for `stitch_*` tools on the `user-ghl-ai` server).
|
|
27
|
+
2. If yes → run the Stitch path. Do not fall back unless a call actually fails.
|
|
28
|
+
3. A **timeout is not a failure** — the default client timeout (~70s) is shorter than Stitch's typical generation time. Poll via `stitch_list-screens` + `stitch_get-screen` before giving up. See timeout handling in `references/stitch-workflow.md`.
|
|
29
|
+
4. If polling confirms the call truly failed (error, auth, quota exhausted, or no screen after 3 min + Flash retry) → document the reason and fall back to hand-written HTML for that screen.
|
|
30
|
+
5. If Stitch tools are missing entirely → skip Stitch and write HTML directly.
|
|
31
|
+
6. If the user explicitly says "offline" or "static HTML only" → skip Stitch and write HTML directly.
|
|
32
|
+
|
|
33
|
+
Never silently pick the HTML path just because it's faster. The user asked for design; Stitch produces better output.
|
|
34
|
+
|
|
35
|
+
## Design Thinking
|
|
36
|
+
|
|
37
|
+
Before generating anything, commit to a clear direction:
|
|
38
|
+
|
|
39
|
+
- **Purpose** — What problem does this screen solve? Who's looking at it and what do they need?
|
|
40
|
+
- **Hierarchy** — What's the single most important thing on the page? Build everything around it.
|
|
41
|
+
- **Restraint** — Premium SaaS is defined by what you leave out. Monochrome-first. One accent color. Let content breathe.
|
|
42
|
+
- **Craft** — Spacing, alignment, typography weight, and motion — the details that separate "looks AI-generated" from "looks designed."
|
|
43
|
+
|
|
44
|
+
The goal is intentional, polished, and cohesive — not flashy.
|
|
45
|
+
|
|
46
|
+
## Workflow
|
|
47
|
+
|
|
48
|
+
### 1. Understand what to design
|
|
49
|
+
|
|
50
|
+
Read the feature's `requirements.md` or `prd.md`. For each screen, identify:
|
|
51
|
+
|
|
52
|
+
- What it shows (purpose, key data, user actions)
|
|
53
|
+
- Which states it needs (default, empty, loading, error, modal)
|
|
54
|
+
- How screens connect (navigation flow)
|
|
55
|
+
|
|
56
|
+
Produce `.aw_docs/features/<slug>/designs/SCREEN_PLAN.md` listing every screen and the nav structure linking them. If scope is unclear, ask the user which pages and flows to cover before generating.
|
|
57
|
+
|
|
58
|
+
### 2. Prepare the prompt
|
|
59
|
+
|
|
60
|
+
Read `references/highrise-tokens.md` and `references/prompt-template.md`. Fill in the bracketed parts of the template for the first screen (screen type, layout, nav items, current page).
|
|
61
|
+
|
|
62
|
+
### 3. Generate screens
|
|
63
|
+
|
|
64
|
+
Read `references/stitch-workflow.md`. Follow the Stitch path. **A timeout is not a failure.** Stitch's 70s MCP client timeout regularly fires before generation finishes; the screen is still being built server-side.
|
|
65
|
+
|
|
66
|
+
**Polling protocol on timeout (mandatory, non-negotiable):**
|
|
67
|
+
|
|
68
|
+
1. Capture the `requestId` / note the timestamp.
|
|
69
|
+
2. Call `stitch_list-screens` every **30 seconds** for **6 polls** (total 3 minutes of wall time).
|
|
70
|
+
3. After each poll, check whether a new screen matching the request has appeared — if yes, call `stitch_get-screen` and continue as normal.
|
|
71
|
+
4. If all 6 polls complete with no new screen → retry **once** with `model: "GEMINI_3_FLASH"` and re-apply the polling protocol.
|
|
72
|
+
5. If the Flash retry also produces nothing → document the screen as a Stitch failure in `SCREEN_PLAN.md` and fall back to hand-written HTML **for that screen only**.
|
|
73
|
+
|
|
74
|
+
Do not write HTML on the first timeout. Do not skip polls. Do not treat a single 70s wait as "Stitch failed."
|
|
75
|
+
|
|
76
|
+
If Stitch is genuinely unavailable (tools not registered, or `stitch_list-screens` errors), read `references/micro-interactions.md` and write self-contained HTML files using the same prompt template as the spec.
|
|
77
|
+
|
|
78
|
+
### 4. Generate state variants
|
|
79
|
+
|
|
80
|
+
For every screen, produce default + loading + empty + error variants (+ modal if applicable). Use the state variant prompt additions at the bottom of `references/prompt-template.md`. Keep them on `GEMINI_3_FLASH` (the default) — they're simple derivations of an existing screen.
|
|
81
|
+
|
|
82
|
+
### 5. Build the index page (mandatory)
|
|
83
|
+
|
|
84
|
+
Write `.aw_docs/features/<slug>/designs/index.html` — the map of the entire feature. This is the stakeholder entry point. See the "Index page requirements" section in `references/quality-checklist.md` for what it must contain.
|
|
85
|
+
|
|
86
|
+
### 6. Self-review & iterate (until production-ready)
|
|
87
|
+
|
|
88
|
+
The skill is not done generating — it is done **proving the output is production-ready**. Read `references/self-review.md` and run both tracks.
|
|
89
|
+
|
|
90
|
+
**This step is mandatory and non-interactive.** Do not ask the user "should I run the review?" Do not ask "do you want me to iterate?" Do not stop after iteration 1 to check in. The contract is: run the loop until zero findings, or until 3 iterations have elapsed with fixes-on-disk in at least 2 of them. That happens silently — the user sees the result, not the permission prompt. Asking for permission to run this step is a protocol violation.
|
|
91
|
+
|
|
92
|
+
- **Track A (deterministic):** regex-level sweep for rogue hex, missing states, missing media queries, missing focus rings, off-scale typography (including `font:` shorthand), sidebar restraint, index completeness, and placeholder data. Always runs, every iteration.
|
|
93
|
+
- **Track B (visual):** navigate each screen via a browser MCP, `browser_resize` through 320 / 768 / 1024 / 1440, toggle dark mode, run cross-screen consistency spot-checks, read `browser_console_messages`. Use **Playwright MCP** (portable across Codex, Claude, and Cursor) or Cursor's `cursor-ide-browser` when running inside Cursor. Both expose the same `browser_*` tool surface. If `file://` URLs are rejected by the MCP, spin up `python3 -m http.server 8765 --directory <designs/>` in the background and use `http://127.0.0.1:8765/...` — never accept the rejection as a reason to skip Track B. Always runs every iteration. Only skipped if neither MCP is registered at all, in which case mark Track B SKIPPED in `REVIEW.md` and downgrade the status.
|
|
94
|
+
- **Never skip Track B just because Track A had findings.** Both tracks run every iteration.
|
|
95
|
+
|
|
96
|
+
Categorize every finding using the fix-method table in `references/self-review.md`. **Prefer 0-cost direct edits over Stitch regeneration** — never burn quota to fix a rogue hex.
|
|
97
|
+
|
|
98
|
+
Loop up to **3 iterations**. Stop conditions and resulting REVIEW.md status are defined in `references/self-review.md`. Summary:
|
|
99
|
+
|
|
100
|
+
- Zero findings → ✅ Production-ready
|
|
101
|
+
- 3 iterations run, findings decreasing, fixes applied in ≥2 iterations → ⚠️ Shipped with known issues
|
|
102
|
+
- Findings stopped decreasing, or a BLOCKER surfaced, or only 1 iteration ran with findings remaining → ❌ Blocked
|
|
103
|
+
|
|
104
|
+
Each iteration must produce its own evidence section in `designs/REVIEW.md` (exact command strings + output counts for Track A, capture-matrix ratio for Track B, fixes-applied list). A status of ⚠️ without two `## Fixes applied` sections on disk is invalid — downgrade to ❌ instead of lying.
|
|
105
|
+
|
|
106
|
+
Do not proceed to step 7 until `REVIEW.md` shows ✅, or until you've exhausted 3 iterations and explicitly documented the remaining blockers and their severity.
|
|
107
|
+
|
|
108
|
+
### 7. Present to user
|
|
109
|
+
|
|
110
|
+
Share `designs/index.html` as the entry point and `designs/REVIEW.md` as the audit trail. If `REVIEW.md` status is ⚠️ or ❌, **call it out explicitly** — don't bury the known issues. Take user feedback here; any change request re-enters step 6 before re-presenting.
|
|
111
|
+
|
|
112
|
+
For user-driven revisions: use `stitch_edit-screens` for targeted fixes, `stitch_generate-variants` for alternatives, and only regenerate for major rethinks. See the iteration table in `references/stitch-workflow.md`.
|
|
113
|
+
|
|
114
|
+
### 8. Document the design
|
|
115
|
+
|
|
116
|
+
Write `.aw_docs/features/<slug>/design.md` **at the feature root, not inside `designs/`**:
|
|
117
|
+
|
|
118
|
+
- Screen-by-screen walkthrough (what each screen does, how users navigate between them)
|
|
119
|
+
- Component inventory (which existing HL components to reuse vs. what's new)
|
|
120
|
+
- Key design decisions and rationale
|
|
121
|
+
- Link to `designs/index.html` as the entry point
|
|
122
|
+
- Link to `designs/SCREEN_PLAN.md` for the flow map
|
|
123
|
+
- Link to `designs/REVIEW.md` for the self-review audit trail
|
|
124
|
+
|
|
125
|
+
## Output Structure
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
.aw_docs/features/<slug>/
|
|
129
|
+
├── requirements.md (from earlier phase)
|
|
130
|
+
├── prd.md (from earlier phase)
|
|
131
|
+
├── design.md ← design decisions, component inventory
|
|
132
|
+
└── designs/
|
|
133
|
+
├── index.html ← entry point: links to every screen + state
|
|
134
|
+
├── SCREEN_PLAN.md ← flow map + nav structure
|
|
135
|
+
├── REVIEW.md ← self-review audit trail (step 6 output)
|
|
136
|
+
├── <screen-1>/
|
|
137
|
+
│ ├── default.html
|
|
138
|
+
│ ├── empty.html
|
|
139
|
+
│ ├── loading.html
|
|
140
|
+
│ ├── error.html
|
|
141
|
+
│ └── modal-<name>.html
|
|
142
|
+
├── <screen-2>/
|
|
143
|
+
│ └── ...
|
|
144
|
+
└── screenshots/
|
|
145
|
+
├── <screen>-desktop-light.png (from stitch_get-screen if used)
|
|
146
|
+
└── <screen>-<state>-<width>.png (from step 6 visual sweep)
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Platform Skills to Reference
|
|
150
|
+
|
|
151
|
+
These exist in the platform registry and contain deeper guidance — reference specific sections when relevant, don't re-read them whole:
|
|
152
|
+
|
|
153
|
+
- **`platform-design:md`** — anti-pattern catalog (cheap vs premium patterns), DESIGN.md output format, dark mode token mapping, responsive breakpoint table
|
|
154
|
+
- **`platform-design:stitch-screen-generation`** — multi-select theme consistency, competitive benchmarking, Ralph Loop iteration
|
|
155
|
+
- **`platform-design:pixel-fidelity-review`** — 6-layer audit, scoring rubric, Highrise-specific CSS override gotchas (use when implementation review is needed)
|
|
156
|
+
- **`platform-design:system`** — HL component catalog, design token CSS properties, WCAG 2.1 AA rules
|