npm - aw-ecc - Versions diffs - 1.4.31 → 1.4.47 - Mend

aw-ecc 1.4.31 → 1.4.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (259) hide show

package/.claude-plugin/plugin.json +1 -1
package/.codex/hooks/aw-post-tool-use.sh +8 -2
package/.codex/hooks/aw-session-start.sh +11 -4
package/.codex/hooks/aw-stop.sh +8 -2
package/.codex/hooks/aw-user-prompt-submit.sh +10 -2
package/.codex/hooks.json +8 -8
package/.cursor/INSTALL.md +7 -5
package/.cursor/hooks/adapter.js +41 -4
package/.cursor/hooks/after-agent-response.js +62 -0
package/.cursor/hooks/before-submit-prompt.js +7 -1
package/.cursor/hooks/post-tool-use-failure.js +21 -0
package/.cursor/hooks/post-tool-use.js +39 -0
package/.cursor/hooks/shared/aw-phase-definitions.js +53 -0
package/.cursor/hooks/shared/aw-phase-runner.js +3 -1
package/.cursor/hooks/subagent-start.js +22 -4
package/.cursor/hooks/subagent-stop.js +18 -1
package/.cursor/hooks.json +23 -2
package/.opencode/package.json +1 -1
package/AGENTS.md +3 -3
package/README.md +5 -5
package/commands/adk.md +52 -0
package/commands/build.md +22 -9
package/commands/deploy.md +12 -0
package/commands/execute.md +9 -0
package/commands/feature.md +333 -0
package/commands/investigate.md +18 -5
package/commands/plan.md +23 -9
package/commands/publish.md +65 -0
package/commands/review.md +12 -0
package/commands/ship.md +12 -0
package/commands/test.md +12 -0
package/commands/verify.md +9 -0
package/hooks/hooks.json +36 -0
package/manifests/install-components.json +8 -0
package/manifests/install-modules.json +83 -0
package/manifests/install-profiles.json +7 -0
package/package.json +1 -1
package/scripts/ci/validate-rules.js +51 -0
package/scripts/cursor-aw-home/hooks.json +23 -2
package/scripts/cursor-aw-hooks/adapter.js +41 -4
package/scripts/cursor-aw-hooks/before-submit-prompt.js +7 -1
package/scripts/hooks/aw-usage-commit-created.js +32 -0
package/scripts/hooks/aw-usage-post-tool-use-failure.js +56 -0
package/scripts/hooks/aw-usage-post-tool-use.js +242 -0
package/scripts/hooks/aw-usage-prompt-submit.js +112 -0
package/scripts/hooks/aw-usage-session-start.js +48 -0
package/scripts/hooks/aw-usage-stop.js +182 -0
package/scripts/hooks/aw-usage-telemetry-send.js +84 -0
package/scripts/hooks/cost-tracker.js +3 -23
package/scripts/hooks/shared/aw-phase-definitions.js +53 -0
package/scripts/hooks/shared/aw-phase-runner.js +3 -1
package/scripts/lib/aw-hook-contract.js +2 -2
package/scripts/lib/aw-pricing.js +306 -0
package/scripts/lib/aw-usage-telemetry.js +472 -0
package/scripts/lib/codex-hook-config.js +8 -8
package/scripts/lib/cursor-hook-config.js +25 -10
package/scripts/lib/install-targets/codex-home.js +7 -0
package/scripts/lib/install-targets/cursor-project.js +3 -0
package/scripts/lib/install-targets/helpers.js +20 -3
package/skills/aw-adk/SKILL.md +317 -0
package/skills/aw-adk/agents/analyzer.md +113 -0
package/skills/aw-adk/agents/comparator.md +113 -0
package/skills/aw-adk/agents/grader.md +115 -0
package/skills/aw-adk/assets/eval_review.html +76 -0
package/skills/aw-adk/eval-viewer/generate_review.py +164 -0
package/skills/aw-adk/eval-viewer/viewer.html +181 -0
package/skills/aw-adk/evals/eval-colocated-placement.md +84 -0
package/skills/aw-adk/evals/eval-create-agent.md +90 -0
package/skills/aw-adk/evals/eval-create-command.md +98 -0
package/skills/aw-adk/evals/eval-create-eval.md +89 -0
package/skills/aw-adk/evals/eval-create-rule.md +99 -0
package/skills/aw-adk/evals/eval-create-skill.md +97 -0
package/skills/aw-adk/evals/eval-delete-agent.md +79 -0
package/skills/aw-adk/evals/eval-delete-command.md +89 -0
package/skills/aw-adk/evals/eval-delete-rule.md +86 -0
package/skills/aw-adk/evals/eval-delete-skill.md +90 -0
package/skills/aw-adk/evals/eval-meta-eval-coverage.md +78 -0
package/skills/aw-adk/evals/eval-meta-eval-determinism.md +81 -0
package/skills/aw-adk/evals/eval-meta-eval-false-pass.md +81 -0
package/skills/aw-adk/evals/eval-score-accuracy.md +95 -0
package/skills/aw-adk/evals/eval-type-redirect.md +68 -0
package/skills/aw-adk/evals/evals.json +96 -0
package/skills/aw-adk/references/artifact-wiring.md +162 -0
package/skills/aw-adk/references/cross-ide-mapping.md +71 -0
package/skills/aw-adk/references/eval-placement-guide.md +183 -0
package/skills/aw-adk/references/external-resources.md +75 -0
package/skills/aw-adk/references/getting-started.md +66 -0
package/skills/aw-adk/references/registry-structure.md +152 -0
package/skills/aw-adk/references/rubric-agent.md +36 -0
package/skills/aw-adk/references/rubric-command.md +36 -0
package/skills/aw-adk/references/rubric-eval.md +36 -0
package/skills/aw-adk/references/rubric-meta-eval.md +132 -0
package/skills/aw-adk/references/rubric-rule.md +36 -0
package/skills/aw-adk/references/rubric-skill.md +36 -0
package/skills/aw-adk/references/schemas.md +222 -0
package/skills/aw-adk/references/template-agent.md +251 -0
package/skills/aw-adk/references/template-command.md +279 -0
package/skills/aw-adk/references/template-eval.md +176 -0
package/skills/aw-adk/references/template-rule.md +119 -0
package/skills/aw-adk/references/template-skill.md +123 -0
package/skills/aw-adk/references/type-classifier.md +98 -0
package/skills/aw-adk/references/writing-good-agents.md +227 -0
package/skills/aw-adk/references/writing-good-commands.md +258 -0
package/skills/aw-adk/references/writing-good-evals.md +271 -0
package/skills/aw-adk/references/writing-good-rules.md +214 -0
package/skills/aw-adk/references/writing-good-skills.md +159 -0
package/skills/aw-adk/scripts/aggregate-benchmark.py +190 -0
package/skills/aw-adk/scripts/lint-artifact.sh +211 -0
package/skills/aw-adk/scripts/score-artifact.sh +179 -0
package/skills/aw-adk/scripts/trigger-eval.py +192 -0
package/skills/aw-build/SKILL.md +19 -2
package/skills/aw-deploy/SKILL.md +65 -3
package/skills/aw-design/SKILL.md +156 -0
package/skills/aw-design/references/highrise-tokens.md +394 -0
package/skills/aw-design/references/micro-interactions.md +76 -0
package/skills/aw-design/references/prompt-template.md +160 -0
package/skills/aw-design/references/quality-checklist.md +70 -0
package/skills/aw-design/references/self-review.md +497 -0
package/skills/aw-design/references/stitch-workflow.md +127 -0
package/skills/aw-feature/SKILL.md +293 -0
package/skills/aw-investigate/SKILL.md +17 -0
package/skills/aw-plan/SKILL.md +34 -3
package/skills/aw-publish/SKILL.md +300 -0
package/skills/aw-publish/evals/eval-confirmation-gate.md +60 -0
package/skills/aw-publish/evals/eval-intent-detection.md +111 -0
package/skills/aw-publish/evals/eval-push-modes.md +67 -0
package/skills/aw-publish/evals/eval-rules-push.md +60 -0
package/skills/aw-publish/evals/evals.json +29 -0
package/skills/aw-publish/references/push-modes.md +38 -0
package/skills/aw-review/SKILL.md +88 -9
package/skills/aw-rules-review/SKILL.md +124 -0
package/skills/aw-rules-review/agents/openai.yaml +3 -0
package/skills/aw-rules-review/scripts/generate-review-template.mjs +323 -0
package/skills/aw-ship/SKILL.md +16 -0
package/skills/aw-spec/SKILL.md +15 -0
package/skills/aw-tasks/SKILL.md +15 -0
package/skills/aw-test/SKILL.md +16 -0
package/skills/aw-yolo/SKILL.md +4 -0
package/skills/diagnose/SKILL.md +121 -0
package/skills/diagnose/scripts/hitl-loop.template.sh +41 -0
package/skills/finish-only-when-green/SKILL.md +265 -0
package/skills/grill-me/SKILL.md +24 -0
package/skills/grill-with-docs/SKILL.md +92 -0
package/skills/grill-with-docs/adr-format.md +47 -0
package/skills/grill-with-docs/context-format.md +67 -0
package/skills/improve-codebase-architecture/SKILL.md +75 -0
package/skills/improve-codebase-architecture/deepening.md +37 -0
package/skills/improve-codebase-architecture/interface-design.md +44 -0
package/skills/improve-codebase-architecture/language.md +53 -0
package/skills/local-ghl-setup-from-screenshot/SKILL.md +538 -0
package/skills/tdd/SKILL.md +115 -0
package/skills/tdd/deep-modules.md +33 -0
package/skills/tdd/interface-design.md +31 -0
package/skills/tdd/mocking.md +59 -0
package/skills/tdd/refactoring.md +10 -0
package/skills/tdd/tests.md +61 -0
package/skills/to-issues/SKILL.md +62 -0
package/skills/to-prd/SKILL.md +75 -0
package/skills/using-aw-skills/SKILL.md +170 -237
package/skills/using-aw-skills/hooks/session-start.sh +11 -41
package/skills/zoom-out/SKILL.md +24 -0
package/.cursor/rules/common-agents.md +0 -53
package/.cursor/rules/common-aw-routing.md +0 -43
package/.cursor/rules/common-coding-style.md +0 -52
package/.cursor/rules/common-development-workflow.md +0 -33
package/.cursor/rules/common-git-workflow.md +0 -28
package/.cursor/rules/common-hooks.md +0 -34
package/.cursor/rules/common-patterns.md +0 -35
package/.cursor/rules/common-performance.md +0 -59
package/.cursor/rules/common-security.md +0 -33
package/.cursor/rules/common-testing.md +0 -33
package/.cursor/skills/api-and-interface-design/SKILL.md +0 -75
package/.cursor/skills/article-writing/SKILL.md +0 -85
package/.cursor/skills/aw-brainstorm/SKILL.md +0 -115
package/.cursor/skills/aw-build/SKILL.md +0 -152
package/.cursor/skills/aw-build/evals/build-stage-cases.json +0 -28
package/.cursor/skills/aw-debug/SKILL.md +0 -49
package/.cursor/skills/aw-deploy/SKILL.md +0 -101
package/.cursor/skills/aw-deploy/evals/deploy-stage-cases.json +0 -32
package/.cursor/skills/aw-execute/SKILL.md +0 -47
package/.cursor/skills/aw-execute/references/mode-code.md +0 -47
package/.cursor/skills/aw-execute/references/mode-docs.md +0 -28
package/.cursor/skills/aw-execute/references/mode-infra.md +0 -44
package/.cursor/skills/aw-execute/references/mode-migration.md +0 -58
package/.cursor/skills/aw-execute/references/worker-implementer.md +0 -26
package/.cursor/skills/aw-execute/references/worker-parallel-worker.md +0 -23
package/.cursor/skills/aw-execute/references/worker-quality-reviewer.md +0 -23
package/.cursor/skills/aw-execute/references/worker-spec-reviewer.md +0 -23
package/.cursor/skills/aw-execute/scripts/build-worker-bundle.js +0 -229
package/.cursor/skills/aw-finish/SKILL.md +0 -111
package/.cursor/skills/aw-investigate/SKILL.md +0 -109
package/.cursor/skills/aw-plan/SKILL.md +0 -368
package/.cursor/skills/aw-prepare/SKILL.md +0 -118
package/.cursor/skills/aw-review/SKILL.md +0 -118
package/.cursor/skills/aw-ship/SKILL.md +0 -115
package/.cursor/skills/aw-spec/SKILL.md +0 -104
package/.cursor/skills/aw-tasks/SKILL.md +0 -138
package/.cursor/skills/aw-test/SKILL.md +0 -118
package/.cursor/skills/aw-verify/SKILL.md +0 -51
package/.cursor/skills/aw-yolo/SKILL.md +0 -111
package/.cursor/skills/browser-testing-with-devtools/SKILL.md +0 -81
package/.cursor/skills/bun-runtime/SKILL.md +0 -84
package/.cursor/skills/ci-cd-and-automation/SKILL.md +0 -71
package/.cursor/skills/code-simplification/SKILL.md +0 -74
package/.cursor/skills/content-engine/SKILL.md +0 -88
package/.cursor/skills/context-engineering/SKILL.md +0 -74
package/.cursor/skills/deprecation-and-migration/SKILL.md +0 -75
package/.cursor/skills/documentation-and-adrs/SKILL.md +0 -75
package/.cursor/skills/documentation-lookup/SKILL.md +0 -90
package/.cursor/skills/frontend-slides/SKILL.md +0 -184
package/.cursor/skills/frontend-slides/STYLE_PRESETS.md +0 -330
package/.cursor/skills/frontend-ui-engineering/SKILL.md +0 -68
package/.cursor/skills/git-workflow-and-versioning/SKILL.md +0 -75
package/.cursor/skills/idea-refine/SKILL.md +0 -84
package/.cursor/skills/incremental-implementation/SKILL.md +0 -75
package/.cursor/skills/investor-materials/SKILL.md +0 -96
package/.cursor/skills/investor-outreach/SKILL.md +0 -76
package/.cursor/skills/market-research/SKILL.md +0 -75
package/.cursor/skills/mcp-server-patterns/SKILL.md +0 -67
package/.cursor/skills/nextjs-turbopack/SKILL.md +0 -44
package/.cursor/skills/performance-optimization/SKILL.md +0 -77
package/.cursor/skills/security-and-hardening/SKILL.md +0 -70
package/.cursor/skills/using-aw-skills/SKILL.md +0 -290
package/.cursor/skills/using-aw-skills/evals/skill-trigger-cases.tsv +0 -25
package/.cursor/skills/using-aw-skills/evals/test-skill-triggers.sh +0 -171
package/.cursor/skills/using-aw-skills/hooks/hooks.json +0 -9
package/.cursor/skills/using-aw-skills/hooks/session-start.sh +0 -67
package/.cursor/skills/using-platform-skills/SKILL.md +0 -163
package/.cursor/skills/using-platform-skills/evals/platform-selection-cases.json +0 -52
/package/.cursor/rules/{golang-coding-style.md → golang-coding-style.mdc} +0 -0
/package/.cursor/rules/{golang-hooks.md → golang-hooks.mdc} +0 -0
/package/.cursor/rules/{golang-patterns.md → golang-patterns.mdc} +0 -0
/package/.cursor/rules/{golang-security.md → golang-security.mdc} +0 -0
/package/.cursor/rules/{golang-testing.md → golang-testing.mdc} +0 -0
/package/.cursor/rules/{kotlin-coding-style.md → kotlin-coding-style.mdc} +0 -0
/package/.cursor/rules/{kotlin-hooks.md → kotlin-hooks.mdc} +0 -0
/package/.cursor/rules/{kotlin-patterns.md → kotlin-patterns.mdc} +0 -0
/package/.cursor/rules/{kotlin-security.md → kotlin-security.mdc} +0 -0
/package/.cursor/rules/{kotlin-testing.md → kotlin-testing.mdc} +0 -0
/package/.cursor/rules/{php-coding-style.md → php-coding-style.mdc} +0 -0
/package/.cursor/rules/{php-hooks.md → php-hooks.mdc} +0 -0
/package/.cursor/rules/{php-patterns.md → php-patterns.mdc} +0 -0
/package/.cursor/rules/{php-security.md → php-security.mdc} +0 -0
/package/.cursor/rules/{php-testing.md → php-testing.mdc} +0 -0
/package/.cursor/rules/{python-coding-style.md → python-coding-style.mdc} +0 -0
/package/.cursor/rules/{python-hooks.md → python-hooks.mdc} +0 -0
/package/.cursor/rules/{python-patterns.md → python-patterns.mdc} +0 -0
/package/.cursor/rules/{python-security.md → python-security.mdc} +0 -0
/package/.cursor/rules/{python-testing.md → python-testing.mdc} +0 -0
/package/.cursor/rules/{swift-coding-style.md → swift-coding-style.mdc} +0 -0
/package/.cursor/rules/{swift-hooks.md → swift-hooks.mdc} +0 -0
/package/.cursor/rules/{swift-patterns.md → swift-patterns.mdc} +0 -0
/package/.cursor/rules/{swift-security.md → swift-security.mdc} +0 -0
/package/.cursor/rules/{swift-testing.md → swift-testing.mdc} +0 -0
/package/.cursor/rules/{typescript-coding-style.md → typescript-coding-style.mdc} +0 -0
/package/.cursor/rules/{typescript-hooks.md → typescript-hooks.mdc} +0 -0
/package/.cursor/rules/{typescript-patterns.md → typescript-patterns.mdc} +0 -0
/package/.cursor/rules/{typescript-security.md → typescript-security.mdc} +0 -0
/package/.cursor/rules/{typescript-testing.md → typescript-testing.mdc} +0 -0

package/skills/aw-adk/references/template-command.md ADDED Viewed

@@ -0,0 +1,279 @@
+# Command Template
+Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
+---
+## Scaffold
+````markdown
+---
+name: <namespace>:<command-slug>
+description: "<1-2 sentences. What workflow this automates and when to use it.>"
+argument-hint: "[target] [--flag]"
+mcp: []
+---
+# <Command Display Name>
+<1-2 sentence purpose. What end-to-end workflow does this command automate?>
+## Protocol
+> **AW-PROTOCOL**: This command follows the AW orchestration protocol.
+> All phases execute sequentially. Each phase has defined inputs, outputs,
+> and checkpoints. Failure at any checkpoint triggers the on-failure handler
+> before proceeding.
+### Skill Loading Gate
+> **BLOCKING**: Before executing ANY phase, resolve and load the following skills.
+> Do not proceed until all skills are confirmed loaded.
+| Skill | Purpose | Required |
+|-------|---------|----------|
+| `<namespace>-<skill-1>` | <what it provides> | Yes |
+| `<namespace>-<skill-2>` | <what it provides> | Yes |
+| `<namespace>-<skill-3>` | <what it provides> | No |
+```
+Resolve: <skill-1>, <skill-2>
+Confirm: all loaded
+Proceed: Phase 0
+```
+## Core Principles
+1. **<Principle 1>** — <Why this principle matters for this workflow>
+2. **<Principle 2>** — <Why this principle matters>
+3. **<Principle 3>** — <Why this principle matters>
+## Agent Roster
+| Agent | Role | Phase(s) | Model |
+|-------|------|----------|-------|
+| `<agent-1>` | <what it does> | <phase numbers> | sonnet |
+| `<agent-2>` | <what it does> | <phase numbers> | sonnet |
+| `<agent-3>` | <what it does> | <phase numbers> | haiku |
+## Phase 0: Initialize
+**Purpose:** Validate inputs, resolve paths, establish workspace.
+1. Parse arguments: `<expected arguments>`
+2. Validate target exists: `<validation command>`
+3. Create workspace directory:
+```bash
+mkdir -p <workspace-path>
+```
+4. Snapshot current state (for rollback):
+```bash
+<snapshot command>
+```
+**Output:** Validated inputs, workspace path, snapshot reference
+**Checkpoint:** All inputs valid, workspace exists, snapshot saved
+**On-failure:** Report missing inputs with usage example, exit
+---
+## Phase 1: <Phase Name>
+**Purpose:** <What this phase accomplishes and why it comes first>
+**Agent:** `<agent-name>`
+**Input:** <What this phase receives from Phase 0 or prior phase>
+### Steps
+1. <Step with concrete action>
+2. <Step with concrete action>
+3. <Step with concrete action>
+```bash
+# Example command for this phase
+<command>
+```
+**Output:** <Specific artifacts this phase produces>
+**Checkpoint:** <Verifiable criteria — how to confirm this phase succeeded>
+**On-failure:** <What to do if the checkpoint fails — retry, skip, escalate>
+---
+## Phase 2: <Phase Name>
+**Purpose:** <What this phase accomplishes>
+**Agent:** `<agent-name>`
+**Input:** <Output from Phase 1>
+### Steps
+1. <Step with concrete action>
+2. <Step with concrete action>
+**Output:** <Artifacts produced>
+**Checkpoint:** <Success criteria>
+**On-failure:** <Recovery strategy>
+---
+## Phase 3: <Phase Name>
+**Purpose:** <What this phase accomplishes>
+**Agent:** `<agent-name>`
+**Input:** <Output from Phase 2>
+### Steps
+1. <Step with concrete action>
+2. <Step with concrete action>
+**Output:** <Artifacts produced>
+**Checkpoint:** <Success criteria>
+**On-failure:** <Recovery strategy>
+---
+## Phase N: <Human Checkpoint> (optional)
+**Purpose:** Pause for human review before irreversible actions.
+**Input:** <Summary of all prior phase outputs>
+### Review Prompt
+```
+The following changes are ready for <action>:
+<summary of changes>
+Proceed? [y/n]
+```
+**On-approve:** Continue to next phase
+**On-reject:** <What to do — rollback, revise, or exit>
+---
+## Phase N+1: Deliver
+**Purpose:** Produce final deliverables and report results.
+### Steps
+1. Aggregate outputs from all phases
+2. Generate summary report
+3. Clean up workspace (if applicable)
+**Output:** Final deliverables (see table below)
+**Checkpoint:** All required deliverables exist and pass validation
+## Compound Learnings
+<Patterns discovered across phases that should be captured for future runs.
+This section is populated after the first few executions.>
+- <Learning 1 — e.g., "Phase 2 consistently takes 3x longer than Phase 1; consider parallelizing sub-steps">
+- <Learning 2 — e.g., "Agent X produces better output when given Phase 1 output as structured JSON, not prose">
+## Output Format
+```
+## <Command Name> Results
+**Status:** <COMPLETE | PARTIAL | FAILED>
+**Duration:** <time>
+**Phases completed:** <N/M>
+### Phase Summary
+| Phase | Status | Key Output |
+|-------|--------|------------|
+| 0: Init | PASS | Workspace at <path> |
+| 1: <name> | PASS | <output summary> |
+| 2: <name> | PASS | <output summary> |
+### Deliverables
+<list of produced artifacts with paths>
+### Issues
+<any failures, warnings, or items needing follow-up>
+```
+## Error Handling
+| Error | Phase | Recovery |
+|-------|-------|----------|
+| <error-type-1> | <phase> | <what to do> |
+| <error-type-2> | <phase> | <what to do> |
+| <error-type-3> | Any | <what to do> |
+| Unrecoverable failure | Any | Rollback to Phase 0 snapshot, report error |
+## References
+- [<skill-name>](../skills/<slug>/SKILL.md) — <what it provides>
+- [<reference-name>](references/<file>.md) — <what it covers>
+````
+---
+## Section-by-Section Guide
+### Frontmatter
+- `name` — follows naming convention: `aw:platform-<domain>-<slug>` for platform, `aw:<team>-<sub_team>-<slug>` for teams (or `aw:<team>-<sub_team>-<domain>-<slug>` when domain nesting is used), `aw:<slug>` for stage commands. All hyphens, no colons (except the `aw:` prefix). See [registry-structure.md](registry-structure.md) for the full naming table.
+- `description` — front-load the workflow being automated
+- `argument-hint` — shown in help text; keep it short
+- `mcp` — list of MCP servers this command requires (empty if none)
+### Protocol & Skill Loading Gate
+The AW-PROTOCOL reference signals that this is a managed pipeline. The skill loading gate is BLOCKING — the command must not execute any phase until all required skills are confirmed loaded. This prevents partial execution with missing context.
+### Core Principles
+Three to five principles that shape decision-making across all phases. These are not rules (those go in rules/); they are workflow-specific values. Example: "Prefer incremental rollout over big-bang deployment."
+### Agent Roster
+Declares all agents used across phases upfront. This lets the reader understand the full cast before diving into phases. Include the model tier — it affects cost and capability expectations.
+### Phase Structure
+Every phase follows the same contract:
+- **Purpose** — Why this phase exists (not what it does — the steps cover that)
+- **Agent** — Which agent executes this phase
+- **Input** — Explicit data dependency on prior phases
+- **Steps** — Concrete, numbered actions
+- **Output** — What this phase produces (consumed by later phases)
+- **Checkpoint** — Verifiable success criteria (binary: pass or fail)
+- **On-failure** — Recovery strategy (retry, skip, escalate, rollback)
+This structure makes commands debuggable. When Phase 3 fails, you check Phase 3's checkpoint and on-failure handler.
+### Human Checkpoints
+Insert before irreversible actions (deployments, data migrations, external API calls). The command pauses, presents a summary, and waits for approval. Include a clear on-reject path.
+### Compound Learnings
+Populated after real executions. This is where operational wisdom accumulates. Review after the first 5-10 runs and update the command based on observed patterns.
+### Error Handling Table
+Exhaustive mapping of known failure modes to recovery strategies. The "Any" phase row catches unexpected failures with a universal rollback strategy.
+## Anti-Patterns
+| Pattern | Problem | Fix |
+|---|---|---|
+| No checkpoints between phases | Cascading failures — Phase 3 fails because Phase 1 silently produced bad output | Add checkpoint to every phase |
+| Monolithic single-phase command | Not a command — it's a script. Commands orchestrate multiple agents through phases | Break into 3+ phases or make it a skill |
+| No skill loading gate | Agents execute without required context, producing shallow output | Add BLOCKING gate with required skills |
+| Phase depends on implicit state | Breaks when phases are re-run or reordered | Make all inputs explicit in the Input field |

package/skills/aw-adk/references/template-eval.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Eval Template
+Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
+---
+## Scaffold
+````markdown
+---
+name: eval-<eval-slug>
+target: <parent-artifact-name>
+category: <functional | structural | behavioral | integration>
+difficulty: <basic | intermediate | advanced>
+---
+# Eval: <Eval Display Name>
+## Task
+<2-4 sentences describing what the model should do when given this eval's prompt.
+Be specific about the scenario, inputs, and expected workflow. This is the "user request"
+that the executor receives.>
+### Prompt
+```
+<The exact prompt to give the executor. This must be realistic — something a real user
+would actually ask. Include enough context for the executor to act without follow-up questions.>
+```
+## Context
+| Field | Value |
+|-------|-------|
+| **Namespace** | `<namespace where the parent artifact lives>` |
+| **Domain** | `<domain: backend, frontend, data, infra, etc.>` |
+| **Target artifact** | `<path to the artifact being tested>` |
+| **Target type** | `<command \| agent \| skill \| rule \| eval>` |
+| **Related work** | `<links to related artifacts, PRs, or docs>` |
+## Expected Outcomes
+The executor's output must satisfy ALL of the following:
+- [ ] <Outcome 1 — specific, verifiable assertion about the output>
+- [ ] <Outcome 2 — structural check: "file exists at X", "section Y is present">
+- [ ] <Outcome 3 — content check: "contains at least N items", "references skill Z">
+- [ ] <Outcome 4 — quality check: "examples are concrete, not placeholder">
+- [ ] <Outcome 5 — negative check: "does NOT contain X" or "does NOT skip Y">
+### Assertion Quality Criteria
+Each assertion above must be:
+- **Verifiable** — A grader can determine pass/fail from the output alone
+- **Discriminating** — A clearly wrong output would fail this assertion
+- **Stable** — Minor formatting changes don't cause false failures
+## Grading Criteria
+### PASS (all conditions met)
+- All expected outcomes checked
+- Output is production-ready (not placeholder/stub content)
+- No critical errors in execution
+### PARTIAL (some conditions met)
+- <N>+ of <M> expected outcomes met
+- Output has correct structure but thin content
+- OR output has rich content but wrong structure
+### FAIL (below threshold)
+- Fewer than <N> expected outcomes met
+- Output is structurally wrong (missing required sections, wrong artifact type)
+- OR executor failed to complete the task
+## Evaluation Method
+**Type:** <deterministic | model-based | hybrid>
+### Deterministic Checks
+<Checks that can be performed by a script — file existence, section headers,
+frontmatter fields, naming patterns.>
+```bash
+# Example: verify file exists and has required sections
+test -f "<expected-path>" || echo "FAIL: file not found"
+grep -q "## Core Mission" "<expected-path>" || echo "FAIL: missing Core Mission"
+```
+### Model-Based Checks
+<Checks that require judgment — content quality, example relevance, reasoning depth.
+These are evaluated by the grader agent.>
+- Does the output explain WHY, not just WHAT?
+- Are examples concrete and domain-specific (not generic foo/bar)?
+- Would a domain expert find the content useful?
+## Variants (optional)
+<Alternative scenarios that test the same artifact from different angles.>
+| Variant | Difference | Tests |
+|---------|------------|-------|
+| `eval-<slug>-minimal` | Minimal input, no context | Handles missing info gracefully |
+| `eval-<slug>-complex` | Multi-step request with constraints | Handles complexity without losing accuracy |
+| `eval-<slug>-adversarial` | Intentionally ambiguous or misleading input | Doesn't hallucinate or guess |
+## Baseline Expectations
+<What should happen when the executor runs WITHOUT the target artifact loaded.
+This establishes the value-add of the artifact.>
+- Without artifact: <expected behavior — generic output, missed requirements, etc.>
+- With artifact: <expected behavior — specific, structured, complete output>
+- **Expected delta:** <quantified improvement, e.g., "+40% pass rate">
+````
+---
+## Section-by-Section Guide
+### Frontmatter
+- `name` — Always prefixed with `eval-`. Lives in the colocated `evals/` directory of the parent artifact.
+- `target` — The artifact this eval tests. Must reference an existing artifact.
+- `category` — What aspect is being tested:
+  - `functional` — Does the artifact produce correct output?
+  - `structural` — Does the output have the right shape?
+  - `behavioral` — Does the artifact handle edge cases correctly?
+  - `integration` — Does the artifact work with other artifacts?
+- `difficulty` — Affects grading tolerance. Basic evals expect straightforward success. Advanced evals allow more nuanced partial results.
+### Task & Prompt
+The prompt is the most critical field. It must be:
+1. **Realistic** — something a real user would type
+2. **Self-contained** — the executor shouldn't need to ask follow-up questions
+3. **Unambiguous** — one clear correct interpretation
+Bad prompts produce unreliable evals. If the eval flakes, the prompt is usually the problem.
+### Expected Outcomes
+Four or more assertions, each independently verifiable. Mix structural checks (file exists, section present) with content checks (examples are concrete, references are valid) and at least one negative check (does NOT contain placeholder text).
+Weak assertions that pass for both good and bad output provide false confidence. Each assertion should discriminate: a clearly wrong output must fail it.
+### Grading Criteria
+Three tiers with clear thresholds. PASS/PARTIAL/FAIL must be unambiguous — the grader should not need judgment to classify a result into a tier. Use specific counts ("4+ of 5 outcomes") rather than vague language ("most outcomes").
+### Evaluation Method
+Three options:
+- **Deterministic** — Script-based checks only. Fast, reliable, but can't assess quality.
+- **Model-based** — Grader agent evaluates. Can assess quality, but slower and potentially inconsistent.
+- **Hybrid** — Deterministic for structure, model-based for content. Best of both worlds. Recommended default.
+### Baseline Expectations
+The with/without comparison is how you measure the artifact's value-add. Without a baseline, you can't distinguish "the artifact helped" from "the model would have done this anyway." Always specify expected delta.
+## Anti-Patterns
+| Pattern | Problem | Fix |
+|---|---|---|
+| Assertions that always pass | False confidence — bad output also passes | Test assertions against a known-bad output |
+| Ambiguous prompt | Eval flakes — different runs interpret differently | Make prompt self-contained with concrete details |
+| No negative assertions | Doesn't catch hallucination or extra content | Add "does NOT contain" checks |
+| No baseline expectation | Can't measure artifact value-add | Specify without-artifact behavior |
+| Only structural checks | Correct shape with garbage content passes | Add content quality assertions |

package/skills/aw-adk/references/template-rule.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Rule Template
+Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens. Rules are intentionally shorter than other artifact types — a rule that needs 1000 words to explain is probably a skill.
+---
+## Scaffold
+````markdown
+---
+id: <domain>/<rule-slug>
+severity: <MUST | SHOULD | MAY>
+domains: [<domain-1>, <domain-2>]
+paths: ["<glob-pattern-1>", "<glob-pattern-2>"]
+---
+# <Rule Title>
+## Rule
+<requirement-statement> [<MUST|SHOULD|MAY>]
+**Why:** <1-2 sentences explaining the consequence of violating this rule. What breaks, degrades, or becomes vulnerable? This is the most important part — a model that understands "why" handles edge cases better than one following a directive.>
+## WRONG
+<Real violation — not a toy example. Show code or config that a developer would actually write.>
+```<language>
+// <Brief comment explaining what's wrong>
+<violating code>
+```
+**Impact:** <What happens if this ships — runtime error, security vulnerability, data corruption, etc.>
+## RIGHT
+<Verified fix — the correct way to write the same code. Must compile/run.>
+```<language>
+// <Brief comment explaining why this is correct>
+<correct code>
+```
+## Exceptions
+<When this rule does NOT apply. Be specific — vague exceptions become loopholes.>
+- <Exception 1>: <specific condition and why the rule doesn't apply>
+- <Exception 2>: <specific condition>
+If no exceptions exist, write: "No exceptions. This rule applies universally."
+## Enforcement
+- **Automated:** <How this can be caught automatically — linter rule, CI check, grep pattern>
+- **Manual:** <What a reviewer should look for during code review>
+## Severity Justification
+**<MUST|SHOULD|MAY>** because <reason tied to impact>:
+- **MUST** — Violation causes correctness failures, security vulnerabilities, or data loss
+- **SHOULD** — Violation degrades quality, maintainability, or developer experience
+- **MAY** — Violation is suboptimal but acceptable in some contexts
+## References
+- [<skill-name>](../skills/<slug>/SKILL.md) — <deeper guidance on the practice>
+- [<external-doc>](<url>) — <authoritative source>
+````
+---
+## Section-by-Section Guide
+### Frontmatter
+- `id` — Unique identifier in `<domain>/<slug>` format. Used in rule-manifest.json and AGENTS.md references.
+- `severity` — One of MUST (violation = defect), SHOULD (violation = code smell), MAY (recommendation).
+- `domains` — Which platform domains this applies to. Use `["universal"]` for cross-cutting rules.
+- `paths` — Glob patterns for files this rule applies to. Enables automated scoping.
+### Rule Statement
+One sentence. Active voice. Ends with the severity tag in brackets. The model reads this as the primary constraint.
+**Good:** `All database queries must be scoped by locationId from auth context. [MUST]`
+**Bad:** `It is recommended that queries should generally include location scoping when possible. [SHOULD]`
+### Why
+The single most important section. Models follow rules more reliably when they understand consequences. "Because the style guide says so" is not a reason. "Because unscoped queries return data from other tenants, creating a data leak" is.
+### WRONG / RIGHT Examples
+Real code, not pseudocode. The WRONG example should be something a developer would plausibly write — not a strawman. The RIGHT example must be a direct fix of the WRONG example, not a different scenario.
+### Exceptions
+Explicit exceptions prevent false positives and reduce rule fatigue. If a rule has no exceptions, say so explicitly — ambiguity about exceptions leads to inconsistent enforcement.
+### Enforcement
+Split into automated (CI/linter) and manual (code review). Every rule should have at least one enforcement path. Rules that can only be enforced manually are expensive — prioritize automatable rules.
+### Severity Justification
+Explains why this severity level was chosen, not what the levels mean. "MUST because unscoped queries create cross-tenant data leaks in production" connects the severity to the specific consequence.
+## Anti-Patterns
+| Pattern | Problem | Fix |
+|---|---|---|
+| No "Why" section | Model follows rule mechanically, fails on edge cases | Add consequence-driven explanation |
+| Pseudocode examples | Developer can't map to real code | Use real language, real patterns |
+| WRONG example is a strawman | Nobody would write that; rule feels patronizing | Use a plausible violation from real code |
+| Vague exceptions | "Sometimes this doesn't apply" — when? | List specific conditions or write "No exceptions" |
+| MUST severity without justification | Everything feels critical; severity loses meaning | Justify with specific impact |

package/skills/aw-adk/references/template-skill.md ADDED Viewed

@@ -0,0 +1,123 @@
+# Skill Template
+Copy the scaffold below as your starting point. Replace all `<placeholder>` tokens.
+---
+## Scaffold
+````markdown
+---
+name: <namespace>-<skill-slug>
+description: "<1-2 sentences. State primary capability first, then 'Use when <trigger scenario>'.>"
+trigger: when the user <trigger condition>
+---
+# <Skill Display Name>
+<1-2 sentence purpose statement. What does this skill teach, and why does it matter?>
+## When to Use
+- <Trigger scenario 1 — specific user intent or request pattern>
+- <Trigger scenario 2 — a different angle or adjacent need>
+- <Trigger scenario 3 — an edge case that should still match>
+## Quick Start
+<Minimal example showing the skill in action. This is the "show, don't tell" section.
+Give a concrete, copy-pasteable example — not a description of what to do.>
+```bash
+# Example: a concrete invocation or code snippet
+<command or code>
+```
+## Detailed Guide
+### <Topic 1>
+<Step-by-step instructions with concrete actions. Each step should be:
+1. Numbered
+2. Actionable (starts with a verb)
+3. Specific (includes file paths, commands, or code)>
+### <Topic 2>
+<More guidance. Add as many topic sections as needed, but each should
+earn its place — if a section doesn't change behavior, remove it.>
+### <Topic 3 — Common Pitfalls>
+<What goes wrong and how to fix it. Real failure modes, not hypothetical ones.>
+## Checklist
+- [ ] <Check item 1> — <pass/fail criteria: what to look for and what "done" means>
+- [ ] <Check item 2> — <pass/fail criteria>
+- [ ] <Check item 3> — <pass/fail criteria>
+## Output Format
+<Show the exact structure of what this skill produces. If it produces a file,
+show the file. If it produces a checklist, show the checklist. Be concrete.>
+```
+<output structure>
+```
+## References
+- [<reference-name>](references/<file>.md) — <what it covers>
+- [<external-link>](<url>) — <why it's relevant>
+````
+---
+## Section-by-Section Guide
+### Frontmatter
+The three fields (`name`, `description`, `trigger`) control discoverability. The description is what the model reads to decide whether to load this skill. Front-load the capability; put the trigger scenario second.
+**Good:** `"MongoDB query optimization patterns for Mongoose and native driver. Use when debugging slow queries, reviewing aggregation pipelines, or designing indexes."`
+**Bad:** `"This skill helps with MongoDB."` (too vague, no trigger signals)
+### Purpose Statement
+One to two sentences below the H1. This is the first thing a reader sees. It should answer: "Why does this skill exist?" and "What outcome does it produce?"
+### When to Use
+Three or more trigger scenarios. These help the model (and human readers) decide if this skill matches their situation. Be specific about user intent, not about the skill's internal mechanics.
+### Quick Start
+The most important section for adoption. A developer should be able to copy-paste this and get a working result. If your Quick Start requires reading the Detailed Guide first, it's too complex.
+### Detailed Guide
+Progressive disclosure. Only readers who need depth will reach here. Organize by task or topic, not by internal architecture. Each subsection should be independently useful.
+### Checklist
+Actionable verification items. Each item must have clear pass/fail criteria — "looks good" is not a criterion. These are used by graders and reviewers to validate the skill was applied correctly.
+### Output Format
+Show, don't describe. If the skill produces JSON, show JSON. If it produces a markdown report, show the markdown. The model uses this section to format its output correctly.
+### References
+Link to deeper material. Reference files for detailed patterns, external docs for vendor APIs. Keep the skill itself lean; push depth into references.
+## Anti-Patterns
+| Pattern | Problem | Fix |
+|---|---|---|
+| 5000+ word SKILL.md | Model wastes context loading it | Split into SKILL.md (overview) + references/ (depth) |
+| No Quick Start | Low adoption — readers leave before learning | Add a copy-pasteable example |
+| Vague trigger description | Model loads the skill for wrong requests | Add 3+ specific trigger scenarios |
+| Checklist without criteria | Unverifiable — "did I do this?" has no answer | Add pass/fail criteria to every item |
+| Generic examples | Model produces generic output | Use real domain examples, not `foo`/`bar` |