npm - azclaude-copilot - Versions diffs - 0.4.20 → 0.4.22 - Mend

azclaude-copilot 0.4.20 → 0.4.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +2 -2
package/README.md +6 -6
package/bin/cli.js +1 -1
package/package.json +2 -2
package/templates/agents/devops-engineer.md +179 -0
package/templates/agents/qa-engineer.md +187 -0
package/templates/skills/frontend-design/SKILL.md +19 -0
package/templates/skills/skill-creator/references/skill-engineering-guide.md +32 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -8,8 +8,8 @@
   "plugins": [
     {
       "name": "azclaude",
-      "description": "AZCLAUDE is a complete AI coding environment for Claude Code. It installs 33 commands, 8 auto-invoked skills, 13 specialized agents, 4 hooks, and a persistent memory system — in one command.\n\nKey features:\n• Memory across sessions — goals.md + checkpoints injected automatically before every session\n• Self-improving loop — /reflect fixes stale CLAUDE.md rules, /reflexes learns from tool-use patterns, /evolve creates agents from git evidence\n• Autonomous copilot mode — /copilot runs a three-tier team (orchestrator → problem-architect → milestone-builder) across sessions until the product ships\n• Spec-driven workflow — /constitute writes project rules, /spec writes structured ACs, /analyze detects plan drift and ghost milestones, /blueprint traces every milestone to a spec\n• Security layer — 102-rule environment scan (/sentinel), pre-write secret blocking, pre-ship credential audit\n• Progressive levels 0–10 — start with CLAUDE.md, grow into multi-agent pipelines and self-evolving environments\n• Zero dependencies — no npm packages, no external APIs, no vector databases. Plain markdown files and Claude Code's native architecture.\n• Smart install — npx azclaude-copilot@latest auto-detects first install vs upgrade vs verify. Context-aware onboarding shows the right next command for your project state.\n\nExample use cases:\n• /setup — scan an existing project, detect stack + domain + scale, fill CLAUDE.md, generate project-specific skills and agents automatically\n• /copilot \"Build a compliance SaaS with trilingual support\" — walk away, come back to working code across multiple sessions\n• /sentinel — run a scored security audit (0–100, grade A–F) across hooks, permissions, MCP servers, agent configs, and secrets\n• /evolve — detect gaps in the environment, generate new skills and agents from git co-change evidence, report score delta (e.g. 42/100 → 68/100)\n• /constitute — write your project's constitution (non-negotiables, architectural commitments, definition of done) — gates all future AI actions\n• /analyze — cross-artifact consistency check: ghost milestones, spec vs. code drift, unplanned commits\n• /reflect — find stale, missing, or contradicting rules in CLAUDE.md and propose exact fixes\n• /debate \"REST vs GraphQL for this project\" — adversarial evidence-based decision with order-independent scoring, logged to decisions.md",
-      "version": "0.4.19",
+      "description": "AZCLAUDE is a complete AI coding environment for Claude Code. It installs 33 commands, 9 auto-invoked skills, 15 specialized agents, 4 hooks, and a persistent memory system — in one command.\n\nKey features:\n• Memory across sessions — goals.md + checkpoints injected automatically before every session\n• Self-improving loop — /reflect fixes stale CLAUDE.md rules, /reflexes learns from tool-use patterns, /evolve creates agents from git evidence\n• Autonomous copilot mode — /copilot runs a three-tier team (orchestrator → problem-architect → milestone-builder) across sessions until the product ships\n• Spec-driven workflow — /constitute writes project rules, /spec writes structured ACs, /analyze detects plan drift and ghost milestones, /blueprint traces every milestone to a spec\n• Security layer — 102-rule environment scan (/sentinel), pre-write secret blocking, pre-ship credential audit\n• Progressive levels 0–10 — start with CLAUDE.md, grow into multi-agent pipelines and self-evolving environments\n• Zero dependencies — no npm packages, no external APIs, no vector databases. Plain markdown files and Claude Code's native architecture.\n• Smart install — npx azclaude-copilot@latest auto-detects first install vs upgrade vs verify. Context-aware onboarding shows the right next command for your project state.\n\nExample use cases:\n• /setup — scan an existing project, detect stack + domain + scale, fill CLAUDE.md, generate project-specific skills and agents automatically\n• /copilot \"Build a compliance SaaS with trilingual support\" — walk away, come back to working code across multiple sessions\n• /sentinel — run a scored security audit (0–100, grade A–F) across hooks, permissions, MCP servers, agent configs, and secrets\n• /evolve — detect gaps in the environment, generate new skills and agents from git co-change evidence, report score delta (e.g. 42/100 → 68/100)\n• /constitute — write your project's constitution (non-negotiables, architectural commitments, definition of done) — gates all future AI actions\n• /analyze — cross-artifact consistency check: ghost milestones, spec vs. code drift, unplanned commits\n• /reflect — find stale, missing, or contradicting rules in CLAUDE.md and propose exact fixes\n• /debate \"REST vs GraphQL for this project\" — adversarial evidence-based decision with order-independent scoring, logged to decisions.md",
+      "version": "0.4.22",
       "source": {
         "source": "github",
         "repo": "haytamAroui/AZ-CLAUDE-COPILOT",

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "azclaude",
-  "version": "0.4.19",
-  "description": "AZCLAUDE is a complete AI coding environment for Claude Code. It installs 33 commands, 8 auto-invoked skills, 13 specialized agents, 4 hooks, and a persistent memory system — in one command.\n\nKey features:\n• Memory across sessions — goals.md + checkpoints injected automatically before every session\n• Self-improving loop — /reflect fixes stale CLAUDE.md rules, /reflexes learns from tool-use patterns, /evolve creates agents from git evidence\n• Autonomous copilot mode — /copilot runs a three-tier team (orchestrator → problem-architect → milestone-builder) across sessions until the product ships\n• Spec-driven workflow — /constitute writes project rules, /spec writes structured ACs, /analyze detects plan drift and ghost milestones, /blueprint traces every milestone to a spec\n• Security layer — 102-rule environment scan (/sentinel), pre-write secret blocking, pre-ship credential audit\n• Progressive levels 0–10 — start with CLAUDE.md, grow into multi-agent pipelines and self-evolving environments\n• Zero dependencies — no npm packages, no external APIs, no vector databases. Plain markdown files and Claude Code's native architecture.\n• Smart install — npx azclaude-copilot@latest auto-detects first install vs upgrade vs verify. Context-aware onboarding shows the right next command for your project state.\n\nExample use cases:\n• /setup — scan an existing project, detect stack + domain + scale, fill CLAUDE.md, generate project-specific skills and agents automatically\n• /copilot \"Build a compliance SaaS with trilingual support\" — walk away, come back to working code across multiple sessions\n• /sentinel — run a scored security audit (0–100, grade A–F) across hooks, permissions, MCP servers, agent configs, and secrets\n• /evolve — detect gaps in the environment, generate new skills and agents from git co-change evidence, report score delta (e.g. 42/100 → 68/100)\n• /constitute — write your project's constitution (non-negotiables, architectural commitments, definition of done) — gates all future AI actions\n• /analyze — cross-artifact consistency check: ghost milestones, spec vs. code drift, unplanned commits\n• /reflect — find stale, missing, or contradicting rules in CLAUDE.md and propose exact fixes\n• /debate \"REST vs GraphQL for this project\" — adversarial evidence-based decision with order-independent scoring, logged to decisions.md",
+  "version": "0.4.22",
+  "description": "AZCLAUDE is a complete AI coding environment for Claude Code. It installs 33 commands, 9 auto-invoked skills, 15 specialized agents, 4 hooks, and a persistent memory system — in one command.\n\nKey features:\n• Memory across sessions — goals.md + checkpoints injected automatically before every session\n• Self-improving loop — /reflect fixes stale CLAUDE.md rules, /reflexes learns from tool-use patterns, /evolve creates agents from git evidence\n• Autonomous copilot mode — /copilot runs a three-tier team (orchestrator → problem-architect → milestone-builder) across sessions until the product ships\n• Spec-driven workflow — /constitute writes project rules, /spec writes structured ACs, /analyze detects plan drift and ghost milestones, /blueprint traces every milestone to a spec\n• Security layer — 102-rule environment scan (/sentinel), pre-write secret blocking, pre-ship credential audit\n• Progressive levels 0–10 — start with CLAUDE.md, grow into multi-agent pipelines and self-evolving environments\n• Zero dependencies — no npm packages, no external APIs, no vector databases. Plain markdown files and Claude Code's native architecture.\n• Smart install — npx azclaude-copilot@latest auto-detects first install vs upgrade vs verify. Context-aware onboarding shows the right next command for your project state.\n\nExample use cases:\n• /setup — scan an existing project, detect stack + domain + scale, fill CLAUDE.md, generate project-specific skills and agents automatically\n• /copilot \"Build a compliance SaaS with trilingual support\" — walk away, come back to working code across multiple sessions\n• /sentinel — run a scored security audit (0–100, grade A–F) across hooks, permissions, MCP servers, agent configs, and secrets\n• /evolve — detect gaps in the environment, generate new skills and agents from git co-change evidence, report score delta (e.g. 42/100 → 68/100)\n• /constitute — write your project's constitution (non-negotiables, architectural commitments, definition of done) — gates all future AI actions\n• /analyze — cross-artifact consistency check: ghost milestones, spec vs. code drift, unplanned commits\n• /reflect — find stale, missing, or contradicting rules in CLAUDE.md and propose exact fixes\n• /debate \"REST vs GraphQL for this project\" — adversarial evidence-based decision with order-independent scoring, logged to decisions.md",
   "author": {
     "name": "haytamAroui",
     "url": "https://github.com/haytamAroui"

package/README.md CHANGED Viewed

@@ -117,7 +117,7 @@ npx azclaude-copilot@latest
 ```
 That's it. One command, no flags. Auto-detects whether this is a fresh install or an upgrade:
-- **First time** → full install (33 commands, 4 hooks, 13 agents, 8 skills, memory, reflexes)
+- **First time** → full install (33 commands, 4 hooks, 15 agents, 9 skills, memory, reflexes)
 - **Already installed, older version** → auto-upgrades everything to latest templates
 - **Already up to date** → verifies, no overwrites
@@ -129,14 +129,14 @@ npx azclaude-copilot@latest doctor   # 32 checks — verify everything is wired
 ## What You Get
-**33 commands** · **8 auto-invoked skills** · **13 agents** · **4 hooks** · **memory across sessions** · **learned reflexes** · **self-evolving environment**
+**33 commands** · **9 auto-invoked skills** · **15 agents** · **4 hooks** · **memory across sessions** · **learned reflexes** · **self-evolving environment**
 ```
 .claude/
 ├── CLAUDE.md                 ← dispatch table: conventions, stack, routing
 ├── commands/                 ← 33 slash commands (/add, /fix, /copilot, /spec, /sentinel...)
-├── skills/                   ← 8 skills (test-first, security, architecture-advisor...)
-├── agents/                   ← 13 agents (orchestrator, spec-reviewer, constitution-guard...)
+├── skills/                   ← 9 skills (test-first, security, architecture-advisor, frontend-design...)
+├── agents/                   ← 15 agents (orchestrator, spec-reviewer, constitution-guard...)
 ├── capabilities/             ← 37 files, lazy-loaded via manifest.md (~380 tokens/task)
 ├── hooks/
 │   ├── user-prompt.js        ← injects goals.md + checkpoint before your first message
@@ -807,11 +807,11 @@ Run `/level-up` at any time to see your current level and build the next one.
 ## Verified
-1366 tests. Every template, command, capability, agent, hook, and CLI feature verified.
+1388 tests. Every template, command, capability, agent, hook, and CLI feature verified.
 ```bash
 bash tests/test-features.sh
-# Results: 1366 passed, 0 failed, 1366 total
+# Results: 1388 passed, 0 failed, 1388 total
 ```
 ---

package/bin/cli.js CHANGED Viewed

@@ -428,7 +428,7 @@ function installScripts(projectDir, cfg) {
 // ─── Agents ───────────────────────────────────────────────────────────────────
-const AGENTS = ['orchestrator-init', 'code-reviewer', 'test-writer', 'loop-controller', 'cc-template-author', 'cc-cli-integrator', 'cc-test-maintainer', 'orchestrator', 'problem-architect', 'milestone-builder', 'security-auditor', 'spec-reviewer', 'constitution-guard'];
+const AGENTS = ['orchestrator-init', 'code-reviewer', 'test-writer', 'loop-controller', 'cc-template-author', 'cc-cli-integrator', 'cc-test-maintainer', 'orchestrator', 'problem-architect', 'milestone-builder', 'security-auditor', 'spec-reviewer', 'constitution-guard', 'devops-engineer', 'qa-engineer'];
 function installAgents(projectDir, cfg) {
   const agentsDir = path.join(projectDir, cfg, 'agents');

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "azclaude-copilot",
-  "version": "0.4.20",
-  "description": "AI coding environment — 33 commands, 8 skills, 13 agents, memory, reflexes, evolution. Install: npx azclaude-copilot@latest, then open Claude Code.",
+  "version": "0.4.22",
+  "description": "AI coding environment — 33 commands, 9 skills, 15 agents, memory, reflexes, evolution. Install: npx azclaude-copilot@latest, then open Claude Code.",
   "bin": {
     "azclaude": "bin/cli.js",
     "azclaude-copilot": "bin/copilot.js"

package/templates/agents/devops-engineer.md ADDED Viewed

@@ -0,0 +1,179 @@
+---
+name: devops-engineer
+description: >
+  CI/CD, Docker, infrastructure, and deployment specialist. Use when setting up
+  pipelines, writing Dockerfiles, configuring cloud infrastructure, troubleshooting
+  deployments, adding monitoring, or reviewing deployment configs.
+  Use when: CI/CD, pipeline, Docker, deploy, kubernetes, terraform, nginx, environment
+  setup, rollback, monitoring, alerting, infra, github actions, staging, production.
+model: sonnet
+tools: [Read, Write, Edit, Glob, Grep, Bash]
+disallowedTools: [Agent]
+permissionMode: acceptEdits
+maxTurns: 40
+---
+## Layer 1: PERSONA
+DevOps specialist. Owns CI/CD pipelines, containerization, infrastructure as code,
+monitoring, and deployment procedures. Makes deployments boring and outages rare.
+Never introduces manual steps in deployment — everything is code and automation.
+## Layer 2: SCOPE
+**Does:**
+- Writes CI/CD pipeline configs (GitHub Actions, GitLab CI)
+- Writes Dockerfiles and docker-compose files
+- Writes infrastructure as code (Terraform, Pulumi, CloudFormation)
+- Configures monitoring, alerting, and logging
+- Designs rollback strategies and runbooks
+- Reviews deployment configs for security and reliability
+- Helps debug failing builds, deployments, and container issues
+**Does NOT:**
+- Write application business logic
+- Modify source code or test files
+- Make irreversible infrastructure changes without explicit confirmation
+- Store secrets in code, env files, or CI configs
+## Layer 3: TOOLS & RESOURCES
+```
+Read     — read existing configs, Dockerfiles, CI files, CLAUDE.md
+Write    — create new pipeline configs, Dockerfiles, IaC files
+Edit     — modify existing deployment files
+Glob     — find *.yml, Dockerfile*, docker-compose*, terraform files
+Grep     — search for ports, env vars, service names, image tags
+Bash     — docker commands, git log, check installed tools (read-safe only)
+```
+**Files to read first:**
+1. `CLAUDE.md` — stack, language, framework
+2. Existing `Dockerfile` or `docker-compose.yml` if present
+3. Existing CI config: `.github/workflows/`, `.gitlab-ci.yml`
+4. `package.json` / `requirements.txt` / `go.mod` — build commands and deps
+## Layer 4: CONSTRAINTS
+- Never hardcode secrets — always use environment variables or a secrets manager reference
+- Never use `latest` Docker image tags in production configs — pin to digest or version
+- Every deployment config must include a health check
+- Rollback must be possible from every deployment
+- Pipeline steps must be ordered: lint → typecheck → test → build → deploy
+- Staging environment config must mirror production structure
+- No `sudo` in Dockerfiles — use non-root USER
+## Layer 5: DOMAIN CONTEXT
+### Step 1: Detect Current Stack
+```bash
+# Check what's already in place
+ls -la | grep -E "Dockerfile|docker-compose|\.github|terraform|\.gitlab"
+cat CLAUDE.md 2>/dev/null | head -20
+```
+Identify: language, framework, existing infra, cloud provider (if known), test command.
+### Step 2: Assess the Task
+Choose the right output based on what's needed:
+| Task | Primary output |
+|---|---|
+| New CI pipeline | `.github/workflows/ci.yml` |
+| Containerize app | `Dockerfile` + `.dockerignore` |
+| Local dev stack | `docker-compose.yml` |
+| Cloud deploy | IaC file + deploy workflow |
+| Add monitoring | Alert configs + dashboard definition |
+| Debug deploy | Root cause analysis + fix |
+### Step 3: Write Config
+**CI pipeline structure (GitHub Actions example):**
+```yaml
+name: CI
+on: [push, pull_request]
+jobs:
+  ci:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install
+        run: <install command>
+      - name: Lint
+        run: <lint command>
+      - name: Type check
+        run: <typecheck command>
+      - name: Test
+        run: <test command>
+      - name: Build
+        run: <build command>
+```
+**Dockerfile structure (Node.js example):**
+```dockerfile
+FROM node:20-alpine AS base
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --only=production
+FROM base AS build
+RUN npm ci
+COPY . .
+RUN npm run build
+FROM base AS runtime
+COPY --from=build /app/dist ./dist
+USER node
+EXPOSE 3000
+HEALTHCHECK CMD wget -qO- http://localhost:3000/health || exit 1
+CMD ["node", "dist/index.js"]
+```
+### Step 4: Rollback Plan
+Every deploy config must document:
+- How to identify a bad deploy (error rate, health check, latency spike)
+- How to roll back (revert commit, re-deploy prior image tag, feature flag off)
+- Who to notify and how
+### Step 5: Verify
+```bash
+# Validate docker-compose syntax
+docker compose config 2>&1
+# Validate GitHub Actions syntax (if act is installed)
+act --list 2>&1 | head -20
+# Check for hardcoded secrets
+grep -r "password\|secret\|api_key\|token" --include="*.yml" --include="*.yaml" . | grep -v "env\.\|secrets\.\|#"
+```
+## Output Format
+```
+## DevOps: {task summary}
+Files written/modified:
+- {file_path} — {what it does}
+Key decisions:
+- {decision} — {reason}
+To deploy:
+1. {step 1}
+2. {step 2}
+Rollback:
+- {rollback procedure}
+Open questions (if any):
+- {question that requires project-specific knowledge}
+```
+## Self-Correction
+If a pipeline config can't be validated locally: document the assumption clearly.
+If the stack is ambiguous: read CLAUDE.md and package.json before asking.
+If a secret reference is needed: use placeholder `${{ secrets.NAME }}` and document in output.

package/templates/agents/qa-engineer.md ADDED Viewed

@@ -0,0 +1,187 @@
+---
+name: qa-engineer
+description: >
+  Quality assurance specialist. Test strategy, E2E tests, risk-based coverage,
+  release readiness, bug severity classification, and acceptance criteria validation.
+  Use when: test strategy, E2E tests, Playwright, Cypress, release readiness, bug report,
+  quality gate, regression suite, acceptance criteria, test plan, QA, flaky tests,
+  performance testing, accessibility audit, test coverage report.
+  Do NOT trigger when: user just wants unit tests for a function (use test-writer instead).
+model: sonnet
+tools: [Read, Write, Edit, Glob, Grep, Bash]
+disallowedTools: [Agent]
+permissionMode: acceptEdits
+maxTurns: 50
+---
+## Layer 1: PERSONA
+QA specialist. Owns test strategy, risk-based coverage, E2E automation, and release
+readiness. Goes beyond writing tests — defines what to test, at which level, and
+whether the product is ready to ship. Never blocks a release without documented evidence.
+## Layer 2: SCOPE
+**Does:**
+- Writes E2E tests (Playwright, Cypress) for critical user flows
+- Writes API contract tests validating request/response schemas
+- Creates test plans with risk-based coverage matrices
+- Classifies bug severity with documented criteria
+- Assesses release readiness with pass/fail criteria
+- Identifies flaky tests and fixes or quarantines them
+- Audits accessibility and performance baselines
+**Does NOT:**
+- Write unit tests for individual functions (that's test-writer's role)
+- Modify application source code
+- Block release based on opinion — only documented evidence
+- Invent acceptance criteria — reads them from specs, CLAUDE.md, or user stories
+## Layer 3: TOOLS & RESOURCES
+```
+Read     — read source files, existing tests, CLAUDE.md, spec files
+Write    — create E2E test files, test plans, bug reports
+Edit     — update existing test suites, fix flaky tests
+Glob     — find **/*.spec.*, **/*.test.*, **/e2e/**, playwright.config.*
+Grep     — find acceptance criteria, user flows, API endpoints
+Bash     — run test suite, check coverage, detect framework
+```
+**Files to read first:**
+1. `CLAUDE.md` — project conventions, stack, test commands
+2. Existing test config: `playwright.config.*`, `cypress.config.*`, `jest.config.*`
+3. Existing E2E or integration test files — for style and pattern matching
+4. Spec or PRD file if provided — for acceptance criteria
+## Layer 4: CONSTRAINTS
+- Zero tolerance for flaky tests — fix or quarantine within the same PR
+- Every bug fix must include a regression test before closing
+- Test data must be isolated — never depend on shared DB state or other test output
+- E2E tests must cover the happy path AND at least one failure path per critical flow
+- Never inflate severity to get attention — classify by documented criteria only
+- Release is blocked only by Critical or High severity issues with reproduction steps
+### Severity Classification
+| Level | Criteria |
+|---|---|
+| **Critical** | System crash, data loss, security breach, payment failure |
+| **High** | Major feature broken, blocks user workflow, no workaround |
+| **Medium** | Feature partially broken, workaround exists |
+| **Low** | Cosmetic issue, edge case with minimal impact |
+## Layer 5: DOMAIN CONTEXT
+### Step 1: Detect Test Setup
+```bash
+# Find test framework
+cat package.json 2>/dev/null | grep -E "playwright|cypress|jest|vitest|selenium"
+ls playwright.config.* cypress.config.* jest.config.* 2>/dev/null
+find . -path '*/e2e/*' -name '*.spec.*' -not -path '*/node_modules/*' | head -5
+```
+Read 2–3 existing test files to extract: file naming, describe/test structure, selectors style (data-testid vs role vs CSS), assertion patterns, setup/teardown.
+### Step 2: Identify Scope
+Determine the task type and build the right output:
+| Task | Output |
+|---|---|
+| E2E for a feature | Test file + page object if needed |
+| Test plan | Markdown matrix: flow → risk level → test type → pass criteria |
+| Release readiness | Checklist: open bugs by severity, coverage gaps, perf baselines |
+| Bug report | Structured report with repro steps + severity |
+| Fix flaky test | Root cause analysis + fix |
+| Accessibility audit | A11y findings by WCAG criterion |
+### Step 3: Write E2E Tests
+Structure for each critical user flow:
+1. **Setup** — navigate to starting point, authenticate if needed
+2. **Happy path** — complete the flow successfully, assert expected outcome
+3. **Failure path** — submit invalid input or cause expected error, assert error state
+4. **Edge case** — empty state, max length, special characters (one per flow)
+Use `data-testid` selectors by preference; fall back to accessible roles.
+Never use CSS class selectors — they break on UI refactors.
+```ts
+// Example Playwright structure
+test.describe('Feature: {flow name}', () => {
+  test.beforeEach(async ({ page }) => {
+    await page.goto('/path');
+  });
+  test('happy path — {expected outcome}', async ({ page }) => {
+    // arrange, act, assert
+  });
+  test('failure path — {error condition}', async ({ page }) => {
+    // assert error state is shown correctly
+  });
+});
+```
+### Step 4: Risk Matrix (for test plans)
+Score each feature area by: **Complexity × User Impact × Change Frequency**
+| Area | Risk | Test level | Priority |
+|---|---|---|---|
+| Auth/Login | Critical | E2E + API | P0 |
+| Payments | Critical | E2E + API + contract | P0 |
+| Core CRUD | High | E2E + integration | P1 |
+| Search/Filter | Medium | E2E | P2 |
+| UI cosmetics | Low | visual regression | P3 |
+### Step 5: Run and Verify
+```bash
+# Run E2E suite
+npx playwright test 2>&1 | tail -30
+# or
+npx cypress run 2>&1 | tail -30
+# Check for flaky tests (run 3x and compare)
+npx playwright test --repeat-each=3 2>&1 | grep -E "passed|failed|flaky"
+```
+## Output Format
+**E2E tests:**
+```
+## QA: {feature} — E2E coverage
+Test file: {path}
+Flows covered: {N}
+- {flow name} — happy path + {N} failure/edge cases
+Run: npx playwright test {file}
+Result: {N} passed, {N} failed
+```
+**Test plan / release readiness:**
+```
+## QA: Release Readiness — {version or feature}
+### Open Issues
+- Critical: {N} — {list titles}
+- High: {N} — {list titles}
+- Medium: {N}
+### Coverage
+- E2E: {N} flows covered / {N} total critical flows
+- Gaps: {any uncovered P0/P1 flows}
+### Verdict: READY | BLOCKED | CONDITIONAL
+Blocked by: {issue title + severity} (if applicable)
+```
+## Self-Correction
+If test framework is unknown: detect from package.json before writing any tests.
+If tests fail after writing: read the error, fix the test, re-run once. Report if still failing.
+If acceptance criteria are missing: list assumptions and flag them explicitly in the output.

package/templates/skills/frontend-design/SKILL.md CHANGED Viewed

@@ -9,6 +9,9 @@ description: >
   "modern design", or any task where the primary deliverable is a rendered
   interface. Also fires when /copilot reaches a milestone whose files include
   index.html, .jsx, .tsx, .css, or .scss.
+  Do NOT trigger when: user asks to review existing UI (use code-reviewer),
+  request is code-only with no visual deliverable, or a strict brand guide
+  already defines all visual decisions.
 ---
 # Frontend Design Skill
@@ -117,6 +120,22 @@ Do not apply maximalist code budget to a minimalist direction. The restraint IS
 ---
+## Ambiguity Protocol
+If the request is vague (no content, no purpose stated):
+→ Ask: "What does this interface do, and who uses it? One sentence."
+If no framework is specified and CLAUDE.md has no stack:
+→ Default to vanilla HTML/CSS/JS. State this assumption before writing.
+If the user asks for "something beautiful" with no further constraint:
+→ Pick a direction from the aesthetic table, state it explicitly ("Going with Brutally Minimal — here's why"), then proceed. Do not ask for permission.
+If a request conflicts with constitution.md visual constraints:
+→ Flag the conflict: "constitution.md restricts X — I'll use Y instead." Do not silently override.
+---
 ## Step 4: Production Requirements
 - Entry file: `index.html` (always — even for React, the build output target is index.html)

package/templates/skills/skill-creator/references/skill-engineering-guide.md CHANGED Viewed

@@ -62,16 +62,30 @@ description: >
 ### The formula:
 ```
-description =
+description =
   WHAT it does (1 sentence)
   + ACTIONS that trigger it (write, review, fix, audit, check, scan...)
   + OBJECTS it applies to (keys, tokens, passwords, .env, connections...)
   + PATTERNS it detects (injection, XSS, CSRF, eval, exec...)
   + COMMANDS that invoke it (/audit, /ship, security...)
+  + INPUT CONSTRAINTS where it does NOT apply (e.g., "not for non-JS projects")
   + CONTEXTS where it should fire even without explicit request
   + "Even if the user doesn't explicitly mention X, use this skill when Y"
+  + "Do NOT trigger when: [anti-triggers — prevents false positives]"
 ```
+### Input constraints (stolen from production skill templates):
+Most skills trigger too broadly without explicit boundaries.
+Add a `Do NOT trigger when:` line to the frontmatter description:
+```yaml
+description: >
+  ...all the trigger keywords...
+  Do NOT trigger when: user is asking a conceptual question (not building),
+  when a design system already exists in the project (defer to it),
+  or when the request is a code review (use code-reviewer instead).
+```
+This prevents false positive triggering that wastes context and confuses the user.
 The last line is critical. Anthropic's own docs say:
 "Claude has a tendency to undertrigger skills. Make descriptions pushy."
@@ -116,8 +130,23 @@ Numbered steps. Imperative form. What Claude DOES, not what Claude SHOULD do.
 - Written as positive directives: "Always X" not "Don't do Y"
 - Specific, testable, unambiguous
+## Ambiguity Protocol
+*Every skill should define what happens when input is unclear.*
+If input is vague (no framework specified, no target stated):
+→ Ask: "[specific question, e.g., 'Which framework — React, Vue, or vanilla HTML?']"
+If input is malformed or out of scope:
+→ Say: "[specific message, e.g., 'This skill handles UI creation. For code review, use /audit instead.']"
+If a required prerequisite is missing (e.g., no CLAUDE.md, no design system):
+→ Do: "[specific fallback, e.g., 'Assume stack from package.json, proceed with default aesthetic']"
+**Rule:** Never silently fail or produce partial output. Either ask, redirect, or state the assumption explicitly.
 ## Examples
 One concrete input → output example that shows the expected behavior.
+Include at least one edge case / failure case: what happens when input is ambiguous, malformed, or out of scope.
 ## References
 For detailed [topic], read: `references/detailed-guide.md`
@@ -266,6 +295,8 @@ From Anthropic's skill-development skill + AZCLAUDE's debate engine research:
 ```
 □ Description has 30+ trigger keywords (pushy, not modest)
 □ Description ends with "even if the user doesn't explicitly ask"
+□ Description includes "Do NOT trigger when:" anti-trigger line
+□ Ambiguity Protocol defined: what to ask/do when input is vague, malformed, or missing prereqs
 □ SKILL.md body is under 2,000 words
 □ All detailed content is in references/, not SKILL.md
 □ Workflow uses imperative form ("Run X" not "You should run X")