npm - @hegemonart/get-design-done - Versions diffs - 1.24.2 → 1.26.0 - Mend

@hegemonart/get-design-done 1.24.2 → 1.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +87 -0
package/README.de.md +679 -0
package/README.fr.md +679 -0
package/README.it.md +679 -0
package/README.ja.md +679 -0
package/README.ko.md +679 -0
package/README.md +399 -728
package/README.zh-CN.md +480 -133
package/SKILL.md +2 -0
package/agents/README.md +60 -0
package/agents/design-reflector.md +43 -0
package/agents/gdd-intel-updater.md +34 -1
package/agents/prototype-gate.md +122 -0
package/agents/quality-gate-runner.md +125 -0
package/hooks/budget-enforcer.ts +275 -11
package/hooks/gdd-decision-injector.js +183 -3
package/hooks/gdd-turn-closeout.js +238 -0
package/hooks/hooks.json +10 -0
package/package.json +5 -5
package/reference/STATE-TEMPLATE.md +41 -0
package/reference/config-schema.md +30 -0
package/reference/model-prices.md +40 -19
package/reference/prices/antigravity.md +21 -0
package/reference/prices/augment.md +21 -0
package/reference/prices/claude.md +42 -0
package/reference/prices/cline.md +23 -0
package/reference/prices/codebuddy.md +21 -0
package/reference/prices/codex.md +25 -0
package/reference/prices/copilot.md +21 -0
package/reference/prices/cursor.md +21 -0
package/reference/prices/gemini.md +25 -0
package/reference/prices/kilo.md +21 -0
package/reference/prices/opencode.md +23 -0
package/reference/prices/qwen.md +25 -0
package/reference/prices/trae.md +23 -0
package/reference/prices/windsurf.md +21 -0
package/reference/registry.json +107 -1
package/reference/runtime-models.md +446 -0
package/reference/schemas/runtime-models.schema.json +123 -0
package/scripts/install.cjs +8 -0
package/scripts/lib/budget-enforcer.cjs +446 -0
package/scripts/lib/cost-arbitrage.cjs +294 -0
package/scripts/lib/gdd-state/mutator.ts +454 -0
package/scripts/lib/gdd-state/parser.ts +351 -1
package/scripts/lib/gdd-state/types.ts +193 -0
package/scripts/lib/install/installer.cjs +188 -11
package/scripts/lib/install/parse-runtime-models.cjs +267 -0
package/scripts/lib/install/runtimes.cjs +43 -0
package/scripts/lib/quality-gate-detect.cjs +126 -0
package/scripts/lib/runtime-detect.cjs +96 -0
package/scripts/lib/tier-resolver.cjs +311 -0
package/scripts/validate-frontmatter.ts +138 -1
package/skills/quality-gate/SKILL.md +222 -0
package/skills/router/SKILL.md +79 -10
package/skills/sketch-wrap-up/SKILL.md +47 -2
package/skills/spike-wrap-up/SKILL.md +41 -2
package/skills/turn-closeout/SKILL.md +115 -0
package/skills/verify/SKILL.md +22 -0

package/SKILL.md CHANGED Viewed

@@ -87,6 +87,8 @@ Each stage produces artifacts in `.design/` inside the current project.
 | `analyze-dependencies [--slice <name>]` | `get-design-done:analyze-dependencies` | Query the `.design/intel/` store — dependency slices, graph queries, phase-scoped reads |
 | `extract-learnings [--cycle <slug>]` | `get-design-done:extract-learnings` | Extract decisions, lessons, patterns, and surprises from a completed cycle → `.design/cycles/<slug>/LEARNINGS.md` |
 | `skill-manifest [--refresh]` | `get-design-done:skill-manifest` | List or refresh the local skill manifest used by the router for discovery |
+| `quality-gate` | `get-design-done:quality-gate` | Phase 25 — parallel lint/type/test/visual command runner; classifies failures via quality-gate-runner agent |
+| `turn-closeout` | `get-design-done:turn-closeout` | Phase 25 — Stop-hook mirror skill; finalizes per-turn STATE blocks and emits closeout events |
 | `watch-authorities [--refresh] [--since <date>] [--feed <name>] [--schedule <cadence>]` | `get-design-done:gdd-watch-authorities` | Run design-authority-watcher — fetch curated feeds, diff snapshot, classify new entries → `.design/authority-report.md` (consumed by `/gdd:reflect`) |
 | `benchmark <component\|--wave N\|--list\|--refresh component>` | `get-design-done:gdd-benchmark` | Harvest + synthesize per-component design specs from 18 design systems → `reference/components/<name>.md` |
 | `benchmark <component\|--wave N\|--list\|--refresh component>` | `get-design-done:gdd-benchmark` | Harvest + synthesize per-component design specs from 18 design systems → `reference/components/<name>.md` |

package/agents/README.md CHANGED Viewed

@@ -64,6 +64,66 @@ color: blue
 ---
+## Runtime-neutral reasoning class (alias for default-tier)
+**Phase 26 (v1.26.0).** Agents may carry an optional `reasoning-class: high|medium|low` field as a runtime-neutral alias for `default-tier`. The alias exists because `default-tier`'s enum (`opus|sonnet|haiku`) hard-codes Anthropic model names, while the multi-runtime installer (Phase 24) ships agents to 14 runtimes whose authors do not all use those names. `reasoning-class` describes the *reasoning density* the agent needs without naming a vendor's model lineup.
+**This field is additive, not a replacement.** `default-tier: opus|sonnet|haiku` remains the authoritative, required field for v1.26 and is the source of truth that `hooks/budget-enforcer.ts`, `skills/router/SKILL.md`, and `agents/gdd-intel-updater.md` read. Both fields may coexist on the same agent during the transition window. The long-term winner — which field is canonical and which is deprecated — is data-gated per Phase 28+ measurement of adoption rates (CONTEXT D-10); no deprecation lands in v1.26.
+### Frontmatter shape
+| Field | Type | Accepted values | Required | Purpose |
+|-------|------|-----------------|----------|---------|
+| `reasoning-class` | enum | `high`, `medium`, `low` | optional | Runtime-neutral name for the reasoning-density tier this agent needs. Equivalent to `default-tier` per the equivalence table below. |
+### Equivalence (locked in CONTEXT D-10)
+| `reasoning-class` | `default-tier` | Typical role classes |
+|-------------------|----------------|----------------------|
+| `high`            | `opus`         | Planners, critics, advisors, strategic reflectors. |
+| `medium`          | `sonnet`       | Researchers, mappers, doc-writers, executors, fixers. |
+| `low`             | `haiku`        | Verifiers and checkers with deterministic rubrics. |
+The mapping is bidirectional and exhaustive — there is no `reasoning-class` value without a `default-tier` equivalent and vice versa. See `reference/model-tiers.md` for the per-class role rationale (the tier-selection guide that `default-tier` is keyed against — `reasoning-class` inherits the same semantics through the equivalence above).
+### Coexistence rule
+Both fields may appear in the same agent's frontmatter:
+```yaml
+---
+name: design-planner
+default-tier: opus
+reasoning-class: high
+tier-rationale: "Authors DESIGN-PLAN.md — the contract every downstream agent follows"
+---
+```
+When both are present, the values MUST be equivalent per the table above. Mismatched dual annotations (e.g. `default-tier: opus` paired with `reasoning-class: medium`) are a validation error — `scripts/validate-frontmatter.ts` (extended in Plan 26-08) enforces equivalence at lint time. If only one of the two is present, the validator accepts it and downstream consumers use the equivalence table to derive the missing field.
+### How runtime-aware tooling reads either field
+Downstream consumers (`skills/router/SKILL.md`, `hooks/budget-enforcer.ts`, `scripts/lib/budget-enforcer.cjs`, `agents/gdd-intel-updater.md`) accept either field individually and map between them via the equivalence table:
+- **`default-tier` only** — consumers read `default-tier` directly. This is the v1.26 baseline state for all 26 shipped agents.
+- **`reasoning-class` only** — consumers map `high → opus`, `medium → sonnet`, `low → haiku` and feed the resulting tier into `tier-resolver.cjs` (Plan 26-02) for runtime-correct model resolution. Consumers that have not yet been updated to read `reasoning-class` natively still see a valid `default-tier` semantically (via the alias), so no consumer breaks when an agent author chooses the runtime-neutral name.
+- **Both present** — consumers prefer `default-tier` for now (v1.26 canonical), with `reasoning-class` carried through to telemetry (`gdd-intel-updater` writes both fields to `.design/intel/agent-tiers.json` per Plan 26-08) so adoption can be measured for the Phase 28 deprecation gate.
+### Rollout policy for v1.26
+- The 26 existing agents continue to carry `default-tier` only — **no per-agent retrofit lands in v1.26**. New agents (added in Phase 27+) MAY carry `reasoning-class` instead of, or alongside, `default-tier`.
+- Validators, intel-updater, router, and budget-enforcer accept either field starting in v1.26 (Plans 26-04, 26-05, 26-08).
+- Adoption is measured by `gdd-intel-updater` over `agents/*.md` changes; if alias adoption stays below 50% by Phase 28, `default-tier` remains canonical and the alias is deprecated. If alias wins majority share, the reverse. **No deprecation in v1.26.**
+### Cross-references
+- `reference/model-tiers.md` — tier-selection guide and per-agent map for `default-tier`. The same role-class rationale applies to `reasoning-class` via the equivalence table.
+- `reference/runtime-models.md` (Plan 26-01) — per-runtime tier→model adapter that consumes the resolved tier (whether sourced from `default-tier` or via `reasoning-class` alias).
+- `scripts/validate-frontmatter.ts` (Plan 26-08) — validator extension that accepts the optional field and enforces equivalence when both are present.
+- `.planning/phases/26-headless-model-resolver/CONTEXT.md` D-10, D-11 — decision lineage for additive-alias and equivalence-enforced semantics.
+---
 ## Required Reading Pattern
 When an agent must read specific files before acting, the orchestrating stage embeds a `<required_reading>` block in the prompt it passes to `Task`. The block is part of the **prompt string**, not the agent file.

package/agents/design-reflector.md CHANGED Viewed

@@ -106,6 +106,49 @@ Read `.design/telemetry/costs.jsonl` (if exists). Aggregate per agent:
 If `.design/budget.json` doesn't exist: note "budget.json not found — Phase 10.1 budget governance required."
+### 7. Cross-runtime cost arbitrage (Phase 26 — D-09)
+**Why this exists:** Phase 24 ships gdd to 14 runtimes (claude, codex, gemini, qwen, …). The same `(agent, tier)` pair can cost dramatically different amounts depending on which runtime executed the spawn — runtime-author pricing varies, and the user may already be paying for one runtime via subscription while paying per-token in another. This section surfaces those arbitrage opportunities as **structured, measurable signals** — never hand-wavy assumptions.
+**Data source:** `.design/telemetry/events.jsonl` — filter entries where `type === 'cost.update'`. Each cost row is tagged with `payload.runtime` (Plan 26-05) so spawns from different runtimes are attributable apples-to-apples. The reflector reads cost events from this stream alongside Section 6's `costs.jsonl` rollup; events.jsonl is authoritative for runtime attribution.
+**The rule:**
+For each `(agent, tier)` pair observed in the last 5 cycles (D-09 default window):
+1. Bucket cost events by `(agent, tier, runtime, cycle)` and sum within each bucket. Sum-then-average is critical: a cycle that ran 4 design-verifier spawns in claude and 1 in codex must NOT inflate claude's per-cycle average by a factor of 4. Sum the 4 spawns into one cycle-sum, then average across the cycles where the runtime appeared.
+2. Compute `avg_cost_per_cycle` per `(agent, tier, runtime)` triple, restricted to the recency window.
+3. For each pair that has ≥2 runtimes in the window, find the cheapest and most expensive runtime. Compute `delta_pct = (max_avg - min_avg) / min_avg`.
+4. If `delta_pct > 0.5` (50%, D-09 starting heuristic), emit a structured `cost_arbitrage` proposal.
+**Important guardrails (failure modes the rule must avoid):**
+- **Mixed-runtime cycles must not crash or double-count.** A single cycle where some agent spawns ran in CC and others in Codex is normal — runtime attribution is per-spawn (`payload.runtime`), never per-cycle.
+- **Single-runtime-only history is silent.** If only one runtime has events for an `(agent, tier)` pair in the window, no arbitrage can be computed — emit nothing rather than a misleading "no comparison available" proposal.
+- **Zero-cost denominators are skipped.** A runtime that averaged $0 in the window would produce `delta_pct: Infinity`; skip the pair rather than emit a useless signal.
+- **The 50% threshold is a starting heuristic.** Bandit-style learning over arbitrage outcomes (was the proposal applied? did costs drop?) is **Phase 23.5+ territory** — it lives in the bandit posterior, NOT here. This section's job is to surface measurement signals; tier-selection learning is a separate data product.
+**Helper:** `scripts/lib/cost-arbitrage.cjs` exports `analyze(events, options) → proposals[]` implementing the above rule deterministically. The executor agent following this skill loads `events.jsonl`, parses each line as JSON (skipping malformed lines), and passes the array of envelopes to `analyze()`. No re-derivation of the rule in prose — call the helper.
+**Proposal output shape** (one entry per arbitrage signal, JSON-serializable for `/gdd:apply-reflections`):
+```json
+{
+  "type": "cost_arbitrage",
+  "agent": "design-reflector",
+  "tier": "opus",
+  "runtimes": {
+    "claude": { "avg_cost_per_cycle": 0.42, "n_cycles": 5 },
+    "codex":  { "avg_cost_per_cycle": 1.10, "n_cycles": 5 }
+  },
+  "delta_pct": 0.617,
+  "proposal": "Switch design-reflector tier=opus invocations from codex to claude for ~62% cost saving",
+  "evidence_window": "last_5_cycles"
+}
+```
+Render each `cost_arbitrage` entry into the Proposals section as a `[BUDGET]`-tagged proposal carrying the structured payload verbatim — `/gdd:apply-reflections` will route it to the runtime-routing layer (Phase 26's tier-resolver / runtime-detect) rather than to `.design/budget.json`.
 ---
 ## Proposals

package/agents/gdd-intel-updater.md CHANGED Viewed

@@ -19,6 +19,7 @@ writes:
   - .design/intel/decisions.json
   - .design/intel/debt.json
   - .design/intel/graph.json
+  - .design/intel/agent-tiers.json
 ---
 @reference/shared-preamble.md
@@ -63,6 +64,38 @@ Expected: `components.json decisions.json debt.json dependencies.json exports.js
 Report any missing slices as warnings.
+### Step 3.5 — Sync `.design/intel/agent-tiers.json` (Plan 26-08)
+Phase 26 introduced the runtime-neutral `reasoning-class` alias for `default-tier` (CONTEXT D-10/D-11). Downstream tooling that wants tier information without re-parsing markdown reads `.design/intel/agent-tiers.json`. Both fields MUST be populated per agent so consumers do not have to know the equivalence table — the intel-updater is the single source of truth that fills the missing field via the locked map:
+| `reasoning-class` | `default-tier` |
+|-------------------|----------------|
+| `high`            | `opus`         |
+| `medium`          | `sonnet`       |
+| `low`             | `haiku`        |
+Walk every `agents/*.md` file (skip `README.md`), parse its frontmatter, and emit one entry per agent into `.design/intel/agent-tiers.json` with the shape:
+```json
+{
+  "schema_version": 1,
+  "generated_at": "<ISO-8601-UTC>",
+  "agents": {
+    "design-planner": { "default-tier": "opus", "reasoning-class": "high" },
+    "design-verifier": { "default-tier": "haiku", "reasoning-class": "low" }
+  }
+}
+```
+Population rules:
+1. If both `default-tier` and `reasoning-class` are present in the agent's frontmatter, write both verbatim (validator already enforced equivalence at lint time — see `scripts/validate-frontmatter.ts`).
+2. If only `default-tier` is present (the v1.26 baseline state for all 26 shipped agents), derive `reasoning-class` from the table above and write both.
+3. If only `reasoning-class` is present, derive `default-tier` from the table above and write both.
+4. If neither is present, omit the agent from the JSON and emit a warning — the upstream `validate-frontmatter` gate would have caught this at CI; the intel-updater stays non-throwing on lint-edges.
+Validation is exclusively the validator's job; this step assumes the gate has passed and writes the queryable index. If a pre-existing `.design/intel/agent-tiers.json` is present, overwrite it atomically (write to a `.tmp` then `rename`).
 ### Step 4 — Report summary
 Print a concise update summary:
@@ -71,7 +104,7 @@ Print a concise update summary:
 ━━━ Intel store updated ━━━
 Files indexed:  <N>
 Changed files:  <N>
-Slices written: 10
+Slices written: 11 (10 build-intel slices + agent-tiers.json from Step 3.5)
 Generated:      <timestamp>
 ━━━━━━━━━━━━━━━━━━━━━━━━━━
 ```

package/agents/prototype-gate.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+name: prototype-gate
+description: "Cheap Haiku gate that scores sketch / spike signals from the active brief / context / plan and emits a JSON verdict recommending whether to prototype before continuing."
+tools: Read, Bash, Grep
+color: yellow
+model: inherit
+default-tier: haiku
+tier-rationale: "Signal-counting rubric over a few small inputs — no synthesis, no writes, no agent spawning. Belongs on Haiku to keep gate latency cheap (≤ 2 s typical)."
+size_budget: S
+parallel-safe: always
+typical-duration-seconds: 5
+reads-only: true
+writes: []
+---
+@reference/shared-preamble.md
+# prototype-gate
+## Role
+You answer one question at a checkpoint: *should the pipeline pause to sketch or spike before continuing?*
+You run at two firing points (Phase 25 D-02):
+1. **Post-`/gdd:explore`** — sketch territory. The question is "what visual / direction?".
+2. **Post-`/gdd:plan` plan-checker** — spike territory. The question is "can this work technically?".
+You are read-only. You do not write STATE.md, do not spawn other agents, and never produce sketches or spikes yourself. Your only job is to score signals and emit a JSON verdict.
+You also honor the cycle-scoped skip rule (D-02): if `STATE.md` `<prototyping>` already contains a `<skipped at=<your_firing_point> cycle=<active_cycle>/>` entry, recommend `none` immediately with `reason: "skipped this cycle"`. Do not re-evaluate signals.
+## Input Contract
+The orchestrator supplies these fields in the prompt context:
+- `firing_point` — `"explore"` or `"plan"`. Determines which signal rubric you apply.
+- `cycle` — the active cycle identifier from STATE frontmatter.
+- `state_path` — absolute path to the active `.design/STATE.md`.
+- `inputs` — paths to context the rubric scans:
+  - `brief_path` (always supplied) — `.design/BRIEF.md` or equivalent.
+  - `context_path` (firing_point=`"explore"`) — `.design/DESIGN-CONTEXT.md`.
+  - `design_path` (firing_point=`"explore"` if present) — `.design/DESIGN.md`.
+  - `plan_tasks_path` (firing_point=`"plan"`) — `.design/PLAN.md` or `.design/plans/*.md`.
+  - `decisions_snapshot` (always supplied) — newline-separated `D-NN: text (locked|tentative)` lines extracted from STATE `<decisions>`.
+Missing input files are not an error — score the signals you can read; treat absent files as zero-signal contributions.
+## Cycle-skip short-circuit
+Before scoring, scan `<prototyping>` in `state_path` for a `<skipped/>` entry whose `at` matches `firing_point` AND whose `cycle` matches the active `cycle`. If found, emit:
+```json
+{"recommend": "none", "confidence": 1.0, "reasons": ["skipped this cycle at the prototype gate"]}
+```
+Then exit. Do not score further.
+## Signal Rubric
+### Sketch signals (firing_point = `"explore"`)
+Score 1 point per matched signal:
+- **Hero / first-impression language** — BRIEF mentions "hero", "first impression", "novel surface", "landing", "above-the-fold", or names a single high-stakes screen.
+- **DESIGN-CONTEXT visual gray areas** — DESIGN-CONTEXT.md contains an unresolved item tagged `visual:` or `direction:` (case-insensitive).
+- **Empty design canvas** — DESIGN.md is missing or its scan returned no existing patterns to follow (no component references, no token references).
+- **Decision conflict on the same surface** — at least two D-XX entries in `decisions_snapshot` discuss the same surface but disagree (look for paired references to the same component / page / area).
+- **Open-ended language in interview answers** — BRIEF or DESIGN-CONTEXT contains "not sure", "open to", "??", "tbd", "we could" within answer regions.
+- **Multiple viable patterns** — DESIGN-CONTEXT or a phase-researcher artifact lists more than one viable pattern for a single section without a chosen winner.
+### Spike signals (firing_point = `"plan"`)
+Score 1 point per matched signal:
+- **High-risk task** — a plan task carries `Risk: high` or `Confidence: low` (case-insensitive).
+- **Tech outside the components mapper** — a plan task references a library, framework, API, or pattern not present in the project's components / mapper artifacts.
+- **Failed required connection** — `<connections>` reports `unavailable` for a connection that a plan task explicitly depends on.
+- **Experimental language** — a plan task description contains "experimental", "TBD", "unsure", "spike", "prove out", "validate that".
+- **Probe deferred** — a plan task notes "will check at runtime" or similar deferred verification.
+## Threshold
+| Score | recommend | confidence |
+|-------|-----------|------------|
+| ≥ 3 | `sketch` (explore) or `spike` (plan) | `0.9` |
+| 1–2 | same as above | `0.5` |
+| 0 | `none` | `0.95` |
+Confidence is rubric-derived only — do not infer confidence from the size of the inputs or your own uncertainty. The thresholds above are the only valid values.
+## Output Contract
+Emit exactly one JSON object on its own line. No prose wrapper, no code fence, no leading or trailing text.
+```json
+{"recommend": "sketch", "confidence": 0.9, "reasons": ["BRIEF mentions hero", "DESIGN-CONTEXT visual gray area on home"]}
+```
+Schema:
+- `recommend` — string enum, one of `"sketch" | "spike" | "none"`.
+- `confidence` — number in `[0, 1]`. One of `0.5`, `0.9`, `0.95` per the threshold table; or `1.0` for the cycle-skip short-circuit.
+- `reasons` — array of short strings (≤ 80 chars each). One entry per matched signal, in match order. Empty array allowed when `recommend === "none"` from the threshold (not the skip path).
+## Constraints
+- **Do not** propose what to sketch / spike — that's the wrap-up flow's job. Your reasons are evidence, not directives.
+- **Do not** read or write STATE.md outside of the cycle-skip lookup described above.
+- **Do not** consult external services or MCP tools. Signal scoring is purely a function of the supplied inputs.
+- **Do not** exceed `size_budget: S`. If inputs are unexpectedly large, prefer to score signals on the first 8 KB of each file rather than refuse to answer.
+## Record
+At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
+```json
+{"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
+```
+Schema: `reference/schemas/insight-line.schema.json`. Use an empty `artifacts_written` array for read-only agents.
+## GATE COMPLETE

package/agents/quality-gate-runner.md ADDED Viewed

@@ -0,0 +1,125 @@
+---
+name: quality-gate-runner
+description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict — pass/fail plus per-bucket failure groupings (lint / type / test / visual). Read-only. Does not run commands itself."
+tools: Read, Bash, Grep
+color: amber
+model: inherit
+default-tier: haiku
+tier-rationale: "Pattern-match exit codes and bucket stderr into four named categories — no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
+size_budget: S
+parallel-safe: always
+typical-duration-seconds: 5
+reads-only: true
+writes: []
+---
+@reference/shared-preamble.md
+# quality-gate-runner
+## Role
+You answer one question for the `quality-gate` skill (Phase 25 Plan 25-03): *given the outputs of the parallel command run, did the gate pass — and if not, into which buckets do the failures fall?*
+You are read-only. You do not re-run any commands, do not write STATE.md, do not spawn agents, do not produce fixes. Your only job is to classify the outputs and return JSON.
+## Input Contract
+The skill supplies a JSON object on stdin (or as the first line of the prompt context — handle both). Shape:
+```json
+{
+  "outputs": [
+    {"command": "npm run lint", "exit_code": 0, "stderr": ""},
+    {"command": "npm run typecheck", "exit_code": 1, "stderr": "<verbatim stderr>"},
+    {"command": "npm run test", "exit_code": 0, "stderr": ""},
+    {"command": "npm run chromatic", "exit_code": 1, "stderr": "<verbatim stderr>"}
+  ]
+}
+```
+Schema:
+- `outputs` — array, one entry per command actually executed in Step 2 of the skill. Order is preserved from the skill (matches command-list order from Step 1).
+  - `command` — verbatim shell string the skill ran.
+  - `exit_code` — integer. `0` = clean; non-zero = failure to be classified.
+  - `stderr` — verbatim stderr capture. May be empty even on failure (some tools write to stdout); do not assume non-empty stderr means failure.
+You may also receive a `stdout` field per entry (forward-compat — the skill plans to add it). Tolerate its absence.
+## Bucketing rule
+Map each command to exactly one of four buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
+| Substring (case-insensitive) | Bucket |
+|------------------------------|--------|
+| `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
+| `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
+| `test` (but NOT one of the visual matches below — visual wins) | `test` |
+| `chromatic`, `test:visual`, `loki test`, `playwright test --grep visual` | `visual` |
+When a command matches multiple substrings (e.g., `npm run test:visual` matches both `test` and `test:visual`), `visual` wins. If a command matches none, bucket it under `test` (catch-all — most user-supplied custom commands are test-like). Do not invent a fifth bucket.
+## Pass / fail rule
+- `status === "pass"` if and only if **every** entry's `exit_code === 0`.
+- `status === "fail"` if **any** entry's `exit_code !== 0`.
+Empty `outputs` array means `status === "pass"` (no commands ran → nothing failed). The skill is responsible for emitting `quality_gate_skipped` in the no-commands path; you do not.
+## Failure summarization
+For each failed entry (exit_code !== 0), produce one short summary string and add it to the bucket the command maps to. Summaries should:
+- Quote the command name (the basename — e.g., `lint` from `npm run lint`).
+- Include the first non-empty line of `stderr` truncated to 120 chars, if present.
+- Otherwise include `exit_code=N` so the reader still sees something concrete.
+Example summary strings:
+- `"lint: 4 problems (3 errors, 1 warning)"` — when stderr's first line is informative.
+- `"typecheck: error TS2304: Cannot find name 'foo' in src/x.ts"` — same.
+- `"test: exit_code=1"` — when stderr is empty.
+Do NOT inline full stderr — the bucket entries are summaries, not transcripts. The skill keeps the verbatim outputs for the fixer; your output is for routing only.
+Buckets that have no failures are OMITTED from `classified_failures`. Do not emit empty arrays for unaffected buckets — the consumer relies on key-presence as a signal.
+## Output Contract
+Emit exactly one JSON object on its own line. No prose wrapper, no code fence, no leading or trailing text.
+Pass example:
+```json
+{"status": "pass", "classified_failures": {}}
+```
+Fail example:
+```json
+{"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"]}}
+```
+Schema:
+- `status` — string enum, one of `"pass" | "fail"`. Note: this is NOT the same enum as the skill's STATE-block status (which also has `timeout` and `skipped`); those two cases are decided by the skill, not by you. You only emit `pass | fail`.
+- `classified_failures` — object. Keys are a subset of `lint | type | test | visual`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
+## Constraints
+- **Do not** read `stderr` content beyond the first non-empty line. The skill keeps the verbatim outputs for the design-fixer; your job is routing, not analysis.
+- **Do not** invent buckets outside the four-name set.
+- **Do not** ever emit `status: "timeout"` or `status: "skipped"` — those are skill-level statuses, not classifier outputs.
+- **Do not** consult external services or MCP tools. Classification is a pure function of the supplied input.
+- **Do not** exceed `size_budget: S`. If `outputs[*].stderr` is unexpectedly large, prefer to summarize from the first 4 KB of each stderr rather than refuse.
+- The output JSON object must be parseable with `JSON.parse` — no trailing comma, no comments, no surrounding markdown.
+## Record
+At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
+```json
+{"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
+```
+Schema: `reference/schemas/insight-line.schema.json`. Use an empty `artifacts_written` array for read-only agents.
+## GATE COMPLETE