npm - @hegemonart/get-design-done - Versions diffs - 1.57.1 → 1.57.2 - Mend

@hegemonart/get-design-done 1.57.1 → 1.57.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

package/.claude-plugin/marketplace.json +26 -41
package/.claude-plugin/plugin.json +23 -48
package/CHANGELOG.md +91 -0
package/README.md +166 -511
package/SKILL.md +2 -0
package/agents/README.md +33 -36
package/agents/a11y-mapper.md +3 -3
package/agents/component-benchmark-harvester.md +6 -6
package/agents/component-benchmark-synthesizer.md +3 -3
package/agents/compose-executor.md +3 -3
package/agents/cost-forecaster.md +2 -2
package/agents/design-auditor.md +7 -7
package/agents/design-authority-watcher.md +15 -15
package/agents/design-context-builder.md +4 -4
package/agents/design-context-checker-gate.md +1 -1
package/agents/design-discussant.md +2 -2
package/agents/design-doc-writer.md +1 -1
package/agents/design-executor.md +2 -2
package/agents/design-figma-writer.md +2 -2
package/agents/design-fixer.md +7 -7
package/agents/design-integration-checker-gate.md +1 -1
package/agents/design-integration-checker.md +1 -1
package/agents/design-paper-writer.md +3 -3
package/agents/design-pencil-writer.md +1 -1
package/agents/design-planner.md +21 -0
package/agents/design-reflector.md +39 -39
package/agents/design-research-synthesizer.md +1 -0
package/agents/design-start-writer.md +1 -1
package/agents/design-update-checker.md +5 -5
package/agents/design-verifier-gate.md +1 -1
package/agents/design-verifier.md +52 -48
package/agents/ds-generator.md +2 -2
package/agents/ds-migration-planner.md +4 -4
package/agents/email-executor.md +9 -9
package/agents/experiment-result-ingester.md +3 -3
package/agents/flutter-executor.md +5 -5
package/agents/gdd-graph-refresh.md +3 -3
package/agents/gdd-intel-updater.md +2 -2
package/agents/motion-mapper.md +2 -2
package/agents/motion-verifier.md +4 -4
package/agents/pdf-executor.md +8 -8
package/agents/perf-analyzer.md +17 -17
package/agents/pr-commenter.md +9 -9
package/agents/prototype-gate.md +2 -2
package/agents/quality-gate-runner.md +1 -1
package/agents/rollout-coordinator.md +3 -3
package/agents/swift-executor.md +4 -4
package/agents/ticket-sync-agent.md +6 -6
package/agents/user-research-synthesizer.md +2 -2
package/connections/connections.md +44 -45
package/connections/cursor.md +73 -0
package/connections/preview.md +3 -3
package/dist/claude-code/.claude/skills/cache-manager/SKILL.md +3 -3
package/dist/claude-code/.claude/skills/cache-manager/cache-policy.md +1 -1
package/dist/claude-code/.claude/skills/design/SKILL.md +19 -0
package/dist/claude-code/.claude/skills/explore/SKILL.md +11 -0
package/dist/claude-code/.claude/skills/figma-write/SKILL.md +13 -2
package/dist/claude-code/.claude/skills/paper-write/SKILL.md +54 -0
package/dist/claude-code/.claude/skills/pencil-write/SKILL.md +54 -0
package/dist/claude-code/.claude/skills/report-issue/SKILL.md +2 -2
package/dist/claude-code/.claude/skills/router/SKILL.md +2 -2
package/dist/claude-code/.claude/skills/verify/verify-procedure.md +10 -11
package/dist/claude-code/.claude/skills/warm-cache/SKILL.md +1 -1
package/hooks/first-run-nudge.cjs +171 -0
package/hooks/gdd-intel-trigger.js +243 -0
package/hooks/gdd-mcp-circuit-breaker.js +62 -7
package/hooks/gdd-precompact-snapshot.js +50 -29
package/hooks/gdd-protected-paths.js +150 -18
package/hooks/gdd-risk-gate.js +93 -1
package/hooks/gdd-sessionstart-recap.js +59 -24
package/hooks/hooks.json +13 -4
package/hooks/inject-using-gdd.cjs +188 -0
package/hooks/update-check.cjs +511 -0
package/package.json +9 -2
package/reference/STATE-TEMPLATE.md +10 -13
package/reference/audit-scoring.md +1 -1
package/reference/cache-tier-doctrine.md +46 -0
package/reference/config-schema.md +9 -9
package/reference/i18n.md +1 -1
package/reference/intel-schema.md +37 -2
package/reference/meta-rules.md +4 -4
package/reference/model-tiers.md +2 -2
package/reference/registry.json +101 -94
package/reference/runtime-models.md +11 -1
package/reference/shared-preamble.md +13 -14
package/reference/skill-graph.md +24 -1
package/scripts/bootstrap.cjs +373 -0
package/scripts/injection-patterns.cjs +58 -0
package/scripts/lib/apply-reflections/incubator-proposals.cjs +57 -26
package/scripts/lib/install/converters/codex-plugin.cjs +5 -2
package/scripts/lib/install/converters/cursor.cjs +20 -0
package/scripts/lib/issue-reporter/report-flow.cjs +1 -1
package/scripts/lib/manifest/skills.json +80 -13
package/scripts/lib/state/query-surface.cjs +67 -9
package/scripts/lib/state/state-store.cjs +68 -26
package/sdk/cli/commands/stage.ts +17 -0
package/sdk/cli/index.js +14 -0
package/skills/cache-manager/SKILL.md +3 -3
package/skills/cache-manager/cache-policy.md +1 -1
package/skills/design/SKILL.md +19 -0
package/skills/explore/SKILL.md +11 -0
package/skills/figma-write/SKILL.md +13 -2
package/skills/paper-write/SKILL.md +54 -0
package/skills/pencil-write/SKILL.md +54 -0
package/skills/report-issue/SKILL.md +2 -2
package/skills/router/SKILL.md +2 -2
package/skills/verify/verify-procedure.md +10 -11
package/skills/warm-cache/SKILL.md +1 -1
package/hooks/first-run-nudge.sh +0 -82
package/hooks/inject-using-gdd.sh +0 -72
package/hooks/update-check.sh +0 -251
package/scripts/lib/audit-aggregator/index.cjs +0 -219
package/scripts/lib/hedge-ensemble.cjs +0 -217

package/agents/design-executor.md CHANGED Viewed

@@ -6,7 +6,7 @@ color: yellow
 default-tier: sonnet
 tier-rationale: "Follows an Opus-authored plan; executes rather than plans"
 size_budget: XXL
-size_budget_rationale: "Phase 17 added Benchmark Spec Pre-Flight section for type:components (+17 lines)"
+size_budget_rationale: "Benchmark Spec Pre-Flight section for type:components adds ~17 lines"
 parallel-safe: conditional-on-touches
 typical-duration-seconds: 60
 reads-only: false
@@ -34,7 +34,7 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 - `.design/STATE.md` - pipeline state (decisions, blockers, must-haves)
 - `.design/DESIGN-PLAN.md` - full task list (your task is identified by task_id)
 - `.design/DESIGN-CONTEXT.md` - brand decisions, constraints, locked choices
-- The reference file(s) relevant to the task type (e.g., `reference/typography.md` for a typography task). The 7 domain-index entry-points `reference/{typography,color,spatial,motion,interaction,responsive,ux-writing}.md` (Phase 45) are the navigation start: load the index, drill into the fragments it lists only as the task needs them.
+- The reference file(s) relevant to the task type (e.g., `reference/typography.md` for a typography task). The 7 domain-index entry-points `reference/{typography,color,spatial,motion,interaction,responsive,ux-writing}.md` are the navigation start: load the index, drill into the fragments it lists only as the task needs them.
 **Invariant:** read all listed files FIRST, before making any changes.

package/agents/design-figma-writer.md CHANGED Viewed

@@ -153,8 +153,8 @@ Build a numbered operation list based on mode. Do not execute yet.
 ```
 Proposed annotations (N operations):
-1. Layer "Button/Primary" → add comment: "Background: brand-primary-500 (#1A73E8) per D-03"
-2. Layer "Typography/H1" → add comment: "Font: Inter 32/40 per D-07"
+1. Layer "Button/Primary" → add comment: "Background: brand-primary-500 (#1A73E8) per color decision"
+2. Layer "Typography/H1" → add comment: "Font: Inter 32/40 per typography decision"
 ... (one line per annotation)
 ```

package/agents/design-fixer.md CHANGED Viewed

@@ -23,7 +23,7 @@ You fix design gaps atomically. One agent invocation = fix all in-scope gaps fro
 You have zero session memory. Every invocation starts fresh. The orchestrating stage supplies all context via the `<required_reading>` block and prompt context fields - you rely entirely on those inputs.
-**Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Phase 5 — Gaps`. You commit one fix per gap. You do nothing else.
+**Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Stage 5 — Gaps`. You commit one fix per gap. You do nothing else.
 **Accessibility failures route here too.** When the quality-gate skill classifies a failure into the `a11y` bucket (sourced from axe / pa11y / lighthouse / jsx-a11y runs), it spawns you with that failure exactly like a `lint`, `type`, `test`, or `visual` failure. Treat an `a11y` classified failure as a normal in-scope fix: read the cited rule, apply the minimal source change that clears the violation (a missing label, an aria attribute, a contrast token), confirm the fix, and commit one fix per gap. No special handling beyond the standard fix sequence below.
@@ -43,7 +43,7 @@ You have zero session memory. Every invocation starts fresh. The orchestrating s
 The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every listed file before acting - this is mandatory. Minimum expected files:
 - `.design/STATE.md` - pipeline state, blockers, decisions
-- `.design/DESIGN-VERIFICATION.md` - gaps to fix (## Phase 5 - Gaps section)
+- `.design/DESIGN-VERIFICATION.md` - gaps to fix (## Stage 5 - Gaps section)
 - `.design/DESIGN-CONTEXT.md` - locked D-XX decisions; do not contradict them
 **Invariant:** read all listed files FIRST, before making any changes.
@@ -64,11 +64,11 @@ The stage embeds the following fields in the prompt:
 ## Gap Input Format
-Gaps are produced by design-verifier Phase 5 and written to the `## Phase 5 — Gaps` section of `.design/DESIGN-VERIFICATION.md`. The format is locked:
+Gaps are produced by design-verifier Stage 5 and written to the `## Stage 5 — Gaps` section of `.design/DESIGN-VERIFICATION.md`. The format is locked:
 ```
 ### [BLOCKER|MAJOR|MINOR|COSMETIC] G-NN: [title]
-- Phase: [1|2|3|4]
+- Stage: [1|2|3|4]
 - Description: [what is broken]
 - Expected: [what should be true]
 - Actual: [what is true]
@@ -85,12 +85,12 @@ Parse every entry in that section. The `G-NN` identifier, severity classificatio
 ### Step 1 - Read gaps and filter by scope
 1. Read `.design/DESIGN-VERIFICATION.md`.
-2. Locate the `## Phase 5 — Gaps` section (or `## GAPS FOUND` if verifier used that heading).
+2. Locate the `## Stage 5 — Gaps` section (or `## GAPS FOUND` if verifier used that heading).
 3. Parse all gap entries in locked G-NN format.
 4. Filter by severity based on `auto_mode`:
    - Always include: `BLOCKER`, `MAJOR`
    - Include only if `auto_mode=true`: `MINOR`, `COSMETIC`
-5. **Confidence routing filter (Phase 49, see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
+5. **Confidence routing filter (see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
 6. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
 If no in-scope gaps are found (e.g., verifier found only MINOR gaps and `auto_mode=false`), emit `## FIX COMPLETE` immediately with "No in-scope gaps to fix."
@@ -125,7 +125,7 @@ f. **Record status.** Note `G-NN: fixed` in your running tracker.
 - **Rule 3 - Blocking issue:** If something prevents applying this specific fix (missing import, wrong file structure), resolve the blocking issue first, then apply the fix → continue.
 - **Rule 4 - Architectural change required:** If resolving the gap requires a new DB table, major schema change, switching libraries, or breaking API changes → DO NOT force a fix. Classify as unresolvable and proceed to Step 3 for this gap.
-### Step 2.5 - Confidence x risk routing (Phase 56)
+### Step 2.5 - Confidence x risk routing
 Step 1's confidence filter (`scripts/lib/confidence-route.cjs`) already dropped tentative and low-confidence gaps. Step 2.5 adds the action-risk dimension: a fix that is correct can still be dangerous to APPLY (touching STATE.md, a schema, a hook, a large diff). Score the write, then combine score and confidence into one routing decision per gap.

package/agents/design-integration-checker-gate.md CHANGED Viewed

@@ -88,7 +88,7 @@ You MAY:
 ## Why this agent exists
-Per 10.1-CONTEXT decision **D-21** (Lazy Checker Spawning): "Cheap Haiku gate agents at `agents/*-gate.md` decide whether to spawn full checker. If false, skip full checker, log as `lazy_skipped: true` in telemetry." This gate is the integration-checker-specific instance of that pattern - the full `design-integration-checker` is a LARGE-size post-verification spawn that grep-walks the codebase for D-XX decision application. If no decision or anchor doc moved in the diff, the wiring result is unchanged from the last verify and the spawn is wasted cost.
+Lazy Checker Spawning: cheap Haiku gate agents at `agents/*-gate.md` decide whether to spawn the full checker. If false, the full checker is skipped and logged as `lazy_skipped: true` in telemetry. This gate is the integration-checker-specific instance of that pattern - the full `design-integration-checker` is a LARGE-size post-verification spawn that grep-walks the codebase for D-XX decision application. If no decision or anchor doc moved in the diff, the wiring result is unchanged from the last verify and the spawn is wasted cost.
 ## Record

package/agents/design-integration-checker.md CHANGED Viewed

@@ -21,7 +21,7 @@ writes: []
 You are a post-execution design decision wiring verifier. You confirm that each D-XX design decision recorded in `.design/DESIGN-CONTEXT.md` is actually reflected in the source code - not just described in planning documents.
-You are spawned by the verify stage **AFTER** design-verifier completes. You supplement the verifier's gap list with decision-wiring status: decisions that are documented but not applied in code are gaps that escaped Phase 1–4 verification.
+You are spawned by the verify stage **AFTER** design-verifier completes. You supplement the verifier's gap list with decision-wiring status: decisions that are documented but not applied in code are gaps that escaped Stages 1–4 verification.
 You run once per verify session. You are read-only - no Write tool. Your findings are returned inline and incorporated into the verify stage's gap-response loop.

package/agents/design-paper-writer.md CHANGED Viewed

@@ -60,11 +60,11 @@ STOP.
 Read `.design/DESIGN-CONTEXT.md`. Build a numbered operation list per mode. Do NOT execute yet.
-**annotate mode** - extract confirmed D-XX decisions, map to canvas nodes:
+**annotate mode** - extract confirmed decisions, map to canvas nodes:
 ```
 Proposed annotations (N operations):
-1. Node "Button/Primary" → add_comment: "bg: brand-primary-500 per D-03"
-2. Node "Typography/H1" → add_comment: "font: Inter 32/40 per D-07"
+1. Node "Button/Primary" → add_comment: "bg: brand-primary-500 per color decision"
+2. Node "Typography/H1" → add_comment: "font: Inter 32/40 per typography decision"
 ```
 **tokenize mode** - extract CSS literal values, map to paper.design style updates:

package/agents/design-pencil-writer.md CHANGED Viewed

@@ -49,7 +49,7 @@ Parse mode: `annotate | roundtrip` (required). If absent, list modes and STOP.
 **annotate mode** - read `.design/DESIGN-DEBT.md`, map findings to .pen components:
 ```
 Proposed annotations (N operations):
-1. Button.pen → add comment: "DEBT: padding token mismatch — D-03 says 8px, impl uses 10px"
+1. Button.pen → add comment: "DEBT: padding token mismatch — decision says 8px, impl uses 10px"
 2. Modal.pen → add comment: "DEBT: missing focus-trap per accessibility audit"
 ```

package/agents/design-planner.md CHANGED Viewed

@@ -24,6 +24,27 @@ You are the design-planner agent. Spawned by the `plan` stage after optional res
 Do not start design work, generate code, or modify any file outside `.design/`. Your output is the plan that the `design` stage will execute.
+## Output Contract
+Emit a single top-of-response fenced ```json block conforming to `reference/output-contracts/planner-decision.schema.json` BEFORE any prose. The envelope captures the typed plan summary so downstream stages can consume it without re-parsing markdown.
+The envelope shape:
+```json
+{
+  "schema_version": "1.0.0",
+  "plan_id": "<dated slug, e.g. 2026-06-04-dashboard>",
+  "tasks": [
+    { "task_id": "T-1", "summary": "<one line>", "touches": ["<glob>"], "dependencies": [], "parallel_safe": true, "estimated_minutes": 30 }
+  ],
+  "waves": [
+    { "wave": "A", "task_ids": ["T-1"] }
+  ]
+}
+```
+After the envelope, continue with the human-readable plan body in prose + markdown tables (the existing format). The DESIGN-PLAN.md file you write continues to include both; the envelope at the top, then the prose. The `parse-contract.cjs#parsePlannerDecision` consumer reads only the envelope; orchestrators and reviewers read the prose.
 ---
 ## Required Reading

package/agents/design-reflector.md CHANGED Viewed

@@ -5,7 +5,7 @@ tools: Read, Write, Bash, Grep, Glob
 color: purple
 model: inherit
 default-tier: opus
-tier-rationale: "Phase 11 strategic reflector; reads telemetry + proposes plugin-level changes"
+tier-rationale: "Strategic reflector; reads telemetry + proposes plugin-level changes"
 size_budget: XL
 parallel-safe: never
 typical-duration-seconds: 60
@@ -24,7 +24,7 @@ You are a post-cycle reflection agent. You analyze what happened in a design cyc
 ## Event-Stream Mode (Phase 20 onwards)
-The reflector now reads proposals from `.design/telemetry/events.jsonl` - the append-only event stream introduced by Plan 20-06. It filters entries where `type === 'reflection.proposal'`. Each matching line is a JSON object whose `payload` carries fields like `{ source: <skill|hook>, proposal_kind: <string>, rationale: <string>, ... }` emitted by the producing skill or hook.
+The reflector reads proposals from `.design/telemetry/events.jsonl` - the append-only event stream. It filters entries where `type === 'reflection.proposal'`. Each matching line is a JSON object whose `payload` carries fields like `{ source: <skill|hook>, proposal_kind: <string>, rationale: <string>, ... }` emitted by the producing skill or hook.
 Read flow:
@@ -33,17 +33,17 @@ Read flow:
 3. Collect every entry where `type === 'reflection.proposal'`. Render each payload into the appropriate Proposals section below.
 4. Cross-reference the event's `stage`, `cycle`, and `_meta.source` fields when citing evidence.
-Legacy grep-based parsing of skill outputs is preserved as a fallback for skills that haven't yet migrated to emit `reflection.proposal` events (Phase 22 scope). If no `reflection.proposal` events are present in the stream, run the legacy harvest across `.design/learnings/*.md` and `.design/intel/` exactly as before - both paths produce the same Proposals section format.
+Legacy grep-based parsing of skill outputs is preserved as a fallback for skills that haven't yet migrated to emit `reflection.proposal` events. If no `reflection.proposal` events are present in the stream, run the legacy harvest across `.design/learnings/*.md` and `.design/intel/` exactly as before - both paths produce the same Proposals section format.
-## Capability-gap pattern scan (Phase 29 Plan 02)
+## Capability-gap pattern scan
-During the reflection pass, also run the capability-gap pattern scan to detect recurring patterns lacking a dedicated executable owner. The scan emits `capability_gap` events with `source: "reflector_pattern"` for Plan 29-03 to aggregate.
+During the reflection pass, also run the capability-gap pattern scan to detect recurring patterns lacking a dedicated executable owner. The scan emits `capability_gap` events with `source: "reflector_pattern"` for downstream aggregation.
 ```
 node -e "console.log(JSON.stringify(require('./scripts/lib/reflector/capability-gap-scan.cjs').runCapabilityGapScan(), null, 2))"
 ```
-The scan reads three signal sources: `.design/intel/*.md` `Touches:` clusters, `.design/telemetry/posterior.json` high-usage arms with no specialized agent, and recent `.design/gep/events.jsonl` decision sequences. MCP-probe failures (`outcome === 'connection-error'`, `agent === 'mcp-probe'`, or `mcp_probe: true`) do NOT trigger gap events (CONTEXT D-08). See @skills/reflect/procedures/capability-gap-scan.md for the full contract.
+The scan reads three signal sources: `.design/intel/*.md` `Touches:` clusters, `.design/telemetry/posterior.json` high-usage arms with no specialized agent, and recent `.design/gep/events.jsonl` decision sequences. MCP-probe failures (`outcome === 'connection-error'`, `agent === 'mcp-probe'`, or `mcp_probe: true`) do NOT trigger gap events. See @skills/reflect/procedures/capability-gap-scan.md for the full contract.
 Cite the returned `emittedEventIds` in the run summary under a `## Capability gaps emitted` heading. The threshold knob is `reflector.capability_gap_threshold` in `.design/config.json` (default `N=3`, integer ≥ 1).
@@ -54,10 +54,10 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 Minimum expected inputs (skip gracefully if absent, note what's missing):
 - `.design/STATE.md` - cycle identity, decisions, session history
 - `.design/DESIGN-VERIFICATION.md` - cycle outcome scores + gaps
-- `.design/learnings/*.md` - structured learnings from Phase 10 extract
-- `.design/telemetry/costs.jsonl` - per-agent-spawn cost data (Phase 10.1)
-- `.design/agent-metrics.json` - aggregated agent performance data (Phase 10.1)
-- `.design/learnings/question-quality.jsonl` - discussant answer quality log (Phase 11)
+- `.design/learnings/*.md` - structured learnings from extract
+- `.design/telemetry/costs.jsonl` - per-agent-spawn cost data
+- `.design/agent-metrics.json` - aggregated agent performance data
+- `.design/learnings/question-quality.jsonl` - discussant answer quality log
 - `.design/cycles/<slug>/CYCLE-SUMMARY.md` - if present
 ## Output
@@ -66,13 +66,13 @@ Before writing any `.design/` artifact, resolve the main repo root via `scripts/
 Write `.design/reflections/<cycle-slug>.md`. If `--dry-run` is set in the spawning prompt, print proposals to stdout only - do not write the file.
-If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Plan 29-03 reads these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.
+If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Downstream consumers read these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.
 Terminate with `## REFLECTION COMPLETE`.
 ## Reflection Sections
-Write these sections in order. If source data is missing, write the section heading and a single note: "Source not found - requires <phase-N> artifacts."
+Write these sections in order. If source data is missing, write the section heading and a single note: "Source not found - requires upstream artifacts."
 ---
@@ -94,7 +94,7 @@ After listing standard surprises, apply the **Four Principles Checks** from `ref
 Scan STATE.md `<decisions>` block for D-XX codes. Cross-reference `.design/learnings/` files from prior cycles if present. Flag decisions that: (a) appeared in multiple sessions of the same cycle, or (b) appear under the same keyword in learnings from ≥2 prior cycles. These are candidates for `reference/` additions.
-**Per-author patterns (Phase 40, team mode).** When decisions carry the `[author= co-author=]` attribution suffix (see `reference/multi-author-model.md`), parse it with `scripts/lib/collab/attribution.cjs` (`parseDecisionsBlock` + `groupByAuthor`) and add a brief **Per-author patterns** sub-note: who locks decisions early, whose decisions get reverted or unlocked most, and any author whose decisions cluster around a recurring keyword. Skip silently when no decision is attributed (single-author projects).
+**Per-author patterns (team mode).** When decisions carry the `[author= co-author=]` attribution suffix (see `reference/multi-author-model.md`), parse it with `scripts/lib/collab/attribution.cjs` (`parseDecisionsBlock` + `groupByAuthor`) and add a brief **Per-author patterns** sub-note: who locks decisions early, whose decisions get reverted or unlocked most, and any author whose decisions cluster around a recurring keyword. Skip silently when no decision is attributed (single-author projects).
 ### 3. Agent Performance
@@ -122,29 +122,29 @@ Read `.design/telemetry/costs.jsonl` (if exists). Aggregate per agent:
 - Sustained underspend: < 40% of allocation for ≥3 cycles → `[BUDGET]` proposal to lower cap
 - Consistent cap breaches: `cap_hit: true` ≥3 times → `[BUDGET]` proposal
-If `.design/budget.json` doesn't exist: note "budget.json not found - Phase 10.1 budget governance required."
+If `.design/budget.json` doesn't exist: note "budget.json not found - budget governance required."
-### 7. Cross-runtime cost arbitrage (Phase 26 - D-09)
+### 7. Cross-runtime cost arbitrage
-**Why this exists:** Phase 24 ships gdd to 14 runtimes (claude, codex, gemini, qwen, …). The same `(agent, tier)` pair can cost dramatically different amounts depending on which runtime executed the spawn - runtime-author pricing varies, and the user may already be paying for one runtime via subscription while paying per-token in another. This section surfaces those arbitrage opportunities as **structured, measurable signals** - never hand-wavy assumptions.
+**Why this exists:** gdd ships to 14 runtimes (claude, codex, gemini, qwen, …). The same `(agent, tier)` pair can cost dramatically different amounts depending on which runtime executed the spawn - runtime-author pricing varies, and the user may already be paying for one runtime via subscription while paying per-token in another. This section surfaces those arbitrage opportunities as **structured, measurable signals** - never hand-wavy assumptions.
-**Data source:** `.design/telemetry/events.jsonl` - filter entries where `type === 'cost.update'`. Each cost row is tagged with `payload.runtime` (Plan 26-05) so spawns from different runtimes are attributable apples-to-apples. The reflector reads cost events from this stream alongside Section 6's `costs.jsonl` rollup; events.jsonl is authoritative for runtime attribution.
+**Data source:** `.design/telemetry/events.jsonl` - filter entries where `type === 'cost.update'`. Each cost row is tagged with `payload.runtime` so spawns from different runtimes are attributable apples-to-apples. The reflector reads cost events from this stream alongside Section 6's `costs.jsonl` rollup; events.jsonl is authoritative for runtime attribution.
 **The rule:**
-For each `(agent, tier)` pair observed in the last 5 cycles (D-09 default window):
+For each `(agent, tier)` pair observed in the last 5 cycles (default window):
 1. Bucket cost events by `(agent, tier, runtime, cycle)` and sum within each bucket. Sum-then-average is critical: a cycle that ran 4 design-verifier spawns in claude and 1 in codex must NOT inflate claude's per-cycle average by a factor of 4. Sum the 4 spawns into one cycle-sum, then average across the cycles where the runtime appeared.
 2. Compute `avg_cost_per_cycle` per `(agent, tier, runtime)` triple, restricted to the recency window.
 3. For each pair that has ≥2 runtimes in the window, find the cheapest and most expensive runtime. Compute `delta_pct = (max_avg - min_avg) / min_avg`.
-4. If `delta_pct > 0.5` (50%, D-09 starting heuristic), emit a structured `cost_arbitrage` proposal.
+4. If `delta_pct > 0.5` (50%, starting heuristic), emit a structured `cost_arbitrage` proposal.
 **Important guardrails (failure modes the rule must avoid):**
 - **Mixed-runtime cycles must not crash or double-count.** A single cycle where some agent spawns ran in CC and others in Codex is normal - runtime attribution is per-spawn (`payload.runtime`), never per-cycle.
 - **Single-runtime-only history is silent.** If only one runtime has events for an `(agent, tier)` pair in the window, no arbitrage can be computed - emit nothing rather than a misleading "no comparison available" proposal.
 - **Zero-cost denominators are skipped.** A runtime that averaged $0 in the window would produce `delta_pct: Infinity`; skip the pair rather than emit a useless signal.
-- **The 50% threshold is a starting heuristic.** Bandit-style learning over arbitrage outcomes (was the proposal applied? did costs drop?) is **Phase 23.5+ territory** - it lives in the bandit posterior, NOT here. This section's job is to surface measurement signals; tier-selection learning is a separate data product.
+- **The 50% threshold is a starting heuristic.** Bandit-style learning over arbitrage outcomes (was the proposal applied? did costs drop?) is bandit-posterior territory - it lives in the bandit posterior, NOT here. This section's job is to surface measurement signals; tier-selection learning is a separate data product.
 **Helper:** `scripts/lib/cost-arbitrage.cjs` exports `analyze(events, options) → proposals[]` implementing the above rule deterministically. The executor agent following this skill loads `events.jsonl`, parses each line as JSON (skipping malformed lines), and passes the array of envelopes to `analyze()`. No re-derivation of the rule in prose - call the helper.
@@ -165,17 +165,17 @@ For each `(agent, tier)` pair observed in the last 5 cycles (D-09 default window
 }
 ```
-Render each `cost_arbitrage` entry into the Proposals section as a `[BUDGET]`-tagged proposal carrying the structured payload verbatim - `/gdd:apply-reflections` will route it to the runtime-routing layer (Phase 26's tier-resolver / runtime-detect) rather than to `.design/budget.json`.
+Render each `cost_arbitrage` entry into the Proposals section as a `[BUDGET]`-tagged proposal carrying the structured payload verbatim - `/gdd:apply-reflections` will route it to the runtime-routing layer (tier-resolver / runtime-detect) rather than to `.design/budget.json`.
 ---
-### 8. Bandit-arbitrage analysis (Phase 27.5 - D-10)
+### 8. Bandit-arbitrage analysis
-**Why this exists:** Phase 27.5 (v1.27.5) wired the bandit posterior + delegate dimension into production. The posterior now accumulates per-`(agent, bin, delegate, tier)` win-rates from real spawns. Once the posterior has enough data, the bandit's best-arm tier for an agent may differ from that agent's frontmatter `default-tier:` - a measurement signal that the frontmatter is stale. This section surfaces that signal as a `[FRONTMATTER]` proposal.
+**Why this exists:** The bandit posterior + delegate dimension is wired into production. The posterior accumulates per-`(agent, bin, delegate, tier)` win-rates from real spawns. Once the posterior has enough data, the bandit's best-arm tier for an agent may differ from that agent's frontmatter `default-tier:` - a measurement signal that the frontmatter is stale. This section surfaces that signal as a `[FRONTMATTER]` proposal.
 **Data sources:**
-- `.design/telemetry/posterior.json` - the bandit posterior file written by Phase 23.5's `bandit-router.cjs` + Phase 27.5-02/03's production callers. Path matches `bandit-router.cjs`'s `DEFAULT_POSTERIOR_PATH`. If the file does not exist, skip this section with note "posterior.json not found - Phase 27.5 wiring required."
+- `.design/telemetry/posterior.json` - the bandit posterior file written by `bandit-router.cjs` + production callers. Path matches `bandit-router.cjs`'s `DEFAULT_POSTERIOR_PATH`. If the file does not exist, skip this section with note "posterior.json not found - bandit wiring required."
 - `agents/*.md` - read each agent's frontmatter `default-tier:` value. The reflector already parses frontmatter in Section 3 ("Agent Performance"); reuse that parse pass and build a `{agent: defaultTier}` map keyed by the agent's `name:` field.
 **The rule:**
@@ -185,7 +185,7 @@ For each `(agent, bin)` slice in the posterior (defaulting to `delegate='none'`
 1. Compute per-tier posterior mean = `α / (α + β)` and stddev = `sqrt(αβ / ((α+β)² · (α+β+1)))`.
 2. Identify `posterior_best_tier = argmax(mean)` across the tiers present in the slice.
 3. Gates (all must hold to emit):
-    - `sum(arm.count)` across the slice's tier rows >= 3 (D-10's "3+ cycles" proxy).
+    - `sum(arm.count)` across the slice's tier rows >= 3 ("3+ cycles" proxy).
     - `(best_mean - second_best_mean) / second_best_mean >= 0.5` (50% delta heuristic).
     - `stddev(best_tier) < 0.05` (credible interval narrow enough).
     - `frontmatter[agent].default-tier !== posterior_best_tier` (the actual stale signal).
@@ -196,7 +196,7 @@ For each `(agent, bin)` slice in the posterior (defaulting to `delegate='none'`
 - **Single-tier-only history is silent.** If only one tier has been pulled for `(agent, bin)`, no comparison is possible - emit nothing rather than a misleading "winner" proposal.
 - **Wide credible intervals are silent.** Bandit posteriors are noisy early on; the 0.05 stddev gate ensures we only surface signals where the bandit is confident.
 - **The 50% threshold is a starting heuristic.** Same discipline as cost-arbitrage Section 7 - bandit-learning over which arbitrage proposals were APPLIED (and whether the posterior subsequently shifted) is a separate (future) phase.
-- **delegateFilter='none' is the v1.27.5 default.** Arbitrage analysis on the 5 peer-delegate slices is left for a future plan; current peer data is too sparse to credibly disagree with frontmatter.
+- **delegateFilter='none' is the current default.** Arbitrage analysis on the 5 peer-delegate slices is left for a future plan; current peer data is too sparse to credibly disagree with frontmatter.
 **Helper:** `scripts/lib/bandit-arbitrage.cjs` exports `analyze(posterior, options) → proposals[]` implementing the above rule deterministically. The executor agent following this skill loads the posterior via `bandit-router.loadPosterior()`, builds the `{agent: defaultTier}` map from `agents/*.md` frontmatter, and passes both to `analyze()`. No re-derivation of the rule in prose - call the helper.
@@ -221,14 +221,14 @@ Render each `bandit_arbitrage` entry into the Proposals section as a `[FRONTMATT
 ---
-### 9. Capability gaps observed (Phase 29 - D-01 / D-03)
+### 9. Capability gaps observed
-**Why this exists:** Plans 29-01 and 29-02 emit `capability_gap` events to `.design/gep/events.jsonl` whenever `/gdd:fast`, `gdd-router`, or the reflector pattern-detection pass identifies a lookup-fail with no dedicated owner. This section surfaces those events as clusters in the cycle markdown and evaluates the Stage-0 → Stage-1 gate per `reference/capability-gap-stage-gate.md`.
+**Why this exists:** Capability-gap detectors emit `capability_gap` events to `.design/gep/events.jsonl` whenever `/gdd:fast`, `gdd-router`, or the reflector pattern-detection pass identifies a lookup-fail with no dedicated owner. This section surfaces those events as clusters in the cycle markdown and evaluates the Stage-0 → Stage-1 gate per `reference/capability-gap-stage-gate.md`.
 **Data sources:**
-- `.design/gep/events.jsonl` - the Phase 22 causal event chain. Rows where `type === 'capability_gap'` (or `outcome === 'capability_gap'`) are aggregated by `payload.context_hash`.
-- `.design/config.json` (optional) - `capability_gap_gate.{K, M, stddev_threshold}` overrides. Defaults: `K=3`, `M=10`, `stddev_threshold=0.05` per D-03.
+- `.design/gep/events.jsonl` - the causal event chain. Rows where `type === 'capability_gap'` (or `outcome === 'capability_gap'`) are aggregated by `payload.context_hash`.
+- `.design/config.json` (optional) - `capability_gap_gate.{K, M, stddev_threshold}` overrides. Defaults: `K=3`, `M=10`, `stddev_threshold=0.05`.
 **The mechanism:**
@@ -247,19 +247,19 @@ node scripts/lib/reflections-cycle-writer.cjs \
 Append stdout to the cycle markdown body (after Section 8 / before the Proposals header). If `--history=<path>` is wired by a future cycle-aggregator, add the flag. For Stage 0 (this phase), per-cycle cluster aggregation alone is the deliverable - gate evaluation surfaces additively when history is present.
-**Important discipline (D-01 lock):**
+**Important discipline:**
-- This section NEVER auto-flips `capability_gap_gate.stage` or any other runtime state. The output is markdown only; the user opts in via Plan 29-05's apply-reflections extension.
-- The shim is read-only with respect to `.design/config.json`. The only state-mutating writer is the user-driven opt-in path (deferred to 29-05).
-- `evidence_refs[]` content is rendered as-is in the markdown table examples column - per the plan's threat model T-29.03-04, evidence refs are trusted-content (file:line or event-id strings from the 29-01 schema).
+- This section NEVER auto-flips `capability_gap_gate.stage` or any other runtime state. The output is markdown only; the user opts in via the apply-reflections extension.
+- The shim is read-only with respect to `.design/config.json`. The only state-mutating writer is the user-driven opt-in path.
+- `evidence_refs[]` content is rendered as-is in the markdown table examples column - evidence refs are trusted-content (file:line or event-id strings from the capability-gap schema).
-**Helper:** `scripts/lib/reflector-capability-gap-aggregator.cjs` exports `aggregateCapabilityGaps`, `renderGapsSection`, `evaluateStageGate`. The shim wraps these for invocation from the agent prompt; tests in `tests/reflector-capability-gap-aggregation.test.cjs` cover the helper directly with synthetic fixtures (D-11).
+**Helper:** `scripts/lib/reflector-capability-gap-aggregator.cjs` exports `aggregateCapabilityGaps`, `renderGapsSection`, `evaluateStageGate`. The shim wraps these for invocation from the agent prompt; tests in `tests/reflector-capability-gap-aggregation.test.cjs` cover the helper directly with synthetic fixtures.
 ---
 ## Atomic instincts
-Phase 51 adds atomic instinct units alongside the prose reflection. For each pattern you observed this cycle that is small enough to state as a single trigger plus a one-line response, emit a structured instinct unit. The narrative below stays for human reading; this section is the machine-readable twin. Both are emitted for one minor version so readers and tooling migrate together.
+Alongside the prose reflection, emit atomic instinct units. For each pattern you observed this cycle that is small enough to state as a single trigger plus a one-line response, emit a structured instinct unit. The narrative below stays for human reading; this section is the machine-readable twin. Both are emitted for one minor version so readers and tooling migrate together.
 Emit 0 to N units. Each unit follows `reference/instinct-format.md` exactly: YAML frontmatter (`id`, `trigger`, `confidence` from 0.3 to 0.9, `domain` from the format's enum, `scope`, `project_id`, `source`, `cycles_seen`, `first_seen`, `last_seen`) plus a short body. Set `source: design-reflector`. Set `confidence` from the strength of the evidence - a pattern seen once this cycle stays near 0.3 to 0.5; a pattern that recurs across prior learnings earns more. Do not exceed 0.9.
@@ -337,7 +337,7 @@ For each keyword cluster meeting threshold:
 ## Discussant Question Quality (generates [QUESTION] proposals)
-Read `.design/learnings/question-quality.jsonl` (if exists). If it doesn't exist: skip and note "question-quality.jsonl not found - requires at least one discuss session with Phase 11 discussant."
+Read `.design/learnings/question-quality.jsonl` (if exists). If it doesn't exist: skip and note "question-quality.jsonl not found - requires at least one discuss session with the discussant."
 Aggregate per `question_id` across all entries:
 - Compute: `(count_skipped + count_low) / total_asks`
@@ -355,9 +355,9 @@ For each flagged question, emit a `[QUESTION]` proposal:
 ## Budget Analysis (generates [BUDGET] proposals)
-Read `.design/telemetry/costs.jsonl` (if exists). If it doesn't exist: skip and note "costs.jsonl not found - Phase 10.1 telemetry required."
+Read `.design/telemetry/costs.jsonl` (if exists). If it doesn't exist: skip and note "costs.jsonl not found - telemetry required."
-Read `.design/budget.json` to get per-agent cap allocations. If it doesn't exist: skip budget analysis and note "budget.json not found - Phase 10.1 budget governance required."
+Read `.design/budget.json` to get per-agent cap allocations. If it doesn't exist: skip budget analysis and note "budget.json not found - budget governance required."
 Aggregate per agent across cycles:
 - **Sustained overspend**: `est_cost_usd` > (budget allocation × 1.2) in ≥3 consecutive cycles → propose raising cap

package/agents/design-research-synthesizer.md CHANGED Viewed

@@ -12,6 +12,7 @@ reads-only: false
 writes:
   - ".design/DESIGN-CONTEXT.md"
   - ".design/context-graph.json"
+delegate_to: gemini-research
 ---
 @reference/shared-preamble.md

package/agents/design-start-writer.md CHANGED Viewed

@@ -6,7 +6,7 @@ color: green
 model: haiku
 default-tier: haiku
-tier-rationale: "Formatting + light synthesis over a bounded ~3KB input; Haiku is the correct tier per Phase 10.1 D-14 (Haiku = writers/formatters with fixed schemas)."
+tier-rationale: "Formatting + light synthesis over a bounded ~3KB input; Haiku is the correct tier (writers/formatters with fixed schemas)."
 parallel-safe: always
 typical-duration-seconds: 10

package/agents/design-update-checker.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: design-update-checker
-description: Cold-path enrichment agent for /gdd:check-update --prompt. Reads .design/update-cache.json plus a release body supplied in the prompt, classifies the delta (major|minor|patch|off-cadence), and returns a 3-5-line human-friendly "what this release changes for you" summary. Does not write any file. Haiku-tier summarizer per Phase 10.1 D-14/D-18.
+description: Cold-path enrichment agent for /gdd:check-update --prompt. Reads .design/update-cache.json plus a release body supplied in the prompt, classifies the delta (major|minor|patch|off-cadence), and returns a 3-5-line human-friendly "what this release changes for you" summary. Does not write any file. Haiku-tier summarizer.
 tools: Read, Grep, Glob
 color: yellow
 model: haiku
 default-tier: haiku
-tier-rationale: "Pure summarization + classification over a bounded 500-char input; Haiku is correct tier per Phase 10.1 D-14 table (Haiku = verifiers/checkers/summarizers)"
+tier-rationale: "Pure summarization + classification over a bounded 500-char input; Haiku is correct tier for verifiers/checkers/summarizers"
 parallel-safe: always
 typical-duration-seconds: 8
 reads-only: true
@@ -30,7 +30,7 @@ You have zero session memory. One invocation = one release summarized. Everythin
 Before producing output you MUST read:
-1. `.design/update-cache.json` - canonical delta classification, `current_tag`, `latest_tag`, `changelog_excerpt` (up to 500 chars of the release body). Written by `hooks/update-check.sh` (Phase 13.3 plan 02).
+1. `.design/update-cache.json` - canonical delta classification, `current_tag`, `latest_tag`, `changelog_excerpt` (up to 500 chars of the release body). Written by `hooks/update-check.sh`.
 2. Any `release_body` string supplied in the spawning prompt context - may be fuller than the 500-char cache excerpt. Prefer it over `changelog_excerpt` when both are present.
 If `.design/update-cache.json` does not exist, return exactly:
@@ -49,7 +49,7 @@ From the spawning prompt (`/gdd:check-update --prompt` in plan 13.3-04) you rece
 |-------|------|---------|--------|
 | `current_tag` | string | `v1.0.7` | installed plugin version |
 | `latest_tag` | string | `v1.0.7.3` | GitHub Releases latest tag |
-| `delta` | enum | `major` \| `minor` \| `patch` \| `off-cadence` | pre-classified by the hot-path hook per D-10 |
+| `delta` | enum | `major` \| `minor` \| `patch` \| `off-cadence` | pre-classified by the hot-path hook |
 | `release_body` | string (markdown) | release-notes markdown, up to a few thousand chars | optional; fuller than the cached excerpt |
 When `release_body` is absent, fall back to `changelog_excerpt` from the cache. When both are missing, emit the fallback message above.
@@ -87,7 +87,7 @@ Bullets must name concrete user impact (new command, changed behavior, fixed bug
 ## Classification boundary
-Do NOT reclassify the delta. The hot-path script (`hooks/update-check.sh`) already wrote `delta` into `.design/update-cache.json` per D-10 (4-segment semver compare). Echo the incoming delta verbatim.
+Do NOT reclassify the delta. The hot-path script (`hooks/update-check.sh`) already wrote `delta` into `.design/update-cache.json` (4-segment semver compare). Echo the incoming delta verbatim.
 If the incoming delta looks wrong given the release body (e.g. body headlines a breaking change but `delta` is `patch`), note the discrepancy in a single inline line after the bullets - but do **not** override the field. Example:

package/agents/design-verifier-gate.md CHANGED Viewed

@@ -92,7 +92,7 @@ You MAY:
 ## Why this agent exists
-Per 10.1-CONTEXT decision **D-21** (Lazy Checker Spawning): "Cheap Haiku gate agents at `agents/*-gate.md` decide whether to spawn full checker. Gate agent: reads DIFF of changed files, applies heuristic (design-system paths touched? copy strings touched? token files touched?), returns `{spawn: true|false, rationale: '...'}`. If false, skip full checker, log as `lazy_skipped: true` in telemetry." This gate is the verifier-specific instance of that pattern - full `design-verifier` is an XL-size spawn and the most expensive single agent in the pipeline, so gating it behind a cheap Haiku diff-scan yields the largest single cost win in Phase 10.1.
+Lazy Checker Spawning: cheap Haiku gate agents at `agents/*-gate.md` decide whether to spawn the full checker. The gate agent reads the DIFF of changed files, applies a heuristic (design-system paths touched? copy strings touched? token files touched?), and returns `{spawn: true|false, rationale: '...'}`. If false, the full checker is skipped and logged as `lazy_skipped: true` in telemetry. This gate is the verifier-specific instance of that pattern - full `design-verifier` is an XL-size spawn and the most expensive single agent in the pipeline, so gating it behind a cheap Haiku diff-scan yields the largest single cost win for lazy-spawn savings.
 ## Record