npm - @hegemonart/get-design-done - Versions diffs - 1.19.5 → 1.20.0 - Mend

@hegemonart/get-design-done 1.19.5 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (103) hide show

package/.claude-plugin/marketplace.json +4 -4
package/.claude-plugin/plugin.json +2 -2
package/CHANGELOG.md +90 -0
package/README.md +12 -0
package/agents/design-auditor.md +12 -0
package/agents/design-discussant.md +14 -0
package/agents/design-reflector.md +23 -0
package/connections/connections.md +3 -0
package/connections/figma.md +2 -0
package/connections/gdd-state.md +186 -0
package/hooks/budget-enforcer.ts +716 -0
package/hooks/context-exhaustion.ts +251 -0
package/hooks/gdd-read-injection-scanner.ts +172 -0
package/hooks/hooks.json +3 -3
package/package.json +29 -7
package/reference/authority-feeds.md +4 -2
package/reference/checklists.md +30 -0
package/reference/component-authoring.md +184 -0
package/reference/config-schema.md +2 -2
package/reference/emotional-design.md +124 -0
package/reference/error-recovery.md +58 -0
package/reference/first-principles.md +89 -0
package/reference/heuristics.md +70 -0
package/reference/motion-advanced.md +192 -3
package/reference/registry.json +28 -0
package/reference/schemas/budget.schema.json +42 -0
package/reference/schemas/events.schema.json +55 -0
package/reference/schemas/generated.d.ts +419 -0
package/reference/schemas/iteration-budget.schema.json +36 -0
package/reference/schemas/mcp-gdd-state-tools.schema.json +89 -0
package/reference/schemas/rate-limits.schema.json +31 -0
package/reference/shared-preamble.md +10 -0
package/scripts/aggregate-agent-metrics.ts +282 -0
package/scripts/codegen-schema-types.ts +149 -0
package/scripts/lib/error-classifier.cjs +232 -0
package/scripts/lib/error-classifier.d.cts +44 -0
package/scripts/lib/event-stream/emitter.ts +88 -0
package/scripts/lib/event-stream/index.ts +154 -0
package/scripts/lib/event-stream/types.ts +127 -0
package/scripts/lib/event-stream/writer.ts +154 -0
package/scripts/lib/gdd-errors/classification.ts +124 -0
package/scripts/lib/gdd-errors/index.ts +218 -0
package/scripts/lib/gdd-state/gates.ts +216 -0
package/scripts/lib/gdd-state/index.ts +167 -0
package/scripts/lib/gdd-state/lockfile.ts +232 -0
package/scripts/lib/gdd-state/mutator.ts +574 -0
package/scripts/lib/gdd-state/parser.ts +523 -0
package/scripts/lib/gdd-state/types.ts +179 -0
package/scripts/lib/iteration-budget.cjs +205 -0
package/scripts/lib/iteration-budget.d.cts +32 -0
package/scripts/lib/jittered-backoff.cjs +112 -0
package/scripts/lib/jittered-backoff.d.cts +38 -0
package/scripts/lib/lockfile.cjs +177 -0
package/scripts/lib/lockfile.d.cts +21 -0
package/scripts/lib/prompt-sanitizer/index.ts +435 -0
package/scripts/lib/prompt-sanitizer/patterns.ts +173 -0
package/scripts/lib/rate-guard.cjs +365 -0
package/scripts/lib/rate-guard.d.cts +38 -0
package/scripts/mcp-servers/gdd-state/schemas/add_blocker.schema.json +67 -0
package/scripts/mcp-servers/gdd-state/schemas/add_decision.schema.json +68 -0
package/scripts/mcp-servers/gdd-state/schemas/add_must_have.schema.json +68 -0
package/scripts/mcp-servers/gdd-state/schemas/checkpoint.schema.json +51 -0
package/scripts/mcp-servers/gdd-state/schemas/frontmatter_update.schema.json +62 -0
package/scripts/mcp-servers/gdd-state/schemas/get.schema.json +51 -0
package/scripts/mcp-servers/gdd-state/schemas/probe_connections.schema.json +75 -0
package/scripts/mcp-servers/gdd-state/schemas/resolve_blocker.schema.json +66 -0
package/scripts/mcp-servers/gdd-state/schemas/set_status.schema.json +47 -0
package/scripts/mcp-servers/gdd-state/schemas/transition_stage.schema.json +70 -0
package/scripts/mcp-servers/gdd-state/schemas/update_progress.schema.json +58 -0
package/scripts/mcp-servers/gdd-state/server.ts +288 -0
package/scripts/mcp-servers/gdd-state/tools/add_blocker.ts +72 -0
package/scripts/mcp-servers/gdd-state/tools/add_decision.ts +89 -0
package/scripts/mcp-servers/gdd-state/tools/add_must_have.ts +113 -0
package/scripts/mcp-servers/gdd-state/tools/checkpoint.ts +60 -0
package/scripts/mcp-servers/gdd-state/tools/frontmatter_update.ts +91 -0
package/scripts/mcp-servers/gdd-state/tools/get.ts +51 -0
package/scripts/mcp-servers/gdd-state/tools/index.ts +51 -0
package/scripts/mcp-servers/gdd-state/tools/probe_connections.ts +73 -0
package/scripts/mcp-servers/gdd-state/tools/resolve_blocker.ts +84 -0
package/scripts/mcp-servers/gdd-state/tools/set_status.ts +54 -0
package/scripts/mcp-servers/gdd-state/tools/shared.ts +194 -0
package/scripts/mcp-servers/gdd-state/tools/transition_stage.ts +80 -0
package/scripts/mcp-servers/gdd-state/tools/update_progress.ts +81 -0
package/scripts/validate-frontmatter.ts +114 -0
package/scripts/validate-schemas.ts +401 -0
package/skills/brief/SKILL.md +15 -6
package/skills/design/SKILL.md +31 -13
package/skills/explore/SKILL.md +41 -17
package/skills/health/SKILL.md +15 -4
package/skills/optimize/SKILL.md +3 -3
package/skills/pause/SKILL.md +16 -10
package/skills/plan/SKILL.md +33 -17
package/skills/progress/SKILL.md +15 -11
package/skills/resume/SKILL.md +19 -10
package/skills/settings/SKILL.md +11 -3
package/skills/todo/SKILL.md +12 -3
package/skills/verify/SKILL.md +65 -29
package/hooks/budget-enforcer.js +0 -329
package/hooks/context-exhaustion.js +0 -127
package/hooks/gdd-read-injection-scanner.js +0 -39
package/scripts/aggregate-agent-metrics.js +0 -173
package/scripts/validate-frontmatter.cjs +0 -68
package/scripts/validate-schemas.cjs +0 -242

package/reference/component-authoring.md ADDED Viewed

@@ -0,0 +1,184 @@
+# Component Authoring Principles
+Source: Emil Kowalski's work on Sonner, Vaul, and cmdk — synthesised from his published writing and talks. See also: `reference/framer-motion-patterns.md`, `reference/motion-advanced.md`.
+Use this file when authoring, reviewing, or auditing UI components. The 6 principles apply as a lens during code review and design verification. Each principle has a grep-able audit signal.
+---
+## The 6 Principles
+### P-01: Minimal API Surface
+> Expose only what the consumer needs. Every prop is a contract you must maintain forever.
+A component with the right API surface works in 1 line for 80% of cases and 3 lines for 95% of cases. A component that requires 7 props for basic usage has too much surface.
+**Audit signal:**
+```bash
+# Count required (non-optional) props in a component interface
+grep -E "^\s+\w+: " src/components/Button.tsx | grep -v "?" | wc -l
+```
+**Thresholds:**
+- ≤5 props total: excellent
+- 6–9 props total: acceptable if logically grouped
+- ≥10 props: flag for decomposition
+**Pattern — variant over prop explosion:**
+```tsx
+// BAD — prop explosion
+<Button color="blue" size="md" rounded={true} shadow={true} uppercase={false} />
+// GOOD — variant collapses 5 props to 1
+<Button variant="primary" size="md" />
+```
+---
+### P-02: Composability Over Configuration
+> Components should compose, not configure. Accepting `backgroundColor` or `textColor` is a sign the abstraction boundary is wrong.
+The right abstraction lets consumers build what they need by combining small pieces, not by passing increasingly specific configuration. Slot-based APIs > configuration-object APIs.
+**Audit signal:**
+```bash
+# Configuration props that should be design tokens instead
+grep -rE "(backgroundColor|textColor|borderRadius|fontSize)=" src/components/ --include="*.tsx"
+```
+**Pattern — slot composition:**
+```tsx
+// BAD — too much configuration
+<Card title="Hello" subtitle="World" icon="user" rightContent={<Badge />} />
+// GOOD — slot composition; parent controls layout
+<Card>
+  <Card.Header>
+    <Card.Icon><UserIcon /></Card.Icon>
+    <Card.Title>Hello</Card.Title>
+    <Card.Subtitle>World</Card.Subtitle>
+    <Badge />
+  </Card.Header>
+</Card>
+```
+---
+### P-03: Sensible Defaults
+> The zero-config case should work and look correct. Options are for exceptions.
+A component with sensible defaults doesn't require the consumer to know its internals. The defaults encode the design system's opinion about what "normal" looks like.
+**Audit signal:**
+```bash
+# Props with no default values (every prop is required = red flag)
+grep -E "^\s+\w+: " src/components/Toast.tsx | grep -v "?" | wc -l
+```
+**The Sonner model:** `<Toaster />` with zero props renders a toast system that follows the OS color scheme, positions correctly on all viewports, stacks properly, and auto-dismisses at a sensible duration. All options exist — but the zero-prop case works.
+**Anti-pattern:** Required props for things with a logical default (e.g., `position` on a modal that should always default to `center`).
+---
+### P-04: Animation as State Communication
+> Transitions communicate state change. They are not decoration. An animation that fires without a state change is noise.
+Every motion in a component should answer: "What did the system just do?" If the animation doesn't have a clear answer, remove it.
+**Audit signal:**
+```bash
+# Animations not tied to state change (decorative loops = red flag)
+grep -rE "animate.*loop|animation.*infinite|keyframes.*repeat" src/components/ --include="*.tsx"
+```
+**The Sonner model:**
+- Toast enters → communicates: "event occurred"
+- Toast stacks → communicates: "multiple events queued"
+- Toast exits → communicates: "event acknowledged or expired"
+- No animation fires without a state change triggering it
+**Thresholds:**
+- All animations tied to state: excellent
+- ≤1 decorative animation (e.g., a loading shimmer): acceptable
+- ≥2 decorative animations: flag for review
+See `reference/motion-advanced.md` for implementation patterns.
+---
+### P-05: Accessibility Before Visuals
+> If a component isn't accessible by keyboard and screen reader, it isn't done. Accessibility is not a post-processing step.
+Build the ARIA contract before the visual. The visual is a rendering of the semantic layer, not the other way around.
+**Audit signal:**
+```bash
+# Interactive divs/spans without ARIA role (missing semantic contract)
+grep -rE "<(div|span)[^>]*onClick" src/components/ --include="*.tsx" | grep -v "role="
+```
+**Required ARIA contracts by component type:**
+| Component | Required attributes |
+|---|---|
+| Dialog / Modal | `role="dialog"`, `aria-modal="true"`, `aria-labelledby` |
+| Combobox / Select | `role="combobox"`, `aria-expanded`, `aria-controls` |
+| Menu | `role="menu"`, `role="menuitem"`, keyboard trap |
+| Toast / Alert | `role="status"` (polite) or `role="alert"` (assertive) |
+| Toggle / Switch | `role="switch"`, `aria-checked` |
+| Tabs | `role="tablist"`, `role="tab"`, `role="tabpanel"`, `aria-selected` |
+| Tooltip | `role="tooltip"`, `aria-describedby` on trigger |
+---
+### P-06: Edge Case Honesty
+> Document the cases where the component fails. If you know it breaks with strings longer than 30 characters, say so.
+Undocumented edge cases become production bugs. A component spec that acknowledges failure modes is more trustworthy than one that claims universal correctness.
+**Audit signal:**
+```bash
+# Look for documented edge cases
+grep -rE "// KNOWN:|// EDGE:|// LIMIT:" src/components/ --include="*.tsx"
+```
+**Template for component edge case documentation:**
+```tsx
+// KNOWN: Toast title truncates at ~80 chars on 320px viewport; use short titles
+// KNOWN: Stacking > 5 toasts simultaneously is unsupported; excess toasts are queued
+// EDGE: If `duration` is set to Infinity, a visible dismiss button is required
+```
+The presence of these comments is a quality signal, not a code smell. It means the author thought about the boundary conditions.
+---
+## Lens Application in Audits
+When auditing a component, apply these six principles as a checklist:
+| Principle | Pass condition | Fail condition |
+|---|---|---|
+| P-01 Minimal API | ≤9 props; zero-config case works | ≥10 props; required props for things with defaults |
+| P-02 Composability | Slot-based composition; no style props | `backgroundColor`/`textColor` props present |
+| P-03 Defaults | Zero-prop case renders correctly | Most props required for basic usage |
+| P-04 Animation | All motion tied to state change | Decorative loops or orphaned animations present |
+| P-05 Accessibility | ARIA contract complete; keyboard navigable | `onClick` on `div`; missing `role`; no keyboard trap |
+| P-06 Edge honesty | Known limits documented with `// KNOWN:` | No documentation of failure modes |
+---
+## Wiring
+**design-auditor:** Apply as an optional sub-check within Pillar 7 (Micro-Polish) for component-heavy UIs. Cite principle ID (P-01 through P-06) in findings.
+**design-discussant:** In `--spec` mode, ask one component-authoring question per component under review: "For [ComponentName]: does the API surface expose only what consumers need, or are there configuration props that reveal implementation details?"
+**design-verifier:** When verifying component-library phases, include P-01 through P-06 in the must-have checklist as a component quality gate.

package/reference/config-schema.md CHANGED Viewed

@@ -185,7 +185,7 @@ If `.design/budget.json` is missing when any `/gdd:*` command runs, `scripts/boo
 ## .design/telemetry/costs.jsonl + .design/agent-metrics.json (Phase 10.1)
-Phase 10.1 introduces two measurement artifacts written by `hooks/budget-enforcer.js` (PreToolUse on `Agent` spawns) and `scripts/aggregate-agent-metrics.js` (detached child of the hook + refresh step of `/gdd:optimize`). Both files live under the gitignored `.design/` directory — they are local session state, not committed.
+Phase 10.1 introduces two measurement artifacts written by `hooks/budget-enforcer.js` (PreToolUse on `Agent` spawns) and `scripts/aggregate-agent-metrics.ts` (detached child of the hook + refresh step of `/gdd:optimize`). Both files live under the gitignored `.design/` directory — they are local session state, not committed.
 ### .design/telemetry/costs.jsonl
@@ -223,7 +223,7 @@ Append-only ledger. One JSON object per line. Written by `hooks/budget-enforcer.
 ### .design/agent-metrics.json
-Per-agent aggregate derived from `costs.jsonl` by `scripts/aggregate-agent-metrics.js`. Written atomically via tmp-file + rename. Overwritten in full on every refresh — not append-only. Consumers should treat it as a snapshot.
+Per-agent aggregate derived from `costs.jsonl` by `scripts/aggregate-agent-metrics.ts`. Written atomically via tmp-file + rename. Overwritten in full on every refresh — not append-only. Consumers should treat it as a snapshot.
 **Schema:**

package/reference/emotional-design.md ADDED Viewed

@@ -0,0 +1,124 @@
+# Emotional Design — Norman's Three Levels
+Source: Don Norman, *Emotional Design: Why We Love (or Hate) Everyday Things* (2004). See also: `jnd.org`.
+Use this file as a **cross-cutting scoring lens** in design audits and reflections. The three levels apply simultaneously to every design decision — they are not sequential stages.
+---
+## The Three Levels
+### Visceral Level
+> The immediate, pre-cognitive, automatic reaction to sensory input.
+The visceral level operates before the user thinks. It is driven by appearance, proportion, colour, texture — the aesthetic surface. A user who says "I don't know why, but this feels cheap" is responding at the visceral level.
+**What it governs:**
+- First impression within 50ms of page load (aesthetic-usability effect)
+- Color palettes and their emotional valence
+- Typography weight, roundness, and whitespace generosity
+- Illustration style, photography tone, iconography personality
+- Whether motion feels fluid or mechanical
+**Audit signals:**
+- Does the visual system convey the intended emotional register within 3 seconds?
+- Are there conflicting visceral signals? (e.g., playful illustration + harsh red error banners)
+- Does the `style-vocabulary.md` aesthetic type match the product's emotional promise?
+**Scoring rubric (visceral):**
+| Score | Evidence |
+|---|---|
+| 4 | Emotional register clear within 3s; no conflicting signals; references a named design authority |
+| 3 | Clear emotional intent; 1–2 minor conflicts (e.g., one off-brand icon) |
+| 2 | Ambiguous emotional register; mixed signals across surfaces |
+| 1 | No discernible emotional intent; generic or template appearance |
+---
+### Behavioral Level
+> The experience of use — whether actions feel controllable, predictable, and rewarding.
+The behavioral level is what UX heuristics primarily address. It covers usability, feedback, error recovery, and responsiveness. A user who says "This is frustrating to use" is responding at the behavioral level.
+**What it governs:**
+- Interaction feedback (loading states, error messages, success confirmations)
+- Control and reversibility (undo, cancel, back navigation)
+- Response latency — the Doherty Threshold: feedback within 400ms
+- Error prevention and recovery
+- Learnability and consistency across screens
+**Audit signals:**
+- Can the user always tell what the system is doing? (H-01 Visibility)
+- Are errors expressed as human problems with solutions? (H-09 Error Recovery)
+- Is every action reversible within 5 seconds?
+- Does the system respond within 400ms (Doherty Threshold)?
+**Scoring rubric (behavioral):**
+| Score | Evidence |
+|---|---|
+| 4 | All states visible; errors human + actionable; Doherty Threshold met; all destructive actions reversible |
+| 3 | Most states covered; 1–2 feedback gaps that don't block task completion |
+| 2 | Notable feedback gaps; some irreversible actions without warning |
+| 1 | System status invisible; errors developer-facing; no undo for destructive actions |
+---
+### Reflective Level
+> The conscious, post-use evaluation — meaning, narrative, and self-image.
+The reflective level is where brand identity, storytelling, and pride of ownership live. A user who says "I love showing this tool to colleagues" is responding at the reflective level. This level takes the longest to build and the longest to repair when damaged.
+**What it governs:**
+- Whether the product aligns with the user's self-image
+- Brand narrative consistency across all touchpoints
+- Delight and surprise moments — not gimmicks; earned moments
+- The **Peak** moment in the Peak-End Rule: the highest positive moment in the flow
+- Long-term loyalty and word-of-mouth referral
+**Audit signals:**
+- Is there an identifiable "peak" moment in the primary user flow?
+- Does the brand voice (from `brand-voice.md`) carry through to microcopy and empty states?
+- Is there a designed completion state that communicates personality?
+- Does the product give users a story to tell? (e.g., a completion screen, an achievement, a shareable output)
+**Scoring rubric (reflective):**
+| Score | Evidence |
+|---|---|
+| 4 | Identifiable designed peak moment; brand voice consistent from entry to completion; users have a story to tell |
+| 3 | Brand voice present in most surfaces; peak moment implicit but not deliberately designed |
+| 2 | Brand voice inconsistent; no designed peak; product is functional but forgettable |
+| 1 | Generic experience; no emotional arc; could be any product in the category |
+---
+## Cross-Cutting Lens Application
+Apply this lens as a **secondary overlay** after scoring the primary audit pillars. For each of the three levels:
+1. Identify 1–2 evidence items from the primary audit (e.g., Pillar 3 Color → Visceral Level)
+2. Note conflicts between levels (e.g., Behavioral score 4 but Visceral score 1 = technically functional but aesthetically repellent)
+3. Flag the weakest level as the highest-leverage improvement opportunity
+**Common cross-level conflict patterns:**
+| Conflict pattern | Diagnosis | Remedy |
+|---|---|---|
+| High behavioral, low visceral | Technically usable but aesthetically generic | Audit against `style-vocabulary.md`; commit to a stronger aesthetic type |
+| High visceral, low behavioral | Beautiful but broken | Fix H-01, H-09 violations first — UX before aesthetics |
+| High visceral + behavioral, low reflective | Polished but forgettable | Design a peak moment; review `brand-voice.md` emotional arc |
+---
+## Wiring
+**design-auditor:** After pillar scoring, apply emotional-design lens as a cross-cutting overlay. Add `## Emotional Design Overlay` section to `DESIGN-AUDIT.md` with scores for all three levels and any cross-level conflict notes.
+**design-reflector:** In Section 1 (What Surprised Us), flag if visceral vs behavioral scores diverge by ≥2 points — this is a leading indicator of the "beautiful but broken" pattern.
+**design-discussant:** In `--spec` mode, include one reflective-level confidence-scored question: "What story does this product help the user tell about themselves?"

package/reference/error-recovery.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Error recovery
+This is the recovery-action protocol for low-level errors inside the GDD pipeline. It sits on top of `scripts/lib/error-classifier.cjs` (Plan 20-14) and references the rate-guard, jittered-backoff, and iteration-budget primitives.
+## Recovery protocol
+On `status=413` or `context_overflow`, re-emit with compressed context (drop oldest non-system turns, target 50% reduction, retry once).
+On `status=429`, consult `scripts/lib/rate-guard.cjs` → `blockUntilReady(provider)` before retry.
+On network-transient (5xx, ECONNRESET), use jittered backoff (`scripts/lib/jittered-backoff.cjs`); max 3 retries.
+On auth-error, surface to user — do not retry.
+## Recovery-action table
+The `FailoverReason` enum in `scripts/lib/error-classifier.cjs` has eight values. Each row below is the canonical recovery action for one of those values. The classifier's `suggestedAction` field returns a one-liner drawn from this table; this doc is the authoritative long form.
+| FailoverReason      | Retryable | Action                                                                                                                                                     |
+| ------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `rate_limited`      | yes       | Call `rate-guard.ingestHeaders(provider, response.headers)` to record the rate-limit signal, then `rate-guard.blockUntilReady(provider)` before retrying. The blocker waits until `resetAt` on disk — synchronized siblings (watch-authorities + update-check) therefore share the backoff boundary. After the block returns, retry with jittered backoff at attempt 0. |
+| `context_overflow`  | yes       | Compress context — drop the oldest non-system turns (or the oldest attachments) targeting roughly 50 % token reduction. Retry **once** with the compressed payload. If the retry also raises `context_overflow`, escalate to the user as an unrecoverable block — further compression destroys information. |
+| `auth_error`        | no        | Surface the error to the user with actionable text: which credential, which provider, and the renewal path (OAuth re-auth URL, API-key environment variable, etc.). Do not retry automatically — a loop would just multiply the failure. |
+| `network_transient` | yes       | Retry with `scripts/lib/jittered-backoff.cjs` — `await sleep(attempt)` inside a bounded loop. Cap at 3 attempts before giving up. When retries exhaust, reclassify as `network_permanent` and surface to the user. |
+| `network_permanent` | no        | Surface to user. The endpoint is wrong, DNS is broken, or the resource was removed. A retry without operator action will just re-fail. |
+| `tool_not_found`    | no        | Surface to user. Either the tool name drifted (common for MCP servers whose prefixes change across sessions) or the MCP is not registered. Reprobe via the connection's probe sequence before retrying anything. |
+| `validation`        | no        | Surface the validation detail to the caller. Do not retry the same input — 4xx is the server saying "your payload is wrong". Fixing the payload is caller work. |
+| `unknown`           | no        | Surface the raw error to the user. Do not retry — we can't tell whether it's safe. Add a telemetry row so we can tighten the classifier over time. |
+## Integration points
+| Caller                    | When to classify                             | What to do with `reason`                                                                                           |
+| ------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
+| `hooks/budget-enforcer.ts` | pre-spawn rate-guard check (Plan 20-14)      | If upstream state already shows `rate_limited`, emit `decision: 'rate-limited'` and short-circuit before any spawn. |
+| Figma MCP probe           | live `get_metadata` call errors              | `network_transient` → jittered-backoff retry. `auth_error` → STOP with a reauth note. `rate_limited` → block then retry. |
+| Watch-authorities fetcher | per-feed HTTP fetch                          | Same policy as Figma probe; `validation` also possible on ETag stalemate (304). |
+| Update-check HTTP curl    | GitHub `releases/latest` fetch               | Silent failure by D-04 of Plan 13.3 — classify but don't surface; log and exit 0. |
+| MCP transport             | tool-call errors (gdd-state, figma, 21st-dev)| Map `tool_not_found` to a probe-reissue; map `auth_error` to STOP; retry transient classes via the caller's own loop. |
+## Fix-loop iteration interaction
+Retries consume iteration budget when paired with the Layer-B cache:
+1. On cache hit, `iteration-budget.refund(1)` preserves the iteration that would otherwise have been spent.
+2. On each actual retry that does real work (no cache hit), the caller `iteration-budget.consume(1)` before the spawn.
+3. When the budget's `remaining === 0`, further retries throw `IterationBudgetExhaustedError` and the caller must surface to user — a retry cycle has become pathological.
+This protects the "infinite fix loop" case — a blocker that regenerates after every fix — from burning unbounded context.
+## Telemetry
+Every classification result that leads to a retry or a surfaced error should append an event to `.design/telemetry/events.jsonl`:
+```json
+{ "type": "error.classified", "timestamp": "…", "sessionId": "…", "payload": { "reason": "rate_limited", "retryable": true, "caller": "figma-probe" } }
+```
+The event subtype is defined in `scripts/lib/event-stream/types.ts`. Consumers (`gdd-reflector`, dashboard) aggregate by `reason` to detect classifier drift — if `unknown` spikes, the classifier needs tightening.

package/reference/first-principles.md ADDED Viewed

@@ -0,0 +1,89 @@
+# First Principles — Invariant Design Constraints
+> These are the three invariants that no design decision can override. They are facts about human biology and cognition, not preferences or conventions. Every design choice is downstream of these three constraints.
+Use during the brief/discover stage (`design-discussant`) as a sanity check: does the proposed direction respect all three invariants? Use during verify as a reducibility check: can each element be removed without breaking the user's ability to complete their goal?
+---
+## Invariant 1: Body
+The user has a physical body with physiological limits. No amount of design skill overrides human motor physiology.
+**Principle → Code pairs:**
+| Principle | Code Pattern |
+|---|---|
+| Touch targets must accommodate tremor and fat-finger error | `min-h-[44px] min-w-[44px]` on all interactive elements |
+| Precision degrades with distance (Fitts's Law) | Destructive actions separated from primary actions by ≥24px or significantly smaller |
+| Scroll fatigue is real | Sticky headers + back-to-top anchor for content > 3 viewport heights |
+| Physical feedback confirms action | `scale(0.96)` press feedback on all clickable surfaces |
+| Eye strain limits reading distance | Body text ≥16px; line-length 60–75ch |
+**Reducibility check:** Can the user complete this task without a mouse? Without a precise pointer? On a 4-inch screen in a moving vehicle?
+---
+## Invariant 2: Attention
+Attention is finite and non-renewable per unit of time. Every element on screen competes for the same fixed budget.
+**Principle → Code pairs:**
+| Principle | Code Pattern |
+|---|---|
+| Attention capacity: 5–9 items (Miller's Law) | Navigation: ≤7 top-level items; dropdown > 7 items gets search |
+| Decision cost grows with choice count (Hick's Law) | Pricing: ≤3 tiers; feature lists: ≤4 items per group |
+| One primary action per screen | Single `.btn-primary` per viewport; all others `.btn-secondary` or `.btn-ghost` |
+| Animation hijacks attention | Motion only on state change; no decorative looping animations |
+| Progressive disclosure reduces overload | Advanced options behind disclosure trigger, not visible by default |
+**Reducibility check:** If you removed 30% of the elements on this screen, would task completion rate drop? If not, remove them.
+---
+## Invariant 3: Memory
+Working memory holds approximately 7 items and degrades rapidly within seconds. Design that requires users to remember things between screens fails.
+**Principle → Code pairs:**
+| Principle | Code Pattern |
+|---|---|
+| Recognition over recall (H-06) | Visible navigation labels, not icon-only; breadcrumbs on deep paths |
+| Context must be preserved | Multi-step forms: prior-step summary visible; form state not cleared on back-navigate |
+| Error memory fades fast | Inline validation: errors adjacent to the field that caused them |
+| Completion status reduces anxiety | Progress indicators: `Step 2 of 4`; Zeigarnik Effect — show percentage done |
+| Last action should be reversible | Undo available for destructive/irreversible actions within 5 seconds |
+**Reducibility check:** Does this screen require the user to remember something from a previous screen? If yes, surface that context inline.
+---
+## The Reducibility Test
+For any proposed design element, apply in order:
+1. **Body test** — Is this element reachable by a person with limited motor precision on a small screen?
+2. **Attention test** — Does this element earn its place by directly supporting the primary task?
+3. **Memory test** — Does this element surface context the user would otherwise need to remember?
+If an element fails all three tests, it is purely decorative. Decorative elements are not forbidden — but they are not invariant-justified, and they are the first candidates for removal when performance or clarity is at risk.
+---
+## Wiring to Design Discussant
+When `design-discussant` runs the brief stage, it prepends this invariants question before the main interview:
+> "Before we discuss the design direction, let me confirm three constraints: (1) Are there any accessibility requirements for motor-impaired users? (2) Is the primary use case on mobile or desktop — or both? (3) Are there any multi-step flows where the user must carry context between screens?"
+Answers are recorded as D-XX decisions prefixed `[Invariant]` in STATE.md.
+---
+## Relationship to Other References
+- `reference/heuristics.md` — H-01 through H-10 are the behavioral-level expression of Invariants 2 and 3
+- `reference/emotional-design.md` — Invariant 1 (Body) maps to the Visceral level; Invariants 2–3 map to the Behavioral level
+- `reference/component-authoring.md` — P-01 through P-06 are the component-level expression of all three invariants

package/reference/heuristics.md CHANGED Viewed

@@ -181,6 +181,76 @@ People **remember incomplete tasks** better than completed ones. Implications:
 ---
+## Peak-End Rule
+Users judge an experience primarily by how it felt at its most intense moment (the **peak**) and how it ended — not by the average across the whole session. Implications:
+- Design a deliberate positive peak in every primary flow (e.g., a celebratory completion screen, an instant result, a delightful empty state).
+- The **end state** of a flow matters disproportionately: the last screen the user sees shapes their memory of the whole interaction.
+- Reduce negative peaks first (error states, loading hangs) — they weigh heavier than neutral moments.
+- A long frustrating form followed by a satisfying completion screen is remembered more positively than a mildly annoying end to an otherwise smooth flow.
+---
+## Loss Aversion
+Users feel the pain of loss approximately **twice as strongly** as the pleasure of an equivalent gain (Kahneman & Tversky). Implications:
+- Frame CTAs around what users keep/save, not what they gain: "Don't lose your progress" over "Save your work."
+- Subscription cancellation flows that show what the user will lose (features, data, streak) leverage loss aversion ethically to reduce churn — but only if the stated losses are real.
+- Free trial countdowns ("3 days left") trigger loss aversion more effectively than benefit reminders.
+- Destructive action confirmations should name what is lost: "Delete this project and all 47 files?" not just "Are you sure?"
+---
+## Cognitive Load Theory
+Working memory is limited to approximately **7 ± 2 chunks** simultaneously (Miller, 1956) and degrades under conditions of stress, distraction, or novelty. Cognitive Load Theory (Sweller, 1988) distinguishes three types:
+| Type | Definition | Design implication |
+|---|---|---|
+| **Intrinsic** | Load inherent to the task itself (complexity of the domain) | Cannot be reduced; must be scaffolded |
+| **Extraneous** | Load imposed by poor design (navigation, unclear labels, visual noise) | Eliminate this completely |
+| **Germane** | Load that builds understanding (learning, pattern recognition) | Preserve and support |
+Practical rules:
+- Every element of visual noise is extraneous load — remove it.
+- New UI patterns create extraneous load (Jakob's Law); use platform conventions.
+- Chunk complex tasks into steps of ≤3 decisions each.
+- Error messages that require decoding ("Error 422") create extraneous load; plain language removes it.
+---
+## Aesthetic-Usability Effect
+Users perceive **aesthetically pleasing designs as more usable**, even when functionality is identical — and this perception persists through initial usability problems. Implications:
+- A polished visual appearance buys tolerance for minor UX rough edges in early releases.
+- This effect is strongest on first impression; it degrades over time as behavioral friction compounds.
+- The effect can mask genuine usability problems in user testing if participants rate overall satisfaction rather than task completion.
+- Do NOT use the aesthetic-usability effect as a reason to defer fixing usability problems — it explains tolerance, not satisfaction.
+---
+## Doherty Threshold
+A system that responds within **400ms** keeps users in a state of flow. Response times above 400ms cause users to shift attention, leading to a productivity drop that compounds with task complexity. Named after W.J. Doherty and R.H. Thadhani (IBM, 1982). Implications:
+- Interactive responses (button click → visible feedback) must be ≤400ms.
+- For operations > 400ms, show optimistic UI immediately and settle in the background.
+- For operations > 1000ms, use a progress indicator.
+- For operations > 10s, provide a way to continue other tasks (async notification on complete).
+- Loading spinners that appear within 400ms reduce perceived wait; those that appear late increase it.
+---
+## Flow (Csikszentmihalyi)
+Users enter a **flow state** when task difficulty matches their skill level exactly — high enough to engage, low enough to feel achievable. Flow is characterized by complete absorption, loss of time awareness, and intrinsic motivation. Implications:
+- Progressive difficulty: onboarding tasks should be trivially easy; expert tasks should provide just enough challenge.
+- Interruptions break flow permanently for that session; avoid modal interruptions in high-focus workflows.
+- Clear goals + immediate feedback are the two design levers for inducing flow (H-01, H-05).
+- Forms designed for flow: one question per screen, immediate validation, visible progress.
+- Notification design: distinguish ambient notifications (don't break flow) from critical interruptions (must break flow).
+---
 ## How to Score During Verification
 For each NNG heuristic (H-01 through H-10), rate 0–4: