npm - @hegemonart/get-design-done - Versions diffs - 1.37.2 → 1.38.5 - Mend

@hegemonart/get-design-done 1.37.2 → 1.38.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +45 -0
package/README.md +8 -0
package/SKILL.md +1 -0
package/agents/design-verifier.md +1 -1
package/agents/experiment-result-ingester.md +61 -0
package/agents/rollout-coordinator.md +71 -0
package/agents/user-research-synthesizer.md +65 -0
package/connections/connections.md +7 -1
package/connections/growthbook.md +110 -0
package/connections/hotjar.md +110 -0
package/connections/launchdarkly.md +83 -0
package/connections/maze.md +130 -0
package/connections/statsig.md +83 -0
package/connections/usertesting.md +99 -0
package/package.json +1 -1
package/reference/design-variants.md +56 -0
package/reference/registry.json +14 -0
package/reference/rollout-coordination.md +63 -0
package/reference/schemas/events.schema.json +1 -1
package/scripts/lib/ds-arms/design-arms-store.cjs +119 -0
package/scripts/lib/rollout/rollout-status.cjs +60 -0
package/skills/brief/SKILL.md +8 -0
package/skills/connections/SKILL.md +4 -4
package/skills/connections/connections-onboarding.md +58 -4
package/skills/design/SKILL.md +2 -1
package/skills/rollout-status/SKILL.md +35 -0

package/connections/launchdarkly.md ADDED Viewed

@@ -0,0 +1,83 @@
+# LaunchDarkly — Connection Specification
+This file is the connection specification for LaunchDarkly within the get-design-done pipeline. It lives in `connections/` alongside other connection specs (see [`connections/slack.md`](slack.md) for the structural sibling — an API/env-based connection with a three-value probe and degrade-to-noop).
+---
+LaunchDarkly is an **experiment-source** for the outcome-learning layer (Phase 38). GDD **reads** A/B experiment results from LaunchDarkly and feeds each variant→outcome into the `design_arms` posterior, so shipped design decisions get reinforced or discounted by what actually performed in production. GDD never runs, creates, edits, or stops experiments — it is strictly **read-only** (D-04). Reads degrade to a noop when unconfigured or disabled; outcome learning simply pauses and the pipeline never blocks.
+---
+## Setup
+**Prerequisites:** read-only access to a LaunchDarkly project's experiment results — either a LaunchDarkly **API key** (a reader/viewer-scoped token, not a writer token) **or** an SDK key, **or** the LaunchDarkly MCP if it is installed in your runtime.
+**Token (env, never committed):**
+```bash
+export LAUNCHDARKLY_API_KEY="<reader-scoped-api-or-sdk-key>"
+```
+Use the narrowest scope LaunchDarkly offers (reader/viewer). The key is a credential — never commit it (not in source, not in `.env`, not in config), never log it, rotate if exposed. GDD reads it from env only and never requests a write scope.
+**Verification:**
+```bash
+test -n "${LAUNCHDARKLY_API_KEY}" && echo "launchdarkly key present" || echo "launchdarkly key absent"
+```
+---
+## Availability Probe
+Probe is **MCP-first**, env-fallback, kill-switch-aware:
+1. If `GDD_DISABLE_LAUNCHDARKLY=1` → short-circuit to `not_configured` (treated as disabled; never probe further).
+2. Run `ToolSearch({ query: "launchdarkly" })`. If a LaunchDarkly MCP tool resolves → `launchdarkly: available`.
+3. Else check the env key: `test -n "${LAUNCHDARKLY_API_KEY}"`.
+   - Non-empty → `launchdarkly: available`
+   - Empty → `launchdarkly: not_configured`
+4. Source present (MCP or key) but a read errored at fetch time → `launchdarkly: unavailable`.
+Write the `launchdarkly` status to `.design/STATE.md` `<connections>` after probing:
+```xml
+<connections>
+launchdarkly: not_configured
+</connections>
+```
+| Value | Meaning |
+|---|---|
+| `available` | LaunchDarkly MCP resolves OR `LAUNCHDARKLY_API_KEY` set, AND not disabled |
+| `unavailable` | source present but a result read errored |
+| `not_configured` | no MCP and no `LAUNCHDARKLY_API_KEY`, or `GDD_DISABLE_LAUNCHDARKLY=1` |
+The kill-switch `GDD_DISABLE_LAUNCHDARKLY=1` forces `not_configured` regardless of MCP/key presence (mirrors the Phase 30 / 35.1 disable convention). `gsd-health` surfaces the state.
+---
+## Pipeline Integration
+LaunchDarkly contributes the **experiment-source** capability. The flow is read-only and one-directional (results in, never experiments out):
+1. The probe marks `launchdarkly: available` in `.design/STATE.md`.
+2. The experiment-result ingester (`agents/experiment-result-ingester.md`) reads completed A/B results from LaunchDarkly — variant identifiers plus their measured metric outcomes.
+3. It maps each variant to the matching `design_arms` arm and records the outcome (win / loss / lift) against that arm's posterior, so the next design decision is informed by production evidence.
+4. For each mapped result it emits an `experiment_result` event into the pipeline's event stream for downstream learning and audit.
+The ingester reads results only; it issues no experiment-creation, assignment, or mutation calls against LaunchDarkly (D-04).
+**Injectable fetch (hermetic tests):** the ingester takes an injectable `fetchImpl` (defaulting to the resolved MCP tool or global `fetch`). Tests pass a stub `fetchImpl` so `npm test` exercises the variant→outcome mapping with no real egress — no live LaunchDarkly call in CI. There is **no bundled LaunchDarkly SDK and no new dependency**; reads go through the MCP tool or the injectable `fetchImpl`.
+---
+## Fallback Behavior
+`not_configured` (no MCP, no key) or disabled (`GDD_DISABLE_LAUNCHDARKLY=1`) → the experiment-source **degrades to a noop**: the ingester is skipped, no `experiment_result` events are emitted, and the `design_arms` posterior simply does not get the outcome update this cycle. Design decisions still ship — they just rely on prior evidence instead of fresh experiment results.
+A read failure when a source *is* present → `launchdarkly: unavailable`; that cycle's ingestion is skipped (no error surfaced to the pipeline) and retried on the next probe. The ingester returns a skipped/empty result and never throws, so outcome learning is best-effort and **never blocks the pipeline** (mirrors the notify degrade-to-noop in [`connections/slack.md`](slack.md)).
+---
+Do NOT edit the connection index here — the 38 wiring plan adds the Active-Connections row + the experiment-source matrix column.

package/connections/maze.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Maze — Connection Specification
+This file is the connection specification for Maze within the get-design-done pipeline. It lives in `connections/` alongside other connection specs (the structural template is [`connections/slack.md`](slack.md)). See `connections/connections.md` for the full connection index and capability matrix (the maze row is added at the Phase 38 wiring closeout).
+---
+Maze is a **user-research source** for the Outcome-Driven Adaptation layer (Phase 38). It is the usability-testing / prototype-testing counterpart to a notification surface: GDD does not *push* to Maze — it **reads** completed test reports and their aggregate metrics (read-only), then feeds the findings into the brief as prior research. The signal answers "how did real users actually do on this flow?" — misclick rate, time-on-task, completion rate — so the design stage starts from evidence instead of assumption.
+**CRITICAL — PII guard (D-05):** every payload read from Maze MUST pass through `scripts/lib/pseudonymize.cjs` **before** it reaches any agent context. Test reports can carry participant-identifying fields (tester names, emails, free-text answers, session URLs). Pseudonymization is the single mandatory gate between Maze and the LLM; there is no path that skips it. This is read-only *plus* identity-scrubbed — the two properties together are the contract.
+---
+## Setup
+**Prerequisites:** a Maze account with at least one completed test (a usability or prototype test that has collected responses), and a Maze **API token** with read access.
+**Token (env, never committed):**
+```bash
+export MAZE_API_KEY="<your-maze-api-token>"
+```
+`MAZE_API_KEY` is a **read-only** credential — GDD uses it solely to GET indexed insights and aggregate metrics (report summaries, per-task rates). GDD never writes to Maze, never creates or modifies tests, and never pulls **raw session recordings** (video/click-stream) — only the indexed, already-aggregated insights and metrics. Treat the token like a password: never commit it (not in source, not in `.env`, not in config), never log it, rotate it if exposed. GDD reads it from env only.
+**Verification:**
+```bash
+test -n "${MAZE_API_KEY}" && echo "maze token present" || echo "maze token absent"
+```
+---
+## Availability Probe
+Maze may be reachable either via an MCP (if one is registered in the host) or via its read-only HTTP API keyed by `MAZE_API_KEY`. Probe **MCP-first**, then fall back to the env check.
+**Step M1 — MCP presence (ToolSearch-only, no tool call, no API cost):**
+```
+ToolSearch({ query: "maze", max_results: 5 })
+  → Non-empty result → maze: available
+  → Empty result     → proceed to Step M2
+```
+**Step M2 — API token check:**
+```bash
+test -n "${MAZE_API_KEY}"
+```
+- Non-empty AND not disabled → `maze: available`
+- Empty → `maze: not_configured`
+- Present but a read errored at fetch time → `maze: unavailable`
+**Kill-switch:** Maze reads are a noop when `GDD_DISABLE_MAZE=1` (env), regardless of token/MCP presence — the probe resolves to `not_configured`. `gsd-health` surfaces the state (mirrors the Phase 30 / 35.1 health-mirror pattern).
+**Write `maze` status to `.design/STATE.md` `<connections>` after probing:**
+```xml
+<connections>
+maze: not_configured
+</connections>
+```
+| Value | Meaning |
+|---|---|
+| `available` | `maze` MCP registered, OR `MAZE_API_KEY` set — AND not disabled |
+| `unavailable` | reachable but a read errored at fetch time |
+| `not_configured` | no `maze` MCP and no `MAZE_API_KEY`, or `GDD_DISABLE_MAZE=1` |
+---
+## What GDD reads
+Read-only, indexed surface only — the parity inverse of the slack.md "What GDD sends" table:
+| Surface | Read? | Pipeline use |
+|---|---|---|
+| Report summary (per-test insights) | yes | synthesized into `<prior-research>` |
+| Aggregate metrics (misclick rate, time-on-task, completion) | yes | the headline outcome signal for the brief |
+| Tester names / emails / free-text answers | only via pseudonymize | identity fields scrubbed to placeholders before any agent sees them |
+| Raw session recordings (video / click-stream) | **never** | out of scope — highest-PII surface, not fetched |
+## Pipeline Integration
+Maze is a **user-research** input to the discover/plan boundary, consumed only when `maze: available`. The flow is strictly one-directional and identity-scrubbed:
+```
+Maze read (read-only)
+  reports + metrics:
+    - misclick rate      (wrong first/early taps per task)
+    - time-on-task       (median seconds to complete)
+    - completion rate    (% of testers who reached the goal)
+  →  scripts/lib/pseudonymize.cjs   ← MANDATORY, runs FIRST, before any agent sees the payload
+  →  agents/user-research-synthesizer.md
+  →  brief-grade insights (ranked, de-duplicated, cited)
+  →  the brief's <prior-research> block
+```
+1. **Read** — fetch indexed report summaries + aggregate metrics for the relevant test(s) via the MCP tool (verify the name via ToolSearch) or the read-only API. Never fetch raw recordings.
+2. **Pseudonymize FIRST** — pass the *entire* fetched payload through `scripts/lib/pseudonymize.cjs` (the Phase 30 R1..R7 rule set: git-identity, paths, hostname, repo-origin, env-values, emails, IPs). Names, emails, and identity-correlatable strings in tester free-text become placeholders. **No payload reaches the synthesizer un-pseudonymized — this is the single egress chokepoint, the same discipline `connections/slack.md` applies via `redact.cjs`.**
+3. **Synthesize** — [`agents/user-research-synthesizer.md`](../agents/user-research-synthesizer.md) turns the scrubbed metrics + report text into brief-grade insights: which tasks failed, where misclicks cluster, which steps are slow, framed against the method guidance in `reference/user-research.md` (so a low-n test is reported as directional, not as a statistically reliable rate).
+4. **Inject** — the synthesized insights land in the brief's `<prior-research>` block, so the design stage opens with observed evidence ("testers misclicked the secondary CTA 41% of the time; completion held at 78%") rather than assumption.
+Honest-framing note: pseudonymization reduces identity correlation, it does not eliminate it (writing style and free-text content can still re-identify). Findings are presented as **directional research signal**, never as anonymized fact, and sample size is always carried through from `reference/user-research.md`'s sample-size heuristics.
+---
+## Fallback Behavior
+Maze is an **enhancement, never a gate** (degrade-to-noop, D-03). When `maze: not_configured`, `maze: unavailable`, or `GDD_DISABLE_MAZE=1`:
+- The discover/plan boundary proceeds with **no** `<prior-research>` Maze block — the brief simply omits the prior-research signal (or carries other research sources if present).
+- The synthesizer is **not** invoked for Maze; nothing is fetched, nothing is pseudonymized, nothing is logged.
+- The pipeline **never blocks** on Maze availability — a missing user-research signal is a quality reduction, not an error. The design stage falls back to the assumption-driven path it would have used before any Maze data existed.
+A read that errors mid-fetch downgrades to `unavailable` and is treated exactly like `not_configured`: skip, note the absence, continue.
+---
+## PII + Privacy
+- **Pseudonymize before context — non-negotiable.** Every byte read from Maze passes through `scripts/lib/pseudonymize.cjs` before it is placed in any prompt, brief, or agent input. There is no bypass path; the synthesizer only ever receives scrubbed text. This mirrors the data-minimization + pseudonymization ethics in `reference/user-research.md` (names replaced with participant IDs, identifying details removed from quotes before sharing).
+- **Read-only — no write-back.** GDD never sends anything to Maze. The token is GET-only; no test, response, or annotation is ever created or modified.
+- **No raw recordings.** Only indexed insights and aggregate metrics are read. Session-recording video and raw click-streams are never fetched — they are the highest-risk PII surface and are out of scope by design.
+- **No PII in logs or events.** The `maze` connection emits no participant identifiers into logs, the event stream, or `.design/STATE.md`. STATE.md carries only the three-value status token (`available` / `unavailable` / `not_configured`) — never report contents. The pseudonymization replacements log is itself length-truncated so a stray un-scrubbed value cannot leak at full length.
+---
+Do NOT edit the connection index here — the 38 wiring plan adds the Active-Connections row + the experiment-source matrix column.

package/connections/statsig.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Statsig — Connection Specification
+This file is the connection specification for Statsig within the get-design-done pipeline. It lives in `connections/` alongside other connection specs (see [`connections/slack.md`](slack.md) for the structural sibling — an API/env-based connection with a three-value probe and degrade-to-noop, and [`connections/launchdarkly.md`](launchdarkly.md) for the experiment-source sibling this file mirrors).
+---
+Statsig is an **experiment-source** for the outcome-learning layer (Phase 38). GDD **reads** A/B experiment results and feature-gate / Pulse metric outcomes from Statsig and feeds each variant→outcome into the `design_arms` posterior, so shipped design decisions get reinforced or discounted by what actually performed in production. GDD never runs, creates, edits, starts, or stops experiments or gates — it is strictly **read-only** (D-04). Reads degrade to a noop when unconfigured or disabled; outcome learning simply pauses and the pipeline never blocks.
+---
+## Setup
+**Prerequisites:** read-only access to a Statsig project's experiment / Pulse results — either a Statsig **console API key** (a read-scoped key, not a server-write key) exported as `STATSIG_API_KEY`, **or** the Statsig MCP if it is installed in your runtime.
+**Token (env, never committed):**
+```bash
+export STATSIG_API_KEY="<read-scoped-console-api-key>"
+```
+Use the narrowest scope Statsig offers (a read-only console key). The key is a credential — never commit it (not in source, not in `.env`, not in config), never log it, rotate if exposed. GDD reads it from env only and never requests a write scope.
+**Verification:**
+```bash
+test -n "${STATSIG_API_KEY}" && echo "statsig key present" || echo "statsig key absent"
+```
+---
+## Availability Probe
+Probe is **MCP-first**, env-fallback, kill-switch-aware:
+1. If `GDD_DISABLE_STATSIG=1` → short-circuit to `not_configured` (treated as disabled; never probe further).
+2. Run `ToolSearch({ query: "statsig" })`. If a Statsig MCP tool resolves → `statsig: available`.
+3. Else check the env key: `test -n "${STATSIG_API_KEY}"`.
+   - Non-empty → `statsig: available`
+   - Empty → `statsig: not_configured`
+4. Source present (MCP or key) but a read errored at fetch time → `statsig: unavailable`.
+Write the `statsig` status to `.design/STATE.md` `<connections>` after probing:
+```xml
+<connections>
+statsig: not_configured
+</connections>
+```
+| Value | Meaning |
+|---|---|
+| `available` | Statsig MCP resolves OR `STATSIG_API_KEY` set, AND not disabled |
+| `unavailable` | source present but a result read errored |
+| `not_configured` | no MCP and no `STATSIG_API_KEY`, or `GDD_DISABLE_STATSIG=1` |
+The kill-switch `GDD_DISABLE_STATSIG=1` forces `not_configured` regardless of MCP/key presence (mirrors the Phase 30 / 35.1 disable convention). `gsd-health` surfaces the state.
+---
+## Pipeline Integration
+Statsig contributes the **experiment-source** capability. The flow is read-only and one-directional (results in, never experiments out):
+1. The probe marks `statsig: available` in `.design/STATE.md`.
+2. The experiment-result ingester (`agents/experiment-result-ingester.md`) reads completed experiment results and feature-gate / Pulse metric outcomes from Statsig — variant (group) identifiers plus their measured metric lift.
+3. It maps each variant to the matching `design_arms` arm and records the outcome (win / loss / lift) against that arm's posterior, so the next design decision is informed by production evidence.
+4. For each mapped result it emits an `experiment_result` event into the pipeline's event stream for downstream learning and audit.
+The ingester reads results only; it issues no experiment-creation, gate-toggle, assignment, or mutation calls against Statsig (D-04).
+**Injectable fetch (hermetic tests):** the ingester takes an injectable `fetchImpl` (defaulting to the resolved MCP tool or global `fetch`). Tests pass a stub `fetchImpl` so `npm test` exercises the variant→outcome mapping with no real egress — no live Statsig call in CI. There is **no bundled Statsig SDK and no new dependency**; reads go through the MCP tool or the injectable `fetchImpl`.
+---
+## Fallback Behavior
+`not_configured` (no MCP, no key) or disabled (`GDD_DISABLE_STATSIG=1`) → the experiment-source **degrades to a noop**: the ingester is skipped, no `experiment_result` events are emitted, and the `design_arms` posterior simply does not get the outcome update this cycle. Design decisions still ship — they just rely on prior evidence instead of fresh experiment results.
+A read failure when a source *is* present → `statsig: unavailable`; that cycle's ingestion is skipped (no error surfaced to the pipeline) and retried on the next probe. The ingester returns a skipped/empty result and never throws, so outcome learning is best-effort and **never blocks the pipeline** (mirrors the notify degrade-to-noop in [`connections/slack.md`](slack.md)). Statsig and [`connections/launchdarkly.md`](launchdarkly.md) are parity experiment-sources — the read-only contract is identical; only the metric-payload shape the ingester maps differs.
+---
+Do NOT edit the connection index here — the 38 wiring plan adds the Active-Connections row + the experiment-source matrix column.

package/connections/usertesting.md ADDED Viewed

@@ -0,0 +1,99 @@
+# UserTesting — Connection Specification
+This file is the connection specification for UserTesting within the get-design-done pipeline. It lives in `connections/` alongside other connection specs (see [`connections/slack.md`](slack.md) for the structural sibling — an API/env-based connection with a three-value probe and degrade-to-noop, and [`connections/launchdarkly.md`](launchdarkly.md) for the Phase-38 read-only source pattern).
+---
+UserTesting is a **user-research source** for the outcome-learning layer (Phase 38). GDD **reads** completed test reports and study insights from UserTesting and feeds brief-grade findings into the design brief, so design decisions are grounded in what real participants did and said. GDD never schedules, launches, edits, or stops studies — it is strictly **read-only**. Reads degrade to a noop when unconfigured or disabled; the brief simply ships without a prior-research block and the pipeline never blocks.
+**CRITICAL (PII guard, D-05): every user-research payload MUST pass through [`scripts/lib/pseudonymize.cjs`](../scripts/lib/pseudonymize.cjs) BEFORE it reaches any agent context.** Participant identities, emails, and faces/voices captured in transcripts are PII. No raw report, transcript, or recording text enters an agent prompt, an event, or a log until it has been pseudonymized. This is mandatory and non-negotiable — see the dedicated PII + Privacy section below.
+---
+## Setup
+**Prerequisites:** read-only access to a UserTesting workspace's completed reports and study insights — either a UserTesting **API key** (a reader/viewer-scoped token, not a writer/admin token) or read-only OAuth, **or** the UserTesting MCP if it is installed in your runtime.
+**Token (env, never committed):**
+```bash
+export USERTESTING_API_KEY="<reader-scoped-api-or-oauth-token>"
+```
+Use the narrowest scope UserTesting offers (reader/viewer). The key is a credential — never commit it (not in source, not in `.env`, not in config), never log it, rotate if exposed. GDD reads it from env only and never requests a write scope.
+**No raw session-replay video storage.** GDD reads **indexed insights** — text findings, tagged highlights, severity/frequency annotations — never the underlying session-replay video. Recordings are never downloaded, never cached, and never stored locally. Only the derived, pseudonymized text crosses into the pipeline.
+**Verification:**
+```bash
+test -n "${USERTESTING_API_KEY}" && echo "usertesting key present" || echo "usertesting key absent"
+```
+---
+## Availability Probe
+Probe is **MCP-first**, env-fallback, kill-switch-aware:
+1. If `GDD_DISABLE_USERTESTING=1` → short-circuit to `not_configured` (treated as disabled; never probe further).
+2. Run `ToolSearch({ query: "usertesting" })`. If a UserTesting MCP tool resolves → `usertesting: available`.
+3. Else check the env key: `test -n "${USERTESTING_API_KEY}"`.
+   - Non-empty → `usertesting: available`
+   - Empty → `usertesting: not_configured`
+4. Source present (MCP or key) but a read errored at fetch time → `usertesting: unavailable`.
+Write the `usertesting` status to `.design/STATE.md` `<connections>` after probing:
+```xml
+<connections>
+usertesting: not_configured
+</connections>
+```
+| Value | Meaning |
+|---|---|
+| `available` | UserTesting MCP resolves OR `USERTESTING_API_KEY` set, AND not disabled |
+| `unavailable` | source present but a result read errored |
+| `not_configured` | no MCP and no `USERTESTING_API_KEY`, or `GDD_DISABLE_USERTESTING=1` |
+The kill-switch `GDD_DISABLE_USERTESTING=1` forces `not_configured` regardless of MCP/key presence (mirrors the Phase 30 / 35.1 disable convention). `gsd-health` surfaces the state.
+---
+## Pipeline Integration
+UserTesting contributes the **user-research source** capability. The flow is read-only and one-directional (insights in, never studies out):
+1. The probe marks `usertesting: available` in `.design/STATE.md`.
+2. The reader pulls completed test reports / study insights from UserTesting (indexed text only — never video).
+3. **PII pseudonymization runs FIRST.** Every report payload is passed through [`scripts/lib/pseudonymize.cjs`](../scripts/lib/pseudonymize.cjs) before anything else touches it — participant names, emails, and identity-correlatable fields are scrubbed to placeholders. The raw payload is never forwarded.
+4. The pseudonymized payload is handed to [`agents/user-research-synthesizer.md`](../agents/user-research-synthesizer.md), which distills it into **brief-grade insights** — each shaped as `{ finding, frequency, severity }` (the finding text, how many participants hit it, and how impactful it was).
+5. Those insights are written into the design brief's `<prior-research>` block, so the brief carries real-participant evidence alongside design direction.
+The reader and synthesizer read insights only; they issue no study-creation, scheduling, or mutation calls against UserTesting.
+**Injectable fetch (hermetic tests):** the reader takes an injectable `fetchImpl` (defaulting to the resolved MCP tool or global `fetch`). Tests pass a stub `fetchImpl` so `npm test` exercises the report → pseudonymize → synthesize path with no real egress — no live UserTesting call in CI. There is **no bundled UserTesting SDK and no new dependency**; reads go through the MCP tool or the injectable `fetchImpl`.
+---
+## Fallback Behavior
+`not_configured` (no MCP, no key) or disabled (`GDD_DISABLE_USERTESTING=1`) → the user-research source **degrades to a noop**: the reader is skipped, no payload is fetched or pseudonymized, the synthesizer is not invoked, and the brief's `<prior-research>` block is simply omitted. Design decisions still ship — they just proceed without fresh participant evidence this cycle.
+A read failure when a source *is* present → `usertesting: unavailable`; that cycle's read is skipped (no error surfaced to the pipeline) and retried on the next probe. The reader returns a skipped/empty result and never throws, so user-research enrichment is best-effort and **never blocks the pipeline** (mirrors the notify degrade-to-noop in [`connections/slack.md`](slack.md)).
+---
+## PII + Privacy
+User-research data is the highest-sensitivity input GDD ingests. Treat every payload as PII until proven otherwise.
+- **Pseudonymize before context (mandatory).** No raw UserTesting payload — report body, transcript text, highlight note — enters an agent prompt, the synthesizer, the event stream, or any STATE/brief artifact until it has passed through [`scripts/lib/pseudonymize.cjs`](../scripts/lib/pseudonymize.cjs). Pseudonymization is the first step after the read, before any other handling. There is no bypass path.
+- **No PII in logs or events.** Participant identities, emails, and identity-correlatable fields never appear in logs, the `experiment_result`/research event stream, or error output. Only pseudonymized, brief-grade text is ever emitted downstream.
+- **Indexed insights, not raw recordings.** GDD reads derived text insights only — never session-replay video, audio, or screen captures of participants' faces/voices. Recordings are never downloaded, cached, or stored; the source of truth stays in UserTesting.
+- **Pseudonymization is not anonymization.** Identity correlation is reduced, not eliminated — side-channel signals may still re-identify. The synthesizer keeps only the `{ finding, frequency, severity }` shape it needs and discards the rest, minimizing what is retained.
+---
+Do NOT edit the connection index here — the 38 wiring plan adds the Active-Connections row + the experiment-source matrix column.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hegemonart/get-design-done",
-  "version": "1.37.2",
+  "version": "1.38.5",
   "description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
   "author": "Hegemon",
   "homepage": "https://github.com/hegemonart/get-design-done",

package/reference/design-variants.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Design Variants + the `design_arms` Outcome Loop
+How `/gdd:design --variants N` generates competing, hypothesis-tagged design variants, and how external outcomes (A/B experiments + user research) feed the `design_arms` posterior so the design stage learns **which patterns win with users** — not just which pass lint/test. The posterior math lives in `scripts/lib/ds-arms/design-arms-store.cjs`; the ingest agents are `agents/experiment-result-ingester.md` (A/B) + `agents/user-research-synthesizer.md` (research).
+---
+## The variant tag
+When `--variants N` is set (default **N = 2**, the A/B baseline), the design stage emits N competing variants, each carrying an **explicit, testable hypothesis**:
+```html
+<variant id="A" component="primary-cta" pattern="cta-bold-filled"
+         hypothesis="A bolder, filled primary CTA raises checkout conversion" />
+<variant id="B" component="primary-cta" pattern="cta-outline-secondary"
+         hypothesis="A lower-pressure outline CTA reduces accidental taps" />
+```
+- **`id`** — A, B, C… (stable within a cycle).
+- **`component`** — the `component_type` the arm is keyed on (e.g. `primary-cta`, `pricing-card`, `signup-form`).
+- **`pattern`** — a short pattern slug → hashed to `variant_pattern_hash` via `variantKey(component, pattern)`.
+- **`hypothesis`** — a falsifiable prediction in user-outcome terms (conversion, completion, error rate). This is the contract an A/B test or research finding later resolves.
+A variant without a hypothesis is not a variant — it's an opinion. The stage refuses to tag one without it.
+## The `design_arms` posterior (advisory)
+Each `(component_type, variant_pattern_hash)` is an **arm** with a Beta posterior, conservative **Beta(2, 8)** prior (posterior mean 0.2 — a pattern must EARN trust from real outcomes; the Phase 29 fairness-gate pattern). Distinct from the routing bandit's `routing_arms` (`scripts/lib/bandit-router.cjs`) — design_arms learn from **users**, not from internal pass/fail.
+**Before generation**, the design stage may consult the posterior:
+```js
+const { variantKey, pull } = require('scripts/lib/ds-arms/design-arms-store.cjs');
+const arm = pull('primary-cta', variantKey('primary-cta', 'cta-bold-filled'));
+// arm.mean = 0.70  → "cta-bold-filled has a 70% win rate — bias toward it"
+```
+**D-03 — advisory, never directive.** The posterior *biases* which patterns the stage proposes (and how it orders variants), but the **user always wins**: if the posterior favors A and the user asks for B, generate B. Surface the posterior as a note ("heads-up: pattern A has won 7/10 prior experiments"), never as a veto.
+## Closing the loop (outcome ingest)
+1. **A/B** — `experiment-result-ingester` reads a LaunchDarkly / Statsig / GrowthBook result, maps each variant to a win/lose, and calls `observe(component, hash, { won, source: 'ab' })`. Emits an `experiment_result` event (Phase 22 chain).
+2. **Research** — `user-research-synthesizer` reads UserTesting / Maze / Hotjar reports (**pseudonymized first** — D-05), extracts findings, and folds qualitative signal as `observe(..., { source: 'research', weight })`.
+3. **Dev-time** (Phase 47, later) — live-accepted variants observe with `source: 'dev_time'` under a conservative discount.
+`observe(won: true)` increments `alpha`; `won: false` increments `beta`. Over many experiments the posterior mean converges on the pattern's true win rate, and the design stage's bias tracks reality.
+## Store API (summary)
+| Function | Purpose |
+|---|---|
+| `variantKey(componentType, pattern)` | stable arm key (inline FNV-1a; dependency-free) |
+| `pull(componentType, hash)` | the arm's `{ alpha, beta, mean, count, seen }` (Beta(2,8) prior if unseen) |
+| `observe(componentType, hash, { won, weight?, source? })` | fold one outcome; persists atomically |
+| `all()` | every arm + posterior mean (for ranking) |
+Persists to `.design/telemetry/design-arms.json` (atomic write); never touches `posterior.json` (the routing bandit).

package/reference/registry.json CHANGED Viewed

@@ -979,6 +979,20 @@
       "type": "heuristic",
       "phase": 37.2,
       "description": "Phase 37.2 greenfield DS emission rules for /gdd:bootstrap-ds + agents/ds-generator.md: primary→9 OKLCH tints (native oklch(), no color library), never >2 brand colors, neutrals + semantic colors, modular type scale (ratio 1.2/1.25/1.333), 4pt/8pt spacing, radius + motion defaults, the 3 variants (conservative/balanced/bold), role-named CSS-custom-property emission + framework mapping, and button/input/card proof scaffolding. Deterministic math in scripts/lib/ds/token-scale.cjs."
+    },
+    {
+      "name": "design-variants",
+      "path": "reference/design-variants.md",
+      "type": "heuristic",
+      "phase": 38,
+      "description": "Phase 38 design-variants schema + the design_arms outcome loop: /gdd:design --variants N emits N hypothesis-tagged competing variants (<variant id component pattern hypothesis>); each (component_type, variant_pattern_hash) is a Beta(2,8) arm in scripts/lib/ds-arms/design-arms-store.cjs that learns which patterns win with USERS from A/B (experiment-result-ingester) + user-research (user-research-synthesizer) outcomes. Advisory not directive (the user always wins, D-03). Distinct from the routing bandit."
+    },
+    {
+      "name": "rollout-coordination",
+      "path": "reference/rollout-coordination.md",
+      "type": "heuristic",
+      "phase": 38.5,
+      "description": "Phase 38.5 rollout-coordination contract: the <rollout_status> STATE block (unrolled/staging-only/canary-N%/prod-100%), stuck detection (default 14 days), linear deployed_pct weighting feeding design_arms via verify_outcome (a 10%-rolled variant counts 0.1), and the rollout_started/advanced/stuck/verify_outcome events. Classifier scripts/lib/rollout/rollout-status.cjs; agent agents/rollout-coordinator.md; skill /gdd:rollout-status. Read-only — GDD never drives the rollout."
     }
   ]
 }

package/reference/rollout-coordination.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Rollout Coordination — the `<rollout_status>` contract + the verify→prod loop
+How GDD tracks a design from "PR merged" to "live for 100% of users", and feeds the **actual deployment percentage** back into the `design_arms` posterior so a variant's reward reflects how widely it was really shipped. The deterministic classifier is `scripts/lib/rollout/rollout-status.cjs`; the orchestration is `agents/rollout-coordinator.md` + `/gdd:rollout-status`. GDD **reads** the feature-flag service (via the Phase 38 LaunchDarkly/Statsig/GrowthBook connections) — it never drives the rollout.
+---
+## Rollout states
+A cycle's rollout is classified from a normalized flag state (`{ stagingEnabled, prodEnabled, prodPercent }`):
+| State | Meaning |
+|---|---|
+| `unrolled` | not in staging or prod |
+| `staging-only` | enabled in staging (or prod-enabled at 0%) — no prod traffic |
+| `canary-N%` | live to N% of prod (0 < N < 100) |
+| `prod-100%` | fully rolled out |
+`deployedPct(flagState)` returns the live prod percentage (0 when not in prod).
+## The `<rollout_status>` STATE block
+The coordinator writes one block per cycle into `.design/STATE.md`:
+```xml
+<rollout_status>
+cycle: 2026-06-checkout-redesign
+state: canary-10%
+deployed_pct: 10
+flag_service: launchdarkly
+last_changed: 2026-05-20
+stuck: false
+</rollout_status>
+```
+`state` ∈ the four states above; `deployed_pct` 0–100; `stuck` is `true` when a **partial** rollout (`staging-only`/`canary-N%`) has not advanced for ≥ the threshold.
+## Stuck detection
+`isStuck(state, daysSinceChange, threshold)` — a partial rollout that has not progressed for ≥ `threshold` days (default **14**, configurable via `.design/config.json rollout.stuck_days`). `prod-100%` and `unrolled` are never stuck. `/gdd:rollout-status` surfaces stuck cycles ("canary-10% for 18 days — advance or roll back?"). GDD **notifies**; it does not auto-advance or roll back (read-only — D-02).
+## Feeding `design_arms` (deployed_pct weighting)
+When a cycle's variant reaches prod, the coordinator folds the outcome into the `design_arms` posterior weighted by how widely it deployed (D-03, linear):
+```js
+const { deployedWeight } = require('scripts/lib/rollout/rollout-status.cjs');
+const { variantKey, observe } = require('scripts/lib/ds-arms/design-arms-store.cjs');
+observe(component, variantKey(component, pattern),
+        { won, weight: deployedWeight(deployed_pct), source: 'verify_outcome' });
+```
+`deployedWeight(pct) = pct/100` — a variant rolled to 10% contributes a 0.1-weight observation; a fully-rolled variant contributes 1.0. This keeps the bandit honest: a "win" that only reached 10% of users is weak evidence. `verify_outcome` is a **slow-loop** reward, distinct from internal lint/test signals.
+## Events (Phase 22 chain)
+The coordinator emits free-form `type` events (registered in `reference/schemas/events.schema.json`):
+- `rollout_started` — first prod exposure (`unrolled`/`staging-only` → `canary-N%`).
+- `rollout_advanced` — canary % increased, or → `prod-100%`.
+- `rollout_stuck` — a partial rollout crossed the stuck threshold.
+- `verify_outcome` — the outcome fed to `design_arms` (carries `deployed_pct` + `weight`).
+All event payloads are PII-free (cycle id, component, pattern slug, percentages — no user data).

package/reference/schemas/events.schema.json CHANGED Viewed

@@ -10,7 +10,7 @@
     "type": {
       "type": "string",
       "minLength": 1,
-      "description": "Free-form event type identifier. Pre-registered seeds: state.mutation, state.transition, stage.entered, stage.exited, hook.fired, error, capability_gap, kfm-candidate, router_pick."
+      "description": "Free-form event type identifier. Pre-registered seeds: state.mutation, state.transition, stage.entered, stage.exited, hook.fired, error, capability_gap, kfm-candidate, router_pick, verify_outcome, rollout_started, rollout_advanced, rollout_stuck."
     },
     "timestamp": {
       "type": "string",