npm - @hegemonart/get-design-done - Versions diffs - 1.32.0 → 1.33.5 - Mend

@hegemonart/get-design-done 1.32.0 → 1.33.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +57 -0
package/NOTICE +43 -5
package/README.md +13 -0
package/package.json +4 -2
package/reference/gdd-runtime-audit.md +111 -0
package/reference/gdd-threat-model.md +336 -0
package/reference/registry.json +14 -0
package/reference/schemas/pressure-scenario.schema.json +69 -0
package/scripts/lib/peer-cli/acp-client.cjs +9 -1
package/scripts/lib/peer-cli/asp-client.cjs +10 -1
package/scripts/lib/peer-cli/sanitize-env.cjs +198 -0
package/scripts/lib/redact.cjs +20 -1
package/scripts/lib/skill-behavior/runner.cjs +187 -0
package/scripts/lib/skill-behavior/stub-invoker.cjs +95 -0
package/scripts/lib/skill-behavior/telemetry.cjs +379 -0
package/scripts/lib/transports/ws.cjs +67 -3
package/sdk/mcp/gdd-state/schemas/add_blocker.schema.json +2 -0
package/sdk/mcp/gdd-state/schemas/add_decision.schema.json +1 -0
package/sdk/mcp/gdd-state/schemas/add_must_have.schema.json +1 -0
package/sdk/mcp/gdd-state/schemas/checkpoint.schema.json +1 -0
package/sdk/mcp/gdd-state/schemas/frontmatter_update.schema.json +1 -1
package/sdk/mcp/gdd-state/schemas/get.schema.json +2 -1
package/sdk/mcp/gdd-state/schemas/probe_connections.schema.json +2 -0
package/sdk/mcp/gdd-state/schemas/resolve_blocker.schema.json +1 -0
package/sdk/mcp/gdd-state/server.js +137 -48
package/sdk/mcp/gdd-state/tools/add_blocker.ts +2 -0
package/sdk/mcp/gdd-state/tools/add_decision.ts +2 -0
package/sdk/mcp/gdd-state/tools/add_must_have.ts +2 -0
package/sdk/mcp/gdd-state/tools/checkpoint.ts +2 -0
package/sdk/mcp/gdd-state/tools/frontmatter_update.ts +2 -0
package/sdk/mcp/gdd-state/tools/get.ts +2 -0
package/sdk/mcp/gdd-state/tools/probe_connections.ts +2 -0
package/sdk/mcp/gdd-state/tools/resolve_blocker.ts +2 -0
package/sdk/mcp/gdd-state/tools/set_status.ts +2 -0
package/sdk/mcp/gdd-state/tools/shared.ts +117 -7
package/sdk/mcp/gdd-state/tools/transition_stage.ts +2 -0
package/sdk/mcp/gdd-state/tools/update_progress.ts +2 -0
package/scripts/lib/cli/index.ts +0 -29
package/scripts/lib/error-classifier.cjs +0 -29
package/scripts/lib/event-stream/index.ts +0 -29
package/scripts/lib/gdd-errors/index.ts +0 -29
package/scripts/lib/gdd-state/index.ts +0 -29
package/scripts/lib/iteration-budget.cjs +0 -29
package/scripts/lib/jittered-backoff.cjs +0 -29
package/scripts/lib/lockfile.cjs +0 -29
package/scripts/mcp-servers/gdd-mcp/server.ts +0 -35
package/scripts/mcp-servers/gdd-state/server.ts +0 -34

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -5,14 +5,14 @@
   },
   "metadata": {
     "description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
-    "version": "1.32.0"
+    "version": "1.33.5"
   },
   "plugins": [
     {
       "name": "get-design-done",
       "source": "./",
       "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
-      "version": "1.32.0",
+      "version": "1.33.5",
       "author": {
         "name": "hegemonart"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "get-design-done",
   "short_name": "gdd",
-  "version": "1.32.0",
+  "version": "1.33.5",
   "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain. v1.27.7 ships gdd-mcp (Phase 27.7): 12 read-only MCP tools for sub-3s priming. v1.28.0 (Phase 28): Foundational References Tier 2 — 5 new reference files (color-theory, composition, proportion-systems, i18n, contrast-advanced), 2 verifier i18n probes + 1 explore i18n-readiness probe, 12 additive cross-link insertions across 10 existing references, 2 orthogonal audit-scoring lens-tags (composition_alignment + i18n_readiness).",
   "author": {
     "name": "hegemonart",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,63 @@ All notable changes to get-design-done are documented here. Versions follow [sem
 ---
+## [1.33.5] - 2026-05-31
+### Phase 33.5 — GDD Runtime Security Hardening
+Audits and hardens GDD's **own** runtime attack surface — the multi-MCP-server, peer-CLI-spawning, WebSocket-transport-emitting SDK that grew across Phases 20–27 without a formalized security model. Phase 14.5 ships a safety floor for *user code being audited*; this phase is the equivalent for GDD's *own* runtime. Ships a STRIDE threat model + static runtime audit, an outbound-network CI gate, WebSocket bind hardening, gdd-state MCP input validation, peer-CLI env sandboxing, a secret-scan extension + fuzz, a published disclosure policy, and a regression baseline. A decimal release on the v1.33.x arc (CHANGELOG-only, D-01); no new runtime dependency (built-ins only, D-12). 6 plans across Waves A–C.
+### Added
+- **STRIDE threat model + runtime audit.** `reference/gdd-threat-model.md` models the five in-scope runtime components (hooks, the `gdd-state` MCP server, the peer-CLI broker, the WebSocket transport, the issue-reporter outbound path) with a per-component Assets / Entry-points / STRIDE / Mitigations / Residual treatment and a residual-risk → closing-plan map; `reference/gdd-runtime-audit.md` records the static audit that backs it. Both are registry-tracked.
+- **Outbound-network CI gate.** `scripts/scan-outbound-network.cjs` + `npm run scan:outbound`, wired into the CI `security` job, statically gates **active egress** (require/import of `node:http(s)`/`ws`/`node-fetch`/`axios`/`undici`, `fetch(`, `http(s).get/.request`, `new WebSocket(Server)`, and child-process spawn of `gh`/`curl`/`wget`/`nc`/`ssh`/`scp`) under `hooks/ scripts/ sdk/ bin/`, failing on any active-egress site not under an allowlisted glob in `scripts/security/outbound-allowlist.json` (D-06). Comment-only matches and bare-URL literals are skipped (a URL cannot exfiltrate without a call). This forces Phase 33.6's OpenRouter REST client to explicitly allowlist its new egress.
+- **WebSocket localhost-default bind + timing-safe token.** `scripts/lib/transports/ws.cjs` now defaults its bind host to **127.0.0.1** (was the implicit `0.0.0.0` all-interfaces bind); a remote bind is opt-in via `.design/config.json` `event_stream.bind_host` or the `GDD_WS_BIND_HOST` env override, and `scripts/scan-ws-bind.cjs` gates that the default config would never bind `0.0.0.0`. The Bearer-token compare is upgraded to `crypto.timingSafeEqual` (was `!==`, timing-unsafe); the existing ≥8-char-token rule is retained (D-04).
+- **gdd-state MCP input validation.** `sdk/mcp/gdd-state/tools/shared.ts` gains a `resolveStatePath()` path-traversal guard (rejects `..`-escape / absolute-outside / best-effort symlink-escape on the `GDD_STATE_PATH` override) and an `assertInputWithinLimits()` payload cap (64 KiB input / 8192-char string / depth-32 JSON-bomb guard) wired into all 11 tool handlers; the 11 tool input schemas are tightened with `maxLength`/`maxItems` (and already carried `additionalProperties:false`) (D-08).
+- **Peer-CLI env sandbox (allowlist-forward / default-deny).** A shared `scripts/lib/peer-cli/sanitize-env.cjs` helper builds the spawned child's environment from an OS-essential baseline **plus** an explicit allowlist read from `.design/config.json` `peer_cli.env_allowlist`, applied to both the acp and asp clients. GDD's `ANTHROPIC_API_KEY` / `GH_TOKEN` / `GDD_*` and anything secret-shaped are never forwarded to a spawned peer unless explicitly allowlisted (D-03).
+- **Secret-scan extension + fuzz.** `scripts/lib/redact.cjs` adds three modern token formats — Gemini/GCP `AIza…`, GitHub fine-grained `github_pat_…`, and GitHub server/oauth/user/refresh `gh[sour]_…` — bringing the redaction set to **11 patterns**, with a synthetic-secret fuzz test asserting zero leak per provider format (D-07). The existing PEM/JWT/anthropic/stripe/slack/`ghp_`/AWS/`sk-` patterns are retained.
+- **`SECURITY.md` disclosure policy.** A repo-root `SECURITY.md` documents the supported-versions stance and routes vulnerability reports through **GitHub private security advisories** (the repo Security tab → "Report a vulnerability"); it publishes no email / no PII (D-02) and notes that enabling private vulnerability reporting is a one-line repo setting the maintainer must toggle (D-11).
+- **Regression baseline.** `test/fixtures/baselines/phase-33-5/` freezes the hardened surface — a STRIDE-checklist snapshot, a hardening-surface invariant manifest (ws default host, redact pattern count, sanitize-env module + allowlist key, 11 gdd-state schemas, allowlist/threat-model/audit paths), and a synthetic secret-fuzz corpus — pinned by `test/suite/phase-33-5-baseline.test.cjs` so a future change cannot silently undo a hardened surface.
+### Notes
+- **WebSocket event-stream now binds `127.0.0.1` by default (was `0.0.0.0`).** Opt into a remote bind via `.design/config.json` `event_stream.bind_host` or the `GDD_WS_BIND_HOST` env override. This is **not marked BREAKING** — the event stream is a token-gated observability surface (Bearer auth ships on every upgrade), and the safe-by-default localhost bind only restricts an unintended off-box exposure; the opt-in escape hatch preserves the remote-bind workflow for anyone who relied on it.
+- All Phase 33.5 tests are hermetic (no network, no live peer; the WS test binds ephemeral localhost; the outbound + bind gates are static scans), so the default `npm test` stays green (D-10).
+- 6-manifest lockstep at **v1.33.5** (`package.json` + `package-lock.json` (root + `packages.""`) + `.claude-plugin/plugin.json` + `.claude-plugin/marketplace.json` (metadata.version + plugins[0].version) + `.cursor-plugin/plugin.json` + `.codex-plugin/plugin.json`). Version-sync hygiene done upfront (D-09): `OFF_CADENCE_VERSIONS.add('1.33.5')` + the 13 live-pinned `manifests-version.txt` baselines forward-propagated 1.33.0 → 1.33.5.
+---
+## [1.33.0] - 2026-05-30
+### Phase 33 — Skill Behavior Tests (Pressure-Scenario Harness)
+Adds a **behavior-test category** that complements the static validators (Phase 28.5 line/frontmatter) and static guardrails (Phase 32 `<HARD-GATE>` presence) with tests that verify skills hold UNDER PRESSURE. A manifest-driven runner drives a pressure scenario (time / sunk-cost / authority / exhaustion / scope-minimization) through an injectable agent-invoker and validates the response against a compliance/violation rubric with N-attempts + majority rule. Ships the harness + 8 baseline scenarios + synthetic RED baselines + the description-format A/B methodology + reflector telemetry integration. Ports the TDD-for-skills methodology + the pressure-scenario pattern from [`obra/superpowers/skills/writing-skills`](https://github.com/obra/superpowers) (MIT). 6 plans across Waves A–C.
+### Added
+- **Manifest-driven pressure-scenario runner** — `scripts/lib/skill-behavior/runner.cjs` exposes an INJECTABLE `invokeAgent(prompt, opts) -> { text }` seam (no `@anthropic-ai/sdk` dependency — D-03): a deterministic STUB invoker (`scripts/lib/skill-behavior/stub-invoker.cjs`) for CI/tests, plus a documented real-invoker adapter for the opt-in keyed run. Runs each scenario N times and decides compliance by majority.
+- **Pressure-scenario schema** — `reference/schemas/pressure-scenario.schema.json` (wired into `validate:schemas`), with conformance tests for the 8 scenario manifests.
+- **8 pressure scenarios + synthetic RED baselines** — `test/suite/skill-behavior/scenarios/` (7 stage skills + `using-gdd`) with synthetic-from-observed-cycle-drift RED baselines at `test/fixtures/skill-behavior-baseline/` (D-02 — ROADMAP-sanctioned).
+- **Description-format A/B methodology** — `docs/research/description-format-ab.md` documents the trigger-only vs `<what>. Use when` counterfactual + the 7/10-run threshold (D-08), with a `pending: keyed run` marker. The empirical result is an opt-in maintainer follow-up (no API key in CI).
+- **Reflector telemetry** — `scripts/lib/skill-behavior/telemetry.cjs` emits to `.design/telemetry/skill-behavior.jsonl`; a sustained-failure signal (≥3 of last 10 runs failing for a scenario) feeds an `apply-reflections` proposal (stub-tested integration — D-07).
+- **`npm run test:behavior` (opt-in, D-06).** A new script that runs the behavior tests ONLY when `ANTHROPIC_API_KEY` is set (a clear skip message + exit 0 otherwise). The default `npm test` is UNCHANGED — the structural stub tests stay CI-green (LLM non-determinism keeps live behavior runs out of the default suite).
+- **Docs** — `CONTRIBUTING.md` gains a "How to add a pressure scenario" section + the keyed `ANTHROPIC_API_KEY=… npm run test:behavior` procedure; `README.md` gains a "Skill behavior tests" subsection.
+### Removed
+- **BREAKING: the Phase-31.5 deprecation shims are removed (D-04).** The 10 `GDD-DEPRECATION-SHIM` re-exports re-created at the OLD SDK paths in v1.31.5 — `scripts/lib/{cli,event-stream,gdd-state,gdd-errors}/index.ts`, `scripts/lib/{error-classifier,iteration-budget,jittered-backoff,lockfile}.cjs`, and `scripts/mcp-servers/{gdd-state,gdd-mcp}/server.ts` — are deleted. The grace window elapsed (v1.31.5 shipped with shims → v1.32.0 still had them → v1.33.0 removes them). The now-empty `scripts/mcp-servers/` is dropped from the `package.json` `files` allowlist. **If you imported `scripts/lib/…` or `scripts/mcp-servers/…` directly, import from `sdk/…` instead** (e.g. `scripts/lib/cli` → `sdk/cli`, `scripts/lib/error-classifier.cjs` → `sdk/primitives/error-classifier.cjs`, `scripts/mcp-servers/gdd-state/server.ts` → `sdk/mcp/gdd-state/server.ts`). Internal callers were all repointed to `sdk/` in 31.5 + the Phase-32 gdd-events fix; the `gdd-state-mcp` / `gdd-mcp` bins target `sdk/`, so deletion drops only the external re-export — proven by the `no-stale-internal-refs` guard + the full suite + the 31.5 headless pack→install→run E2E.
+### Attribution
+- **Methodology + pattern ported from [`obra/superpowers/skills/writing-skills`](https://github.com/obra/superpowers) (MIT).** The TDD-for-skills cycle (RED: agent fails without the skill → GREEN: skill counters the rationalizations → REFACTOR: close new loopholes) and the pressure-scenario pattern. See `NOTICE`. We port the methodology, not the content — GDD's scenarios, rubrics, and skills are GDD-specific.
+### Notes
+- The behavioral evidence (real RED baselines from live agent runs + the empirical A/B result) is NOT capturable autonomously (no API key / SDK in CI). RED baselines are authored synthetic-from-observed-cycle-drift (D-02); the A/B evidence file documents methodology + expected-signal + a `pending: keyed run` marker. A Phase-28.5 feedback note points at `docs/research/description-format-ab.md`; **Phase 28.5's description-format validator regex is unchanged** (33-06 emits the pointer only — D-08).
+- The 31.5 tarball golden (`test/fixtures/baselines/phase-31-5/tarball-manifest.txt`) was regenerated as a reviewed delta: **+4** skill-behavior paths (`reference/schemas/pressure-scenario.schema.json` + the 3 `scripts/lib/skill-behavior/*.cjs`) and **−10** removed shim paths (618 paths).
+- 6-manifest lockstep at **v1.33.0** (`package.json` + `package-lock.json` + `.claude-plugin/plugin.json` + `.claude-plugin/marketplace.json` (metadata.version + plugins[0].version) + `.cursor-plugin/plugin.json` + `.codex-plugin/plugin.json`). Version-sync hygiene done upfront (D-09): `OFF_CADENCE_VERSIONS.add('1.33.0')` + prior `manifests-version.txt` baselines forward-propagated 1.32.0 → 1.33.0.
+---
 ## [1.32.0] - 2026-05-30
 ### Phase 32 — Skill Auto-Trigger Discipline + Defensive Guardrails

package/NOTICE CHANGED Viewed

@@ -249,14 +249,52 @@ Three ported artifacts:
 The mechanism is the contribution being attributed; the discipline content is
 original to get-design-done.
+──────────────────────────────────────────────────────────────────────────────
+Phase 33 — Skill Behavior Tests (Pressure-Scenario Harness) (v1.33.0, 2026-05-30)
+──────────────────────────────────────────────────────────────────────────────
+The skill-behavior pressure-scenario harness shipped in v1.33.0 ports the
+TDD-for-skills METHODOLOGY and the pressure-scenario PATTERN (not the content)
+from:
+  obra/superpowers/skills/writing-skills (https://github.com/obra/superpowers)
+  License: MIT
+writing-skills codifies the TDD-for-skills cycle (RED: an agent fails the task
+without the skill → GREEN: the skill counters those specific rationalizations →
+REFACTOR: close newly-discovered loopholes) and the pattern of testing a skill
+UNDER PRESSURE (time / sunk-cost / authority / exhaustion / scope-minimization)
+rather than only statically. We re-derive the methodology + pattern in GDD's own
+runtime and skill set:
+  scripts/lib/skill-behavior/runner.cjs
+    └─ The manifest-driven pressure-scenario runner (injectable agent-invoker
+       seam, N-attempts + majority rule, RED→GREEN structured result) adapts
+       writing-skills' TDD-for-skills test loop. GDD content: the injectable
+       invoker seam (no SDK dependency — D-03), the scenario-manifest schema,
+       and the stub-LLM CI path.
+  test/suite/skill-behavior/scenarios/*.json
+    └─ The pressure-scenario manifest pattern (a scenario applies a named
+       pressure to a skill and scores compliance vs violation against a rubric)
+       adapts writing-skills' pressure-test pattern. The specific scenarios,
+       pressures, rubrics, and the 8 covered skills are GDD-specific.
+  reference/schemas/pressure-scenario.schema.json
+    └─ The scenario-manifest contract formalizing the pattern. GDD original.
+The methodology + pattern are the contribution being attributed; the scenarios,
+rubrics, runner implementation, and skills are original to get-design-done.
 ────────────────────────────────────────────────────────────────────────
 Note on the broader codebase: get-design-done as a whole is licensed under
 the MIT License (see LICENSE). The Apache 2.0 attribution above applies
 specifically to the cc-multi-cli-derived files listed under the Phase 27
-block. The MIT attributions under Phase 28.5, Phase 28.7, and Phase 32 cover
-content/mechanism adapted from mattpocock/skills (MIT), gsd-build/get-shit-done
-(MIT), and obra/superpowers (MIT) respectively — the MIT-to-MIT re-licensing is
-straightforward and the attributions above provide the required source
-citation. The MIT and Apache 2.0 licenses are compatible — see
+block. The MIT attributions under Phase 28.5, Phase 28.7, Phase 32, and
+Phase 33 cover content/mechanism/methodology adapted from mattpocock/skills
+(MIT), gsd-build/get-shit-done (MIT), obra/superpowers (MIT), and
+obra/superpowers/skills/writing-skills (MIT) respectively — the MIT-to-MIT
+re-licensing is straightforward and the attributions above provide the
+required source citation. The MIT and Apache 2.0 licenses are compatible — see
 https://www.apache.org/legal/resolved.html#category-a.

package/README.md CHANGED Viewed

@@ -288,6 +288,19 @@ GDD ships 70+ skills, but a description-match skill router consults them opportu
 See [`skills/using-gdd/SKILL.md`](skills/using-gdd/SKILL.md) and the `NOTICE` attribution for details.
+### Skill behavior tests (v1.33.0+)
+Static validators check a skill's shape; **behavior tests** check that it holds under pressure. v1.33.0 adds a manifest-driven pressure-scenario harness (porting the TDD-for-skills methodology + pressure-scenario pattern from [`obra/superpowers/skills/writing-skills`](https://github.com/obra/superpowers), MIT): a runner drives a scenario (time / sunk-cost / authority / exhaustion / scope-minimization) through an injectable agent-invoker and scores the response against a compliance/violation rubric with N-attempts + majority rule. Ships 8 scenarios (7 stage skills + `using-gdd`) with synthetic RED baselines.
+Behavior tests are **opt-in** and key-gated — the default `npm test` stub suite covers the harness structurally and stays CI-green (LLM non-determinism keeps live runs out of CI). To run the live pass:
+```bash
+# Skips + exits 0 when ANTHROPIC_API_KEY is unset.
+ANTHROPIC_API_KEY=sk-... GDD_BEHAVIOR_INVOKER=./path/to/invoker.cjs npm run test:behavior
+```
+See [`docs/research/description-format-ab.md`](docs/research/description-format-ab.md) for the description-format A/B methodology and [`CONTRIBUTING.md`](CONTRIBUTING.md) ("How to add a pressure scenario").
 ## How It Works

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hegemonart/get-design-done",
-  "version": "1.32.0",
+  "version": "1.33.5",
   "description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
   "author": "Hegemon",
   "homepage": "https://github.com/hegemonart/get-design-done",
@@ -24,7 +24,6 @@
     "recipes/",
     "docs/i18n/",
     "scripts/lib/",
-    "scripts/mcp-servers/",
     "scripts/cli/",
     "scripts/install.cjs",
     "SKILL.md",
@@ -51,6 +50,7 @@
     "prepack": "npm run build:sdk",
     "postpack": "node scripts/build-sdk-bins.cjs --clean",
     "test": "node --test --experimental-strip-types \"test/suite/**/*.test.cjs\" \"test/suite/**/*.test.ts\"",
+    "test:behavior": "node scripts/run-behavior-tests.cjs",
     "typecheck": "tsc --noEmit",
     "codegen:schemas": "node --experimental-strip-types scripts/codegen-schema-types.ts",
     "lint:md": "npx --yes markdownlint-cli2 \"**/*.md\" \"#node_modules\" \"#.planning\" \"#.claude\" \"#test/fixtures/baselines\"",
@@ -60,6 +60,8 @@
     "validate:frontmatter": "node --experimental-strip-types scripts/validate-frontmatter.ts agents/",
     "detect:stale-refs": "node scripts/detect-stale-refs.cjs",
     "scan:injection": "node scripts/run-injection-scanner-ci.cjs",
+    "scan:outbound": "node scripts/scan-outbound-network.cjs",
+    "scan:ws-bind": "node scripts/scan-ws-bind.cjs",
     "test:size-budget": "node --test test/suite/agent-size-budget.test.cjs",
     "release:extract-changelog": "node scripts/extract-changelog-section.cjs",
     "verify:version-sync": "node scripts/verify-version-sync.cjs",

package/reference/gdd-runtime-audit.md ADDED Viewed

@@ -0,0 +1,111 @@
+# GDD Runtime Static Security Audit
+**Phase:** 33.5 — GDD Runtime Security Hardening · **Plan:** 33.5-02 (Wave A) · **Date:** 2026-05-31
+**Reference:** branch `phase/33-5-runtime-security`
+**Scope:** the SHIPPED runtime surface — `hooks/`, `scripts/`, `sdk/`, `bin/`.
+**Method:** static enumeration (read-only source inspection, no execution, no network).
+**Companions:**
+- `scripts/security/outbound-allowlist.json` — the machine-readable, CANONICAL active-egress
+  allowlist that Phase 33.5-04's `scripts/scan-outbound-network.cjs` gate `JSON.parse`s (this
+  report is its human-readable rationale).
+- `reference/gdd-threat-model.md` — the STRIDE threat model (Phase 33.5-01).
+This audit freezes the egress / secret / external-input picture so any *future* surface that is
+not pre-approved trips the 33.5-04 gate at CI time. It is grounded in the CONTEXT real-tree sweep
+("no raw unexpected egress found"), re-verified against HEAD by reading each cited file.
+---
+## Outbound-network call sites
+GDD's runtime makes outbound calls (or local IPC/server binds) from exactly the sites below. Each
+maps to an `outbound-allowlist.json` directory glob (see the **Allowlisted** column). The 33.5-04
+gate is **ACTIVE-egress-only** — it matches `require`/`import` of `node:http(s)`/`http(s)`/`ws`/
+`node-fetch`/`axios`/`undici`, `fetch(`, `http(s).get/.request(`, `new WebSocket`/`new
+WebSocketServer`/`WebSocketServer(`, and child-process `spawn`/`spawnSync`/`exec*` of
+`gh|curl|wget|nc|ssh|scp` — so a bare URL string alone never trips it.
+| File | What it calls | Transport | Legitimacy | Allowlisted by |
+| --- | --- | --- | --- | --- |
+| `scripts/lib/figma-extract/pull.cjs` | `fetch` → `https://api.figma.com/v1` (read-only) | REST (global `fetch`, injectable) | User-initiated Figma extract; token from `FIGMA_TOKEN`/`FIGMA_PERSONAL_ACCESS_TOKEN`, never persisted/logged | `scripts/lib/figma-extract/**` |
+| `scripts/lib/figma-extract/styles-resolver.cjs` | Figma REST styles lookup (shares `pull.cjs` fetch path) | REST | Same extract flow; read-only | `scripts/lib/figma-extract/**` |
+| `scripts/lib/figma-extract/receiver.cjs` | `require('node:http')` → `http.createServer` + `server.listen(RECEIVER_PORT, RECEIVER_HOST)` | local HTTP server | **EPHEMERAL** handshake server on `127.0.0.1`; lives for one extract run then exits (timeout-armed) | `scripts/lib/figma-extract/**` |
+| `scripts/lib/transports/ws.cjs` | `require('node:http')` + `new WebSocketServer({ noServer })` + `httpServer.listen(opts.port)` | WebSocket over HTTP upgrade | Event-stream transport; **hardened in 33.5-03** (127.0.0.1 default bind, Bearer token ≥8 chars → timing-safe compare) | `scripts/lib/transports/ws.cjs` |
+| `scripts/lib/issue-reporter/gh-submit.cjs` | `spawn('gh', …)` / `spawnSync` | `gh` CLI spawn | Rides `gh`'s own logged-in auth; **no raw HTTP**; frozen destination + kill-switch | `scripts/lib/issue-reporter/**` |
+| `scripts/lib/issue-reporter/dedup.cjs` | `spawn('gh', …)` (3 call sites) | `gh` CLI spawn | Duplicate-issue lookup before submit; same frozen destination | `scripts/lib/issue-reporter/**` |
+| `scripts/lib/issue-reporter/gh-absent-fallback.cjs` | `spawnSync('gh', ['gh'], {stdio:'ignore'})` (presence probe) | `gh` CLI spawn | Detects whether `gh` is installed; no payload | `scripts/lib/issue-reporter/**` |
+| `scripts/lib/peer-cli/acp-client.cjs` | `spawn(command, args, …)` (no shell) | child-process / stdio IPC | Spawns a LOCAL peer binary over stdio (JSON-RPC); **IPC, not network**; env sandboxed (33.5-04) | `scripts/lib/peer-cli/**` |
+| `scripts/lib/peer-cli/asp-client.cjs` | `spawn(...)` (no shell) | child-process / stdio IPC | Local peer over stdio; same env-allowlist sandbox | `scripts/lib/peer-cli/**` |
+| `scripts/e2e/run-headless.ts` | live Anthropic API run | REST (key-gated) | Test infrastructure only; gated on `ANTHROPIC_API_KEY` + main-branch; never in default `npm test` | `scripts/e2e/**` |
+| `scripts/lib/authority-watcher/index.cjs` | authority-feed classification of already-fetched records | (delegated) | The live article fetch is delegated to `agents/design-authority-watcher.md`; `index.cjs` is the pure-CommonJS classifier. Allowlisted so any direct feed fetch added here stays pre-approved | `scripts/lib/authority-watcher/**` |
+**Documented but NOT hard-gated** (the scanner scope is `.js`-family + active-egress only —
+these cannot exfiltrate on their own and are covered by gitleaks + the threat model):
+| File / surface | What it is | Why not gated |
+| --- | --- | --- |
+| `scripts/lib/easings.cjs` (line 6) | React-Native `Easing.js` spec link in a comment | bare URL string, no call |
+| `scripts/lib/spring.cjs` (line 6) | React-Native `SpringConfig.js` spec link in a comment | bare URL string, no call |
+| `scripts/lib/install/merge.cjs` (line 85) | "Plugin repository: https://github.com/hegemonart/get-design-done" printed string | bare URL string, no call |
+| `scripts/lib/issue-reporter/destination.cjs` (lines 29–30) | frozen `DESTINATION_URL` / `ISSUE_TEMPLATE_URL` constants | constants consumed only via the `gh` spawn above |
+| `scripts/lint-agentskills-spec.cjs` (line 76) | spec-example URL in a comment | bare URL string, no call |
+| `hooks/update-check.sh` | shell update-check egress | shell script — outside the `.{js,cjs,mjs,ts}` scanner scope |
+| `scripts/bootstrap.sh` | shell bootstrap egress | shell script — outside the `.{js,cjs,mjs,ts}` scanner scope |
+---
+## Secret-handling sites
+Where a token/key/credential is read, forwarded, or scrubbed across the shipped runtime.
+| File | Secret touched | Action | Risk note |
+| --- | --- | --- | --- |
+| `scripts/lib/redact.cjs` | PEM, JWT, Anthropic `sk-ant-`, Stripe `sk_live_`, Slack `xox*`, GitHub `ghp_`, AWS `AKIA`, generic `sk-` (8 patterns) | **scrub** | Deep-walks event-stream payloads at serialize time so everything that hits disk / a bus subscriber is `[REDACTED:<type>]`. **D-07 extends** with Gemini `AIza…`, GitHub fine-grained `github_pat_`, and GitHub `ghs_/gho_/ghu_/ghr_` (currently uncovered). |
+| `scripts/lib/peer-cli/acp-client.cjs` (line 102) | full `process.env` | **forward** (default) | `const env = opts.env … : process.env` → GDD's `ANTHROPIC_API_KEY`/`GH_TOKEN`/`GDD_*` leak to a spawned peer when `opts.env` is absent. **Fixed by 33.5-04** (allowlist-forward, default-deny, shared `sanitize-env`). |
+| `scripts/lib/peer-cli/asp-client.cjs` (line 122) | full `process.env` | **forward** (default) | Same default-inherit gap; same 33.5-04 fix. |
+| `scripts/lib/figma-extract/pull.cjs` | `FIGMA_TOKEN` / `FIGMA_PERSONAL_ACCESS_TOKEN` | **read** | Lives only inside caller-provided `headers`; NEVER written to disk, NEVER logged (diagnostics read a short body prefix, never the token — D-10 of the figma sub-plan). |
+| `scripts/lib/issue-reporter/gh-submit.cjs` / `dedup.cjs` | none held by GDD | **delegate** | Authentication is `gh`'s own logged-in credential store; GDD never reads or forwards a GitHub token for these spawns. |
+---
+## External-input surfaces
+Where untrusted data crosses a trust boundary into GDD's runtime.
+| Surface | Untrusted source | Boundary | Risk note |
+| --- | --- | --- | --- |
+| WebSocket upgrade request | a remote client connecting to the event-stream WS | `scripts/lib/transports/ws.cjs` HTTP `upgrade` handler | Bearer-token gate (≥8 chars) already ships; **33.5-03** adds 127.0.0.1-default bind + timing-safe compare so the default config does not expose `0.0.0.0`. |
+| gdd-state MCP tool inputs | the MCP client / model | `sdk/mcp/gdd-state/tools/*.ts` (11 tools) + `tools/shared.ts` | Each tool has a JSON schema under `sdk/mcp/gdd-state/schemas/`; **33.5-08** tightens them (`additionalProperties:false` + `maxLength`) and adds a payload-size cap (JSON-bomb guard). |
+| `GDD_STATE_PATH` env override | the launching environment | `sdk/mcp/gdd-state/tools/shared.ts:resolveStatePath()` (line 60–61) | `process.env['GDD_STATE_PATH'] ?? .design/STATE.md` with **no path-traversal guard** today — `..`/absolute-outside escape is unchecked. **Closed by 33.5-08** (resolve + assert within project root / `.design/`). |
+| `.design/config.json` | a repo-local config file (potentially attacker-influenced in a malicious clone) | 14 modules incl. `scripts/lib/peer-cli/registry.cjs` (line 154) | Drives `peer_cli.enabled_peers` / (33.5) `peer_cli.env_allowlist`, the WS `event_stream.bind_host`, and the issue-reporter kill-switch. Parsed defensively; opt-in by design (a peer must be explicitly enabled). |
+| peer child `stdout` | a spawned LOCAL peer CLI | `scripts/lib/peer-cli/acp-client.cjs` (JSON-RPC frame parser) | A malicious/buggy peer could flood stdout — **already capped at 16 MiB un-newlined** (DoS guard). JSON-RPC frames are parsed, not `eval`'d. |
+---
+## Finding
+The legitimate active-egress set is fully enumerated above and **no raw unexpected egress was
+found** — every outbound call, server bind, or child spawn maps to one of six trusted modules,
+each frozen into `scripts/security/outbound-allowlist.json` with a justification. The allowlist
+uses **directory globs** so a new helper in an already-trusted module (e.g. another
+`figma-extract` file) does not silently trip the gate, and every glob is asserted by
+`test/suite/phase-33-5-audit.test.cjs` to resolve to ≥1 real file — so the 33.5-04 gate cannot be
+defeated by a stale entry that matches nothing.
+The residual gaps surfaced here are **closed by the remaining Phase 33.5 plans**:
+- **33.5-03** — WebSocket bind hardening: 127.0.0.1 default bind (no more `0.0.0.0`), opt-in remote
+  via `event_stream.bind_host`, and a timing-safe (`crypto.timingSafeEqual`) token compare. Closes
+  the WS-upgrade external-input row.
+- **33.5-04** — Outbound-network static CI gate: `scripts/scan-outbound-network.cjs` +
+  `npm run scan:outbound`, consuming THIS plan's `outbound-allowlist.json`. Also lands the
+  peer-CLI env-allowlist sandbox (shared `sanitize-env`) that closes the `acp-client.cjs` /
+  `asp-client.cjs` full-`process.env` forward rows.
+- **33.5-05** (and 33.5-07/08) — secret-scan extension (Gemini + GitHub fine-grained/server
+  tokens) with a synthetic-secret fuzz, the gdd-state path-traversal guard + payload cap +
+  tightened schemas, `SECURITY.md`, and the regression baseline.
+This report + the canonical allowlist satisfy **SEC-02** and unblock ROADMAP **SC#5** (the
+outbound-network gate), as amended by **D-05** (corrected paths `reference/` + `test/suite/`) and
+**D-06** (the gate mirrors the injection-scanner: a data file + a scanner that loads it).