@hegemonart/get-design-done 1.27.1 → 1.27.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,14 +5,14 @@
5
5
  },
6
6
  "metadata": {
7
7
  "description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
8
- "version": "1.27.1"
8
+ "version": "1.27.5"
9
9
  },
10
10
  "plugins": [
11
11
  {
12
12
  "name": "get-design-done",
13
13
  "source": "./",
14
14
  "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
15
- "version": "1.27.1",
15
+ "version": "1.27.5",
16
16
  "author": {
17
17
  "name": "hegemonart"
18
18
  },
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "get-design-done",
3
3
  "short_name": "gdd",
4
- "version": "1.27.1",
4
+ "version": "1.27.5",
5
5
  "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
6
6
  "author": {
7
7
  "name": "hegemonart",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,51 @@ All notable changes to get-design-done are documented here. Versions follow [sem
4
4
 
5
5
  ---
6
6
 
7
+ ## [1.27.5] — 2026-05-17
8
+
9
+ ### Added
10
+
11
+ - **Phase 27.5 — Bandit Production Integration** (6 plans). Wires Phase 23.5's bandit posterior + Phase 27-07's `delegate?` dimension into a real production routing path. After v1.27.5, `default-tier:` becomes a default (cold-start prior), not a final answer — the bandit picks the final tier from measurement when `adaptive_mode: full`.
12
+ - `scripts/lib/bandit-router/integration.cjs` (Plan 27.5-01) — thin shim exposing `consultBandit({agent, bin, delegate, agentFrontmatter, adaptiveMode}) → {tier, decision_log}` and `recordOutcome({agent, bin, delegate, tier, status, costUsd, adaptiveMode}) → void`. Hides `pull` vs `pullWithDelegate` choice. Best-effort posterior write per D-04.
13
+ - `hooks/budget-enforcer.ts` (Plan 27.5-02) — bandit consultation per Agent spawn after `resolved_models` is computed, before SDK call. Overrides `resolved_models[agent]` via `tier-resolver.cjs` when the bandit picks a different tier than the router emitted. Emits `bandit.tier_selected` event per spawn. Respects `tier_override:` frontmatter bypass (D-05), `adaptive_mode` gate (D-07), and the 80% auto-downgrade guard.
14
+ - `scripts/lib/session-runner/index.ts` (Plan 27.5-03) — calls `recordOutcome()` after every `emit('session.completed', ...)` site (4 call sites: rate-limited, peer-success, turn-cap-zero, terminal retry-exit). Adds 3 optional fields to `SessionRunnerOptions`: `agent`, `bin`, `tier`. Posterior write is best-effort; missing fields silent.
15
+ - `agents/design-reflector.md` Section 8 (Plan 27.5-04) — bandit-arbitrage analysis surfaces "agent X frontmatter says sonnet but bandit picks opus" as `[FRONTMATTER]` proposals after 3+ pulls with credible interval < 0.05 and ≥ 50% mean delta vs second-best tier (D-10). New module `scripts/lib/bandit-arbitrage.cjs` mirrors Phase 26-06's cost-arbitrage shape.
16
+ - `skills/peers/SKILL.md` Step 5 + new `skills/bandit-status/SKILL.md` (Plan 27.5-05) — `/gdd:peers` now reads canonical posterior path `.design/telemetry/posterior.json` and renders real per-peer reward-delta when posterior is populated. New read-only `/gdd:bandit-status` skill surfaces per-`(agent, bin, delegate, tier)` posterior snapshots (alpha/beta/mean/stddev/count/last-used). Strictly read-only per D-11.
17
+ - `docs/BANDIT-INTEGRATION.md` + `reference/bandit-integration.md` (Plan 27.5-06) — operator guide + developer cheat sheet.
18
+
19
+ ### Decisions locked
20
+
21
+ - D-01: `hooks/budget-enforcer.ts` is the bandit consultation site (single canonical routing decision point).
22
+ - D-02: Per-spawn timing, after `resolved_models` computed, before SDK call.
23
+ - D-03: Override `resolved_models[agent]` with bandit tier through `tier-resolver.cjs`. Preserve `model_tier_overrides[agent]` unchanged (back-compat).
24
+ - D-04: `update()` called in session-runner's terminal-emit path after `session.completed`. Best-effort posterior write — errors swallowed.
25
+ - D-05: `tier_override:` frontmatter is the explicit per-agent bandit-bypass surface.
26
+ - D-06: Posterior path stays at `.design/telemetry/posterior.json` (Phase 23.5 D-08 unchanged).
27
+ - D-07: Bandit consultation gated by `adaptive_mode` (static + hedge silent; full active).
28
+ - D-08: Reward function unchanged from Phase 23.5 (two-stage lexicographic correctness + cost).
29
+ - D-09: Cold-start prior for the 5 peer delegate arms uses neutral `TIER_PRIOR` (no bias toward any peer).
30
+ - D-10: Reflector bandit-arbitrage reuses `cost-arbitrage.cjs` shape (50% threshold, mirror Phase 26-06).
31
+ - D-11: `/gdd:bandit-status` is read-only (use `/gdd:bandit-reset` from Phase 23.5 to mutate).
32
+ - D-12: All 6 plans land together with one CHANGELOG block; 4 manifests bump lockstep.
33
+
34
+ ### Out of scope (deferred)
35
+
36
+ - Auto-failover when bandit recommends a delegate not in `enabled_peers` — bandit stays advisory.
37
+ - Cross-cycle posterior decay — Phase 23.5 D-12 already specifies discounted Thompson sampling.
38
+ - Per-task bandit dimensions beyond `(agent, bin, delegate)` — needs convergence proof first.
39
+ - Removing frontmatter `default-tier:` — additive only; deprecation is Phase 30+.
40
+ - Bandit-driven complexity_class selection — different decision domain.
41
+
42
+ ### Test coverage
43
+
44
+ - `tests/bandit-router-integration.test.cjs` — 25+ tests covering all 5 paths × adaptive_mode × tier_override × delegate (Plan 27.5-01).
45
+ - `tests/budget-enforcer-bandit.test.cjs` — 8+ tests for hook consultation branches (Plan 27.5-02).
46
+ - `tests/session-runner-bandit-outcome.test.cjs` — 6+ tests for recordOutcome paths (Plan 27.5-03).
47
+ - `tests/bandit-arbitrage.test.cjs` — 6+ tests for reflector analyzer (Plan 27.5-04).
48
+ - `tests/phase-27-5-baseline.test.cjs` — manifests + baseline + integration-exports regression (Plan 27.5-06).
49
+
50
+ ---
51
+
7
52
  ## [1.27.1] — 2026-04-30
8
53
 
9
54
  Phase 27 wiring patch — closes the production-integration gaps left by v1.27.0's "structural ship". v1.27.0 landed all peer-CLI library code + tests + docs but the helpers were exported without callers, so `delegate_to:` on agent frontmatter was validated and then ignored at runtime. v1.27.1 wires the four integration points so delegation actually fires for users who set `delegate_to:` AND allowlist the peer.
package/SKILL.md CHANGED
@@ -89,6 +89,7 @@ Each stage produces artifacts in `.design/` inside the current project.
89
89
  | `skill-manifest [--refresh]` | `get-design-done:skill-manifest` | List or refresh the local skill manifest used by the router for discovery |
90
90
  | `quality-gate` | `get-design-done:quality-gate` | Phase 25 — parallel lint/type/test/visual command runner; classifies failures via quality-gate-runner agent |
91
91
  | `turn-closeout` | `get-design-done:turn-closeout` | Phase 25 — Stop-hook mirror skill; finalizes per-turn STATE blocks and emits closeout events |
92
+ | `bandit-status` | `get-design-done:bandit-status` | Phase 27.5 — read-only diagnostic surface for the bandit posterior; per-(agent, bin, delegate, tier) snapshots (alpha, beta, mean, stddev, count, last-used). Use `/gdd:bandit-reset` to mutate. |
92
93
  | `peers` | `get-design-done:peers` | Phase 27 — `/gdd:peers` capability matrix command; shows installed peer-CLIs (codex/gemini/cursor/copilot/qwen), allowlist status, claimed roles, posterior delta vs local |
93
94
  | `peer-cli-customize` | `get-design-done:peer-cli-customize` | Phase 27 — rewire role→peer mappings on a per-agent basis (edits frontmatter `delegate_to:` directly) |
94
95
  | `peer-cli-add` | `get-design-done:peer-cli-add` | Phase 27 — guided ladder for adding a brand-new peer (verification ladder + adapter scaffolding + capability-matrix update) |
@@ -151,6 +151,58 @@ Render each `cost_arbitrage` entry into the Proposals section as a `[BUDGET]`-ta
151
151
 
152
152
  ---
153
153
 
154
+ ### 8. Bandit-arbitrage analysis (Phase 27.5 — D-10)
155
+
156
+ **Why this exists:** Phase 27.5 (v1.27.5) wired the bandit posterior + delegate dimension into production. The posterior now accumulates per-`(agent, bin, delegate, tier)` win-rates from real spawns. Once the posterior has enough data, the bandit's best-arm tier for an agent may differ from that agent's frontmatter `default-tier:` — a measurement signal that the frontmatter is stale. This section surfaces that signal as a `[FRONTMATTER]` proposal.
157
+
158
+ **Data sources:**
159
+
160
+ - `.design/telemetry/posterior.json` — the bandit posterior file written by Phase 23.5's `bandit-router.cjs` + Phase 27.5-02/03's production callers. Path matches `bandit-router.cjs`'s `DEFAULT_POSTERIOR_PATH`. If the file does not exist, skip this section with note "posterior.json not found — Phase 27.5 wiring required."
161
+ - `agents/*.md` — read each agent's frontmatter `default-tier:` value. The reflector already parses frontmatter in Section 3 ("Agent Performance"); reuse that parse pass and build a `{agent: defaultTier}` map keyed by the agent's `name:` field.
162
+
163
+ **The rule:**
164
+
165
+ For each `(agent, bin)` slice in the posterior (defaulting to `delegate='none'` arms — focuses on local-call routing):
166
+
167
+ 1. Compute per-tier posterior mean = `α / (α + β)` and stddev = `sqrt(αβ / ((α+β)² · (α+β+1)))`.
168
+ 2. Identify `posterior_best_tier = argmax(mean)` across the tiers present in the slice.
169
+ 3. Gates (all must hold to emit):
170
+ - `sum(arm.count)` across the slice's tier rows >= 3 (D-10's "3+ cycles" proxy).
171
+ - `(best_mean - second_best_mean) / second_best_mean >= 0.5` (50% delta heuristic).
172
+ - `stddev(best_tier) < 0.05` (credible interval narrow enough).
173
+ - `frontmatter[agent].default-tier !== posterior_best_tier` (the actual stale signal).
174
+ 4. If all gates hold, emit a structured `bandit_arbitrage` proposal.
175
+
176
+ **Important guardrails (failure modes the rule must avoid):**
177
+
178
+ - **Single-tier-only history is silent.** If only one tier has been pulled for `(agent, bin)`, no comparison is possible — emit nothing rather than a misleading "winner" proposal.
179
+ - **Wide credible intervals are silent.** Bandit posteriors are noisy early on; the 0.05 stddev gate ensures we only surface signals where the bandit is confident.
180
+ - **The 50% threshold is a starting heuristic.** Same discipline as cost-arbitrage Section 7 — bandit-learning over which arbitrage proposals were APPLIED (and whether the posterior subsequently shifted) is a separate (future) phase.
181
+ - **delegateFilter='none' is the v1.27.5 default.** Arbitrage analysis on the 5 peer-delegate slices is left for a future plan; current peer data is too sparse to credibly disagree with frontmatter.
182
+
183
+ **Helper:** `scripts/lib/bandit-arbitrage.cjs` exports `analyze(posterior, options) → proposals[]` implementing the above rule deterministically. The executor agent following this skill loads the posterior via `bandit-router.loadPosterior()`, builds the `{agent: defaultTier}` map from `agents/*.md` frontmatter, and passes both to `analyze()`. No re-derivation of the rule in prose — call the helper.
184
+
185
+ **Proposal output shape** (one entry per stale-frontmatter signal, JSON-serializable for `/gdd:apply-reflections`):
186
+
187
+ ```json
188
+ {
189
+ "type": "bandit_arbitrage",
190
+ "agent": "design-verifier",
191
+ "bin": "medium",
192
+ "current_frontmatter_tier": "sonnet",
193
+ "posterior_best_tier": "opus",
194
+ "posterior_mean": { "haiku": 0.50, "sonnet": 0.62, "opus": 0.95 },
195
+ "posterior_stddev": { "haiku": 0.04, "sonnet": 0.03, "opus": 0.02 },
196
+ "pull_count": 18,
197
+ "proposal": "design-verifier (medium bin) frontmatter says sonnet but bandit picks opus (posterior mean 0.950 vs 0.620, 18 pulls, stddev 0.020) — update frontmatter or add tier_override: sonnet if intentional",
198
+ "evidence": "posterior_cred_int_narrow"
199
+ }
200
+ ```
201
+
202
+ Render each `bandit_arbitrage` entry into the Proposals section as a `[FRONTMATTER]`-tagged proposal carrying the structured payload verbatim. `/gdd:apply-reflections` routes the proposal to either (a) an `agents/<name>.md` frontmatter `default-tier:` update OR (b) a new `tier_override: <existing-tier>` add when the operator explicitly wants to keep the existing default-tier despite the measured drift.
203
+
204
+ ---
205
+
154
206
  ## Proposals
155
207
 
156
208
  After all sections, write a **Proposals** section. Number proposals sequentially. Every proposal must include evidence — no vague observations.
@@ -105,6 +105,76 @@ interface RuntimeDetectModule {
105
105
  }
106
106
  const runtimeDetect = nodeRequire('../scripts/lib/runtime-detect.cjs') as RuntimeDetectModule;
107
107
 
108
+ // Plan 27.5-01: bandit production-integration shim. Hides pull /
109
+ // pullWithDelegate choice from the hook; reads adaptive_mode + frontmatter
110
+ // tier_override under the same gating discipline as Phase 23.5 D-07 and
111
+ // Phase 27.5 D-05.
112
+ interface BanditIntegrationModule {
113
+ consultBandit(args: {
114
+ agent: string;
115
+ bin: string;
116
+ delegate: string;
117
+ agentFrontmatter: { tier_override?: string; default_tier?: string };
118
+ adaptiveMode?: 'static' | 'hedge' | 'full';
119
+ baseDir?: string;
120
+ posteriorPath?: string;
121
+ }): {
122
+ tier: 'haiku' | 'sonnet' | 'opus';
123
+ decision_log: {
124
+ source:
125
+ | 'frontmatter'
126
+ | 'tier_override_bypass'
127
+ | 'bandit_pull'
128
+ | 'bandit_pull_with_delegate';
129
+ samples?: Record<string, number> | Record<string, Record<string, number>>;
130
+ delegate?: string;
131
+ adaptive_mode: 'static' | 'hedge' | 'full';
132
+ reason?: string;
133
+ };
134
+ };
135
+ recordOutcome(args: unknown): void;
136
+ DELEGATE_NONE: 'none';
137
+ }
138
+ const banditIntegration = nodeRequire(
139
+ '../scripts/lib/bandit-router/integration.cjs',
140
+ ) as BanditIntegrationModule;
141
+
142
+ // Plan 27.5-02: adaptive-mode module surfaces the single gating predicate.
143
+ interface AdaptiveModeModule {
144
+ getMode(opts?: {
145
+ baseDir?: string;
146
+ budgetPath?: string;
147
+ quiet?: boolean;
148
+ }): 'static' | 'hedge' | 'full';
149
+ isBanditEnabled(opts?: { baseDir?: string; budgetPath?: string }): boolean;
150
+ }
151
+ const adaptiveMode = nodeRequire(
152
+ '../scripts/lib/adaptive-mode.cjs',
153
+ ) as AdaptiveModeModule;
154
+
155
+ // Plan 27.5-02: bin selection helper for bandit (agent, bin) addressing.
156
+ // budget-enforcer doesn't currently surface glob_count; default to 'medium'
157
+ // as a safe per-agent partition until a future plan wires the real count.
158
+ interface BanditRouterCoreModule {
159
+ binForGlobCount(n: number): 'tiny' | 'small' | 'medium' | 'large';
160
+ DEFAULT_DELEGATES: readonly string[];
161
+ }
162
+ const banditRouterCore = nodeRequire(
163
+ '../scripts/lib/bandit-router.cjs',
164
+ ) as BanditRouterCoreModule;
165
+
166
+ // Plan 27.5-02: tier-resolver translates bandit tier → concrete model.
167
+ interface TierResolverModule {
168
+ resolve(
169
+ runtime: string,
170
+ tier: string,
171
+ opts?: { silent?: boolean },
172
+ ): string | null;
173
+ }
174
+ const tierResolver = nodeRequire(
175
+ '../scripts/lib/tier-resolver.cjs',
176
+ ) as TierResolverModule;
177
+
108
178
  // ── Types ───────────────────────────────────────────────────────────────────
109
179
 
110
180
  /**
@@ -618,6 +688,50 @@ function emitCostRecorded(
618
688
  }
619
689
  }
620
690
 
691
+ /**
692
+ * Plan 27.5-02 / D-03: emit `bandit.tier_selected` event when the bandit
693
+ * is consulted (regardless of whether it overrode the prior tier). The
694
+ * event captures the prior tier, the bandit's pick, the sampled posterior
695
+ * (when applicable), the delegate dimension, and the runtime tag so
696
+ * Phase 11 reflector (27.5-04) and `/gdd:bandit-status` (27.5-05) can
697
+ * reconstruct decision history without re-reading the posterior file.
698
+ *
699
+ * Fail-open like every other emit in this hook.
700
+ */
701
+ function emitBanditTierSelected(
702
+ payload: {
703
+ agent: string;
704
+ bin: string;
705
+ prior_tier: string;
706
+ selected_tier: 'haiku' | 'sonnet' | 'opus';
707
+ source:
708
+ | 'frontmatter'
709
+ | 'tier_override_bypass'
710
+ | 'bandit_pull'
711
+ | 'bandit_pull_with_delegate';
712
+ delegate: string;
713
+ adaptive_mode: 'static' | 'hedge' | 'full';
714
+ samples?: unknown;
715
+ runtime: string;
716
+ model_id: string | null;
717
+ reason?: string;
718
+ },
719
+ cycle?: string,
720
+ ): void {
721
+ const ev = {
722
+ type: 'bandit.tier_selected',
723
+ timestamp: new Date().toISOString(),
724
+ sessionId: getSessionId(),
725
+ ...(cycle !== undefined && cycle !== 'unknown' ? { cycle } : {}),
726
+ payload,
727
+ };
728
+ try {
729
+ appendEvent(ev as unknown as HookFiredEvent);
730
+ } catch {
731
+ // Fail open.
732
+ }
733
+ }
734
+
621
735
  // ── main ────────────────────────────────────────────────────────────────────
622
736
 
623
737
  async function readStdin(): Promise<string> {
@@ -905,12 +1019,142 @@ export async function main(): Promise<void> {
905
1019
  ? routerDecision.runtime
906
1020
  : runtimeDetect.detect()) ?? 'claude';
907
1021
 
1022
+ // ── Plan 27.5-02 — bandit consultation ────────────────────────────────────
1023
+ //
1024
+ // D-01 / D-02 / D-03 / D-07: per-spawn after `resolved_models` is computed,
1025
+ // before the SDK call. Skip conditions (all silent — no event, no override):
1026
+ // - adaptive_mode !== 'full' (D-07)
1027
+ // - toolInput._tier_downgraded === true (80% downgrade fired upstream —
1028
+ // bandit must not undo budget)
1029
+ //
1030
+ // When bandit fires, override resolved_models[agent] through tier-resolver
1031
+ // so downstream consumers see the bandit's pick as the actual model.
1032
+ // model_tier_overrides[agent] is preserved (D-03 back-compat).
1033
+ const currentMode = adaptiveMode.getMode({ quiet: true });
1034
+ const priorTier = resolvedTier; // captured before bandit override
1035
+ // Mutable references for the cost/telemetry path; bandit may rewrite.
1036
+ let effectiveTier: string = resolvedTier;
1037
+ let effectiveModelId: string | null = resolvedModelId;
1038
+
1039
+ if (currentMode === 'full' && toolInput._tier_downgraded !== true) {
1040
+ // Bin defaults to 'medium' — budget-enforcer doesn't currently surface
1041
+ // glob_count; future plan can wire it. Per-agent bandit arms still
1042
+ // converge correctly under a fixed bin (Phase 23.5 D-08). The function
1043
+ // call below makes the integration point explicit for future plans.
1044
+ void banditRouterCore.binForGlobCount(0);
1045
+ const bin = 'medium';
1046
+
1047
+ // Source the frontmatter view from the in-flight toolInput. The hook
1048
+ // reads frontmatter indirectly: _default_tier carries the agent's
1049
+ // declared default-tier, _tier_override (if any) carries an explicit
1050
+ // override the router emitted. For bandit purposes, _tier_override
1051
+ // means "operator has already taken control" — the shim returns
1052
+ // source='tier_override_bypass' (no posterior side effect).
1053
+ const agentFrontmatter: {
1054
+ tier_override?: string;
1055
+ default_tier?: string;
1056
+ } = {};
1057
+ if (
1058
+ typeof toolInput._tier_override === 'string' &&
1059
+ toolInput._tier_override.length > 0
1060
+ ) {
1061
+ agentFrontmatter.tier_override = toolInput._tier_override;
1062
+ }
1063
+ if (
1064
+ typeof toolInput._default_tier === 'string' &&
1065
+ toolInput._default_tier.length > 0
1066
+ ) {
1067
+ agentFrontmatter.default_tier = toolInput._default_tier;
1068
+ }
1069
+
1070
+ // Delegate dimension: budget-enforcer doesn't currently see the
1071
+ // agent's delegate_to: frontmatter (session-runner does). For 27.5-02
1072
+ // we always consult the local-call slice (delegate='none'); 27.5-03
1073
+ // wires delegate=<peer> for the recordOutcome side.
1074
+ const banditDelegate = banditIntegration.DELEGATE_NONE;
1075
+
1076
+ let banditResult: ReturnType<
1077
+ BanditIntegrationModule['consultBandit']
1078
+ > | null = null;
1079
+ try {
1080
+ banditResult = banditIntegration.consultBandit({
1081
+ agent,
1082
+ bin,
1083
+ delegate: banditDelegate,
1084
+ agentFrontmatter,
1085
+ adaptiveMode: currentMode,
1086
+ });
1087
+ } catch {
1088
+ // Fail open — never let a bandit error block a spawn.
1089
+ }
1090
+
1091
+ if (banditResult !== null) {
1092
+ // Translate the bandit tier into a concrete model. The tier-resolver
1093
+ // emits its own fallback events (tier_resolution_fallback /
1094
+ // tier_resolution_failed) when the runtime row is incomplete, so we
1095
+ // don't need to re-emit those here.
1096
+ const banditModel = tierResolver.resolve(
1097
+ runtimeId,
1098
+ banditResult.tier,
1099
+ { silent: true },
1100
+ );
1101
+
1102
+ // Apply override only when:
1103
+ // 1. bandit actually picked a different tier than priorTier
1104
+ // (no-op write avoided)
1105
+ // 2. tier-resolver returned a non-null model (fall back to
1106
+ // existing resolvedModelId on null)
1107
+ // 3. source is 'bandit_pull' or 'bandit_pull_with_delegate'
1108
+ // (frontmatter/bypass paths don't override resolved_models)
1109
+ if (
1110
+ banditResult.tier !== priorTier &&
1111
+ banditModel !== null &&
1112
+ (banditResult.decision_log.source === 'bandit_pull' ||
1113
+ banditResult.decision_log.source === 'bandit_pull_with_delegate')
1114
+ ) {
1115
+ // Override resolved_models[agent] without touching
1116
+ // model_tier_overrides[agent] (D-03 back-compat).
1117
+ if (routerDecision !== undefined) {
1118
+ const rm = routerDecision.resolved_models ?? {};
1119
+ rm[agent] = banditModel;
1120
+ routerDecision.resolved_models = rm;
1121
+ }
1122
+ // Also stamp _tier_override on toolInput so downstream readers
1123
+ // see the bandit's pick.
1124
+ toolInput._tier_override = banditResult.tier;
1125
+ effectiveTier = banditResult.tier;
1126
+ effectiveModelId = banditModel;
1127
+ }
1128
+
1129
+ // Emit one bandit.tier_selected event regardless of override outcome
1130
+ // (the event captures the decision, not the override side effect).
1131
+ emitBanditTierSelected(
1132
+ {
1133
+ agent,
1134
+ bin,
1135
+ prior_tier: priorTier,
1136
+ selected_tier: banditResult.tier,
1137
+ source: banditResult.decision_log.source,
1138
+ delegate: banditResult.decision_log.delegate ?? banditDelegate,
1139
+ adaptive_mode: banditResult.decision_log.adaptive_mode,
1140
+ samples: banditResult.decision_log.samples,
1141
+ runtime: runtimeId,
1142
+ model_id: effectiveModelId ?? resolvedModelId,
1143
+ ...(banditResult.decision_log.reason !== undefined
1144
+ ? { reason: banditResult.decision_log.reason }
1145
+ : {}),
1146
+ },
1147
+ cycle,
1148
+ );
1149
+ }
1150
+ }
1151
+
908
1152
  // Compute runtime-aware cost via the shared backend. Failures return
909
1153
  // null cost; we emit the event regardless so the cost-aggregator sees
910
1154
  // the lookup attempt (Phase 22 events.jsonl tagging).
911
1155
  const costLookup = budgetBackend.computeCost({
912
- model_id: resolvedModelId,
913
- tier: resolvedTier,
1156
+ model_id: effectiveModelId,
1157
+ tier: effectiveTier,
914
1158
  runtime: runtimeId,
915
1159
  tokens_in: Number(toolInput._tokens_in_est ?? 0),
916
1160
  tokens_out: Number(toolInput._tokens_out_est ?? 0),
@@ -920,8 +1164,8 @@ export async function main(): Promise<void> {
920
1164
  {
921
1165
  runtime: runtimeId,
922
1166
  agent,
923
- model_id: resolvedModelId ?? costLookup.model,
924
- tier: costLookup.tier ?? resolvedTier,
1167
+ model_id: effectiveModelId ?? costLookup.model,
1168
+ tier: costLookup.tier ?? effectiveTier,
925
1169
  tokens_in: Number(toolInput._tokens_in_est ?? 0),
926
1170
  tokens_out: Number(toolInput._tokens_out_est ?? 0),
927
1171
  cost_usd: costLookup.cost_usd,
@@ -932,7 +1176,7 @@ export async function main(): Promise<void> {
932
1176
  // Branch E: standard spawn-allowed (includes tier-downgraded path).
933
1177
  writeTelemetry({
934
1178
  agent,
935
- tier: resolvedTier,
1179
+ tier: effectiveTier,
936
1180
  tokens_in: Number(toolInput._tokens_in_est ?? 0),
937
1181
  tokens_out: Number(toolInput._tokens_out_est ?? 0),
938
1182
  cache_hit: false,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hegemonart/get-design-done",
3
- "version": "1.27.1",
3
+ "version": "1.27.5",
4
4
  "description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
5
5
  "author": "Hegemon",
6
6
  "homepage": "https://github.com/hegemonart/get-design-done",
@@ -0,0 +1,163 @@
1
+ ---
2
+ name: bandit-integration
3
+ phase: 27.5
4
+ version: 1.0.0
5
+ type: meta-rules
6
+ description: Bandit posterior + production-integration shim cheat sheet — signatures, reward function semantics, adaptive_mode gate, posterior path conventions.
7
+ ---
8
+
9
+ # Bandit Integration — Developer Cheat Sheet
10
+
11
+ **Phase 27.5 (v1.27.5).** Reference for the bandit production-integration surface. Authoring or modifying a caller of the bandit posterior? Debugging a routing decision at the code level? Start here.
12
+
13
+ For ops-level guidance (when bandit fires, how to disable, posterior inspection), see `docs/BANDIT-INTEGRATION.md`.
14
+
15
+ In-scope modules:
16
+
17
+ - `scripts/lib/bandit-router.cjs` (Phase 23.5 primitives).
18
+ - `scripts/lib/bandit-router/integration.cjs` (Phase 27.5 shim).
19
+
20
+ ---
21
+
22
+ ## The two-stage architecture
23
+
24
+ Phase 23.5 ships the bandit primitives — Thompson-sampling pull, posterior update, computeReward, atomic persistence. Phase 27-07 added the `delegate?` arm dimension (5 peer-CLI arms + the local `none` arm). Both phases shipped library-only with no production callers.
25
+
26
+ Phase 27.5 ships the production-integration shim that wraps the primitives behind two purpose-built entry points and hides the `pull` vs `pullWithDelegate` choice. Callers pass a `delegate` argument and the shim routes internally.
27
+
28
+ ### Phase 23.5 + 27-07 surface — `scripts/lib/bandit-router.cjs`
29
+
30
+ Exports: `pull`, `update`, `pullWithDelegate`, `updateWithDelegate`, `computeReward`, `loadPosterior`, `savePosterior`, `reset`, `decayArm`, `sampleBeta`, `priorFor`, `binForGlobCount`, `DEFAULT_DELEGATES`, `DELEGATE_NONE`, `TIER_PRIOR`, `PRIOR_STRENGTH`, `TOUCHES_BINS`, `DEFAULT_POSTERIOR_PATH`, `SCHEMA_VERSION`.
31
+
32
+ The two-pair primitive split:
33
+
34
+ - `pull({agent, bin, ...})` / `update({agent, bin, tier, reward, ...})` — operate on the `(agent, bin, tier)` arm slice. Equivalent to `delegate='none'`.
35
+ - `pullWithDelegate({agent, bin, delegates, ...})` / `updateWithDelegate({agent, bin, tier, delegate, reward, ...})` — operate on the `(agent, bin, tier, delegate)` arm slice for any `delegate ∈ DEFAULT_DELEGATES`.
36
+
37
+ ### Phase 27.5 surface — `scripts/lib/bandit-router/integration.cjs`
38
+
39
+ Exports: `consultBandit`, `recordOutcome`, `DELEGATE_NONE`.
40
+
41
+ Routing rules (D-05, D-07):
42
+
43
+ 1. `agentFrontmatter.tier_override` set → bypass bandit, return `tier_override`.
44
+ 2. `adaptiveMode !== 'full'` → bandit silent, return `frontmatter.default_tier`.
45
+ 3. `adaptiveMode === 'full'` + delegate `'none'` or undefined → call `pull()`.
46
+ 4. `adaptiveMode === 'full'` + delegate is a peer name → call `pullWithDelegate({delegates: [delegate]})`.
47
+
48
+ `recordOutcome` is symmetric on the adaptive-mode gate.
49
+
50
+ ---
51
+
52
+ ## `consultBandit` signature
53
+
54
+ ```javascript
55
+ consultBandit({
56
+ agent: string, // required
57
+ bin: string, // required: 'tiny' | 'small' | 'medium' | 'large'
58
+ delegate: string, // 'none' or one of DEFAULT_DELEGATES
59
+ agentFrontmatter: {
60
+ tier_override?: string,
61
+ default_tier?: string,
62
+ },
63
+ adaptiveMode?: 'static' | 'hedge' | 'full', // omit to read on-disk
64
+ baseDir?: string, // override workspace root (test-injection)
65
+ posteriorPath?: string, // override posterior file path (test-injection)
66
+ }) → {
67
+ tier: 'haiku' | 'sonnet' | 'opus',
68
+ decision_log: {
69
+ source: 'frontmatter' | 'tier_override_bypass' | 'bandit_pull' | 'bandit_pull_with_delegate',
70
+ samples?: { haiku?: number, sonnet?: number, opus?: number },
71
+ delegate?: string,
72
+ adaptive_mode: string,
73
+ reason?: string,
74
+ },
75
+ }
76
+ ```
77
+
78
+ `decision_log.source` is the audit trail — it tells observability tools which routing branch ran. Tests use it to assert the correct path was taken.
79
+
80
+ ---
81
+
82
+ ## `recordOutcome` signature
83
+
84
+ ```javascript
85
+ recordOutcome({
86
+ agent: string,
87
+ bin: string,
88
+ delegate: string,
89
+ tier: string,
90
+ status: string, // SessionResult.status — only 'completed' triggers reward.solidify_pass
91
+ costUsd?: number,
92
+ adaptiveMode?: 'static' | 'hedge' | 'full',
93
+ baseDir?: string,
94
+ posteriorPath?: string,
95
+ }) → void // best-effort per D-04 — write errors are swallowed
96
+ ```
97
+
98
+ Reward semantics:
99
+
100
+ - `solidify_pass = (status === 'completed')`.
101
+ - If `!solidify_pass`, reward is `0`. If true, reward is `1 - lambda * normalize(costUsd + epsilon * wallTimeMs)`.
102
+
103
+ Phase 27.5 passes `wallTimeMs: 0` always (D-08 unchanged from Phase 23.5).
104
+
105
+ ---
106
+
107
+ ## `adaptive_mode` gate semantics
108
+
109
+ Phase 23.5 ladder (D-07):
110
+
111
+ - `static` — default. Bandit silent. `default-tier:` is authoritative. No reads, no writes.
112
+ - `hedge` — measurement-only. Bandit silent on reads, but `recordOutcome` may still write to seed the posterior. Currently identical to `static` in Phase 27.5; reserved for Phase 28+ explicit "hedge mode".
113
+ - `full` — bandit active. Reads pick via Thompson sampling; writes update posterior.
114
+
115
+ The shim respects the gate transparently. Operators flip via `.design/budget.json#adaptive_mode`.
116
+
117
+ ---
118
+
119
+ ## Reward function
120
+
121
+ `computeReward({solidify_pass, cost_usd, wall_time_ms, lambda?, epsilon?, costNormalizer?}) → number`
122
+
123
+ Two-stage lexicographic (D-08, unchanged from Phase 23.5):
124
+
125
+ - Stage 1 — correctness: if `solidify_pass !== true`, return `0`.
126
+ - Stage 2 — cost: return `1 - lambda * normalize(cost_usd + epsilon * wall_time_ms)`.
127
+
128
+ Defaults: `lambda = 0.3`, `epsilon = 0.05`. `normalize` maps `[0, $5]` linearly to `[0, 1]`, clamped.
129
+
130
+ Cheaper successful spawns get higher reward. Failed spawns are flat zero. Tune `lambda` to weight cost less.
131
+
132
+ ---
133
+
134
+ ## Posterior path
135
+
136
+ Canonical path: `.design/telemetry/posterior.json` (Phase 23.5 D-08, Phase 27.5 D-06 unchanged). Path is owned by `DEFAULT_POSTERIOR_PATH` constant in `scripts/lib/bandit-router.cjs`.
137
+
138
+ Test injection: pass `baseDir` (anchors path under a different workspace root) or `posteriorPath` (overrides the file path directly). Both `consultBandit` and `recordOutcome` accept these options.
139
+
140
+ Write discipline: atomic via `.tmp` + rename. Read failures yield an empty posterior; subsequent writes overwrite. Concurrent writers within the same process are not synchronized — gdd's session-runner is single-threaded.
141
+
142
+ ---
143
+
144
+ ## Call sites
145
+
146
+ Phase 27.5 wires these consumers:
147
+
148
+ - **`hooks/budget-enforcer.ts`** (Plan 27.5-02) — per Agent spawn, after `resolved_models` is computed, before SDK call. Calls `consultBandit({agent, bin, delegate, agentFrontmatter, adaptiveMode})`. Overrides `resolved_models[agent]` with the bandit tier via `tier-resolver.cjs`. Emits `bandit.tier_selected` event for observability.
149
+ - **`scripts/lib/session-runner/index.ts`** (Plan 27.5-03) — terminal-emit path. Calls `recordOutcome({agent, bin, delegate, tier, status, costUsd})` after every `emit('session.completed', ...)` site (4 sites: rate-limited, peer-success, turn-cap-zero, terminal retry-exit). Posterior write is best-effort; missing optional fields silent.
150
+ - **`agents/design-reflector.md` Section 8** (Plan 27.5-04) — bandit-arbitrage analysis. `scripts/lib/bandit-arbitrage.cjs` reads `.design/telemetry/posterior.json` and surfaces stale-frontmatter proposals. Mirrors Phase 26-06's `cost-arbitrage.cjs` shape.
151
+ - **`skills/peers/SKILL.md` Step 5 + `skills/bandit-status/SKILL.md`** (Plan 27.5-05) — read-only diagnostic surfaces. `/gdd:peers` posterior delta column populated; `/gdd:bandit-status` renders per-`(agent, bin, delegate, tier)` snapshots.
152
+
153
+ ---
154
+
155
+ ## Cross-references
156
+
157
+ - `docs/BANDIT-INTEGRATION.md` — operator guide (when bandit fires, how to disable, troubleshooting).
158
+ - `reference/peer-protocols.md` — Phase 27 ACP/ASP cheat sheet (peer-CLI delegation transport).
159
+ - `scripts/lib/bandit-router.cjs` — Phase 23.5 primitives surface.
160
+ - `scripts/lib/bandit-router/integration.cjs` — Phase 27.5 production shim.
161
+ - `scripts/lib/bandit-arbitrage.cjs` — Phase 27.5 reflector analyzer (Section 8 of `design-reflector.md`).
162
+ - `hooks/budget-enforcer.ts` — bandit consultation site.
163
+ - `scripts/lib/session-runner/index.ts` — `recordOutcome` site.
@@ -95,6 +95,13 @@
95
95
  "type": "authority-feed",
96
96
  "description": "Whitelist of design-authority feed sources for the watcher"
97
97
  },
98
+ {
99
+ "name": "bandit-integration",
100
+ "path": "reference/bandit-integration.md",
101
+ "type": "meta-rules",
102
+ "phase": 27.5,
103
+ "description": "Phase 27.5 bandit production-integration cheat sheet — consultBandit + recordOutcome shim signatures, adaptive_mode gate semantics, reward function (Phase 23.5 D-08 unchanged), posterior path .design/telemetry/posterior.json, call-site map (budget-enforcer + session-runner + design-reflector Section 8)"
104
+ },
98
105
  {
99
106
  "name": "brand-voice",
100
107
  "path": "reference/brand-voice.md",