@hegemonart/get-design-done 1.27.0 → 1.27.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -2
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +69 -0
- package/SKILL.md +1 -0
- package/agents/design-reflector.md +52 -0
- package/hooks/budget-enforcer.ts +249 -5
- package/package.json +2 -2
- package/reference/bandit-integration.md +163 -0
- package/reference/peer-protocols.md +1 -1
- package/reference/registry.json +7 -0
- package/scripts/install.cjs +100 -1
- package/scripts/lib/bandit-arbitrage.cjs +423 -0
- package/scripts/lib/bandit-router/integration.cjs +309 -0
- package/scripts/lib/peer-cli/spawn-cmd.cjs +2 -2
- package/scripts/lib/session-runner/index.ts +381 -28
- package/skills/bandit-status/SKILL.md +129 -0
- package/skills/peers/SKILL.md +27 -8
|
@@ -5,14 +5,14 @@
|
|
|
5
5
|
},
|
|
6
6
|
"metadata": {
|
|
7
7
|
"description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
|
|
8
|
-
"version": "1.27.
|
|
8
|
+
"version": "1.27.5"
|
|
9
9
|
},
|
|
10
10
|
"plugins": [
|
|
11
11
|
{
|
|
12
12
|
"name": "get-design-done",
|
|
13
13
|
"source": "./",
|
|
14
14
|
"description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
|
|
15
|
-
"version": "1.27.
|
|
15
|
+
"version": "1.27.5",
|
|
16
16
|
"author": {
|
|
17
17
|
"name": "hegemonart"
|
|
18
18
|
},
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "get-design-done",
|
|
3
3
|
"short_name": "gdd",
|
|
4
|
-
"version": "1.27.
|
|
4
|
+
"version": "1.27.5",
|
|
5
5
|
"description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
|
|
6
6
|
"author": {
|
|
7
7
|
"name": "hegemonart",
|
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,75 @@ All notable changes to get-design-done are documented here. Versions follow [sem
|
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
+
## [1.27.5] — 2026-05-17
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **Phase 27.5 — Bandit Production Integration** (6 plans). Wires Phase 23.5's bandit posterior + Phase 27-07's `delegate?` dimension into a real production routing path. After v1.27.5, `default-tier:` becomes a default (cold-start prior), not a final answer — the bandit picks the final tier from measurement when `adaptive_mode: full`.
|
|
12
|
+
- `scripts/lib/bandit-router/integration.cjs` (Plan 27.5-01) — thin shim exposing `consultBandit({agent, bin, delegate, agentFrontmatter, adaptiveMode}) → {tier, decision_log}` and `recordOutcome({agent, bin, delegate, tier, status, costUsd, adaptiveMode}) → void`. Hides `pull` vs `pullWithDelegate` choice. Best-effort posterior write per D-04.
|
|
13
|
+
- `hooks/budget-enforcer.ts` (Plan 27.5-02) — bandit consultation per Agent spawn after `resolved_models` is computed, before SDK call. Overrides `resolved_models[agent]` via `tier-resolver.cjs` when the bandit picks a different tier than the router emitted. Emits `bandit.tier_selected` event per spawn. Respects `tier_override:` frontmatter bypass (D-05), `adaptive_mode` gate (D-07), and the 80% auto-downgrade guard.
|
|
14
|
+
- `scripts/lib/session-runner/index.ts` (Plan 27.5-03) — calls `recordOutcome()` after every `emit('session.completed', ...)` site (4 call sites: rate-limited, peer-success, turn-cap-zero, terminal retry-exit). Adds 3 optional fields to `SessionRunnerOptions`: `agent`, `bin`, `tier`. Posterior write is best-effort; missing fields silent.
|
|
15
|
+
- `agents/design-reflector.md` Section 8 (Plan 27.5-04) — bandit-arbitrage analysis surfaces "agent X frontmatter says sonnet but bandit picks opus" as `[FRONTMATTER]` proposals after 3+ pulls with credible interval < 0.05 and ≥ 50% mean delta vs second-best tier (D-10). New module `scripts/lib/bandit-arbitrage.cjs` mirrors Phase 26-06's cost-arbitrage shape.
|
|
16
|
+
- `skills/peers/SKILL.md` Step 5 + new `skills/bandit-status/SKILL.md` (Plan 27.5-05) — `/gdd:peers` now reads canonical posterior path `.design/telemetry/posterior.json` and renders real per-peer reward-delta when posterior is populated. New read-only `/gdd:bandit-status` skill surfaces per-`(agent, bin, delegate, tier)` posterior snapshots (alpha/beta/mean/stddev/count/last-used). Strictly read-only per D-11.
|
|
17
|
+
- `docs/BANDIT-INTEGRATION.md` + `reference/bandit-integration.md` (Plan 27.5-06) — operator guide + developer cheat sheet.
|
|
18
|
+
|
|
19
|
+
### Decisions locked
|
|
20
|
+
|
|
21
|
+
- D-01: `hooks/budget-enforcer.ts` is the bandit consultation site (single canonical routing decision point).
|
|
22
|
+
- D-02: Per-spawn timing, after `resolved_models` computed, before SDK call.
|
|
23
|
+
- D-03: Override `resolved_models[agent]` with bandit tier through `tier-resolver.cjs`. Preserve `model_tier_overrides[agent]` unchanged (back-compat).
|
|
24
|
+
- D-04: `update()` called in session-runner's terminal-emit path after `session.completed`. Best-effort posterior write — errors swallowed.
|
|
25
|
+
- D-05: `tier_override:` frontmatter is the explicit per-agent bandit-bypass surface.
|
|
26
|
+
- D-06: Posterior path stays at `.design/telemetry/posterior.json` (Phase 23.5 D-08 unchanged).
|
|
27
|
+
- D-07: Bandit consultation gated by `adaptive_mode` (static + hedge silent; full active).
|
|
28
|
+
- D-08: Reward function unchanged from Phase 23.5 (two-stage lexicographic correctness + cost).
|
|
29
|
+
- D-09: Cold-start prior for the 5 peer delegate arms uses neutral `TIER_PRIOR` (no bias toward any peer).
|
|
30
|
+
- D-10: Reflector bandit-arbitrage reuses `cost-arbitrage.cjs` shape (50% threshold, mirror Phase 26-06).
|
|
31
|
+
- D-11: `/gdd:bandit-status` is read-only (use `/gdd:bandit-reset` from Phase 23.5 to mutate).
|
|
32
|
+
- D-12: All 6 plans land together with one CHANGELOG block; 4 manifests bump lockstep.
|
|
33
|
+
|
|
34
|
+
### Out of scope (deferred)
|
|
35
|
+
|
|
36
|
+
- Auto-failover when bandit recommends a delegate not in `enabled_peers` — bandit stays advisory.
|
|
37
|
+
- Cross-cycle posterior decay — Phase 23.5 D-12 already specifies discounted Thompson sampling.
|
|
38
|
+
- Per-task bandit dimensions beyond `(agent, bin, delegate)` — needs convergence proof first.
|
|
39
|
+
- Removing frontmatter `default-tier:` — additive only; deprecation is Phase 30+.
|
|
40
|
+
- Bandit-driven complexity_class selection — different decision domain.
|
|
41
|
+
|
|
42
|
+
### Test coverage
|
|
43
|
+
|
|
44
|
+
- `tests/bandit-router-integration.test.cjs` — 25+ tests covering all 5 paths × adaptive_mode × tier_override × delegate (Plan 27.5-01).
|
|
45
|
+
- `tests/budget-enforcer-bandit.test.cjs` — 8+ tests for hook consultation branches (Plan 27.5-02).
|
|
46
|
+
- `tests/session-runner-bandit-outcome.test.cjs` — 6+ tests for recordOutcome paths (Plan 27.5-03).
|
|
47
|
+
- `tests/bandit-arbitrage.test.cjs` — 6+ tests for reflector analyzer (Plan 27.5-04).
|
|
48
|
+
- `tests/phase-27-5-baseline.test.cjs` — manifests + baseline + integration-exports regression (Plan 27.5-06).
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## [1.27.1] — 2026-04-30
|
|
53
|
+
|
|
54
|
+
Phase 27 wiring patch — closes the production-integration gaps left by v1.27.0's "structural ship". v1.27.0 landed all peer-CLI library code + tests + docs but the helpers were exported without callers, so `delegate_to:` on agent frontmatter was validated and then ignored at runtime. v1.27.1 wires the four integration points so delegation actually fires for users who set `delegate_to:` AND allowlist the peer.
|
|
55
|
+
|
|
56
|
+
### Fixed
|
|
57
|
+
|
|
58
|
+
- **`session-runner.run()` now invokes `tryDelegate` (Plan 27-06 wiring)** — when `opts.delegateTo` is set to a `<peer>-<role>` value AND the registry can route AND the peer is in `.design/config.json#peer_cli.enabled_peers`, the prompt runs on the peer-CLI and `run()` returns the peer result. On peer-absent / peer-error / null result, falls through transparently to the local Anthropic SDK loop (D-07). Previously the `tryDelegate` helper existed in the file but `run()` never called it.
|
|
59
|
+
|
|
60
|
+
- **Real `appendEvent('peer_call_started|complete|failed', ...)` emission (Plan 27-08 wiring)** — replaced the stderr-only placeholder in session-runner with three real event-emission calls. Events flow through Phase 22's `appendEvent()` API using the constants registered in v1.27.0, tagged with `runtime_role: 'peer'` and `peer_id`. Reflector cross-runtime cost-arbitrage (Plan 26-06) now sees peer telemetry. `GDD_PEER_DEBUG=1` continues to mirror the failed events to stderr for live tailing.
|
|
61
|
+
|
|
62
|
+
- **`install.cjs` interactive peer-detection nudge (Plan 27-11 wiring)** — after a successful (non-uninstall, non-dry-run) install in a TTY, scans `peerBinary` paths via `detectInstalledPeers()`. If 1+ peer detected, prompts via `@clack/prompts` with `confirm()` (default: NO). On yes, writes `.design/config.json#peer_cli.enabled_peers`. New `--no-peer-prompt` flag suppresses the prompt entirely (CI-friendly). Silent skip when zero peers detected. Default-NO preserves the opt-in trust contract (D-11).
|
|
63
|
+
|
|
64
|
+
### Out of scope (known, deferred)
|
|
65
|
+
|
|
66
|
+
- **Bandit `pullWithDelegate` caller (Plan 27-07 wiring)** — `pullWithDelegate` and `updateWithDelegate` ship in the bandit module surface (v1.27.0) but no production caller invokes them yet. Wiring requires `gdd-router` SKILL.md change (procedural, not code) which is out of scope for a wiring patch. Phase 28+ territory once the integration shape is decided. The `delegate?` dimension stays exported as a library extension for ad-hoc use.
|
|
67
|
+
|
|
68
|
+
### Tests
|
|
69
|
+
|
|
70
|
+
- Existing 23 peer-CLI session-runner / events / end-to-end tests pass after wiring.
|
|
71
|
+
- Existing 33 install.cjs + peer-detect tests pass after the nudge addition.
|
|
72
|
+
- Full Phase 27 surface tests stay green; no new test files (this is a wiring patch, not a new surface).
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
7
76
|
## [1.27.0] — 2026-04-30
|
|
8
77
|
|
|
9
78
|
Phase 27 Peer-CLI Delegation Layer milestone — closes the **outbound** half of multi-runtime. Phase 24 made gdd installable on 14 runtimes; Phase 21 made the same pipeline run on each; Phase 26 made tier→model resolve correctly per runtime. v1.27.0 adds the missing piece: gdd agents OPTIONALLY delegate to local peer CLIs (Codex via App Server Protocol; Gemini/Cursor/Copilot/Qwen via Agent Client Protocol) when measurably cheaper or higher-quality for the role. Falls back to in-process Anthropic SDK when peer is unavailable. Honors Phase 26 tier maps + Phase 22 event chain + Phase 23.5 bandit posterior — `delegate?` becomes another arm in `(agent_type × tier × delegate)` Thompson sampling, no new ML.
|
package/SKILL.md
CHANGED
|
@@ -89,6 +89,7 @@ Each stage produces artifacts in `.design/` inside the current project.
|
|
|
89
89
|
| `skill-manifest [--refresh]` | `get-design-done:skill-manifest` | List or refresh the local skill manifest used by the router for discovery |
|
|
90
90
|
| `quality-gate` | `get-design-done:quality-gate` | Phase 25 — parallel lint/type/test/visual command runner; classifies failures via quality-gate-runner agent |
|
|
91
91
|
| `turn-closeout` | `get-design-done:turn-closeout` | Phase 25 — Stop-hook mirror skill; finalizes per-turn STATE blocks and emits closeout events |
|
|
92
|
+
| `bandit-status` | `get-design-done:bandit-status` | Phase 27.5 — read-only diagnostic surface for the bandit posterior; per-(agent, bin, delegate, tier) snapshots (alpha, beta, mean, stddev, count, last-used). Use `/gdd:bandit-reset` to mutate. |
|
|
92
93
|
| `peers` | `get-design-done:peers` | Phase 27 — `/gdd:peers` capability matrix command; shows installed peer-CLIs (codex/gemini/cursor/copilot/qwen), allowlist status, claimed roles, posterior delta vs local |
|
|
93
94
|
| `peer-cli-customize` | `get-design-done:peer-cli-customize` | Phase 27 — rewire role→peer mappings on a per-agent basis (edits frontmatter `delegate_to:` directly) |
|
|
94
95
|
| `peer-cli-add` | `get-design-done:peer-cli-add` | Phase 27 — guided ladder for adding a brand-new peer (verification ladder + adapter scaffolding + capability-matrix update) |
|
|
@@ -151,6 +151,58 @@ Render each `cost_arbitrage` entry into the Proposals section as a `[BUDGET]`-ta
|
|
|
151
151
|
|
|
152
152
|
---
|
|
153
153
|
|
|
154
|
+
### 8. Bandit-arbitrage analysis (Phase 27.5 — D-10)
|
|
155
|
+
|
|
156
|
+
**Why this exists:** Phase 27.5 (v1.27.5) wired the bandit posterior + delegate dimension into production. The posterior now accumulates per-`(agent, bin, delegate, tier)` win-rates from real spawns. Once the posterior has enough data, the bandit's best-arm tier for an agent may differ from that agent's frontmatter `default-tier:` — a measurement signal that the frontmatter is stale. This section surfaces that signal as a `[FRONTMATTER]` proposal.
|
|
157
|
+
|
|
158
|
+
**Data sources:**
|
|
159
|
+
|
|
160
|
+
- `.design/telemetry/posterior.json` — the bandit posterior file written by Phase 23.5's `bandit-router.cjs` + Phase 27.5-02/03's production callers. Path matches `bandit-router.cjs`'s `DEFAULT_POSTERIOR_PATH`. If the file does not exist, skip this section with note "posterior.json not found — Phase 27.5 wiring required."
|
|
161
|
+
- `agents/*.md` — read each agent's frontmatter `default-tier:` value. The reflector already parses frontmatter in Section 3 ("Agent Performance"); reuse that parse pass and build a `{agent: defaultTier}` map keyed by the agent's `name:` field.
|
|
162
|
+
|
|
163
|
+
**The rule:**
|
|
164
|
+
|
|
165
|
+
For each `(agent, bin)` slice in the posterior (defaulting to `delegate='none'` arms — focuses on local-call routing):
|
|
166
|
+
|
|
167
|
+
1. Compute per-tier posterior mean = `α / (α + β)` and stddev = `sqrt(αβ / ((α+β)² · (α+β+1)))`.
|
|
168
|
+
2. Identify `posterior_best_tier = argmax(mean)` across the tiers present in the slice.
|
|
169
|
+
3. Gates (all must hold to emit):
|
|
170
|
+
- `sum(arm.count)` across the slice's tier rows >= 3 (D-10's "3+ cycles" proxy).
|
|
171
|
+
- `(best_mean - second_best_mean) / second_best_mean >= 0.5` (50% delta heuristic).
|
|
172
|
+
- `stddev(best_tier) < 0.05` (credible interval narrow enough).
|
|
173
|
+
- `frontmatter[agent].default-tier !== posterior_best_tier` (the actual stale signal).
|
|
174
|
+
4. If all gates hold, emit a structured `bandit_arbitrage` proposal.
|
|
175
|
+
|
|
176
|
+
**Important guardrails (failure modes the rule must avoid):**
|
|
177
|
+
|
|
178
|
+
- **Single-tier-only history is silent.** If only one tier has been pulled for `(agent, bin)`, no comparison is possible — emit nothing rather than a misleading "winner" proposal.
|
|
179
|
+
- **Wide credible intervals are silent.** Bandit posteriors are noisy early on; the 0.05 stddev gate ensures we only surface signals where the bandit is confident.
|
|
180
|
+
- **The 50% threshold is a starting heuristic.** Same discipline as cost-arbitrage Section 7 — bandit-learning over which arbitrage proposals were APPLIED (and whether the posterior subsequently shifted) is a separate (future) phase.
|
|
181
|
+
- **delegateFilter='none' is the v1.27.5 default.** Arbitrage analysis on the 5 peer-delegate slices is left for a future plan; current peer data is too sparse to credibly disagree with frontmatter.
|
|
182
|
+
|
|
183
|
+
**Helper:** `scripts/lib/bandit-arbitrage.cjs` exports `analyze(posterior, options) → proposals[]` implementing the above rule deterministically. The executor agent following this skill loads the posterior via `bandit-router.loadPosterior()`, builds the `{agent: defaultTier}` map from `agents/*.md` frontmatter, and passes both to `analyze()`. No re-derivation of the rule in prose — call the helper.
|
|
184
|
+
|
|
185
|
+
**Proposal output shape** (one entry per stale-frontmatter signal, JSON-serializable for `/gdd:apply-reflections`):
|
|
186
|
+
|
|
187
|
+
```json
|
|
188
|
+
{
|
|
189
|
+
"type": "bandit_arbitrage",
|
|
190
|
+
"agent": "design-verifier",
|
|
191
|
+
"bin": "medium",
|
|
192
|
+
"current_frontmatter_tier": "sonnet",
|
|
193
|
+
"posterior_best_tier": "opus",
|
|
194
|
+
"posterior_mean": { "haiku": 0.50, "sonnet": 0.62, "opus": 0.95 },
|
|
195
|
+
"posterior_stddev": { "haiku": 0.04, "sonnet": 0.03, "opus": 0.02 },
|
|
196
|
+
"pull_count": 18,
|
|
197
|
+
"proposal": "design-verifier (medium bin) frontmatter says sonnet but bandit picks opus (posterior mean 0.950 vs 0.620, 18 pulls, stddev 0.020) — update frontmatter or add tier_override: sonnet if intentional",
|
|
198
|
+
"evidence": "posterior_cred_int_narrow"
|
|
199
|
+
}
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Render each `bandit_arbitrage` entry into the Proposals section as a `[FRONTMATTER]`-tagged proposal carrying the structured payload verbatim. `/gdd:apply-reflections` routes the proposal to either (a) an `agents/<name>.md` frontmatter `default-tier:` update OR (b) a new `tier_override: <existing-tier>` add when the operator explicitly wants to keep the existing default-tier despite the measured drift.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
154
206
|
## Proposals
|
|
155
207
|
|
|
156
208
|
After all sections, write a **Proposals** section. Number proposals sequentially. Every proposal must include evidence — no vague observations.
|
package/hooks/budget-enforcer.ts
CHANGED
|
@@ -105,6 +105,76 @@ interface RuntimeDetectModule {
|
|
|
105
105
|
}
|
|
106
106
|
const runtimeDetect = nodeRequire('../scripts/lib/runtime-detect.cjs') as RuntimeDetectModule;
|
|
107
107
|
|
|
108
|
+
// Plan 27.5-01: bandit production-integration shim. Hides pull /
|
|
109
|
+
// pullWithDelegate choice from the hook; reads adaptive_mode + frontmatter
|
|
110
|
+
// tier_override under the same gating discipline as Phase 23.5 D-07 and
|
|
111
|
+
// Phase 27.5 D-05.
|
|
112
|
+
interface BanditIntegrationModule {
|
|
113
|
+
consultBandit(args: {
|
|
114
|
+
agent: string;
|
|
115
|
+
bin: string;
|
|
116
|
+
delegate: string;
|
|
117
|
+
agentFrontmatter: { tier_override?: string; default_tier?: string };
|
|
118
|
+
adaptiveMode?: 'static' | 'hedge' | 'full';
|
|
119
|
+
baseDir?: string;
|
|
120
|
+
posteriorPath?: string;
|
|
121
|
+
}): {
|
|
122
|
+
tier: 'haiku' | 'sonnet' | 'opus';
|
|
123
|
+
decision_log: {
|
|
124
|
+
source:
|
|
125
|
+
| 'frontmatter'
|
|
126
|
+
| 'tier_override_bypass'
|
|
127
|
+
| 'bandit_pull'
|
|
128
|
+
| 'bandit_pull_with_delegate';
|
|
129
|
+
samples?: Record<string, number> | Record<string, Record<string, number>>;
|
|
130
|
+
delegate?: string;
|
|
131
|
+
adaptive_mode: 'static' | 'hedge' | 'full';
|
|
132
|
+
reason?: string;
|
|
133
|
+
};
|
|
134
|
+
};
|
|
135
|
+
recordOutcome(args: unknown): void;
|
|
136
|
+
DELEGATE_NONE: 'none';
|
|
137
|
+
}
|
|
138
|
+
const banditIntegration = nodeRequire(
|
|
139
|
+
'../scripts/lib/bandit-router/integration.cjs',
|
|
140
|
+
) as BanditIntegrationModule;
|
|
141
|
+
|
|
142
|
+
// Plan 27.5-02: adaptive-mode module surfaces the single gating predicate.
|
|
143
|
+
interface AdaptiveModeModule {
|
|
144
|
+
getMode(opts?: {
|
|
145
|
+
baseDir?: string;
|
|
146
|
+
budgetPath?: string;
|
|
147
|
+
quiet?: boolean;
|
|
148
|
+
}): 'static' | 'hedge' | 'full';
|
|
149
|
+
isBanditEnabled(opts?: { baseDir?: string; budgetPath?: string }): boolean;
|
|
150
|
+
}
|
|
151
|
+
const adaptiveMode = nodeRequire(
|
|
152
|
+
'../scripts/lib/adaptive-mode.cjs',
|
|
153
|
+
) as AdaptiveModeModule;
|
|
154
|
+
|
|
155
|
+
// Plan 27.5-02: bin selection helper for bandit (agent, bin) addressing.
|
|
156
|
+
// budget-enforcer doesn't currently surface glob_count; default to 'medium'
|
|
157
|
+
// as a safe per-agent partition until a future plan wires the real count.
|
|
158
|
+
interface BanditRouterCoreModule {
|
|
159
|
+
binForGlobCount(n: number): 'tiny' | 'small' | 'medium' | 'large';
|
|
160
|
+
DEFAULT_DELEGATES: readonly string[];
|
|
161
|
+
}
|
|
162
|
+
const banditRouterCore = nodeRequire(
|
|
163
|
+
'../scripts/lib/bandit-router.cjs',
|
|
164
|
+
) as BanditRouterCoreModule;
|
|
165
|
+
|
|
166
|
+
// Plan 27.5-02: tier-resolver translates bandit tier → concrete model.
|
|
167
|
+
interface TierResolverModule {
|
|
168
|
+
resolve(
|
|
169
|
+
runtime: string,
|
|
170
|
+
tier: string,
|
|
171
|
+
opts?: { silent?: boolean },
|
|
172
|
+
): string | null;
|
|
173
|
+
}
|
|
174
|
+
const tierResolver = nodeRequire(
|
|
175
|
+
'../scripts/lib/tier-resolver.cjs',
|
|
176
|
+
) as TierResolverModule;
|
|
177
|
+
|
|
108
178
|
// ── Types ───────────────────────────────────────────────────────────────────
|
|
109
179
|
|
|
110
180
|
/**
|
|
@@ -618,6 +688,50 @@ function emitCostRecorded(
|
|
|
618
688
|
}
|
|
619
689
|
}
|
|
620
690
|
|
|
691
|
+
/**
|
|
692
|
+
* Plan 27.5-02 / D-03: emit `bandit.tier_selected` event when the bandit
|
|
693
|
+
* is consulted (regardless of whether it overrode the prior tier). The
|
|
694
|
+
* event captures the prior tier, the bandit's pick, the sampled posterior
|
|
695
|
+
* (when applicable), the delegate dimension, and the runtime tag so
|
|
696
|
+
* Phase 11 reflector (27.5-04) and `/gdd:bandit-status` (27.5-05) can
|
|
697
|
+
* reconstruct decision history without re-reading the posterior file.
|
|
698
|
+
*
|
|
699
|
+
* Fail-open like every other emit in this hook.
|
|
700
|
+
*/
|
|
701
|
+
function emitBanditTierSelected(
|
|
702
|
+
payload: {
|
|
703
|
+
agent: string;
|
|
704
|
+
bin: string;
|
|
705
|
+
prior_tier: string;
|
|
706
|
+
selected_tier: 'haiku' | 'sonnet' | 'opus';
|
|
707
|
+
source:
|
|
708
|
+
| 'frontmatter'
|
|
709
|
+
| 'tier_override_bypass'
|
|
710
|
+
| 'bandit_pull'
|
|
711
|
+
| 'bandit_pull_with_delegate';
|
|
712
|
+
delegate: string;
|
|
713
|
+
adaptive_mode: 'static' | 'hedge' | 'full';
|
|
714
|
+
samples?: unknown;
|
|
715
|
+
runtime: string;
|
|
716
|
+
model_id: string | null;
|
|
717
|
+
reason?: string;
|
|
718
|
+
},
|
|
719
|
+
cycle?: string,
|
|
720
|
+
): void {
|
|
721
|
+
const ev = {
|
|
722
|
+
type: 'bandit.tier_selected',
|
|
723
|
+
timestamp: new Date().toISOString(),
|
|
724
|
+
sessionId: getSessionId(),
|
|
725
|
+
...(cycle !== undefined && cycle !== 'unknown' ? { cycle } : {}),
|
|
726
|
+
payload,
|
|
727
|
+
};
|
|
728
|
+
try {
|
|
729
|
+
appendEvent(ev as unknown as HookFiredEvent);
|
|
730
|
+
} catch {
|
|
731
|
+
// Fail open.
|
|
732
|
+
}
|
|
733
|
+
}
|
|
734
|
+
|
|
621
735
|
// ── main ────────────────────────────────────────────────────────────────────
|
|
622
736
|
|
|
623
737
|
async function readStdin(): Promise<string> {
|
|
@@ -905,12 +1019,142 @@ export async function main(): Promise<void> {
|
|
|
905
1019
|
? routerDecision.runtime
|
|
906
1020
|
: runtimeDetect.detect()) ?? 'claude';
|
|
907
1021
|
|
|
1022
|
+
// ── Plan 27.5-02 — bandit consultation ────────────────────────────────────
|
|
1023
|
+
//
|
|
1024
|
+
// D-01 / D-02 / D-03 / D-07: per-spawn after `resolved_models` is computed,
|
|
1025
|
+
// before the SDK call. Skip conditions (all silent — no event, no override):
|
|
1026
|
+
// - adaptive_mode !== 'full' (D-07)
|
|
1027
|
+
// - toolInput._tier_downgraded === true (80% downgrade fired upstream —
|
|
1028
|
+
// bandit must not undo budget)
|
|
1029
|
+
//
|
|
1030
|
+
// When bandit fires, override resolved_models[agent] through tier-resolver
|
|
1031
|
+
// so downstream consumers see the bandit's pick as the actual model.
|
|
1032
|
+
// model_tier_overrides[agent] is preserved (D-03 back-compat).
|
|
1033
|
+
const currentMode = adaptiveMode.getMode({ quiet: true });
|
|
1034
|
+
const priorTier = resolvedTier; // captured before bandit override
|
|
1035
|
+
// Mutable references for the cost/telemetry path; bandit may rewrite.
|
|
1036
|
+
let effectiveTier: string = resolvedTier;
|
|
1037
|
+
let effectiveModelId: string | null = resolvedModelId;
|
|
1038
|
+
|
|
1039
|
+
if (currentMode === 'full' && toolInput._tier_downgraded !== true) {
|
|
1040
|
+
// Bin defaults to 'medium' — budget-enforcer doesn't currently surface
|
|
1041
|
+
// glob_count; future plan can wire it. Per-agent bandit arms still
|
|
1042
|
+
// converge correctly under a fixed bin (Phase 23.5 D-08). The function
|
|
1043
|
+
// call below makes the integration point explicit for future plans.
|
|
1044
|
+
void banditRouterCore.binForGlobCount(0);
|
|
1045
|
+
const bin = 'medium';
|
|
1046
|
+
|
|
1047
|
+
// Source the frontmatter view from the in-flight toolInput. The hook
|
|
1048
|
+
// reads frontmatter indirectly: _default_tier carries the agent's
|
|
1049
|
+
// declared default-tier, _tier_override (if any) carries an explicit
|
|
1050
|
+
// override the router emitted. For bandit purposes, _tier_override
|
|
1051
|
+
// means "operator has already taken control" — the shim returns
|
|
1052
|
+
// source='tier_override_bypass' (no posterior side effect).
|
|
1053
|
+
const agentFrontmatter: {
|
|
1054
|
+
tier_override?: string;
|
|
1055
|
+
default_tier?: string;
|
|
1056
|
+
} = {};
|
|
1057
|
+
if (
|
|
1058
|
+
typeof toolInput._tier_override === 'string' &&
|
|
1059
|
+
toolInput._tier_override.length > 0
|
|
1060
|
+
) {
|
|
1061
|
+
agentFrontmatter.tier_override = toolInput._tier_override;
|
|
1062
|
+
}
|
|
1063
|
+
if (
|
|
1064
|
+
typeof toolInput._default_tier === 'string' &&
|
|
1065
|
+
toolInput._default_tier.length > 0
|
|
1066
|
+
) {
|
|
1067
|
+
agentFrontmatter.default_tier = toolInput._default_tier;
|
|
1068
|
+
}
|
|
1069
|
+
|
|
1070
|
+
// Delegate dimension: budget-enforcer doesn't currently see the
|
|
1071
|
+
// agent's delegate_to: frontmatter (session-runner does). For 27.5-02
|
|
1072
|
+
// we always consult the local-call slice (delegate='none'); 27.5-03
|
|
1073
|
+
// wires delegate=<peer> for the recordOutcome side.
|
|
1074
|
+
const banditDelegate = banditIntegration.DELEGATE_NONE;
|
|
1075
|
+
|
|
1076
|
+
let banditResult: ReturnType<
|
|
1077
|
+
BanditIntegrationModule['consultBandit']
|
|
1078
|
+
> | null = null;
|
|
1079
|
+
try {
|
|
1080
|
+
banditResult = banditIntegration.consultBandit({
|
|
1081
|
+
agent,
|
|
1082
|
+
bin,
|
|
1083
|
+
delegate: banditDelegate,
|
|
1084
|
+
agentFrontmatter,
|
|
1085
|
+
adaptiveMode: currentMode,
|
|
1086
|
+
});
|
|
1087
|
+
} catch {
|
|
1088
|
+
// Fail open — never let a bandit error block a spawn.
|
|
1089
|
+
}
|
|
1090
|
+
|
|
1091
|
+
if (banditResult !== null) {
|
|
1092
|
+
// Translate the bandit tier into a concrete model. The tier-resolver
|
|
1093
|
+
// emits its own fallback events (tier_resolution_fallback /
|
|
1094
|
+
// tier_resolution_failed) when the runtime row is incomplete, so we
|
|
1095
|
+
// don't need to re-emit those here.
|
|
1096
|
+
const banditModel = tierResolver.resolve(
|
|
1097
|
+
runtimeId,
|
|
1098
|
+
banditResult.tier,
|
|
1099
|
+
{ silent: true },
|
|
1100
|
+
);
|
|
1101
|
+
|
|
1102
|
+
// Apply override only when:
|
|
1103
|
+
// 1. bandit actually picked a different tier than priorTier
|
|
1104
|
+
// (no-op write avoided)
|
|
1105
|
+
// 2. tier-resolver returned a non-null model (fall back to
|
|
1106
|
+
// existing resolvedModelId on null)
|
|
1107
|
+
// 3. source is 'bandit_pull' or 'bandit_pull_with_delegate'
|
|
1108
|
+
// (frontmatter/bypass paths don't override resolved_models)
|
|
1109
|
+
if (
|
|
1110
|
+
banditResult.tier !== priorTier &&
|
|
1111
|
+
banditModel !== null &&
|
|
1112
|
+
(banditResult.decision_log.source === 'bandit_pull' ||
|
|
1113
|
+
banditResult.decision_log.source === 'bandit_pull_with_delegate')
|
|
1114
|
+
) {
|
|
1115
|
+
// Override resolved_models[agent] without touching
|
|
1116
|
+
// model_tier_overrides[agent] (D-03 back-compat).
|
|
1117
|
+
if (routerDecision !== undefined) {
|
|
1118
|
+
const rm = routerDecision.resolved_models ?? {};
|
|
1119
|
+
rm[agent] = banditModel;
|
|
1120
|
+
routerDecision.resolved_models = rm;
|
|
1121
|
+
}
|
|
1122
|
+
// Also stamp _tier_override on toolInput so downstream readers
|
|
1123
|
+
// see the bandit's pick.
|
|
1124
|
+
toolInput._tier_override = banditResult.tier;
|
|
1125
|
+
effectiveTier = banditResult.tier;
|
|
1126
|
+
effectiveModelId = banditModel;
|
|
1127
|
+
}
|
|
1128
|
+
|
|
1129
|
+
// Emit one bandit.tier_selected event regardless of override outcome
|
|
1130
|
+
// (the event captures the decision, not the override side effect).
|
|
1131
|
+
emitBanditTierSelected(
|
|
1132
|
+
{
|
|
1133
|
+
agent,
|
|
1134
|
+
bin,
|
|
1135
|
+
prior_tier: priorTier,
|
|
1136
|
+
selected_tier: banditResult.tier,
|
|
1137
|
+
source: banditResult.decision_log.source,
|
|
1138
|
+
delegate: banditResult.decision_log.delegate ?? banditDelegate,
|
|
1139
|
+
adaptive_mode: banditResult.decision_log.adaptive_mode,
|
|
1140
|
+
samples: banditResult.decision_log.samples,
|
|
1141
|
+
runtime: runtimeId,
|
|
1142
|
+
model_id: effectiveModelId ?? resolvedModelId,
|
|
1143
|
+
...(banditResult.decision_log.reason !== undefined
|
|
1144
|
+
? { reason: banditResult.decision_log.reason }
|
|
1145
|
+
: {}),
|
|
1146
|
+
},
|
|
1147
|
+
cycle,
|
|
1148
|
+
);
|
|
1149
|
+
}
|
|
1150
|
+
}
|
|
1151
|
+
|
|
908
1152
|
// Compute runtime-aware cost via the shared backend. Failures return
|
|
909
1153
|
// null cost; we emit the event regardless so the cost-aggregator sees
|
|
910
1154
|
// the lookup attempt (Phase 22 events.jsonl tagging).
|
|
911
1155
|
const costLookup = budgetBackend.computeCost({
|
|
912
|
-
model_id:
|
|
913
|
-
tier:
|
|
1156
|
+
model_id: effectiveModelId,
|
|
1157
|
+
tier: effectiveTier,
|
|
914
1158
|
runtime: runtimeId,
|
|
915
1159
|
tokens_in: Number(toolInput._tokens_in_est ?? 0),
|
|
916
1160
|
tokens_out: Number(toolInput._tokens_out_est ?? 0),
|
|
@@ -920,8 +1164,8 @@ export async function main(): Promise<void> {
|
|
|
920
1164
|
{
|
|
921
1165
|
runtime: runtimeId,
|
|
922
1166
|
agent,
|
|
923
|
-
model_id:
|
|
924
|
-
tier: costLookup.tier ??
|
|
1167
|
+
model_id: effectiveModelId ?? costLookup.model,
|
|
1168
|
+
tier: costLookup.tier ?? effectiveTier,
|
|
925
1169
|
tokens_in: Number(toolInput._tokens_in_est ?? 0),
|
|
926
1170
|
tokens_out: Number(toolInput._tokens_out_est ?? 0),
|
|
927
1171
|
cost_usd: costLookup.cost_usd,
|
|
@@ -932,7 +1176,7 @@ export async function main(): Promise<void> {
|
|
|
932
1176
|
// Branch E: standard spawn-allowed (includes tier-downgraded path).
|
|
933
1177
|
writeTelemetry({
|
|
934
1178
|
agent,
|
|
935
|
-
tier:
|
|
1179
|
+
tier: effectiveTier,
|
|
936
1180
|
tokens_in: Number(toolInput._tokens_in_est ?? 0),
|
|
937
1181
|
tokens_out: Number(toolInput._tokens_out_est ?? 0),
|
|
938
1182
|
cache_hit: false,
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@hegemonart/get-design-done",
|
|
3
|
-
"version": "1.27.
|
|
3
|
+
"version": "1.27.5",
|
|
4
4
|
"description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
|
|
5
5
|
"author": "Hegemon",
|
|
6
6
|
"homepage": "https://github.com/hegemonart/get-design-done",
|
|
@@ -37,7 +37,7 @@
|
|
|
37
37
|
"provenance": true
|
|
38
38
|
},
|
|
39
39
|
"scripts": {
|
|
40
|
-
"test": "node --test --experimental-strip-types \"tests/**/*.cjs\" \"tests/**/*.ts\"",
|
|
40
|
+
"test": "node --test --experimental-strip-types \"tests/**/*.test.cjs\" \"tests/**/*.test.ts\"",
|
|
41
41
|
"typecheck": "tsc --noEmit",
|
|
42
42
|
"codegen:schemas": "node --experimental-strip-types scripts/codegen-schema-types.ts",
|
|
43
43
|
"lint:md": "npx --yes markdownlint-cli2 \"**/*.md\" \"#node_modules\" \"#.planning\" \"#.claude\" \"#test-fixture/baselines\"",
|
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: bandit-integration
|
|
3
|
+
phase: 27.5
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
type: meta-rules
|
|
6
|
+
description: Bandit posterior + production-integration shim cheat sheet — signatures, reward function semantics, adaptive_mode gate, posterior path conventions.
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Bandit Integration — Developer Cheat Sheet
|
|
10
|
+
|
|
11
|
+
**Phase 27.5 (v1.27.5).** Reference for the bandit production-integration surface. Authoring or modifying a caller of the bandit posterior? Debugging a routing decision at the code level? Start here.
|
|
12
|
+
|
|
13
|
+
For ops-level guidance (when bandit fires, how to disable, posterior inspection), see `docs/BANDIT-INTEGRATION.md`.
|
|
14
|
+
|
|
15
|
+
In-scope modules:
|
|
16
|
+
|
|
17
|
+
- `scripts/lib/bandit-router.cjs` (Phase 23.5 primitives).
|
|
18
|
+
- `scripts/lib/bandit-router/integration.cjs` (Phase 27.5 shim).
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## The two-stage architecture
|
|
23
|
+
|
|
24
|
+
Phase 23.5 ships the bandit primitives — Thompson-sampling pull, posterior update, computeReward, atomic persistence. Phase 27-07 added the `delegate?` arm dimension (5 peer-CLI arms + the local `none` arm). Both phases shipped library-only with no production callers.
|
|
25
|
+
|
|
26
|
+
Phase 27.5 ships the production-integration shim that wraps the primitives behind two purpose-built entry points and hides the `pull` vs `pullWithDelegate` choice. Callers pass a `delegate` argument and the shim routes internally.
|
|
27
|
+
|
|
28
|
+
### Phase 23.5 + 27-07 surface — `scripts/lib/bandit-router.cjs`
|
|
29
|
+
|
|
30
|
+
Exports: `pull`, `update`, `pullWithDelegate`, `updateWithDelegate`, `computeReward`, `loadPosterior`, `savePosterior`, `reset`, `decayArm`, `sampleBeta`, `priorFor`, `binForGlobCount`, `DEFAULT_DELEGATES`, `DELEGATE_NONE`, `TIER_PRIOR`, `PRIOR_STRENGTH`, `TOUCHES_BINS`, `DEFAULT_POSTERIOR_PATH`, `SCHEMA_VERSION`.
|
|
31
|
+
|
|
32
|
+
The two-pair primitive split:
|
|
33
|
+
|
|
34
|
+
- `pull({agent, bin, ...})` / `update({agent, bin, tier, reward, ...})` — operate on the `(agent, bin, tier)` arm slice. Equivalent to `delegate='none'`.
|
|
35
|
+
- `pullWithDelegate({agent, bin, delegates, ...})` / `updateWithDelegate({agent, bin, tier, delegate, reward, ...})` — operate on the `(agent, bin, tier, delegate)` arm slice for any `delegate ∈ DEFAULT_DELEGATES`.
|
|
36
|
+
|
|
37
|
+
### Phase 27.5 surface — `scripts/lib/bandit-router/integration.cjs`
|
|
38
|
+
|
|
39
|
+
Exports: `consultBandit`, `recordOutcome`, `DELEGATE_NONE`.
|
|
40
|
+
|
|
41
|
+
Routing rules (D-05, D-07):
|
|
42
|
+
|
|
43
|
+
1. `agentFrontmatter.tier_override` set → bypass bandit, return `tier_override`.
|
|
44
|
+
2. `adaptiveMode !== 'full'` → bandit silent, return `frontmatter.default_tier`.
|
|
45
|
+
3. `adaptiveMode === 'full'` + delegate `'none'` or undefined → call `pull()`.
|
|
46
|
+
4. `adaptiveMode === 'full'` + delegate is a peer name → call `pullWithDelegate({delegates: [delegate]})`.
|
|
47
|
+
|
|
48
|
+
`recordOutcome` is symmetric on the adaptive-mode gate.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## `consultBandit` signature
|
|
53
|
+
|
|
54
|
+
```javascript
|
|
55
|
+
consultBandit({
|
|
56
|
+
agent: string, // required
|
|
57
|
+
bin: string, // required: 'tiny' | 'small' | 'medium' | 'large'
|
|
58
|
+
delegate: string, // 'none' or one of DEFAULT_DELEGATES
|
|
59
|
+
agentFrontmatter: {
|
|
60
|
+
tier_override?: string,
|
|
61
|
+
default_tier?: string,
|
|
62
|
+
},
|
|
63
|
+
adaptiveMode?: 'static' | 'hedge' | 'full', // omit to read on-disk
|
|
64
|
+
baseDir?: string, // override workspace root (test-injection)
|
|
65
|
+
posteriorPath?: string, // override posterior file path (test-injection)
|
|
66
|
+
}) → {
|
|
67
|
+
tier: 'haiku' | 'sonnet' | 'opus',
|
|
68
|
+
decision_log: {
|
|
69
|
+
source: 'frontmatter' | 'tier_override_bypass' | 'bandit_pull' | 'bandit_pull_with_delegate',
|
|
70
|
+
samples?: { haiku?: number, sonnet?: number, opus?: number },
|
|
71
|
+
delegate?: string,
|
|
72
|
+
adaptive_mode: string,
|
|
73
|
+
reason?: string,
|
|
74
|
+
},
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
`decision_log.source` is the audit trail — it tells observability tools which routing branch ran. Tests use it to assert the correct path was taken.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## `recordOutcome` signature
|
|
83
|
+
|
|
84
|
+
```javascript
|
|
85
|
+
recordOutcome({
|
|
86
|
+
agent: string,
|
|
87
|
+
bin: string,
|
|
88
|
+
delegate: string,
|
|
89
|
+
tier: string,
|
|
90
|
+
status: string, // SessionResult.status — only 'completed' triggers reward.solidify_pass
|
|
91
|
+
costUsd?: number,
|
|
92
|
+
adaptiveMode?: 'static' | 'hedge' | 'full',
|
|
93
|
+
baseDir?: string,
|
|
94
|
+
posteriorPath?: string,
|
|
95
|
+
}) → void // best-effort per D-04 — write errors are swallowed
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Reward semantics:
|
|
99
|
+
|
|
100
|
+
- `solidify_pass = (status === 'completed')`.
|
|
101
|
+
- If `!solidify_pass`, reward is `0`. If true, reward is `1 - lambda * normalize(costUsd + epsilon * wallTimeMs)`.
|
|
102
|
+
|
|
103
|
+
Phase 27.5 passes `wallTimeMs: 0` always (D-08 unchanged from Phase 23.5).
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## `adaptive_mode` gate semantics
|
|
108
|
+
|
|
109
|
+
Phase 23.5 ladder (D-07):
|
|
110
|
+
|
|
111
|
+
- `static` — default. Bandit silent. `default-tier:` is authoritative. No reads, no writes.
|
|
112
|
+
- `hedge` — measurement-only. Bandit silent on reads, but `recordOutcome` may still write to seed the posterior. Currently identical to `static` in Phase 27.5; reserved for Phase 28+ explicit "hedge mode".
|
|
113
|
+
- `full` — bandit active. Reads pick via Thompson sampling; writes update posterior.
|
|
114
|
+
|
|
115
|
+
The shim respects the gate transparently. Operators flip via `.design/budget.json#adaptive_mode`.
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Reward function
|
|
120
|
+
|
|
121
|
+
`computeReward({solidify_pass, cost_usd, wall_time_ms, lambda?, epsilon?, costNormalizer?}) → number`
|
|
122
|
+
|
|
123
|
+
Two-stage lexicographic (D-08, unchanged from Phase 23.5):
|
|
124
|
+
|
|
125
|
+
- Stage 1 — correctness: if `solidify_pass !== true`, return `0`.
|
|
126
|
+
- Stage 2 — cost: return `1 - lambda * normalize(cost_usd + epsilon * wall_time_ms)`.
|
|
127
|
+
|
|
128
|
+
Defaults: `lambda = 0.3`, `epsilon = 0.05`. `normalize` maps `[0, $5]` linearly to `[0, 1]`, clamped.
|
|
129
|
+
|
|
130
|
+
Cheaper successful spawns get higher reward. Failed spawns are flat zero. Tune `lambda` to weight cost less.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Posterior path
|
|
135
|
+
|
|
136
|
+
Canonical path: `.design/telemetry/posterior.json` (Phase 23.5 D-08, Phase 27.5 D-06 unchanged). Path is owned by `DEFAULT_POSTERIOR_PATH` constant in `scripts/lib/bandit-router.cjs`.
|
|
137
|
+
|
|
138
|
+
Test injection: pass `baseDir` (anchors path under a different workspace root) or `posteriorPath` (overrides the file path directly). Both `consultBandit` and `recordOutcome` accept these options.
|
|
139
|
+
|
|
140
|
+
Write discipline: atomic via `.tmp` + rename. Read failures yield an empty posterior; subsequent writes overwrite. Concurrent writers within the same process are not synchronized — gdd's session-runner is single-threaded.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Call sites
|
|
145
|
+
|
|
146
|
+
Phase 27.5 wires these consumers:
|
|
147
|
+
|
|
148
|
+
- **`hooks/budget-enforcer.ts`** (Plan 27.5-02) — per Agent spawn, after `resolved_models` is computed, before SDK call. Calls `consultBandit({agent, bin, delegate, agentFrontmatter, adaptiveMode})`. Overrides `resolved_models[agent]` with the bandit tier via `tier-resolver.cjs`. Emits `bandit.tier_selected` event for observability.
|
|
149
|
+
- **`scripts/lib/session-runner/index.ts`** (Plan 27.5-03) — terminal-emit path. Calls `recordOutcome({agent, bin, delegate, tier, status, costUsd})` after every `emit('session.completed', ...)` site (4 sites: rate-limited, peer-success, turn-cap-zero, terminal retry-exit). Posterior write is best-effort; missing optional fields silent.
|
|
150
|
+
- **`agents/design-reflector.md` Section 8** (Plan 27.5-04) — bandit-arbitrage analysis. `scripts/lib/bandit-arbitrage.cjs` reads `.design/telemetry/posterior.json` and surfaces stale-frontmatter proposals. Mirrors Phase 26-06's `cost-arbitrage.cjs` shape.
|
|
151
|
+
- **`skills/peers/SKILL.md` Step 5 + `skills/bandit-status/SKILL.md`** (Plan 27.5-05) — read-only diagnostic surfaces. `/gdd:peers` posterior delta column populated; `/gdd:bandit-status` renders per-`(agent, bin, delegate, tier)` snapshots.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Cross-references
|
|
156
|
+
|
|
157
|
+
- `docs/BANDIT-INTEGRATION.md` — operator guide (when bandit fires, how to disable, troubleshooting).
|
|
158
|
+
- `reference/peer-protocols.md` — Phase 27 ACP/ASP cheat sheet (peer-CLI delegation transport).
|
|
159
|
+
- `scripts/lib/bandit-router.cjs` — Phase 23.5 primitives surface.
|
|
160
|
+
- `scripts/lib/bandit-router/integration.cjs` — Phase 27.5 production shim.
|
|
161
|
+
- `scripts/lib/bandit-arbitrage.cjs` — Phase 27.5 reflector analyzer (Section 8 of `design-reflector.md`).
|
|
162
|
+
- `hooks/budget-enforcer.ts` — bandit consultation site.
|
|
163
|
+
- `scripts/lib/session-runner/index.ts` — `recordOutcome` site.
|