opengstack 0.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/AGENTS.md +47 -0
  2. package/CLAUDE.md +370 -0
  3. package/LICENSE +21 -0
  4. package/README.md +80 -0
  5. package/SKILL.md +226 -0
  6. package/autoplan/SKILL.md +96 -0
  7. package/autoplan/SKILL.md.tmpl +694 -0
  8. package/benchmark/SKILL.md +358 -0
  9. package/benchmark/SKILL.md.tmpl +222 -0
  10. package/browse/SKILL.md +396 -0
  11. package/browse/SKILL.md.tmpl +131 -0
  12. package/canary/SKILL.md +89 -0
  13. package/canary/SKILL.md.tmpl +212 -0
  14. package/careful/SKILL.md +58 -0
  15. package/careful/SKILL.md.tmpl +56 -0
  16. package/codex/SKILL.md +90 -0
  17. package/codex/SKILL.md.tmpl +417 -0
  18. package/connect-chrome/SKILL.md +87 -0
  19. package/connect-chrome/SKILL.md.tmpl +195 -0
  20. package/cso/SKILL.md +93 -0
  21. package/cso/SKILL.md.tmpl +606 -0
  22. package/design-consultation/SKILL.md +94 -0
  23. package/design-consultation/SKILL.md.tmpl +415 -0
  24. package/design-review/SKILL.md +94 -0
  25. package/design-review/SKILL.md.tmpl +290 -0
  26. package/design-shotgun/SKILL.md +91 -0
  27. package/design-shotgun/SKILL.md.tmpl +285 -0
  28. package/docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md +84 -0
  29. package/docs/designs/CONDUCTOR_CHROME_SIDEBAR_INTEGRATION.md +57 -0
  30. package/docs/designs/CONDUCTOR_SESSION_API.md +108 -0
  31. package/docs/designs/DESIGN_SHOTGUN.md +451 -0
  32. package/docs/designs/DESIGN_TOOLS_V1.md +622 -0
  33. package/docs/skills.md +880 -0
  34. package/document-release/SKILL.md +91 -0
  35. package/document-release/SKILL.md.tmpl +359 -0
  36. package/freeze/SKILL.md +78 -0
  37. package/freeze/SKILL.md.tmpl +77 -0
  38. package/gstack-upgrade/SKILL.md +224 -0
  39. package/gstack-upgrade/SKILL.md.tmpl +222 -0
  40. package/guard/SKILL.md +78 -0
  41. package/guard/SKILL.md.tmpl +77 -0
  42. package/investigate/SKILL.md +105 -0
  43. package/investigate/SKILL.md.tmpl +194 -0
  44. package/land-and-deploy/SKILL.md +88 -0
  45. package/land-and-deploy/SKILL.md.tmpl +881 -0
  46. package/office-hours/SKILL.md +96 -0
  47. package/office-hours/SKILL.md.tmpl +645 -0
  48. package/package.json +43 -0
  49. package/plan-ceo-review/SKILL.md +94 -0
  50. package/plan-ceo-review/SKILL.md.tmpl +811 -0
  51. package/plan-design-review/SKILL.md +92 -0
  52. package/plan-design-review/SKILL.md.tmpl +446 -0
  53. package/plan-eng-review/SKILL.md +93 -0
  54. package/plan-eng-review/SKILL.md.tmpl +303 -0
  55. package/qa/SKILL.md +95 -0
  56. package/qa/SKILL.md.tmpl +316 -0
  57. package/qa-only/SKILL.md +89 -0
  58. package/qa-only/SKILL.md.tmpl +101 -0
  59. package/retro/SKILL.md +89 -0
  60. package/retro/SKILL.md.tmpl +820 -0
  61. package/review/SKILL.md +92 -0
  62. package/review/SKILL.md.tmpl +281 -0
  63. package/scripts/cleanup.py +100 -0
  64. package/scripts/filter-skills.sh +114 -0
  65. package/scripts/filter_skills.py +140 -0
  66. package/setup-browser-cookies/SKILL.md +216 -0
  67. package/setup-browser-cookies/SKILL.md.tmpl +81 -0
  68. package/setup-deploy/SKILL.md +92 -0
  69. package/setup-deploy/SKILL.md.tmpl +215 -0
  70. package/ship/SKILL.md +90 -0
  71. package/ship/SKILL.md.tmpl +636 -0
  72. package/unfreeze/SKILL.md +37 -0
  73. package/unfreeze/SKILL.md.tmpl +36 -0
@@ -0,0 +1,622 @@
1
+ # Design: gstack Visual Design Generation (`design` binary)
2
+
3
+ Generated by /office-hours on 2026-03-26
4
+ Branch: garrytan/agent-design-tools
5
+ Repo: gstack
6
+ Status: DRAFT
7
+ Mode: Intrapreneurship
8
+
9
+ ## Context
10
+
11
+ gstack's design skills (/office-hours, /design-consultation, /plan-design-review, /design-review) all produce **text descriptions** of design — DESIGN.md files with hex codes, plan docs with pixel specs in prose, ASCII art wireframes. The creator is a designer who hand-designed HelloSign in OmniGraffle and finds this embarrassing.
12
+
13
+ The unit of value is wrong. Users don't need richer design language — they need an executable visual artifact that changes the conversation from "do you like this spec?" to "is this the screen?"
14
+
15
+ ## Problem Statement
16
+
17
+ Design skills describe design in text instead of showing it. The Argus UX overhaul plan is the example: 487 lines of detailed emotional arc specs, typography choices, animation timing — zero visual artifacts. An AI coding agent that "designs" should produce something you can look at and react to viscerally.
18
+
19
+ ## Demand Evidence
20
+
21
+ The creator/primary user finds the current output embarrassing. Every design skill session ends with prose where a mockup should be. GPT Image API now generates pixel-perfect UI mockups with accurate text rendering — the capability gap that justified text-only output no longer exists.
22
+
23
+ ## Narrowest Wedge
24
+
25
+ A compiled TypeScript binary (`design/dist/design`) that wraps the OpenAI Images/Responses API, callable from skill templates via `$D` (mirroring the existing `$B` browse binary pattern). Priority integration order: /office-hours → /plan-design-review → /design-consultation → /design-review.
26
+
27
+ ## Agreed Premises
28
+
29
+ 1. GPT Image API (via OpenAI Responses API) is the right engine. Google Stitch SDK is backup.
30
+ 2. **Visual mockups are default-on for design skills** with an easy skip path — not opt-in. (Revised per Codex challenge.)
31
+ 3. The integration is a shared utility (not per-skill reimplementation) — a `design` binary that any skill can call.
32
+ 4. Priority: /office-hours first, then /plan-design-review, /design-consultation, /design-review.
33
+
34
+ ## Cross-Model Perspective (Codex)
35
+
36
+ Codex independently validated the core thesis: "The failure is not output quality within markdown; it is that the current unit of value is wrong." Key contributions:
37
+ - Challenged premise #2 (opt-in → default-on) — accepted
38
+ - Proposed vision-based quality gate: use GPT-4o vision to verify generated mockups for unreadable text, missing sections, broken layout, auto-retry once
39
+ - Scoped 48-hour prototype: shared `visual_mockup.ts` utility, /office-hours + /plan-design-review only, hero mockup + 2 variants
40
+
41
+ ## Recommended Approach: `design` Binary (Approach B)
42
+
43
+ ### Architecture
44
+
45
+ **Shares the browse binary's compilation and distribution pattern** (bun build --compile, setup script, $VARIABLE resolution in skill templates) but is architecturally simpler — no persistent daemon server, no Chromium, no health checks, no token auth. The design binary is a stateless CLI that makes OpenAI API calls and writes PNGs to disk. Session state (for multi-turn iteration) is a JSON file.
46
+
47
+ **New dependency:** `openai` npm package (add to `devDependencies`, NOT runtime deps). Design binary compiled separately from browse so openai doesn't bloat the browse binary.
48
+
49
+ ```
50
+ design/
51
+ ├── src/
52
+ │ ├── cli.ts # Entry point, command dispatch
53
+ │ ├── commands.ts # Command registry (source of truth for docs + validation)
54
+ │ ├── generate.ts # Generate mockups from structured brief
55
+ │ ├── iterate.ts # Multi-turn iteration on existing mockups
56
+ │ ├── variants.ts # Generate N design variants from brief
57
+ │ ├── check.ts # Vision-based quality gate (GPT-4o)
58
+ │ ├── brief.ts # Structured brief type + assembly helpers
59
+ │ └── session.ts # Session state (response IDs for multi-turn)
60
+ ├── dist/
61
+ │ ├── design # Compiled binary
62
+ │ └── .version # Git hash
63
+ └── test/
64
+ └── design.test.ts # Integration tests
65
+ ```
66
+
67
+ ### Commands
68
+
69
+ ```bash
70
+ # Generate a hero mockup from a structured brief
71
+ $D generate --brief "Dashboard for a coding assessment tool. Dark theme, cream accents. Shows: builder name, score badge, narrative letter, score cards. Target: technical users." --output /tmp/mockup-hero.png
72
+
73
+ # Generate 3 design variants
74
+ $D variants --brief "..." --count 3 --output-dir /tmp/mockups/
75
+
76
+ # Iterate on an existing mockup with feedback
77
+ $D iterate --session /tmp/design-session.json --feedback "Make the score cards larger, move the narrative above the scores" --output /tmp/mockup-v2.png
78
+
79
+ # Vision-based quality check (returns PASS/FAIL + issues)
80
+ $D check --image /tmp/mockup-hero.png --brief "Dashboard with builder name, score badge, narrative"
81
+
82
+ # One-shot with quality gate + auto-retry
83
+ $D generate --brief "..." --output /tmp/mockup.png --check --retry 1
84
+
85
+ # Pass a structured brief via JSON file
86
+ $D generate --brief-file /tmp/brief.json --output /tmp/mockup.png
87
+
88
+ # Generate comparison board HTML for user review
89
+ $D compare --images /tmp/mockups/variant-*.png --output /tmp/design-board.html
90
+
91
+ # Guided API key setup + smoke test
92
+ $D setup
93
+ ```
94
+
95
+ **Brief input modes:**
96
+ - `--brief "plain text"` — free-form text prompt (simple mode)
97
+ - `--brief-file path.json` — structured JSON matching the `DesignBrief` interface (rich mode)
98
+ - Skills construct a JSON brief file, write it to /tmp, and pass `--brief-file`
99
+
100
+ **All commands are registered in `commands.ts`** including `--check` and `--retry` as flags on `generate`.
101
+
102
+ ### Design Exploration Workflow (from eng review)
103
+
104
+ The workflow is sequential, not parallel. PNGs are for visual exploration (human-facing), HTML wireframes are for implementation (agent-facing):
105
+
106
+ ```
107
+ 1. $D variants --brief "..." --count 3 --output-dir /tmp/mockups/
108
+ → Generates 2-5 PNG mockup variations
109
+
110
+ 2. $D compare --images /tmp/mockups/*.png --output /tmp/design-board.html
111
+ → Generates HTML comparison board (spec below)
112
+
113
+ 3. $B goto file:///tmp/design-board.html
114
+ → User reviews all variants in headed Chrome
115
+
116
+ 4. User picks favorite, rates, comments, clicks [Submit]
117
+ Agent polls: $B eval document.getElementById('status').textContent
118
+ Agent reads: $B eval document.getElementById('feedback-result').textContent
119
+ → No clipboard, no pasting. Agent reads feedback directly from the page.
120
+
121
+ 5. Claude generates HTML wireframe via DESIGN_SKETCH matching approved direction
122
+ → Agent implements from the inspectable HTML, not the opaque PNG
123
+ ```
124
+
125
+ ### Comparison Board Design Spec (from /plan-design-review)
126
+
127
+ **Classifier: APP UI** (task-focused, utility page). No product branding.
128
+
129
+ **Layout: Single column, full-width mockups.** Each variant gets the full viewport
130
+ width for maximum image fidelity. Users scroll vertically through variants.
131
+
132
+ ```
133
+ ┌─────────────────────────────────────────────────────────────┐
134
+ │ HEADER BAR │
135
+ │ "Design Exploration" . project name . "3 variants" │
136
+ │ Mode indicator: [Wide exploration] | [Matching DESIGN.md] │
137
+ ├─────────────────────────────────────────────────────────────┤
138
+ │ │
139
+ │ ┌───────────────────────────────────────────────────────┐ │
140
+ │ │ VARIANT A (full width) │ │
141
+ │ │ [ mockup PNG, max-width: 1200px ] │ │
142
+ │ ├───────────────────────────────────────────────────────┤ │
143
+ │ │ (●) Pick ★★★★☆ [What do you like/dislike?____] │ │
144
+ │ │ [More like this] │ │
145
+ │ └───────────────────────────────────────────────────────┘ │
146
+ │ │
147
+ │ ┌───────────────────────────────────────────────────────┐ │
148
+ │ │ VARIANT B (full width) │ │
149
+ │ │ [ mockup PNG, max-width: 1200px ] │ │
150
+ │ ├───────────────────────────────────────────────────────┤ │
151
+ │ │ ( ) Pick ★★★☆☆ [What do you like/dislike?____] │ │
152
+ │ │ [More like this] │ │
153
+ │ └───────────────────────────────────────────────────────┘ │
154
+ │ │
155
+ │ ... (scroll for more variants) │
156
+ │ │
157
+ │ ─── separator ───────────────────────────────────────── │
158
+ │ Overall direction (optional, collapsed by default) │
159
+ │ [textarea, 3 lines, expand on focus] │
160
+ │ │
161
+ │ ─── REGENERATE BAR (#f7f7f7 bg) ─────────────────────── │
162
+ │ "Want to explore more?" │
163
+ │ [Totally different] [Match my design] [Custom: ______] │
164
+ │ [Regenerate ->] │
165
+ │ ───────────────────────────────────────────────────────── │
166
+ │ [ ✓ Submit ] │
167
+ └─────────────────────────────────────────────────────────────┘
168
+ ```
169
+
170
+ **Visual spec:**
171
+ - Background: #fff. No shadows, no card borders. Variant separation: 1px #e5e5e5 line.
172
+ - Typography: system font stack. Header: 16px semibold. Labels: 14px semibold. Feedback placeholder: 13px regular #999.
173
+ - Star rating: 5 clickable stars, filled=#000, unfilled=#ddd. Not colored, not animated.
174
+ - Radio button "Pick": explicit favorite selection. One per variant, mutually exclusive.
175
+ - "More like this" button: per-variant, triggers regeneration with that variant's style as seed.
176
+ - Submit button: #000 background, white text, right-aligned. Single CTA.
177
+ - Regenerate bar: #f7f7f7 background, visually distinct from feedback area.
178
+ - Max-width: 1200px centered for mockup images. Margins: 24px sides.
179
+
180
+ **Interaction states:**
181
+ - Loading (page opens before images ready): skeleton pulse with "Generating variant A..." per card. Stars/textarea/pick disabled.
182
+ - Partial failure (2 of 3 succeed): show good ones, error card for failed with per-variant [Retry].
183
+ - Post-submit: "Feedback submitted! Return to your coding agent." Page stays open.
184
+ - Regeneration: smooth transition, fade out old variants, skeleton pulses, fade in new. Scroll resets to top. Previous feedback cleared.
185
+
186
+ **Feedback JSON structure** (written to hidden #feedback-result element):
187
+ ```json
188
+ {
189
+ "preferred": "A",
190
+ "ratings": { "A": 4, "B": 3, "C": 2 },
191
+ "comments": {
192
+ "A": "Love the spacing, header feels right",
193
+ "B": "Too busy, but good color palette",
194
+ "C": "Wrong mood entirely"
195
+ },
196
+ "overall": "Go with A, make the CTA bigger",
197
+ "regenerated": false
198
+ }
199
+ ```
200
+
201
+ **Accessibility:** Star ratings keyboard navigable (arrow keys). Textareas labeled ("Feedback for Variant A"). Submit/Regenerate keyboard accessible with visible focus ring. All text #333+ on white.
202
+
203
+ **Responsive:** >1200px: comfortable margins. 768-1200px: tighter margins. <768px: full-width, no horizontal scroll.
204
+
205
+ **Screenshot consent (first-time only for $D evolve):** "This will send a screenshot of your live site to OpenAI for design evolution. [Proceed] [Don't ask again]" Stored in ~/.gstack/config.yaml as design_screenshot_consent.
206
+
207
+ Why sequential: Codex adversarial review identified that raster PNGs are opaque to agents (no DOM, no states, no diffable structure). HTML wireframes preserve a bridge back to code. The PNG is for the human to say "yes, that's right." The HTML is for the agent to say "I know how to build this."
208
+
209
+ ### Key Design Decisions
210
+
211
+ **1. Stateless CLI, not daemon**
212
+ Browse needs a persistent Chromium instance. Design is just API calls — no reason for a server. Session state for multi-turn iteration is a JSON file written to `/tmp/design-session-{id}.json` containing `previous_response_id`.
213
+ - **Session ID:** generated from `${PID}-${timestamp}`, passed via `--session` flag
214
+ - **Discovery:** the `generate` command creates the session file and prints its path; `iterate` reads it via `--session`
215
+ - **Cleanup:** session files in /tmp are ephemeral (OS cleans up); no explicit cleanup needed
216
+
217
+ **2. Structured brief input**
218
+ The brief is the interface between skill prose and image generation. Skills construct it from design context:
219
+ ```typescript
220
+ interface DesignBrief {
221
+ goal: string; // "Dashboard for coding assessment tool"
222
+ audience: string; // "Technical users, YC partners"
223
+ style: string; // "Dark theme, cream accents, minimal"
224
+ elements: string[]; // ["builder name", "score badge", "narrative letter"]
225
+ constraints?: string; // "Max width 1024px, mobile-first"
226
+ reference?: string; // Path to existing screenshot or DESIGN.md excerpt
227
+ screenType: string; // "desktop-dashboard" | "mobile-app" | "landing-page" | etc.
228
+ }
229
+ ```
230
+
231
+ **3. Default-on in design skills**
232
+ Skills generate mockups by default. The template includes skip language:
233
+ ```
234
+ Generating visual mockup of the proposed design... (say "skip" if you don't need visuals)
235
+ ```
236
+
237
+ **4. Vision quality gate**
238
+ After generating, optionally pass the image through GPT-4o vision to check:
239
+ - Text readability (are labels/headings legible?)
240
+ - Layout completeness (are all requested elements present?)
241
+ - Visual coherence (does it look like a real UI, not a collage?)
242
+ Auto-retry once on failure. If still fails, present anyway with a warning.
243
+
244
+ **5. Output location: explorations in /tmp, approved finals in `docs/designs/`**
245
+ - Exploration variants go to `/tmp/gstack-mockups-{session}/` (ephemeral, not committed)
246
+ - Only the **user-approved final** mockup gets saved to `docs/designs/` (checked in)
247
+ - Default output directory configurable via CLAUDE.md `design_output_dir` setting
248
+ - Filename pattern: `{skill}-{description}-{timestamp}.png`
249
+ - Create `docs/designs/` if it doesn't exist (mkdir -p)
250
+ - Design doc references the committed image path
251
+ - Always show to user via the Read tool (which renders images inline in Claude Code)
252
+ - This avoids repo bloat: only approved designs are committed, not every exploration variant
253
+ - Fallback: if not in a git repo, save to `/tmp/gstack-mockup-{timestamp}.png`
254
+
255
+ **6. Trust boundary acknowledgment**
256
+ Default-on generation sends design brief text to OpenAI. This is a new external data flow vs. the existing HTML wireframe path which is entirely local. The brief contains only abstract design descriptions (goal, style, elements), never source code or user data. Screenshots from $B are NOT sent to OpenAI (the reference field in DesignBrief is a local file path used by the agent, not uploaded to the API). Document this in CLAUDE.md.
257
+
258
+ **7. Rate limit mitigation**
259
+ Variant generation uses staggered parallel: start each API call 1 second apart via `Promise.allSettled()` with delays. This avoids the 5-7 RPM rate limit on image generation while still being faster than fully serial. If any call 429s, retry with exponential backoff (2s, 4s, 8s).
260
+
261
+ ### Template Integration
262
+
263
+ **Add to existing resolver:** `scripts/resolvers/design.ts` (NOT a new file)
264
+ - Add `generateDesignSetup()` for `{{DESIGN_SETUP}}` placeholder (mirrors `generateBrowseSetup()`)
265
+ - Add `generateDesignMockup()` for `{{DESIGN_MOCKUP}}` placeholder (full exploration workflow)
266
+ - Keeps all design resolvers in one file (consistent with existing codebase convention)
267
+
268
+ **New HostPaths entry:** `types.ts`
269
+ ```typescript
270
+ // claude host:
271
+ designDir: '~/.claude/skills/gstack/design/dist'
272
+ // codex host:
273
+ designDir: '$GSTACK_DESIGN'
274
+ ```
275
+ Note: Codex runtime setup (`setup` script) must also export `GSTACK_DESIGN` env var, similar to how `GSTACK_BROWSE` is set.
276
+
277
+ **`$D` resolution bash block** (generated by `{{DESIGN_SETUP}}`):
278
+ ```bash
279
+ _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
280
+ D=""
281
+ [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/design/dist/design" ] && D="$_ROOT/.claude/skills/gstack/design/dist/design"
282
+ [ -z "$D" ] && D=~/.claude/skills/gstack/design/dist/design
283
+ if [ -x "$D" ]; then
284
+ echo "DESIGN_READY: $D"
285
+ else
286
+ echo "DESIGN_NOT_AVAILABLE"
287
+ fi
288
+ ```
289
+ If `DESIGN_NOT_AVAILABLE`: skills fall back to HTML wireframe generation (existing `DESIGN_SKETCH` pattern). Design mockup is a progressive enhancement, not a hard requirement.
290
+
291
+ **New functions in existing resolver:** `scripts/resolvers/design.ts`
292
+ - Add `generateDesignSetup()` for `{{DESIGN_SETUP}}` — mirrors `generateBrowseSetup()` pattern
293
+ - Add `generateDesignMockup()` for `{{DESIGN_MOCKUP}}` — the full generate+check+present workflow
294
+ - Keeps all design resolvers in one file (consistent with existing codebase convention)
295
+
296
+ ### Skill Integration (Priority Order)
297
+
298
+ **1. /office-hours** — Replace the Visual Sketch section
299
+ - After approach selection (Phase 4), generate hero mockup + 2 variants
300
+ - Present all three via Read tool, ask user to pick
301
+ - Iterate if requested
302
+ - Save chosen mockup alongside design doc
303
+
304
+ **2. /plan-design-review** — "What better looks like"
305
+ - When rating a design dimension <7/10, generate a mockup showing what 10/10 would look like
306
+ - Side-by-side: current (screenshot via $B) vs. proposed (mockup via $D)
307
+
308
+ **3. /design-consultation** — Design system preview
309
+ - Generate visual preview of proposed design system (typography, colors, components)
310
+ - Replace the /tmp HTML preview page with a proper mockup
311
+
312
+ **4. /design-review** — Design intent comparison
313
+ - Generate "design intent" mockup from the plan/DESIGN.md specs
314
+ - Compare against live site screenshot for visual delta
315
+
316
+ ### Files to Create
317
+
318
+ | File | Purpose |
319
+ |------|---------|
320
+ | `design/src/cli.ts` | Entry point, command dispatch |
321
+ | `design/src/commands.ts` | Command registry |
322
+ | `design/src/generate.ts` | GPT Image generation via Responses API |
323
+ | `design/src/iterate.ts` | Multi-turn iteration with session state |
324
+ | `design/src/variants.ts` | Generate N design variants |
325
+ | `design/src/check.ts` | Vision-based quality gate |
326
+ | `design/src/brief.ts` | Structured brief types + helpers |
327
+ | `design/src/session.ts` | Session state management |
328
+ | `design/src/compare.ts` | HTML comparison board generator |
329
+ | `design/test/design.test.ts` | Integration tests (mock OpenAI API) |
330
+ | (none — add to existing `scripts/resolvers/design.ts`) | `{{DESIGN_SETUP}}` + `{{DESIGN_MOCKUP}}` resolvers |
331
+
332
+ ### Files to Modify
333
+
334
+ | File | Change |
335
+ |------|--------|
336
+ | `scripts/resolvers/types.ts` | Add `designDir` to `HostPaths` |
337
+ | `scripts/resolvers/index.ts` | Register DESIGN_SETUP + DESIGN_MOCKUP resolvers |
338
+ | `package.json` | Add `design` build command |
339
+ | `setup` | Build design binary alongside browse |
340
+ | `scripts/resolvers/preamble.ts` | Add `GSTACK_DESIGN` env var export for Codex host |
341
+ | `test/gen-skill-docs.test.ts` | Update DESIGN_SKETCH test suite for new resolvers |
342
+ | `setup` | Add design binary build + Codex/Kiro asset linking |
343
+ | `office-hours/SKILL.md.tmpl` | Replace Visual Sketch section with `{{DESIGN_MOCKUP}}` |
344
+ | `plan-design-review/SKILL.md.tmpl` | Add `{{DESIGN_SETUP}}` + mockup generation for low-scoring dimensions |
345
+
346
+ ### Existing Code to Reuse
347
+
348
+ | Code | Location | Used For |
349
+ |------|----------|----------|
350
+ | Browse CLI pattern | `browse/src/cli.ts` | Command dispatch architecture |
351
+ | `commands.ts` registry | `browse/src/commands.ts` | Single source of truth pattern |
352
+ | `generateBrowseSetup()` | `scripts/resolvers/browse.ts` | Template for `generateDesignSetup()` |
353
+ | `DESIGN_SKETCH` resolver | `scripts/resolvers/design.ts` | Template for `DESIGN_MOCKUP` resolver |
354
+ | HostPaths system | `scripts/resolvers/types.ts` | Multi-host path resolution |
355
+ | Build pipeline | `package.json` build script | `bun build --compile` pattern |
356
+
357
+ ### API Details
358
+
359
+ **Generate:** OpenAI Responses API with `image_generation` tool
360
+ ```typescript
361
+ const response = await openai.responses.create({
362
+ model: "gpt-4o",
363
+ input: briefToPrompt(brief),
364
+ tools: [{ type: "image_generation", size: "1536x1024", quality: "high" }],
365
+ });
366
+ // Extract image from response output items
367
+ const imageItem = response.output.find(item => item.type === "image_generation_call");
368
+ const base64Data = imageItem.result; // base64-encoded PNG
369
+ fs.writeFileSync(outputPath, Buffer.from(base64Data, "base64"));
370
+ ```
371
+
372
+ **Iterate:** Same API with `previous_response_id`
373
+ ```typescript
374
+ const response = await openai.responses.create({
375
+ model: "gpt-4o",
376
+ input: feedback,
377
+ previous_response_id: session.lastResponseId,
378
+ tools: [{ type: "image_generation" }],
379
+ });
380
+ ```
381
+ **NOTE:** Multi-turn image iteration via `previous_response_id` is an assumption that needs prototype validation. The Responses API supports conversation threading, but whether it retains visual context of generated images for edit-style iteration is not confirmed in docs. **Fallback:** if multi-turn doesn't work, `iterate` falls back to re-generating with the original brief + accumulated feedback in a single prompt.
382
+
383
+ **Check:** GPT-4o vision
384
+ ```typescript
385
+ const check = await openai.chat.completions.create({
386
+ model: "gpt-4o",
387
+ messages: [{
388
+ role: "user",
389
+ content: [
390
+ { type: "image_url", image_url: { url: `data:image/png;base64,${imageData}` } },
391
+ { type: "text", text: `Check this UI mockup. Brief: ${brief}. Is text readable? Are all elements present? Does it look like a real UI? Return PASS or FAIL with issues.` }
392
+ ]
393
+ }]
394
+ });
395
+ ```
396
+
397
+ **Cost:** ~$0.10-$0.40 per design session (1 hero + 2 variants + 1 quality check + 1 iteration). Negligible next to the LLM costs already in each skill invocation.
398
+
399
+ ### Auth (validated via smoke test)
400
+
401
+ **Codex OAuth tokens DO NOT work for image generation.** Tested 2026-03-26: both the Images API and Responses API reject `~/.codex/auth.json` access_token with "Missing scopes: api.model.images.request". Codex CLI also has no native imagegen capability.
402
+
403
+ **Auth resolution order:**
404
+ 1. Read `~/.gstack/openai.json` → `{ "api_key": "sk-..." }` (file permissions 0600)
405
+ 2. Fall back to `OPENAI_API_KEY` environment variable
406
+ 3. If neither exists → guided setup flow:
407
+ - Tell user: "Design mockups need an OpenAI API key with image generation permissions. Get one at platform.openai.com/api-keys"
408
+ - Prompt user to paste the key
409
+ - Write to `~/.gstack/openai.json` with 0600 permissions
410
+ - Run a smoke test (generate a 1024x1024 test image) to verify the key works
411
+ - If smoke test passes, proceed. If it fails, show the error and fall back to DESIGN_SKETCH.
412
+ 4. If auth exists but API call fails → fall back to DESIGN_SKETCH (existing HTML wireframe approach). Design mockups are a progressive enhancement, never a hard requirement.
413
+
414
+ **New command:** `$D setup` — guided API key setup + smoke test. Can be run anytime to update the key.
415
+
416
+ ## Assumptions to Validate in Prototype
417
+
418
+ 1. **Image quality:** "Pixel-perfect UI mockups" is aspirational. GPT Image generation may not reliably produce accurate text rendering, alignment, and spacing at true UI fidelity. The vision quality gate helps, but success criterion "good enough to implement from" needs prototype validation before full skill integration.
419
+ 2. **Multi-turn iteration:** Whether `previous_response_id` retains visual context is unproven (see API Details section).
420
+ 3. **Cost model:** Estimated $0.10-$0.40/session needs real-world validation.
421
+
422
+ **Prototype validation plan:** Build Commit 1 (core generate + check), run 10 design briefs across different screen types, evaluate output quality before proceeding to skill integration.
423
+
424
+ ## CEO Expansion Scope (accepted via /plan-ceo-review SCOPE EXPANSION)
425
+
426
+ ### 1. Design Memory + Exploration Width Control
427
+ - Auto-extract visual language from approved mockups into DESIGN.md
428
+ - If DESIGN.md exists, constrain future mockups to established design language
429
+ - If no DESIGN.md (bootstrap), explore WIDE across diverse directions
430
+ - Progressive constraint: more established design = narrower exploration band
431
+ - Comparison board gets REGENERATE section with exploration controls:
432
+ - "Something totally different" (wide exploration)
433
+ - "More like option ___" (narrow around a favorite)
434
+ - "Match my existing design" (constrain to DESIGN.md)
435
+ - Free text input for specific direction changes
436
+ - Regenerate refreshes the page, agent polls for new submission
437
+
438
+ ### 2. Mockup Diffing
439
+ - `$D diff --before old.png --after new.png` generates visual diff
440
+ - Side-by-side with changed regions highlighted
441
+ - Uses GPT-4o vision to identify differences
442
+ - Used in: /design-review, iteration feedback, PR review
443
+
444
+ ### 3. Screenshot-to-Mockup Evolution
445
+ - `$D evolve --screenshot current.png --brief "make it calmer"`
446
+ - Takes live site screenshot, generates mockup showing how it SHOULD look
447
+ - Starts from reality, not blank canvas
448
+ - Bridge between /design-review critique and visual fix proposal
449
+
450
+ ### 4. Design Intent Verification
451
+ - During /design-review, overlay approved mockup (docs/designs/) onto live screenshot
452
+ - Highlight divergence: "You designed X, you built Y, here's the gap"
453
+ - Closes the full loop: design -> implement -> verify visually
454
+ - Combines $B screenshot + $D diff + vision analysis
455
+
456
+ ### 5. Responsive Variants
457
+ - `$D variants --brief "..." --viewports desktop,tablet,mobile`
458
+ - Auto-generates mockups at multiple viewport sizes
459
+ - Comparison board shows responsive grid for simultaneous approval
460
+ - Makes responsive design a first-class concern from mockup stage
461
+
462
+ ### 6. Design-to-Code Prompt
463
+ - After comparison board approval, auto-generate structured implementation prompt
464
+ - Extracts colors, typography, layout from approved PNG via vision analysis
465
+ - Combines with DESIGN.md and HTML wireframe as structured spec
466
+ - Bridges "approved design" to "agent starts coding" with zero interpretation gap
467
+
468
+ ### Future Engines (NOT in this plan's scope)
469
+ - Magic Patterns integration (extract patterns from existing designs)
470
+ - Variant API (when they ship it, multi-variation React code + preview)
471
+ - Figma MCP (bidirectional design file access)
472
+ - Google Stitch SDK (free TypeScript alternative)
473
+
474
+ ## Open Questions
475
+
476
+ 1. When Variant ships an API, what's the integration path? (Separate engine in the design binary, or a standalone Variant binary?)
477
+ 2. How should Magic Patterns integrate? (Another engine in $D, or a separate tool?)
478
+ 3. At what point does the design binary need a plugin/engine architecture to support multiple generation backends?
479
+
480
+ ## Success Criteria
481
+
482
+ - Running `/office-hours` on a UI idea produces actual PNG mockups alongside the design doc
483
+ - Running `/plan-design-review` shows "what better looks like" as a mockup, not prose
484
+ - Mockups are good enough that a developer could implement from them
485
+ - The quality gate catches obviously broken mockups and retries
486
+ - Cost per design session stays under $0.50
487
+
488
+ ## Distribution Plan
489
+
490
+ The design binary is compiled and distributed alongside the browse binary:
491
+ - `bun build --compile design/src/cli.ts --outfile design/dist/design`
492
+ - Built during `./setup` and `bun run build`
493
+ - Symlinked via existing `~/.claude/skills/gstack/` install path
494
+
495
+ ## Next Steps (Implementation Order)
496
+
497
+ ### Commit 0: Prototype validation (MUST PASS before building infrastructure)
498
+ - Single-file prototype script (~50 lines) that sends 3 different design briefs to GPT Image API
499
+ - Validates: text rendering quality, layout accuracy, visual coherence
500
+ - If output is "embarrassingly bad AI art" for UI mockups, STOP. Re-evaluate approach.
501
+ - This is the cheapest way to validate the core assumption before building 8 files of infrastructure.
502
+
503
+ ### Commit 1: Design binary core (generate + check + compare)
504
+ - `design/src/` with cli.ts, commands.ts, generate.ts, check.ts, brief.ts, session.ts, compare.ts
505
+ - Auth module (read ~/.gstack/openai.json, fallback to env var, guided setup flow)
506
+ - `compare` command generates HTML comparison board with per-variant feedback textareas
507
+ - `package.json` build command (separate `bun build --compile` from browse)
508
+ - `setup` script integration (including Codex + Kiro asset linking)
509
+ - Unit tests with mock OpenAI API server
510
+
511
+ ### Commit 2: Variants + iterate
512
+ - `design/src/variants.ts`, `design/src/iterate.ts`
513
+ - Staggered parallel generation (1s delay between starts, exponential backoff on 429)
514
+ - Session state management for multi-turn
515
+ - Tests for iteration flow + rate limit handling
516
+
517
+ ### Commit 3: Template integration
518
+ - Add `generateDesignSetup()` + `generateDesignMockup()` to existing `scripts/resolvers/design.ts`
519
+ - Add `designDir` to `HostPaths` in `scripts/resolvers/types.ts`
520
+ - Register DESIGN_SETUP + DESIGN_MOCKUP in `scripts/resolvers/index.ts`
521
+ - Add GSTACK_DESIGN env var export to `scripts/resolvers/preamble.ts` (Codex host)
522
+ - Update `test/gen-skill-docs.test.ts` (DESIGN_SKETCH test suite)
523
+ - Regenerate SKILL.md files
524
+
525
+ ### Commit 4: /office-hours integration
526
+ - Replace Visual Sketch section with `{{DESIGN_MOCKUP}}`
527
+ - Sequential workflow: generate variants → $D compare → user feedback → DESIGN_SKETCH HTML wireframe
528
+ - Save approved mockup to docs/designs/ (only the approved one, not explorations)
529
+
530
+ ### Commit 5: /plan-design-review integration
531
+ - Add `{{DESIGN_SETUP}}` and mockup generation for low-scoring dimensions
532
+ - "What 10/10 looks like" mockup comparison
533
+
534
+ ### Commit 6: Design Memory + Exploration Width Control (CEO expansion)
535
+ - After mockup approval, extract visual language via GPT-4o vision
536
+ - Write/update DESIGN.md with extracted colors, typography, spacing, layout patterns
537
+ - If DESIGN.md exists, feed it as constraint context to all future mockup prompts
538
+ - Add REGENERATE section to comparison board HTML (chiclets + free text + refresh loop)
539
+ - Progressive constraint logic in brief construction
540
+
541
+ ### Commit 7: Mockup Diffing + Design Intent Verification (CEO expansion)
542
+ - `$D diff` command: takes two PNGs, uses GPT-4o vision to identify differences, generates overlay
543
+ - `$D verify` command: screenshots live site via $B, diffs against approved mockup from docs/designs/
544
+ - Integration into /design-review template: auto-verify when approved mockup exists
545
+
546
+ ### Commit 8: Screenshot-to-Mockup Evolution (CEO expansion)
547
+ - `$D evolve` command: takes screenshot + brief, generates "how it should look" mockup
548
+ - Sends screenshot as reference image to GPT Image API
549
+ - Integration into /design-review: "Here's what the fix should look like" visual proposals
550
+
551
+ ### Commit 9: Responsive Variants + Design-to-Code Prompt (CEO expansion)
552
+ - `--viewports` flag on `$D variants` for multi-size generation
553
+ - Comparison board responsive grid layout
554
+ - Auto-generate structured implementation prompt after approval
555
+ - Vision analysis of approved PNG to extract colors, typography, layout for the prompt
556
+
557
+ ## The Assignment
558
+
559
+ Tell Variant to build an API. As their investor: "I'm building a workflow where AI agents generate visual designs programmatically. GPT Image API works today — but I'd rather use Variant because the multi-variation approach is better for design exploration. Ship an API endpoint: prompt in, React code + preview image out. I'll be your first integration partner."
560
+
561
+ ## Verification
562
+
563
+ 1. `bun run build` compiles `design/dist/design` binary
564
+ 2. `$D generate --brief "Landing page for a developer tool" --output /tmp/test.png` produces a real PNG
565
+ 3. `$D check --image /tmp/test.png --brief "Landing page"` returns PASS/FAIL
566
+ 4. `$D variants --brief "..." --count 3 --output-dir /tmp/variants/` produces 3 PNGs
567
+ 5. Running `/office-hours` on a UI idea produces mockups inline
568
+ 6. `bun test` passes (skill validation, gen-skill-docs)
569
+ 7. `bun run test:evals` passes (E2E tests)
570
+
571
+ ## What I noticed about how you think
572
+
573
+ - You said "that isn't design" about text descriptions and ASCII art. That's a designer's instinct — you know the difference between describing a thing and showing a thing. Most people building AI tools don't notice this gap because they were never designers.
574
+ - You prioritized /office-hours first — the upstream leverage point. If the brainstorm produces real mockups, every downstream skill (/plan-design-review, /design-review) has a visual artifact to reference instead of re-interpreting prose.
575
+ - You funded Variant and immediately thought "they should have an API." That's investor-as-user thinking — you're not just evaluating the company, you're designing how their product fits into your workflow.
576
+ - When Codex challenged the opt-in premise, you accepted it immediately. No ego defense. That's the fastest path to the right answer.
577
+
578
+ ## Spec Review Results
579
+
580
+ Doc survived 1 round of adversarial review. 11 issues caught and fixed.
581
+ Quality score: 7/10 → estimated 8.5/10 after fixes.
582
+
583
+ Issues fixed:
584
+ 1. OpenAI SDK dependency declared
585
+ 2. Image data extraction path specified (response.output item shape)
586
+ 3. --check and --retry flags formally registered in command registry
587
+ 4. Brief input modes specified (plain text vs JSON file)
588
+ 5. Resolver file contradiction fixed (add to existing design.ts)
589
+ 6. HostPaths Codex env var setup noted
590
+ 7. "Mirrors browse" reframed to "shares compilation/distribution pattern"
591
+ 8. Session state specified (ID generation, discovery, cleanup)
592
+ 9. "Pixel-perfect" flagged as assumption needing prototype validation
593
+ 10. Multi-turn iteration flagged as unproven with fallback plan
594
+ 11. $D discovery bash block fully specified with fallback to DESIGN_SKETCH
595
+
596
+ ## Eng Review Completion Summary
597
+
598
+ - Step 0: Scope Challenge — scope accepted as-is (full binary, user overrode reduction recommendation)
599
+ - Architecture Review: 5 issues found (openai dep separation, graceful degrade, output dir config, auth model, trust boundary)
600
+ - Code Quality Review: 1 issue found (8 files vs 5, kept 8)
601
+ - Test Review: diagram produced, 42 gaps identified, test plan written
602
+ - Performance Review: 1 issue found (parallel variants with staggered start)
603
+ - NOT in scope: Google Stitch SDK integration, Figma MCP, Variant API (deferred)
604
+ - What already exists: browse CLI pattern, DESIGN_SKETCH resolver, HostPaths system, gen-skill-docs pipeline
605
+ - Outside voice: 4 passes (Claude structured 12 issues, Codex structured 8 issues, Claude adversarial 1 fatal flaw, Codex adversarial 1 fatal flaw). Key insight: sequential PNG→HTML workflow resolved the "opaque raster" fatal flaw.
606
+ - Failure modes: 0 critical gaps (all identified failure modes have error handling + tests planned)
607
+ - Lake Score: 7/7 recommendations chose complete option
608
+
609
+ ## GSTACK REVIEW REPORT
610
+
611
+ | Review | Trigger | Why | Runs | Status | Findings |
612
+ |--------|---------|-----|------|--------|----------|
613
+ | Office Hours | `/office-hours` | Design brainstorm | 1 | DONE | 4 premises, 1 revised (Codex: opt-in->default-on) |
614
+ | CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | EXPANSION: 6 proposed, 6 accepted, 0 deferred |
615
+ | Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR | 7 issues, 0 critical gaps, 4 outside voices |
616
+ | Design Review | `/plan-design-review` | UI/UX gaps | 1 | CLEAR | score: 2/10 -> 8/10, 5 decisions made |
617
+ | Outside Voice | structured + adversarial | Independent challenge | 4 | DONE | Sequential PNG->HTML workflow, trust boundary noted |
618
+
619
+ **CEO EXPANSIONS:** Design Memory + Exploration Width, Mockup Diffing, Screenshot Evolution, Design Intent Verification, Responsive Variants, Design-to-Code Prompt.
620
+ **DESIGN DECISIONS:** Single-column full-width layout, per-card "More like this", explicit radio Pick, smooth fade regeneration, skeleton loading states.
621
+ **UNRESOLVED:** 0
622
+ **VERDICT:** CEO + ENG + DESIGN CLEARED. Ready to implement. Start with Commit 0 (prototype validation).