opengstack 0.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/AGENTS.md +47 -0
  2. package/CLAUDE.md +370 -0
  3. package/LICENSE +21 -0
  4. package/README.md +80 -0
  5. package/SKILL.md +226 -0
  6. package/autoplan/SKILL.md +96 -0
  7. package/autoplan/SKILL.md.tmpl +694 -0
  8. package/benchmark/SKILL.md +358 -0
  9. package/benchmark/SKILL.md.tmpl +222 -0
  10. package/browse/SKILL.md +396 -0
  11. package/browse/SKILL.md.tmpl +131 -0
  12. package/canary/SKILL.md +89 -0
  13. package/canary/SKILL.md.tmpl +212 -0
  14. package/careful/SKILL.md +58 -0
  15. package/careful/SKILL.md.tmpl +56 -0
  16. package/codex/SKILL.md +90 -0
  17. package/codex/SKILL.md.tmpl +417 -0
  18. package/connect-chrome/SKILL.md +87 -0
  19. package/connect-chrome/SKILL.md.tmpl +195 -0
  20. package/cso/SKILL.md +93 -0
  21. package/cso/SKILL.md.tmpl +606 -0
  22. package/design-consultation/SKILL.md +94 -0
  23. package/design-consultation/SKILL.md.tmpl +415 -0
  24. package/design-review/SKILL.md +94 -0
  25. package/design-review/SKILL.md.tmpl +290 -0
  26. package/design-shotgun/SKILL.md +91 -0
  27. package/design-shotgun/SKILL.md.tmpl +285 -0
  28. package/docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md +84 -0
  29. package/docs/designs/CONDUCTOR_CHROME_SIDEBAR_INTEGRATION.md +57 -0
  30. package/docs/designs/CONDUCTOR_SESSION_API.md +108 -0
  31. package/docs/designs/DESIGN_SHOTGUN.md +451 -0
  32. package/docs/designs/DESIGN_TOOLS_V1.md +622 -0
  33. package/docs/skills.md +880 -0
  34. package/document-release/SKILL.md +91 -0
  35. package/document-release/SKILL.md.tmpl +359 -0
  36. package/freeze/SKILL.md +78 -0
  37. package/freeze/SKILL.md.tmpl +77 -0
  38. package/gstack-upgrade/SKILL.md +224 -0
  39. package/gstack-upgrade/SKILL.md.tmpl +222 -0
  40. package/guard/SKILL.md +78 -0
  41. package/guard/SKILL.md.tmpl +77 -0
  42. package/investigate/SKILL.md +105 -0
  43. package/investigate/SKILL.md.tmpl +194 -0
  44. package/land-and-deploy/SKILL.md +88 -0
  45. package/land-and-deploy/SKILL.md.tmpl +881 -0
  46. package/office-hours/SKILL.md +96 -0
  47. package/office-hours/SKILL.md.tmpl +645 -0
  48. package/package.json +43 -0
  49. package/plan-ceo-review/SKILL.md +94 -0
  50. package/plan-ceo-review/SKILL.md.tmpl +811 -0
  51. package/plan-design-review/SKILL.md +92 -0
  52. package/plan-design-review/SKILL.md.tmpl +446 -0
  53. package/plan-eng-review/SKILL.md +93 -0
  54. package/plan-eng-review/SKILL.md.tmpl +303 -0
  55. package/qa/SKILL.md +95 -0
  56. package/qa/SKILL.md.tmpl +316 -0
  57. package/qa-only/SKILL.md +89 -0
  58. package/qa-only/SKILL.md.tmpl +101 -0
  59. package/retro/SKILL.md +89 -0
  60. package/retro/SKILL.md.tmpl +820 -0
  61. package/review/SKILL.md +92 -0
  62. package/review/SKILL.md.tmpl +281 -0
  63. package/scripts/cleanup.py +100 -0
  64. package/scripts/filter-skills.sh +114 -0
  65. package/scripts/filter_skills.py +140 -0
  66. package/setup-browser-cookies/SKILL.md +216 -0
  67. package/setup-browser-cookies/SKILL.md.tmpl +81 -0
  68. package/setup-deploy/SKILL.md +92 -0
  69. package/setup-deploy/SKILL.md.tmpl +215 -0
  70. package/ship/SKILL.md +90 -0
  71. package/ship/SKILL.md.tmpl +636 -0
  72. package/unfreeze/SKILL.md +37 -0
  73. package/unfreeze/SKILL.md.tmpl +36 -0
@@ -0,0 +1,451 @@
1
+ # Design: Design Shotgun — Browser-to-Agent Feedback Loop
2
+
3
+ Generated on 2026-03-27
4
+ Branch: garrytan/agent-design-tools
5
+ Status: LIVING DOCUMENT — update as bugs are found and fixed
6
+
7
+ ## What This Feature Does
8
+
9
+ Design Shotgun generates multiple AI design mockups, opens them side-by-side in the
10
+ user's real browser as a comparison board, and collects structured feedback (pick a
11
+ favorite, rate alternatives, leave notes, request regeneration). The feedback flows
12
+ back to the coding agent, which acts on it: either proceeding with the approved
13
+ variant or generating new variants and reloading the board.
14
+
15
+ The user never leaves their browser tab. The agent never asks redundant questions.
16
+ The board is the feedback mechanism.
17
+
18
+ ## The Core Problem: Two Worlds That Must Talk
19
+
20
+ ```
21
+ ┌─────────────────────┐ ┌──────────────────────┐
22
+ │ USER'S BROWSER │ │ CODING AGENT │
23
+ │ (real Chrome) │ │ (Claude Code / │
24
+ │ │ │ Conductor) │
25
+ │ Comparison board │ │ │
26
+ │ with buttons: │ ??? │ Needs to know: │
27
+ │ - Submit │ ──────── │ - What was picked │
28
+ │ - Regenerate │ │ - Star ratings │
29
+ │ - More like this │ │ - Comments │
30
+ │ - Remix │ │ - Regen requested? │
31
+ └─────────────────────┘ └──────────────────────┘
32
+ ```
33
+
34
+ The "???" is the hard part. The user clicks a button in Chrome. The agent running in
35
+ a terminal needs to know about it. These are two completely separate processes with
36
+ no shared memory, no shared event bus, no WebSocket connection.
37
+
38
+ ## Architecture: How the Linkage Works
39
+
40
+ ```
41
+ USER'S BROWSER $D serve (Bun HTTP) AGENT
42
+ ═══════════════ ═══════════════════ ═════
43
+ │ │ │
44
+ │ GET / │ │
45
+ │ ◄─────── serves board HTML ──────►│ │
46
+ │ (with __GSTACK_SERVER_URL │ │
47
+ │ injected into <head>) │ │
48
+ │ │ │
49
+ │ [user rates, picks, comments] │ │
50
+ │ │ │
51
+ │ POST /api/feedback │ │
52
+ │ ─────── {preferred:"A",...} ─────►│ │
53
+ │ │ │
54
+ │ ◄── {received:true} ────────────│ │
55
+ │ │── writes feedback.json ──►│
56
+ │ [inputs disabled, │ (or feedback-pending │
57
+ │ "Return to agent" shown] │ .json for regen) │
58
+ │ │ │
59
+ │ │ [agent polls
60
+ │ │ every 5s,
61
+ │ │ reads file]
62
+ ```
63
+
64
+ ### The Three Files
65
+
66
+ | File | Written when | Means | Agent action |
67
+ |------|-------------|-------|-------------|
68
+ | `feedback.json` | User clicks Submit | Final selection, done | Read it, proceed |
69
+ | `feedback-pending.json` | User clicks Regenerate/More Like This | Wants new options | Read it, delete it, generate new variants, reload board |
70
+ | `feedback.json` (round 2+) | User clicks Submit after regeneration | Final selection after iteration | Read it, proceed |
71
+
72
+ ### The State Machine
73
+
74
+ ```
75
+ $D serve starts
76
+
77
+
78
+ ┌──────────┐
79
+ │ SERVING │◄──────────────────────────────────────┐
80
+ │ │ │
81
+ │ Board is │ POST /api/feedback │
82
+ │ live, │ {regenerated: true} │
83
+ │ waiting │──────────────────►┌──────────────┐ │
84
+ │ │ │ REGENERATING │ │
85
+ │ │ │ │ │
86
+ └────┬─────┘ │ Agent has │ │
87
+ │ │ 10 min to │ │
88
+ │ POST /api/feedback │ POST new │ │
89
+ │ {regenerated: false} │ board HTML │ │
90
+ │ └──────┬───────┘ │
91
+ ▼ │ │
92
+ ┌──────────┐ POST /api/reload │
93
+ │ DONE │ {html: "/new/board"} │
94
+ │ │ │ │
95
+ │ exit 0 │ ▼ │
96
+ └──────────┘ ┌──────────────┐ │
97
+ │ RELOADING │─────┘
98
+ │ │
99
+ │ Board auto- │
100
+ │ refreshes │
101
+ │ (same tab) │
102
+ └──────────────┘
103
+ ```
104
+
105
+ ### Port Discovery
106
+
107
+ The agent backgrounds `$D serve` and reads stderr for the port:
108
+
109
+ ```
110
+ SERVE_STARTED: port=54321 html=/path/to/board.html
111
+ SERVE_BROWSER_OPENED: url=http://127.0.0.1:54321
112
+ ```
113
+
114
+ The agent parses `port=XXXXX` from stderr. This port is needed later to POST
115
+ `/api/reload` when the user requests regeneration. If the agent loses the port
116
+ number, it cannot reload the board.
117
+
118
+ ### Why 127.0.0.1, Not localhost
119
+
120
+ `localhost` can resolve to IPv6 `::1` on some systems while Bun.serve() listens
121
+ on IPv4 only. More importantly, `localhost` sends all dev cookies for every domain
122
+ the developer has been working on. On a machine with many active sessions, this
123
+ blows past Bun's default header size limit (HTTP 431 error). `127.0.0.1` avoids
124
+ both issues.
125
+
126
+ ## Every Edge Case and Pitfall
127
+
128
+ ### 1. The Zombie Form Problem
129
+
130
+ **What:** User submits feedback, the POST succeeds, the server exits. But the HTML
131
+ page is still open in Chrome. It looks interactive. The user might edit their
132
+ feedback and click Submit again. Nothing happens because the server is gone.
133
+
134
+ **Fix:** After successful POST, the board JS:
135
+ - Disables ALL inputs (buttons, radios, textareas, star ratings)
136
+ - Hides the Regenerate bar entirely
137
+ - Replaces the Submit button with: "Feedback received! Return to your coding agent."
138
+ - Shows: "Want to make more changes? Run `/design-shotgun` again."
139
+ - The page becomes a read-only record of what was submitted
140
+
141
+ **Implemented in:** `compare.ts:showPostSubmitState()` (line 484)
142
+
143
+ ### 2. The Dead Server Problem
144
+
145
+ **What:** The server times out (10 min default) or crashes while the user still has
146
+ the board open. User clicks Submit. The fetch() fails silently.
147
+
148
+ **Fix:** The `postFeedback()` function has a `.catch()` handler. On network failure:
149
+ - Shows red error banner: "Connection lost"
150
+ - Displays the collected feedback JSON in a copyable `<pre>` block
151
+ - User can copy-paste it directly into their coding agent
152
+
153
+ **Implemented in:** `compare.ts:showPostFailure()` (line 546)
154
+
155
+ ### 3. The Stale Regeneration Spinner
156
+
157
+ **What:** User clicks Regenerate. Board shows spinner and polls `/api/progress`
158
+ every 2 seconds. Agent crashes or takes too long to generate new variants. The
159
+ spinner spins forever.
160
+
161
+ **Fix:** Progress polling has a hard 5-minute timeout (150 polls x 2s interval).
162
+ After 5 minutes:
163
+ - Spinner replaced with: "Something went wrong."
164
+ - Shows: "Run `/design-shotgun` again in your coding agent."
165
+ - Polling stops. Page becomes informational.
166
+
167
+ **Implemented in:** `compare.ts:startProgressPolling()` (line 511)
168
+
169
+ ### 4. The file:// URL Problem (THE ORIGINAL BUG)
170
+
171
+ **What:** The skill template originally used `$B goto file:///path/to/board.html`.
172
+ But `browse/src/url-validation.ts:71` blocks `file://` URLs for security. The
173
+ fallback `open file://...` opens the user's macOS browser, but `$B eval` polls
174
+ Playwright's headless browser (different process, never loaded the page).
175
+ Agent polls empty DOM forever.
176
+
177
+ **Fix:** `$D serve` serves over HTTP. Never use `file://` for the board. The
178
+ `--serve` flag on `$D compare` combines board generation and HTTP serving in
179
+ one command.
180
+
181
+ **Evidence:** See `.context/attachments/image-v2.png` — a real user hit this exact
182
+ bug. The agent correctly diagnosed: (1) `$B goto` rejects `file://` URLs,
183
+ (2) no polling loop even with the browse daemon.
184
+
185
+ ### 5. The Double-Click Race
186
+
187
+ **What:** User clicks Submit twice rapidly. Two POST requests arrive at the server.
188
+ First one sets state to "done" and schedules exit(0) in 100ms. Second one arrives
189
+ during that 100ms window.
190
+
191
+ **Current state:** NOT fully guarded. The `handleFeedback()` function doesn't check
192
+ if state is already "done" before processing. The second POST would succeed and
193
+ write a second `feedback.json` (harmless, same data). The exit still fires after
194
+ 100ms.
195
+
196
+ **Risk:** Low. The board disables all inputs on the first successful POST response,
197
+ so a second click would need to arrive within ~1ms. And both writes would contain
198
+ the same feedback data.
199
+
200
+ **Potential fix:** Add `if (state === 'done') return Response.json({error: 'already submitted'}, {status: 409})` at the top of `handleFeedback()`.
201
+
202
+ ### 6. The Port Coordination Problem
203
+
204
+ **What:** Agent backgrounds `$D serve` and parses `port=54321` from stderr. Agent
205
+ needs this port later to POST `/api/reload` during regeneration. If the agent
206
+ loses context (conversation compresses, context window fills up), it may not
207
+ remember the port.
208
+
209
+ **Current state:** The port is printed to stderr once. The agent must remember it.
210
+ There is no port file written to disk.
211
+
212
+ **Potential fix:** Write a `serve.pid` or `serve.port` file next to the board HTML
213
+ on startup. Agent can read it anytime:
214
+ ```bash
215
+ cat "$_DESIGN_DIR/serve.port" # → 54321
216
+ ```
217
+
218
+ ### 7. The Feedback File Cleanup Problem
219
+
220
+ **What:** `feedback-pending.json` from a regeneration round is left on disk. If the
221
+ agent crashes before reading it, the next `$D serve` session finds a stale file.
222
+
223
+ **Current state:** The polling loop in the resolver template says to delete
224
+ `feedback-pending.json` after reading it. But this depends on the agent following
225
+ instructions perfectly. Stale files could confuse a new session.
226
+
227
+ **Potential fix:** `$D serve` could check for and delete stale feedback files on
228
+ startup. Or: name files with timestamps (`feedback-pending-1711555200.json`).
229
+
230
+ ### 8. Sequential Generate Rule
231
+
232
+ **What:** The underlying OpenAI GPT Image API rate-limits concurrent image generation
233
+ requests. When 3 `$D generate` calls run in parallel, 1 succeeds and 2 get aborted.
234
+
235
+ **Fix:** The skill template must explicitly say: "Generate mockups ONE AT A TIME.
236
+ Do not parallelize `$D generate` calls." This is a prompt-level instruction, not
237
+ a code-level lock. The design binary does not enforce sequential execution.
238
+
239
+ **Risk:** Agents are trained to parallelize independent work. Without an explicit
240
+ instruction, they will try to run 3 generates simultaneously. This wastes API calls
241
+ and money.
242
+
243
+ ### 9. The AskUserQuestion Redundancy
244
+
245
+ **What:** After the user submits feedback via the board (with preferred variant,
246
+ ratings, comments all in the JSON), the agent asks them again: "Which variant do
247
+ you prefer?" This is annoying. The whole point of the board is to avoid this.
248
+
249
+ **Fix:** The skill template must say: "Do NOT use AskUserQuestion to ask the user's
250
+ preference. Read `feedback.json`, it contains their selection. Only AskUserQuestion
251
+ to confirm you understood correctly, not to re-ask."
252
+
253
+ ### 10. The CORS Problem
254
+
255
+ **What:** If the board HTML references external resources (fonts, images from CDN),
256
+ the browser sends requests with `Origin: http://127.0.0.1:PORT`. Most CDNs allow
257
+ this, but some might block it.
258
+
259
+ **Current state:** The server does not set CORS headers. The board HTML is
260
+ self-contained (images base64-encoded, styles inline), so this hasn't been an
261
+ issue in practice.
262
+
263
+ **Risk:** Low for current design. Would matter if the board loaded external
264
+ resources.
265
+
266
+ ### 11. The Large Payload Problem
267
+
268
+ **What:** No size limit on POST bodies to `/api/feedback`. If the board somehow
269
+ sends a multi-MB payload, `req.json()` will parse it all into memory.
270
+
271
+ **Current state:** In practice, feedback JSON is ~500 bytes to ~2KB. The risk is
272
+ theoretical, not practical. The board JS constructs a fixed-shape JSON object.
273
+
274
+ ### 12. The fs.writeFileSync Error
275
+
276
+ **What:** `feedback.json` write in `serve.ts:138` uses `fs.writeFileSync()` with no
277
+ try/catch. If the disk is full or the directory is read-only, this throws and
278
+ crashes the server. The user sees a spinner forever (server is dead, but board
279
+ doesn't know).
280
+
281
+ **Risk:** Low in practice (the board HTML was just written to the same directory,
282
+ proving it's writable). But a try/catch with a 500 response would be cleaner.
283
+
284
+ ## The Complete Flow (Step by Step)
285
+
286
+ ### Happy Path: User Picks on First Try
287
+
288
+ ```
289
+ 1. Agent runs: $D compare --images "A.png,B.png,C.png" --output board.html --serve &
290
+ 2. $D serve starts Bun.serve() on random port (e.g. 54321)
291
+ 3. $D serve opens http://127.0.0.1:54321 in user's browser
292
+ 4. $D serve prints to stderr: SERVE_STARTED: port=54321 html=/path/board.html
293
+ 5. $D serve writes board HTML with injected __GSTACK_SERVER_URL
294
+ 6. User sees comparison board with 3 variants side by side
295
+ 7. User picks Option B, rates A: 3/5, B: 5/5, C: 2/5
296
+ 8. User writes "B has better spacing, go with that" in overall feedback
297
+ 9. User clicks Submit
298
+ 10. Board JS POSTs to http://127.0.0.1:54321/api/feedback
299
+ Body: {"preferred":"B","ratings":{"A":3,"B":5,"C":2},"overall":"B has better spacing","regenerated":false}
300
+ 11. Server writes feedback.json to disk (next to board.html)
301
+ 12. Server prints feedback JSON to stdout
302
+ 13. Server responds {received:true, action:"submitted"}
303
+ 14. Board disables all inputs, shows "Return to your coding agent"
304
+ 15. Server exits with code 0 after 100ms
305
+ 16. Agent's polling loop finds feedback.json
306
+ 17. Agent reads it, summarizes to user, proceeds
307
+ ```
308
+
309
+ ### Regeneration Path: User Wants Different Options
310
+
311
+ ```
312
+ 1-6. Same as above
313
+ 7. User clicks "Totally different" chiclet
314
+ 8. User clicks Regenerate
315
+ 9. Board JS POSTs to /api/feedback
316
+ Body: {"regenerated":true,"regenerateAction":"different","preferred":"","ratings":{},...}
317
+ 10. Server writes feedback-pending.json to disk
318
+ 11. Server state → "regenerating"
319
+ 12. Server responds {received:true, action:"regenerate"}
320
+ 13. Board shows spinner: "Generating new designs..."
321
+ 14. Board starts polling GET /api/progress every 2s
322
+
323
+ Meanwhile, in the agent:
324
+ 15. Agent's polling loop finds feedback-pending.json
325
+ 16. Agent reads it, deletes it
326
+ 17. Agent runs: $D variants --brief "totally different direction" --count 3
327
+ (ONE AT A TIME, not parallel)
328
+ 18. Agent runs: $D compare --images "new-A.png,new-B.png,new-C.png" --output board-v2.html
329
+ 19. Agent POSTs: curl -X POST http://127.0.0.1:54321/api/reload -d '{"html":"/path/board-v2.html"}'
330
+ 20. Server swaps htmlContent to new board
331
+ 21. Server state → "serving" (from reloading)
332
+ 22. Board's next /api/progress poll returns {"status":"serving"}
333
+ 23. Board auto-refreshes: window.location.reload()
334
+ 24. User sees new board with 3 fresh variants
335
+ 25. User picks one, clicks Submit → happy path from step 10
336
+ ```
337
+
338
+ ### "More Like This" Path
339
+
340
+ ```
341
+ Same as regeneration, except:
342
+ - regenerateAction is "more_like_B" (references the variant)
343
+ - Agent uses $D iterate --image B.png --brief "more like this, keep the spacing"
344
+ instead of $D variants
345
+ ```
346
+
347
+ ### Fallback Path: $D serve Fails
348
+
349
+ ```
350
+ 1. Agent tries $D compare --serve, it fails (binary missing, port error, etc.)
351
+ 2. Agent falls back to: open file:///path/board.html
352
+ 3. Agent uses AskUserQuestion: "I've opened the design board. Which variant
353
+ do you prefer? Any feedback?"
354
+ 4. User responds in text
355
+ 5. Agent proceeds with text feedback (no structured JSON)
356
+ ```
357
+
358
+ ## Files That Implement This
359
+
360
+ | File | Role |
361
+ |------|------|
362
+ | `design/src/serve.ts` | HTTP server, state machine, file writing, browser launch |
363
+ | `design/src/compare.ts` | Board HTML generation, JS for ratings/picks/regen, POST logic, post-submit lifecycle |
364
+ | `design/src/cli.ts` | CLI entry point, wires `serve` and `compare --serve` commands |
365
+ | `design/src/commands.ts` | Command registry, defines `serve` and `compare` with their args |
366
+ | `scripts/resolvers/design.ts` | `generateDesignShotgunLoop()` — template resolver that outputs the polling loop and reload instructions |
367
+ | `design-shotgun/SKILL.md.tmpl` | Skill template that orchestrates the full flow: context gathering, variant generation, `{{DESIGN_SHOTGUN_LOOP}}`, feedback confirmation |
368
+ | `design/test/serve.test.ts` | Unit tests for HTTP endpoints and state transitions |
369
+ | `design/test/feedback-roundtrip.test.ts` | E2E test: browser click → JS fetch → HTTP POST → file on disk |
370
+ | `browse/test/compare-board.test.ts` | DOM-level tests for the comparison board UI |
371
+
372
+ ## What Could Still Go Wrong
373
+
374
+ ### Known Risks (ordered by likelihood)
375
+
376
+ 1. **Agent doesn't follow sequential generate rule** — most LLMs want to parallelize. Without enforcement in the binary, this is a prompt-level instruction that can be ignored.
377
+
378
+ 2. **Agent loses port number** — context compression drops the stderr output. Agent can't reload the board. Mitigation: write port to a file.
379
+
380
+ 3. **Stale feedback files** — leftover `feedback-pending.json` from a crashed session confuses the next run. Mitigation: clean on startup.
381
+
382
+ 4. **fs.writeFileSync crash** — no try/catch on the feedback file write. Silent server death if disk is full. User sees infinite spinner.
383
+
384
+ 5. **Progress polling drift** — `setInterval(fn, 2000)` over 5 minutes. In practice, JavaScript timers are accurate enough. But if the browser tab is backgrounded, Chrome may throttle intervals to once per minute.
385
+
386
+ ### Things That Work Well
387
+
388
+ 1. **Dual-channel feedback** — stdout for foreground mode, files for background mode. Both always active. Agent can use whichever works.
389
+
390
+ 2. **Self-contained HTML** — board has all CSS, JS, and base64-encoded images inline. No external dependencies. Works offline.
391
+
392
+ 3. **Same-tab regeneration** — user stays in one tab. Board auto-refreshes via `/api/progress` polling + `window.location.reload()`. No tab explosion.
393
+
394
+ 4. **Graceful degradation** — POST failure shows copyable JSON. Progress timeout shows clear error message. No silent failures.
395
+
396
+ 5. **Post-submit lifecycle** — board becomes read-only after submit. No zombie forms. Clear "what to do next" message.
397
+
398
+ ## Test Coverage
399
+
400
+ ### What's Tested
401
+
402
+ | Flow | Test | File |
403
+ |------|------|------|
404
+ | Submit → feedback.json on disk | browser click → file | `feedback-roundtrip.test.ts` |
405
+ | Post-submit UI lockdown | inputs disabled, success shown | `feedback-roundtrip.test.ts` |
406
+ | Regenerate → feedback-pending.json | chiclet + regen click → file | `feedback-roundtrip.test.ts` |
407
+ | "More like this" → specific action | more_like_B in JSON | `feedback-roundtrip.test.ts` |
408
+ | Spinner after regenerate | DOM shows loading text | `feedback-roundtrip.test.ts` |
409
+ | Full regen → reload → submit | 2-round trip | `feedback-roundtrip.test.ts` |
410
+ | Server starts on random port | port 0 binding | `serve.test.ts` |
411
+ | HTML injection of server URL | __GSTACK_SERVER_URL check | `serve.test.ts` |
412
+ | Invalid JSON rejection | 400 response | `serve.test.ts` |
413
+ | HTML file validation | exit 1 if missing | `serve.test.ts` |
414
+ | Timeout behavior | exit 1 after timeout | `serve.test.ts` |
415
+ | Board DOM structure | radios, stars, chiclets | `compare-board.test.ts` |
416
+
417
+ ### What's NOT Tested
418
+
419
+ | Gap | Risk | Priority |
420
+ |-----|------|----------|
421
+ | Double-click submit race | Low — inputs disable on first response | P3 |
422
+ | Progress polling timeout (150 iterations) | Medium — 5 min is long to wait in a test | P2 |
423
+ | Server crash during regeneration | Medium — user sees infinite spinner | P2 |
424
+ | Network timeout during POST | Low — localhost is fast | P3 |
425
+ | Backgrounded Chrome tab throttling intervals | Medium — could extend 5-min timeout to 30+ min | P2 |
426
+ | Large feedback payload | Low — board constructs fixed-shape JSON | P3 |
427
+ | Concurrent sessions (two boards, one server) | Low — each $D serve gets its own port | P3 |
428
+ | Stale feedback file from prior session | Medium — could confuse new polling loop | P2 |
429
+
430
+ ## Potential Improvements
431
+
432
+ ### Short-term (this branch)
433
+
434
+ 1. **Write port to file** — `serve.ts` writes `serve.port` to disk on startup. Agent reads it anytime. 5 lines.
435
+ 2. **Clean stale files on startup** — `serve.ts` deletes `feedback*.json` before starting. 3 lines.
436
+ 3. **Guard double-click** — check `state === 'done'` at top of `handleFeedback()`. 2 lines.
437
+ 4. **try/catch file write** — wrap `fs.writeFileSync` in try/catch, return 500 on failure. 5 lines.
438
+
439
+ ### Medium-term (follow-up)
440
+
441
+ 5. **WebSocket instead of polling** — replace `setInterval` + `GET /api/progress` with a WebSocket connection. Board gets instant notification when new HTML is ready. Eliminates polling drift and backgrounded-tab throttling. ~50 lines in serve.ts + ~20 lines in compare.ts.
442
+
443
+ 6. **Port file for agent** — write `{"port": 54321, "pid": 12345, "html": "/path/board.html"}` to `$_DESIGN_DIR/serve.json`. Agent reads this instead of parsing stderr. Makes the system more robust to context loss.
444
+
445
+ 7. **Feedback schema validation** — validate the POST body against a JSON schema before writing. Catch malformed feedback early instead of confusing the agent downstream.
446
+
447
+ ### Long-term (design direction)
448
+
449
+ 8. **Persistent design server** — instead of launching `$D serve` per session, run a long-lived design daemon (like the browse daemon). Multiple boards share one server. Eliminates cold start. But adds daemon lifecycle management complexity.
450
+
451
+ 9. **Real-time collaboration** — two agents (or one agent + one human) working on the same board simultaneously. Server broadcasts state changes via WebSocket. Requires conflict resolution on feedback.