@tekyzinc/gsd-t 2.31.17 → 2.33.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,43 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [2.33.11] - 2026-03-05
6
+
7
+ ### Added
8
+ - `.gitignore` excludes `.claude/worktrees/` (Claude Code internal) and `nul` (Windows artifact)
9
+ - `ai-evals-analysis.md`, `gsd-t-command-doc-matrix.csv` — development reference documents
10
+ - `scripts/gsd-t-dashboard-mockup.html` — interactive mockup from M15 brainstorm (historical reference)
11
+ - `.gsd-t/brainstorm-2026-02-18.md` — brainstorm notes from Feb 18 ideation session
12
+
13
+ ## [2.33.10] - 2026-03-04
14
+
15
+ ### Added
16
+ - **Milestone 15: Real-Time Agent Dashboard** — Zero-dependency live browser dashboard for GSD-T execution:
17
+ - **`scripts/gsd-t-dashboard-server.js`** (141 lines): Node.js HTTP+SSE server (zero external deps). Watches `.gsd-t/events/*.jsonl`, streams up to 500 existing events on connect, tails for new events, keepalive every 15s. Runs detached with PID file. All functions exported for testability (23 unit tests in `test/dashboard-server.test.js`).
18
+ - **`scripts/gsd-t-dashboard.html`** (194 lines): Browser dashboard using React 17 + React Flow v11.11.4 + Dagre via CDN (no build step, no npm deps). Dark theme. Renders agent hierarchy as directed graph from `parent_agent_id` relationships. Live event feed (max 200 events, outcome color-coded: green=success, red=failure, yellow=learning). Auto-reconnects on disconnect.
19
+ - **`commands/gsd-t-visualize`**: 48th GSD-T command. Starts server via `--detach`, polls `/ping` up to 5s, opens browser cross-platform (win32/darwin/linux). Accepts `stop` argument. Includes Step 0 self-spawn with OBSERVABILITY LOGGING.
20
+ - Both `gsd-t-dashboard-server.js` and `gsd-t-dashboard.html` automatically installed to `~/.claude/scripts/` during `npx @tekyzinc/gsd-t install/update`
21
+ - 23 new tests in `test/dashboard-server.test.js` — total: 176/176 passing
22
+
23
+ ### Changed
24
+ - Total command count: 47 → **48** (44 GSD-T workflow + 4 utility)
25
+
26
+ ## [2.32.10] - 2026-03-04
27
+
28
+ ### Added
29
+ - **Milestone 14: Execution Intelligence Layer** — Structured observability, learning, and reflection:
30
+ - **`scripts/gsd-t-event-writer.js`**: New zero-dependency CLI + module.exports. Writes structured JSONL events to `.gsd-t/events/YYYY-MM-DD.jsonl`. Validates 8 event_type values and 5 outcome values. Symlink-safe. Resolves events dir from `GSD_T_PROJECT_DIR` or cwd. 26 new tests.
31
+ - **Heartbeat enrichment**: `scripts/gsd-t-heartbeat.js` maps `SubagentStart`/`SubagentStop`/`PostToolUse` hook events to the events/ schema, appending them to daily JSONL files alongside existing heartbeat writes.
32
+ - **Outcome-tagged Decision Log**: `execute`, `debug`, and `wave` now prefix all new Decision Log entries with `[success]`, `[failure]`, `[learning]`, or `[deferred]`.
33
+ - **Pre-task experience retrieval (Reflexion pattern)**: `execute` and `debug` grep the Decision Log for `[failure]`/`[learning]` entries matching the current domain before spawning subagents. Relevant past failures prepended as `⚠️ Past Failures` block in subagent prompt.
34
+ - **Phase transition events**: `wave` writes `phase_transition` event with outcome:success/failure at each phase boundary.
35
+ - **Distillation step** (Step 2.5 in `complete-milestone`): Scans event stream for patterns seen ≥3 times, proposes CLAUDE.md / constraints.md rule additions, requires user confirmation before any write.
36
+ - **`commands/gsd-t-reflect`** (134 lines, 47th command): On-demand retrospective from event stream. Generates `.gsd-t/retrospectives/YYYY-MM-DD-{milestone}.md` with What Worked / What Failed / Patterns Found / Proposed Memory Updates. Includes Step 0 self-spawn with OBSERVABILITY LOGGING.
37
+ - `gsd-t-event-writer.js` installed to `~/.claude/scripts/` during install/update
38
+
39
+ ### Changed
40
+ - Total command count: 46 → **47** (43 GSD-T workflow + 4 utility)
41
+
5
42
  ## [2.28.10] - 2026-02-18
6
43
 
7
44
  ### Added
package/README.md CHANGED
@@ -7,6 +7,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
7
7
  **Maintains test coverage** — automatically keeps tests aligned with code changes.
8
8
  **Catches downstream effects** — analyzes impact before changes break things.
9
9
  **Protects existing work** — destructive action guard prevents schema drops, architecture replacements, and data loss without explicit approval.
10
+ **Visualizes execution in real time** — live browser dashboard renders agent hierarchy, tool activity, and phase progression from the event stream.
10
11
 
11
12
  ---
12
13
 
@@ -18,7 +19,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
18
19
  npx @tekyzinc/gsd-t install
19
20
  ```
20
21
 
21
- This installs 42 GSD-T commands + 4 utility commands (46 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
22
+ This installs 44 GSD-T commands + 4 utility commands (48 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
22
23
 
23
24
  ### Start Using It
24
25
 
@@ -144,6 +145,8 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
144
145
  | `/user:gsd-t-status` | Cross-domain progress view | Manual |
145
146
  | `/user:gsd-t-resume` | Restore context, continue | Manual |
146
147
  | `/user:gsd-t-quick` | Fast task with GSD-T guarantees | Manual |
148
+ | `/user:gsd-t-reflect` | Generate retrospective from event stream, propose memory updates | Manual |
149
+ | `/user:gsd-t-visualize` | Launch browser dashboard — SSE server + React Flow agent visualization | Manual |
147
150
  | `/user:gsd-t-debug` | Systematic debugging with state | Manual |
148
151
  | `/user:gsd-t-health` | Validate .gsd-t/ structure, optionally repair | Manual |
149
152
  | `/user:gsd-t-pause` | Save exact position for reliable resume | Manual |
@@ -230,6 +233,8 @@ your-project/
230
233
  │ │ ├── scope.md
231
234
  │ │ ├── tasks.md
232
235
  │ │ └── constraints.md
236
+ │ ├── events/ # Execution event stream (JSONL, daily-rotated)
237
+ │ ├── retrospectives/ # Retrospective reports from gsd-t-reflect
233
238
  │ ├── milestones/ # Archived completed milestones
234
239
  │ │ └── {milestone-name}-{date}/
235
240
  │ └── scan/ # Codebase analysis outputs
@@ -247,6 +252,7 @@ your-project/
247
252
  5. **State survives sessions.** Everything is in `.gsd-t/`.
248
253
  6. **Plan is single-brain, execute is multi-brain.** Planning and integration always solo; execution and verification can parallelize.
249
254
  7. **Every decision is logged.** The Decision Log captures why, not just what.
255
+ 8. **Agents learn from experience.** Every command invocation, phase transition, and subagent spawn is captured as a structured event. Past failures surface before each task (Reflexion pattern). Distillation converts repeated patterns into lasting CLAUDE.md rules.
250
256
 
251
257
  ---
252
258
 
@@ -298,8 +304,8 @@ get-stuff-done-teams/
298
304
  ├── LICENSE
299
305
  ├── bin/
300
306
  │ └── gsd-t.js # CLI installer
301
- ├── commands/ # 46 slash commands
302
- │ ├── gsd-t-*.md # 42 GSD-T workflow commands
307
+ ├── commands/ # 48 slash commands
308
+ │ ├── gsd-t-*.md # 44 GSD-T workflow commands
303
309
  │ ├── gsd.md # GSD-T smart router
304
310
  │ ├── branch.md # Git branch helper
305
311
  │ ├── checkin.md # Auto-version + commit/push helper
@@ -314,6 +320,12 @@ get-stuff-done-teams/
314
320
  │ ├── progress.md
315
321
  │ ├── backlog.md
316
322
  │ └── backlog-settings.md
323
+ ├── scripts/ # Runtime utility scripts (installed to ~/.claude/scripts/)
324
+ │ ├── gsd-t-tools.js # State CLI (get/set/validate/list)
325
+ │ ├── gsd-t-statusline.js # Context usage bar
326
+ │ ├── gsd-t-event-writer.js # Structured JSONL event writer
327
+ │ ├── gsd-t-dashboard-server.js # Zero-dep SSE server for dashboard
328
+ │ └── gsd-t-dashboard.html # React Flow + Dagre real-time dashboard
317
329
  ├── examples/
318
330
  │ ├── settings.json
319
331
  │ └── .gsd-t/
package/bin/gsd-t.js CHANGED
@@ -518,7 +518,7 @@ function configureAutoRouteHook(scriptPath) {
518
518
 
519
519
  // ─── Utility Scripts ─────────────────────────────────────────────────────────
520
520
 
521
- const UTILITY_SCRIPTS = ["gsd-t-tools.js", "gsd-t-statusline.js"];
521
+ const UTILITY_SCRIPTS = ["gsd-t-tools.js", "gsd-t-statusline.js", "gsd-t-event-writer.js", "gsd-t-dashboard-server.js", "gsd-t-dashboard.html"];
522
522
 
523
523
  function installUtilityScripts() {
524
524
  ensureDir(SCRIPTS_DIR);
@@ -84,26 +84,42 @@ If mode is unclear, ask: "What kind of thinking would be most useful right now
84
84
  4. When energy shifts to a new idea, follow it
85
85
  5. Periodically collect the best ideas into a running list
86
86
 
87
- ### Team Mode (if enabled and user requests):
87
+ ### Deep Research Phase (MANDATORY always runs before Step 5):
88
+
89
+ Before drawing any conclusions or presenting final insights, spawn a team of parallel research agents. **This is not optional.** No brainstorm session may land (Step 5) until this research phase is complete. The purpose is to ensure conclusions are grounded in evidence — not just intuition — so the brainstorm surfaces the genuinely best path forward and avoids going down the wrong path.
88
90
 
89
91
  **OBSERVABILITY LOGGING (MANDATORY):**
90
92
  Before spawning the team — run via Bash:
91
93
  `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
92
94
 
93
95
  ```
94
- Create an agent team for brainstorming:
95
-
96
- - Teammate "visionary": Push boundaries. What's the most ambitious
97
- version? What would make this remarkable? Think in terms of
98
- possibilities, not constraints.
99
-
100
- - Teammate "pragmatist": Keep it real. What can we actually build?
101
- What's the shortest path to value? Where are the hidden costs?
102
-
103
- - Teammate "devil's advocate": Challenge everything. Why might this
104
- fail? What are we not seeing? Which assumptions are weakest?
105
-
106
- Lead: Synthesize the best insights from all three perspectives.
96
+ Spawn a deep research team (run all three in parallel):
97
+
98
+ - Teammate "researcher-landscape": Search external sources, docs, and
99
+ prior art. What solutions already exist for this problem or idea?
100
+ What have others tried? What are the known pitfalls? What does the
101
+ current state of the art look like? Produce a research brief with
102
+ concrete findings and citations.
103
+
104
+ - Teammate "researcher-alternatives": Enumerate 3–5 fundamentally
105
+ different technical or architectural approaches to this problem.
106
+ For each: what are the trade-offs, risks, costs, and prerequisites?
107
+ Which is most promising and why? Consider approaches that might
108
+ require a completely different direction from the current thinking.
109
+
110
+ - Teammate "researcher-analogies": Look outside the immediate domain.
111
+ How have adjacent industries, other products, or different technical
112
+ domains solved similar problems? Find non-obvious analogies and
113
+ extract transferable insights that the team may not have considered.
114
+
115
+ Lead: Wait for all three researchers to report before proceeding.
116
+ Then synthesize:
117
+ 1. What did we learn that changes or validates the initial thinking?
118
+ 2. Which ideas from the brainstorm are supported by research findings?
119
+ 3. Which ideas should be reconsidered or ruled out based on evidence?
120
+ 4. What is the most promising path forward, and what is the evidence for it?
121
+
122
+ Do NOT proceed to Step 5 until this synthesis is complete.
107
123
  ```
108
124
 
109
125
  After team completes — run via Bash:
@@ -112,7 +128,7 @@ Compute tokens and compaction:
112
128
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
113
129
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
114
130
  Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
115
- `| {DT_START} | {DT_END} | gsd-t-brainstorm | Step 3 | sonnet | {DURATION}s | team brainstorm: {topic summary} | {TOKENS} | {COMPACTED} |`
131
+ `| {DT_START} | {DT_END} | gsd-t-brainstorm | Step 3 | sonnet | {DURATION}s | deep research: {topic summary} | {TOKENS} | {COMPACTED} |`
116
132
 
117
133
  ## Step 4: Capture the Sparks
118
134
 
@@ -17,6 +17,30 @@ If status is not VERIFIED:
17
17
 
18
18
  If `--force` flag provided, proceed with warning in archive.
19
19
 
20
+ ## Step 1.5: Smoke Test Artifact Gate (MANDATORY — Categories 2 and 7)
21
+
22
+ Before archiving, verify that high-risk features have testable artifacts. This gate catches what code review and unit tests cannot.
23
+
24
+ **Scan this milestone's domains for any of the following:**
25
+ - Audio capture/playback, speech recognition/synthesis
26
+ - GPU/WebGPU/WebGL compute or rendering
27
+ - ML inference, model loading, quantized model execution
28
+ - Background workers, service workers, IPC channels
29
+ - Native APIs (camera, bluetooth, filesystem, microphone)
30
+ - WebAssembly modules
31
+ - Any feature whose only prior "test" was manual user interaction
32
+
33
+ **For each high-risk feature found:**
34
+
35
+ 1. Check that a smoke test script exists (in `scripts/`, `tests/`, or `.gsd-t/smoke-tests/`)
36
+ 2. Check that the script was run and passed (evidence in token-log.md, CI output, or a `.gsd-t/smoke-tests/{feature}.md` file with run results)
37
+ 3. If manual steps remain unavoidable: `.gsd-t/smoke-tests/{feature}.md` must exist documenting exact steps and confirming they passed
38
+
39
+ **If any high-risk feature lacks a smoke test artifact → BLOCK completion.**
40
+ Do not proceed to archiving. Create the smoke test now, run it, confirm it passes, then continue.
41
+
42
+ > This gate exists because complete-milestone is the last opportunity to catch "shipped blind" features before they become user-facing bugs requiring 15 debug sessions to resolve.
43
+
20
44
  ## Step 2: Gap Analysis Gate
21
45
 
22
46
  After verification passes, run a gap analysis against `docs/requirements.md` scoped to this milestone's deliverables:
@@ -33,6 +57,27 @@ After verification passes, run a gap analysis against `docs/requirements.md` sco
33
57
 
34
58
  This is a **mandatory gate** — the milestone cannot be archived with known gaps against its requirements.
35
59
 
60
+ ## Step 2.5: Distillation — Extract Milestone Patterns
61
+
62
+ Before archiving, extract learning from the event stream to improve future runs.
63
+
64
+ 1. Check if `.gsd-t/events/` exists and has any `.jsonl` files for this milestone period
65
+ - If no events files found: skip distillation (log "No events recorded — distillation skipped"), continue to Step 3
66
+ - If event-writer not installed (`node ~/.claude/scripts/gsd-t-event-writer.js 2>/dev/null || true`): skip gracefully
67
+
68
+ 2. Parse events: scan `.gsd-t/events/*.jsonl` for events with `"outcome":"failure"` or `"outcome":"learning"`
69
+
70
+ 3. Group by `reasoning` field value — count occurrences of each distinct reasoning string
71
+
72
+ 4. For each group with ≥ 3 occurrences:
73
+ - Formulate a concrete rule (e.g., "Always read X before modifying Y — failed 4 times without this")
74
+ - Present to user: "Pattern found {N} times: {reasoning}. Proposed rule: '{rule}'. Add to CLAUDE.md? [y/n]"
75
+ - **Wait for user confirmation before writing** (Destructive Action Guard — CLAUDE.md changes require approval)
76
+ - If approved: append the rule to CLAUDE.md under the relevant section
77
+ - Write event: `node ~/.claude/scripts/gsd-t-event-writer.js --type distillation --command gsd-t-complete-milestone --reasoning "{rule}" --outcome success || true`
78
+
79
+ 5. If no patterns found (fewer than 3 occurrences): log "Distillation complete — no repeating patterns found", continue to Step 3
80
+
36
81
  ## Step 3: Gather Milestone Artifacts
37
82
 
38
83
  Collect all files related to this milestone:
@@ -144,7 +189,8 @@ None — ready for next milestone
144
189
  | {previous} | {version} | {date} | v{version} |
145
190
 
146
191
  ## Decision Log
147
- {Keep the decision logit's valuable context}
192
+ - {date}: [success] Milestone "{name}" completed {summary of what was built}. v{version}
193
+ {Keep all prior decision log entries — they are valuable context}
148
194
  ```
149
195
 
150
196
  ## Step 8: Update README.md
@@ -41,6 +41,107 @@ Read:
41
41
  3. `.gsd-t/contracts/` — all contracts
42
42
  4. `.gsd-t/domains/*/scope.md` — domain boundaries
43
43
 
44
+ ## Step 1.5: Debug Loop Detection (MANDATORY)
45
+
46
+ Before attempting any fix, check whether this issue has been through multiple failed debug sessions. This prevents the 10–20 attempt death spiral that happens when the same approach is retried repeatedly.
47
+
48
+ **Detection:**
49
+ 1. Scan `.gsd-t/progress.md` Decision Log for `[debug]` entries related to this issue (match by keyword, error name, or component)
50
+ 2. Count distinct debug sessions that attempted to fix this issue
51
+ 3. Check `.gsd-t/deferred-items.md` for any entries matching this issue
52
+
53
+ **If 3 or more prior sessions found → Enter Deep Research Mode (below). Do NOT attempt another fix with the same approach.**
54
+
55
+ **If fewer than 3 sessions → Proceed to Step 2 normally.**
56
+
57
+ ---
58
+
59
+ ### Deep Research Mode (triggered when debug loop detected)
60
+
61
+ The current approach has failed 3+ times. This means the root cause is not yet understood. A different strategy — possibly a fundamentally different technical approach — is required.
62
+
63
+ **OBSERVABILITY LOGGING (MANDATORY):**
64
+ Before spawning — run via Bash:
65
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
66
+
67
+ ```
68
+ Spawn a deep research team (run all three in parallel):
69
+
70
+ - Teammate "researcher-root-cause": Take the broadest possible look at
71
+ the problem. Ignore prior fix attempts. Read the full component,
72
+ its dependencies, contracts, and all error traces from scratch.
73
+ What is the actual root cause — not the symptom? Consider that the
74
+ real issue may be architectural, not in the code being patched.
75
+
76
+ - Teammate "researcher-alternatives": Enumerate 3–5 fundamentally
77
+ different ways to solve this problem. Include approaches that would
78
+ require refactoring or changing the technical direction entirely.
79
+ For each: what are the trade-offs, effort, and risk?
80
+
81
+ - Teammate "researcher-prior-art": Search external sources, docs,
82
+ GitHub issues, and known patterns for this class of bug. Has this
83
+ problem been documented elsewhere? What did others find? Are there
84
+ framework-specific pitfalls or known workarounds?
85
+
86
+ Lead: Wait for all three researchers to complete. Then synthesize:
87
+ 1. What is the true root cause based on full investigation?
88
+ 2. What are the viable solution paths (ranked by confidence)?
89
+ 3. Does any path require a different technical approach than what has been tried?
90
+ 4. What is the recommended path and why?
91
+ ```
92
+
93
+ After team completes — run via Bash:
94
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
95
+ Compute tokens and compaction:
96
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
97
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
98
+ Append to `.gsd-t/token-log.md`:
99
+ `| {DT_START} | {DT_END} | gsd-t-debug | Step 1.5 | sonnet | {DURATION}s | deep research loop break: {issue summary} | {TOKENS} | {COMPACTED} |`
100
+
101
+ **STOP. Present findings to the user before making any changes:**
102
+
103
+ ```
104
+ ## Debug Loop Break — Research Findings
105
+
106
+ **Issue**: {issue summary}
107
+ **Prior sessions**: {count} failed attempts
108
+
109
+ **Root Cause (revised)**: {finding from researcher-root-cause}
110
+
111
+ **Solution Options**:
112
+ | # | Approach | Effort | Risk | Notes |
113
+ |---|----------|--------|------|-------|
114
+ | 1 | {option} | {effort} | {risk} | {notes} |
115
+ | 2 | {option} | {effort} | {risk} | {notes} |
116
+ | 3 | {option} | {effort} | {risk} | {notes} |
117
+
118
+ **Recommendation**: {recommended option and rationale}
119
+
120
+ **Does this require a different technical direction?** {Yes/No — explain}
121
+
122
+ Please select an option (or provide your own direction) before I proceed.
123
+ ```
124
+
125
+ **Wait for explicit user selection/approval.** Do NOT proceed with any fix until the user confirms the chosen approach. If the recommendation requires refactoring or changing technical direction, the Destructive Action Guard applies — present the full migration path and wait for approval.
126
+
127
+ ---
128
+
129
+ ## Step 1.7: Experience Retrieval
130
+
131
+ Before proceeding to classification and fix, retrieve relevant past failures from the Decision Log.
132
+
133
+ Run via Bash:
134
+ `grep -i "\[failure\]\|\[learning\]" .gsd-t/progress.md | tail -10`
135
+
136
+ If results found:
137
+ - Display a `## ⚠️ Relevant Past Failures` block showing matching entries (max 5 lines)
138
+ - Pass this block as context to any debug subagent spawned in Step 3
139
+ - Write event via Bash: `node ~/.claude/scripts/gsd-t-event-writer.js --type experience_retrieval --command gsd-t-debug --reasoning "{N entries found}" --outcome null || true`
140
+
141
+ If no results found: proceed normally to Step 2.
142
+
143
+ ---
144
+
44
145
  ## Step 2: Classify the Bug
45
146
 
46
147
  Based on the user's description ($ARGUMENTS), determine:
@@ -71,6 +172,27 @@ The contract didn't specify something it should have. Symptoms:
71
172
 
72
173
  → Update the contract, then fix implementations on both sides.
73
174
 
175
+ ## Step 2.5: Reproduce First (MANDATORY — Category 5)
176
+
177
+ **A fix attempt without a reproduction script is a guess, not a fix.**
178
+
179
+ Before touching any code:
180
+
181
+ 1. **Write a reproduction script** that demonstrates the bug. Automate as much as possible:
182
+ - Unit/integration bug → write a failing test that proves the bug exists
183
+ - UI/audio/GPU/worker bug (not fully automatable) → write the closest possible script: a headless probe, a log-based trigger, a mock that replicates the failure path. Document the manual remainder explicitly.
184
+ - If you cannot write any form of reproduction → you do not yet understand the bug. Keep investigating until you can.
185
+
186
+ 2. **Run the reproduction** and confirm it fails before attempting any fix.
187
+
188
+ 3. **Never close a debug session with "ready for testing."** A session closes only when the reproduction script passes. If manual steps remain, document them explicitly and confirm they passed.
189
+
190
+ 4. **Log the reproduction script path** in `.gsd-t/progress.md` Decision Log: what it tests, how to run it, what passing looks like.
191
+
192
+ > This rule exists because code review cannot detect silent runtime failures (GPU compute shaders, audio context state, worker message drops). Only execution proves correctness.
193
+
194
+ ---
195
+
74
196
  ## Step 3: Debug (Solo or Team)
75
197
 
76
198
  ### Deviation Rules
@@ -81,16 +203,17 @@ When you encounter unexpected situations during the fix:
81
203
  3. **Blocker (missing file, wrong API response)** → Fix blocker and continue. Log if non-trivial.
82
204
  4. **Architectural change required to fix correctly** → STOP. Explain what exists, what needs to change, what breaks, and a migration path. Wait for user approval. Never self-approve.
83
205
 
84
- **3-attempt limit**: If your fix doesn't work after 3 attempts, log to `.gsd-t/deferred-items.md` and stop trying.
206
+ **3-attempt limit**: If your fix doesn't work after 3 attempts within this session, treat it as a loop. Do NOT keep trying the same approach. Log the attempt to `.gsd-t/progress.md` Decision Log with a `[failure]` prefix, then return to Step 1.5 and run Deep Research Mode before any further attempts. Present findings and options to the user before proceeding.
85
207
 
86
208
  ### Solo Mode
87
- 1. Reproduce the issue
209
+ 1. Reproduce the issue — **reproduction script must exist before step 2** (see Step 2.5)
88
210
  2. Trace through the relevant domain(s)
89
211
  3. Check contract compliance at each boundary
90
212
  4. Identify root cause
91
213
  5. **Destructive Action Guard**: If the fix requires destructive or structural changes (dropping tables, removing columns, changing schema, replacing architecture patterns, removing working modules) → STOP and present the change to the user with what exists, what will change, what will break, and a safe migration path. Wait for explicit approval.
92
214
  6. Fix and test — **adapt the fix to existing structures**, not the other way around
93
215
  7. Update contracts if needed
216
+ 8. **Category 6 — Bug Isolation Check**: After applying the fix, run the FULL test suite and all smoke tests — not just the reproduction script. Do not assume the bug was isolated. A fix that resolves one failure frequently uncovers adjacent failures. Every test must pass before the session closes.
94
217
 
95
218
  ### Team Mode (for complex cross-domain bugs)
96
219
  ```
@@ -110,7 +233,11 @@ First to find root cause: message the lead with findings.
110
233
  After fixing, assess what documentation was affected by the change and update ALL relevant files:
111
234
 
112
235
  ### Always check:
113
- 1. **`.gsd-t/progress.md`** — Add to Decision Log: what broke, why, and the fix
236
+ 1. **`.gsd-t/progress.md`** — Add to Decision Log: what broke, why, and the fix. Prefix the entry with an outcome tag:
237
+ - Debug session start → prefix `[debug]`
238
+ - Fix succeeded → prefix `[success]`
239
+ - Fix failed → prefix `[failure]`
240
+ - Issue deferred → prefix `[deferred]`
114
241
  2. **`.gsd-t/contracts/`** — Update any contract if the fix changed an interface, schema, or API shape
115
242
  3. **Domain `constraints.md`** — Add a "must not" rule if the bug was caused by a pattern that should be avoided
116
243
 
@@ -70,6 +70,16 @@ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime
70
70
 
71
71
  **For each domain (in wave order), spawn:**
72
72
 
73
+ **Pre-task experience retrieval (before spawning each domain subagent):**
74
+ Run via Bash:
75
+ `grep -i "\[failure\]\|\[learning\]" .gsd-t/progress.md | grep -i "{domain-name}" | tail -5`
76
+
77
+ If results found:
78
+ - Prepend a `## ⚠️ Past Failures (retrieve before acting)` block to the subagent prompt (max 5 lines from results)
79
+ - Write event via Bash: `node ~/.claude/scripts/gsd-t-event-writer.js --type experience_retrieval --command gsd-t-execute --reasoning "{N past failures found for {domain-name}}" --outcome null || true`
80
+
81
+ If no results found: proceed normally (no warning block, no event write).
82
+
73
83
  ```
74
84
  Task subagent (general-purpose, model: sonnet, mode: bypassPermissions):
75
85
  "You are executing all tasks for the {domain-name} domain.
@@ -98,7 +108,11 @@ Execute each incomplete task in order:
98
108
  7. Run ALL tests — unit, integration, Playwright. Fix failures (up to 2 attempts)
99
109
  8. Run Pre-Commit Gate checklist from CLAUDE.md — update all affected docs BEFORE committing
100
110
  9. Commit immediately: feat({domain-name}/task-{N}): {description}
101
- 10. Update .gsd-t/progress.md — mark task complete
111
+ 10. Update .gsd-t/progress.md — mark task complete; prefix the Decision Log entry with an outcome tag based on how the task completed:
112
+ - Task completed successfully on first attempt → prefix `[success]`
113
+ - Task completed after a fix (required debugging or correction) → prefix `[learning]`
114
+ - Task deferred to .gsd-t/deferred-items.md → prefix `[deferred]`
115
+ - Task failed after 3 attempts → prefix `[failure]`
102
116
  11. Spawn QA subagent (model: haiku) after each task:
103
117
  'Run the full test suite. Read .gsd-t/contracts/ for definitions.
104
118
  Report: pass/fail counts and coverage gaps.'
@@ -51,6 +51,8 @@ UTILITIES Manual
51
51
  status Cross-domain progress view
52
52
  resume Restore context after break
53
53
  quick Fast task with GSD-T guarantees
54
+ reflect Generate retrospective from event stream, propose memory updates
55
+ visualize Launch browser dashboard (SSE server + React Flow)
54
56
  debug Systematic debugging with state
55
57
  health Validate .gsd-t/ structure, optionally repair missing files
56
58
  pause Save exact position for reliable resume later
@@ -299,6 +301,20 @@ Use these when user asks for help on a specific command:
299
301
  - **Creates**: Quick task record
300
302
  - **Use when**: Small tasks that don't need full planning
301
303
 
304
+ ### reflect
305
+ - **Summary**: Generate a structured retrospective from the event stream for the current milestone, then propose CLAUDE.md/constraints.md rule additions based on recurring patterns
306
+ - **Auto-invoked**: No
307
+ - **Reads**: `.gsd-t/events/*.jsonl`, `.gsd-t/progress.md`, `CLAUDE.md`
308
+ - **Creates**: `.gsd-t/retrospectives/YYYY-MM-DD-{milestone}.md`
309
+ - **Use when**: After completing a milestone or mid-milestone to surface what's working, what's failing, and what patterns should become permanent rules
310
+
311
+ ### visualize
312
+ - **Summary**: Launch the real-time agent dashboard — starts the SSE server (if not running) and opens the React Flow visualization in a browser
313
+ - **Auto-invoked**: No
314
+ - **Reads**: `.gsd-t/dashboard.pid`, `.gsd-t/events/*.jsonl` (via server)
315
+ - **Creates**: `.gsd-t/dashboard.pid` (when starting server)
316
+ - **Use when**: Monitoring live agent activity during execute/wave phases; run `gsd-t-visualize stop` to stop the server
317
+
302
318
  ### debug
303
319
  - **Summary**: Systematic debugging with persistent state
304
320
  - **Auto-invoked**: No
@@ -54,6 +54,7 @@ Skip the copy (step 2) silently if the target already exists.
54
54
  │ └── .gitkeep
55
55
  ├── domains/
56
56
  │ └── .gitkeep
57
+ ├── events/
57
58
  ├── backlog.md
58
59
  ├── backlog-settings.md
59
60
  ├── progress.md
@@ -61,6 +62,8 @@ Skip the copy (step 2) silently if the target already exists.
61
62
  └── qa-issues.md
62
63
  ```
63
64
 
65
+ Create `.gsd-t/events/` directory (empty — populated at runtime by heartbeat and event writer).
66
+
64
67
  Create `token-log.md` with header row:
65
68
  ```
66
69
  | Date | Command | Step | Model | Duration(s) | Notes |
@@ -17,6 +17,63 @@ If `.gsd-t/` doesn't exist, create the full directory structure:
17
17
  └── progress.md
18
18
  ```
19
19
 
20
+ ## Step 1.5: Assumption Audit (MANDATORY — complete before domain work begins)
21
+
22
+ Before partitioning, surface and lock down all assumptions baked into the requirements. Unexamined assumptions become architectural decisions no one approved.
23
+
24
+ Work through each category below. For every match found, write the explicit disposition into the affected domain's `constraints.md` and into the Decision Log in `.gsd-t/progress.md`.
25
+
26
+ ---
27
+
28
+ ### Category 1: External Reference Assumptions
29
+
30
+ Scan requirements for any external project, file, component, library, or URL mentioned by name or path. For each one found, explicitly confirm which disposition applies — and lock it in the contract before any domain touches it:
31
+
32
+ | Disposition | Meaning |
33
+ |-------------|---------|
34
+ | `USE` | Import and depend on it — treat as a dependency |
35
+ | `INSPECT` | Read source for patterns only — do not import or copy code |
36
+ | `BUILD` | Build equivalent functionality from scratch — do not read or use it |
37
+
38
+ **No external reference survives partition without a locked disposition.**
39
+
40
+ Trigger phrases to watch for: "reference X", "like X", "similar to Y", "see W for how it handles Z", any file path or project name, any URL.
41
+
42
+ > If Level 3 (Full Auto): state the inferred disposition and reason; lock it unless it's ambiguous.
43
+ > If ambiguous (e.g., "reference X" could mean USE or INSPECT): pause and ask the user before proceeding.
44
+
45
+ ---
46
+
47
+ ### Category 3: Black Box Assumptions
48
+
49
+ Any component, module, or library **not written in this milestone** that a domain will call, import, or depend on → the agent that executes that domain must read its source before treating it as correct. This includes internal project modules written in a previous milestone.
50
+
51
+ For each such component identified:
52
+ 1. Name it explicitly in the domain's `constraints.md` under a `## Must Read Before Using` section
53
+ 2. List the specific functions or behaviors the domain depends on
54
+ 3. The execute agent is prohibited from treating it as a black box — it must read the listed items before implementing
55
+
56
+ ---
57
+
58
+ ### Category 4: User Intent Assumptions
59
+
60
+ Scan requirements for ambiguous language. Flag every instance where intent could be interpreted more than one way. Common patterns:
61
+
62
+ - "like X" / "similar to Y" — does this mean the same UX, the same architecture, or just the same concept?
63
+ - "the way X handles it" — inspiration, direct port, or behavioral equivalent?
64
+ - "reference Z" — does this mean read it, use it, or replicate it?
65
+ - "build something that does W" — from scratch, or using an existing library?
66
+ - Any requirement where a reasonable developer could make two different implementation choices
67
+
68
+ For each ambiguous item:
69
+ 1. State the two (or more) possible interpretations explicitly
70
+ 2. State which interpretation you are locking in and why
71
+ 3. If genuinely unclear: pause and ask the user — do not infer and proceed
72
+
73
+ > **Rule**: Ambiguous intent that reaches execute unresolved becomes a wrong assumption. Resolve it here or pay for it in debug sessions.
74
+
75
+ ---
76
+
20
77
  ## Step 2: Identify Domains
21
78
 
22
79
  Decompose the milestone into 2-5 independent domains. Each domain should: