@tekyzinc/gsd-t 2.74.13 → 3.10.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/CHANGELOG.md +165 -0
  2. package/README.md +117 -1
  3. package/bin/advisor-integration.js +93 -0
  4. package/bin/check-headless-sessions.js +140 -0
  5. package/bin/context-meter-config.cjs +101 -0
  6. package/bin/context-meter-config.test.cjs +101 -0
  7. package/bin/gsd-t-unattended-platform.js +381 -0
  8. package/bin/gsd-t-unattended-safety.js +766 -0
  9. package/bin/gsd-t-unattended.js +1259 -0
  10. package/bin/gsd-t.js +723 -19
  11. package/bin/handoff-lock.js +249 -0
  12. package/bin/headless-auto-spawn.js +328 -0
  13. package/bin/model-selector.js +224 -0
  14. package/bin/runway-estimator.js +242 -0
  15. package/bin/token-budget.js +96 -89
  16. package/bin/token-optimizer.js +471 -0
  17. package/bin/token-telemetry.js +246 -0
  18. package/commands/gsd-t-audit.md +3 -3
  19. package/commands/gsd-t-backlog-list.md +38 -0
  20. package/commands/gsd-t-brainstorm.md +3 -3
  21. package/commands/gsd-t-complete-milestone.md +24 -0
  22. package/commands/gsd-t-debug.md +124 -7
  23. package/commands/gsd-t-discuss.md +10 -3
  24. package/commands/gsd-t-doc-ripple.md +32 -4
  25. package/commands/gsd-t-execute.md +107 -52
  26. package/commands/gsd-t-help.md +22 -0
  27. package/commands/gsd-t-integrate.md +67 -4
  28. package/commands/gsd-t-optimization-apply.md +91 -0
  29. package/commands/gsd-t-optimization-reject.md +94 -0
  30. package/commands/gsd-t-partition.md +7 -0
  31. package/commands/gsd-t-pause.md +3 -0
  32. package/commands/gsd-t-plan.md +10 -3
  33. package/commands/gsd-t-prd.md +3 -3
  34. package/commands/gsd-t-quick.md +71 -9
  35. package/commands/gsd-t-reflect.md +3 -7
  36. package/commands/gsd-t-resume.md +86 -1
  37. package/commands/gsd-t-status.md +31 -0
  38. package/commands/gsd-t-test-sync.md +7 -0
  39. package/commands/gsd-t-unattended-stop.md +83 -0
  40. package/commands/gsd-t-unattended-watch.md +290 -0
  41. package/commands/gsd-t-unattended.md +414 -0
  42. package/commands/gsd-t-verify.md +12 -5
  43. package/commands/gsd-t-visualize.md +3 -7
  44. package/commands/gsd-t-wave.md +82 -18
  45. package/docs/GSD-T-README.md +69 -0
  46. package/docs/architecture.md +176 -4
  47. package/docs/infrastructure.md +221 -0
  48. package/docs/methodology.md +44 -0
  49. package/docs/prd-harness-evolution.md +51 -37
  50. package/docs/requirements.md +95 -0
  51. package/docs/unattended-windows-caveats.md +245 -0
  52. package/package.json +2 -2
  53. package/scripts/context-meter/count-tokens-client.js +221 -0
  54. package/scripts/context-meter/count-tokens-client.test.js +308 -0
  55. package/scripts/context-meter/test-injector.js +55 -0
  56. package/scripts/context-meter/threshold.js +88 -0
  57. package/scripts/context-meter/threshold.test.js +255 -0
  58. package/scripts/context-meter/transcript-parser.js +252 -0
  59. package/scripts/context-meter/transcript-parser.test.js +320 -0
  60. package/scripts/gsd-t-context-meter.e2e.test.js +415 -0
  61. package/scripts/gsd-t-context-meter.js +350 -0
  62. package/scripts/gsd-t-context-meter.test.js +417 -0
  63. package/scripts/gsd-t-heartbeat.js +2 -2
  64. package/scripts/gsd-t-statusline.js +23 -8
  65. package/templates/CLAUDE-global.md +17 -1
  66. package/templates/CLAUDE-project.md +26 -6
  67. package/templates/context-meter-config.json +10 -0
  68. package/templates/prompts/README.md +1 -1
  69. package/bin/task-counter.cjs +0 -161
@@ -195,6 +195,227 @@ gsd-t-verify:
195
195
  ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
196
196
  ```
197
197
 
198
+ ## Context Meter Setup (M34, v2.75.10+)
199
+
200
+ The Context Meter is a PostToolUse hook that measures real context consumption via the Anthropic `count_tokens` API. It is **required** for the `gsd-t-execute`, `gsd-t-wave`, `gsd-t-quick`, `gsd-t-integrate`, and `gsd-t-debug` session-stop gates to work.
201
+
202
+ **API key — required**
203
+
204
+ ```bash
205
+ # Shell profile (~/.zshrc, ~/.bashrc, ~/.config/fish/config.fish)
206
+ export ANTHROPIC_API_KEY="sk-ant-..."
207
+
208
+ # Verify
209
+ echo $ANTHROPIC_API_KEY | head -c 10
210
+ ```
211
+
212
+ Get a key at [https://console.anthropic.com](https://console.anthropic.com). Free tier is sufficient — `count_tokens` is billed per call at negligible cost.
213
+
214
+ **CI/CD**: set `ANTHROPIC_API_KEY` as a secret (`secrets.ANTHROPIC_API_KEY` in GitHub Actions, masked variables in GitLab CI). The existing verify workflow example above already threads the secret through.
215
+
216
+ **Per-project `.env.local` (optional)**: drop `ANTHROPIC_API_KEY=sk-ant-...` into the project's ignored env file and source it in your shell profile if you need per-project keys.
217
+
218
+ **Verify with doctor**:
219
+
220
+ ```bash
221
+ npx @tekyzinc/gsd-t doctor
222
+ ```
223
+
224
+ Expected GREEN output:
225
+
226
+ ```
227
+ Context Meter
228
+ ✅ API key set (ANTHROPIC_API_KEY)
229
+ ✅ PostToolUse hook registered in ~/.claude/settings.json
230
+ ✅ scripts/gsd-t-context-meter.js exists
231
+ ✅ .gsd-t/context-meter-config.json loads cleanly
232
+ ✅ count_tokens dry-run: 7 tokens
233
+ ```
234
+
235
+ If any check is RED, doctor exits with code 1.
236
+
237
+ **Config file** — `.gsd-t/context-meter-config.json`:
238
+
239
+ ```json
240
+ {
241
+ "enabled": true,
242
+ "apiKeyEnvVar": "ANTHROPIC_API_KEY",
243
+ "modelWindowSize": 200000,
244
+ "thresholdPct": 85,
245
+ "checkFrequency": 1
246
+ }
247
+ ```
248
+
249
+ **Threshold bands** (M35 v3.0.0 — three bands, lower-bound inclusive):
250
+
251
+ | Band | Range | Orchestrator action |
252
+ |--------|----------|----------------------------------------------------------------------------------|
253
+ | normal | 0–69% | Proceed |
254
+ | warn | 70–84% | Log warning; cue for explicit pause/resume at the next clean boundary |
255
+ | stop | ≥85% | Halt cleanly with resume instruction; command refuses to start if runway crosses |
256
+
257
+ **Zero silent quality degradation.** There is no `downgrade` band and no `conserve` band. Models are **never** swapped at runtime under context pressure — model choice is a plan-time decision made by `bin/model-selector.js`, and quality-critical phases (Red Team, doc-ripple, Design Verify) always run at their designated tier. See `.gsd-t/contracts/token-budget-contract.md` v3.0.0.
258
+
259
+ **Structural guarantee**: because the runway estimator refuses runs that project past 85% and the stop band fires at 85%, the runtime's 95% native compact is structurally unreachable under healthy operation. `halt_type: native-compact` in `.gsd-t/token-metrics.jsonl` is a defect signal.
260
+
261
+ **Upgrading from pre-M34**: `gsd-t update-all` runs a one-time task-counter retirement migration in every registered project (deletes `bin/task-counter.cjs`, `.gsd-t/task-counter-config.json`, `.gsd-t/.task-counter-state.json`, and the `.gsd-t/.task-counter` file; writes `.gsd-t/.task-counter-retired-v1` marker). After upgrade you **must** set `ANTHROPIC_API_KEY` — doctor will fail otherwise.
262
+
263
+ ## Runway-Protected Execution (M35)
264
+
265
+ M35 adds four components on top of the Context Meter. Together they replace graduated degradation with a pre-flight gate + pause/resume model.
266
+
267
+ ### Per-phase model selection — `bin/model-selector.js`
268
+
269
+ Declarative rules table mapping each phase (`plan`, `execute`, `red-team`, `doc-ripple`, `design-verify`, `qa`, `integrate`, ...) to a default tier (`haiku`|`sonnet`|`opus`). Complexity signals promoted from the task plan (`cross_module_refactor`, `security_boundary`, `data_loss_risk`, `contract_design`) escalate sonnet→opus. Each command file documents its assignments in a `## Model Assignment` block.
270
+
271
+ Contract: `.gsd-t/contracts/model-selection-contract.md` v1.0.0
272
+
273
+ ### Pre-flight runway estimator — `bin/runway-estimator.js`
274
+
275
+ Reads historical per-spawn consumption from `.gsd-t/token-metrics.jsonl` via a three-tier query fallback (exact match on `{command, phase, domain}` → command+phase → command) and produces a confidence-weighted projection of end-of-run `pct`. If the projection would cross `STOP_THRESHOLD_PCT = 85`, the command refuses to start — the interactive session exits cleanly and an autonomous headless continuation is auto-spawned. The user never types `/clear`.
276
+
277
+ ### Headless auto-spawn — `bin/headless-auto-spawn.js`
278
+
279
+ Detached child-process spawn (`child_process.spawn` with `detached:true`, `stdio:['ignore', fd, fd]`, `child.unref()`). Writes `.gsd-t/headless-sessions/{session-id}.json` with session metadata, polls with `process.kill(pid, 0)` liveness probe (timer `.unref()`-ed), marks `status: completed`, and posts a macOS `osascript` notification when done (graceful no-op on non-darwin). `bin/check-headless-sessions.js` renders the read-back banner on the next `gsd-t-resume` / `gsd-t-status`.
280
+
281
+ Directory: `.gsd-t/headless-sessions/` — one JSON per session, plus optional `{id}-context.json` and log files.
282
+
283
+ ### Per-spawn token telemetry — `.gsd-t/token-metrics.jsonl`
284
+
285
+ Frozen 18-field JSONL schema, one record per subagent spawn. Written by the orchestrator, consumed by the runway estimator and the token optimizer.
286
+
287
+ Contract: `.gsd-t/contracts/token-telemetry-contract.md` v1.0.0
288
+
289
+ Key fields: `timestamp`, `session_id`, `command`, `phase`, `domain`, `task_id`, `model`, `complexity_signals[]`, `input_tokens`, `output_tokens`, `duration_seconds`, `start_pct`, `end_pct`, `halt_type`, `halt_reason`, `exit_code`, `run_type` (`interactive`|`headless`), `projection_variance`.
290
+
291
+ `halt_type` values: `clean`, `stop-band`, `runway-refuse`, `native-compact` (defect), `crash`.
292
+
293
+ ### Optimization backlog — `.gsd-t/optimization-backlog.md`
294
+
295
+ Append-only markdown file of recalibration recommendations. `bin/token-optimizer.js` scans the last 3 milestones of telemetry at `complete-milestone` time and appends detected recommendations. Recommendations are **never** auto-applied — the user promotes via `/user:gsd-t-optimization-apply {ID}` or rejects via `/user:gsd-t-optimization-reject {ID} [--reason "..."]`. Rejected items are fingerprinted and cooled down for 5 milestones before re-surfacing.
296
+
297
+ Detection rules: `demote` (opus phase with ≥90% success, ≥3 volume), `escalate` (sonnet phase with ≥30% failure, ≥5 volume), `runway-tune` (projection vs. actual divergence >15%), `investigate` (per-phase p95 > 2× median, ≥10 volume).
298
+
299
+ ## Metrics CLI — `gsd-t metrics`
300
+
301
+ Read-only surface onto the token telemetry stream. Backward-compatible with pre-M35 `metrics` output; new flags surface M35 data.
302
+
303
+ | Flag | Output |
304
+ |-----------------------|----------------------------------------------------------------------------------------|
305
+ | (no flag) | Task telemetry, process ELO, domain health (pre-M35 behavior) |
306
+ | `--tokens` | Per-command / per-phase token usage summary from `.gsd-t/token-metrics.jsonl` |
307
+ | `--halts` | Count + breakdown of `halt_type` values — flags any `native-compact` as a defect |
308
+ | `--context-window` | Trailing 20-run window of `end_pct` with runway headroom |
309
+ | `--cross-project` | Cross-project ranking (pre-M35, unchanged) |
310
+
311
+ ## `/advisor` escalation convention
312
+
313
+ When a sonnet-default phase hits a complexity signal that warrants opus (e.g., cross-module refactor detected mid-execution), the command may emit an `/advisor` hook line in its output — a structured suggestion for the orchestrator to escalate the **next** spawn of that phase to opus. This is a plan-time signal, not a runtime swap: the current spawn completes at its assigned tier, and the escalation applies to subsequent work. See `.gsd-t/contracts/model-selection-contract.md` for the hook schema.
314
+
315
+ ## Unattended Supervisor Setup (M36)
316
+
317
+ The unattended supervisor runs an active GSD-T milestone to completion in a detached OS process — no terminal needed, no human intervention required.
318
+
319
+ ### Quick Start
320
+
321
+ ```bash
322
+ # From within an interactive Claude session:
323
+ /user:gsd-t-unattended
324
+
325
+ # From the terminal (detached — returns immediately):
326
+ gsd-t unattended --hours=24 --milestone=M36
327
+
328
+ # Watch current run status (in-session, 270s tick):
329
+ /user:gsd-t-unattended-watch
330
+
331
+ # Request a graceful stop:
332
+ /user:gsd-t-unattended-stop
333
+ ```
334
+
335
+ ### CLI Flags
336
+
337
+ ```
338
+ gsd-t unattended [OPTIONS]
339
+ --hours=24 Wall-clock cap in hours (default: 24)
340
+ --max-iterations=200 Worker iteration cap (default: 200)
341
+ --project=. Project directory (default: cwd)
342
+ --branch=AUTO Branch to run on; AUTO = current non-protected branch
343
+ --on-done=print Terminal action: print | merge-commit (merge-commit is v2)
344
+ --dry-run Preflight only; no spawn
345
+ --verbose Extra log detail
346
+ --test-mode Uses stub worker; for CI and smoke tests
347
+ ```
348
+
349
+ ### Config File (optional)
350
+
351
+ `.gsd-t/unattended-config.json` — per-project overrides. Absence = hardcoded defaults.
352
+
353
+ ```json
354
+ {
355
+ "hours": 24,
356
+ "maxIterations": 200,
357
+ "gutterNoProgressIters": 5,
358
+ "workerTimeoutMs": 3600000,
359
+ "protectedBranches": ["main", "master", "develop", "trunk"],
360
+ "dirtyTreeWhitelist": [".gsd-t/.unattended/*", ".gsd-t/events/*.jsonl"]
361
+ }
362
+ ```
363
+
364
+ ### State Files
365
+
366
+ | File | Purpose |
367
+ |------|---------|
368
+ | `.gsd-t/.unattended/supervisor.pid` | Integer PID. Exists only while supervisor is alive. |
369
+ | `.gsd-t/.unattended/state.json` | Live state snapshot — status, iter, milestone, lastTick, etc. Full schema in `unattended-supervisor-contract.md`. |
370
+ | `.gsd-t/.unattended/run.log` | Append-only worker stdout+stderr. Never truncated during a run. |
371
+ | `.gsd-t/.unattended/stop` | Sentinel — touching this file requests a graceful stop. |
372
+ | `.gsd-t/.unattended/config.json` | Optional per-project config overrides (same keys as CLI flags). |
373
+
374
+ ### Required Platform Helpers
375
+
376
+ | Platform | Sleep Prevention | Notifications |
377
+ |----------|-----------------|---------------|
378
+ | macOS | `caffeinate` (built-in) | `osascript` (built-in) |
379
+ | Linux | `systemd-inhibit` or no-op | `notify-send` (install via `apt`/`dnf`) |
380
+ | Windows | NOT supported — see `docs/unattended-windows-caveats.md` | no-op |
381
+
382
+ macOS and Linux work out of the box on standard installs. Windows can run the supervisor but the machine may sleep mid-run.
383
+
384
+ ### Contract
385
+
386
+ `.gsd-t/contracts/unattended-supervisor-contract.md` v1.0.0 — authoritative state schema, exit-code table, status enum, launch/resume handshakes.
387
+
388
+ ### Troubleshooting
389
+
390
+ **Supervisor won't start**
391
+ - Check `.gsd-t/.unattended/run.log` for error output
392
+ - Run `gsd-t unattended --dry-run` to run pre-flight checks without spawning
393
+ - Verify no protected branch: `git branch --show-current`
394
+
395
+ **"Already running" error**
396
+ ```bash
397
+ # Verify supervisor is actually alive:
398
+ kill -0 $(cat .gsd-t/.unattended/supervisor.pid) && echo "alive" || echo "stale PID"
399
+
400
+ # If stale, remove PID file:
401
+ rm .gsd-t/.unattended/supervisor.pid
402
+
403
+ # Or request a graceful stop:
404
+ /user:gsd-t-unattended-stop
405
+ # (or) touch .gsd-t/.unattended/stop
406
+ ```
407
+
408
+ **Watch loop stopped firing**
409
+ - Re-invoke `/user:gsd-t-resume` from a fresh session
410
+ - Step 0 auto-reattach reads `supervisor.pid` — if the supervisor is still alive, it re-enters the watch loop automatically (no manual steps needed)
411
+
412
+ **Supervisor crashed mid-run**
413
+ - The watch loop detects crash via `kill -0` failure
414
+ - Check `.gsd-t/.unattended/run.log` and final `state.json` for diagnostics
415
+ - Resume normally with `/user:gsd-t-resume` — the milestone continues from its last checkpoint
416
+
417
+ ---
418
+
198
419
  ## Security Notes
199
420
 
200
421
  - Zero npm dependencies — no supply chain risk
@@ -83,3 +83,47 @@ Teams burn tokens fast. Use them strategically:
83
83
  - Planning (always — need full cross-domain context)
84
84
  - Integration (always — need to see all seams)
85
85
  - The task is straightforward
86
+
87
+ ## Context Awareness: From Proxy to Real Measurement (M34)
88
+
89
+ GSD-T has always needed a reliable signal for "how much of the context window is consumed right now" so the orchestrator can decide whether to continue, pause, or hand off to a headless continuation. The journey to real measurement is instructive:
90
+
91
+ 1. **v1.0 era — env var check.** Early GSD-T read `CLAUDE_CONTEXT_TOKENS_USED` / `CLAUDE_CONTEXT_TOKENS_MAX` environment variables, assuming Claude Code exported them. It does not. The check was always inert — `pct` was effectively zero forever, and the stop gate never fired. The first symptom was not a crash, it was silent context exhaustion leading to mid-session compaction.
92
+
93
+ 2. **v2.74.12 — task-counter proxy.** To patch the regression, `bin/task-counter.cjs` tracked the number of tasks completed since the last `/clear` and assumed a linear correspondence between task count and context percentage (e.g., 5 tasks ≈ 80%). This was better than nothing but fundamentally a proxy — it could not distinguish a task that read three files from a task that ran a full-project grep and a Playwright suite.
94
+
95
+ 3. **v2.75.10 (M34) — real measurement.** The Context Meter PostToolUse hook streams the current transcript to the Anthropic `count_tokens` API after every tool call and writes the exact `input_tokens` count to `.gsd-t/.context-meter-state.json`. `bin/token-budget.js` `getSessionStatus()` reads that state file as the authoritative signal. Proxies are retired.
96
+
97
+ **Why this matters**: Opus-primary sessions compound context risk (larger system prompts, deeper reasoning, longer tool outputs). A proxy with ±20% error is fine for an undercommitted Sonnet session but causes silent compaction on a busy Opus session. Real measurement is the only durable fix.
98
+
99
+ **Fail-open principle**: the meter hook never blocks tool calls or crashes Claude Code. Every failure mode (missing API key, network error, malformed transcript, rate limit) catches and writes a partial state file with `lastError.code` set. The orchestrator treats a missing or stale state file as "fall back to heuristic" rather than "stop immediately" — the user never loses work to a meter hiccup.
100
+
101
+ ## From Silent Degradation to Aggressive Pause-Resume (M35)
102
+
103
+ Between v2.74 and v2.75, GSD-T attempted to cope with context pressure through **graduated degradation** — downgrading subagent models (opus→sonnet, sonnet→haiku), checkpointing early, skipping "non-essential" phases (Red Team, doc-ripple, Design Verify). The idea was well-intentioned: burn less context, finish more work before hitting the runtime's native compact at 95%.
104
+
105
+ **It was the wrong framing.** Degradation is invisible to the user: a task that silently dropped from opus to haiku still reports "completed," and a skipped Red Team pass still looks like a green wave. Several regressions showed up where bugs made it through QA because Red Team had been silently skipped under context pressure, and where cross-module refactors made on haiku introduced subtle type errors that opus would have caught. The quality floor had become conditional on context pressure — a load-bearing invariant that the user could not see or control.
106
+
107
+ **M35 (v2.76.10) replaces graduated degradation with aggressive pause-resume.** The core principles:
108
+
109
+ 1. **Quality is non-negotiable.** No phase is "non-essential." No model is downgraded under pressure. Red Team, doc-ripple, and Design Verify always run at their designated tier. If a task can't fit, the task pauses — it does not degrade.
110
+
111
+ 2. **Explicit per-phase model selection.** `bin/model-selector.js` carries a declarative rules table with ≥13 phase mappings. Complexity signals (`cross_module_refactor`, `security_boundary`, `data_loss_risk`, `contract_design`) escalate sonnet→opus at plan time. Each command file carries a `## Model Assignment` block documenting its assignments. Model choice is now a plan-time decision, not a runtime pressure response.
112
+
113
+ 3. **User never types `/clear`.** When the runway estimator (`bin/runway-estimator.js`) projects a run would cross 85% context, the command refuses to start — then auto-spawns a detached headless continuation via `bin/headless-auto-spawn.js`. The interactive session sees a single ⛔ banner and exits cleanly. The user gets a macOS notification when the headless run finishes and a read-back banner on the next `gsd-t-resume` or `gsd-t-status` call.
114
+
115
+ 4. **Data before optimization.** Per-spawn token telemetry (`.gsd-t/token-metrics.jsonl`, 18-field frozen schema) is the raw material. The runway estimator reads historical consumption to project future runs. The token optimizer (`bin/token-optimizer.js`) runs at `complete-milestone` and appends retrospective recalibration recommendations to `.gsd-t/optimization-backlog.md`. Recommendations are **never auto-applied** — the user promotes or rejects deliberately. Tier calibration is a data-driven human decision, not a runtime heuristic.
116
+
117
+ 5. **Clean break, no compat shim.** The v2.0.0 `token-budget-contract.md` defined `downgrade` and `conserve` bands with `modelOverrides` and `skipPhases` fields. v3.0.0 drops all of it. The contract is a clean three-band model (`normal` < 70%, `warn` 70–85%, `stop` ≥ 85%) and the response object is just `{band, pct, message}`. No backwards-compat translation layer — the old API is gone.
118
+
119
+ **Structural guarantee.** Because `STOP_THRESHOLD_PCT = 85` and the runway estimator refuses runs that would project past 85%, the runtime's 95% native compact is now structurally unreachable under healthy operation. `halt_type: native-compact` in `.gsd-t/token-metrics.jsonl` is a defect signal — if it appears, the estimator needs re-tuning.
120
+
121
+ **Message content is never logged**: the meter writes only token counts, band names, and error category codes. Never transcript text, never API response bodies, never the API key itself. See `docs/architecture.md` for the full data-flow diagram and `.gsd-t/contracts/context-meter-contract.md` for the schema.
122
+
123
+ ## From Runway-Protected Execution to Cross-Session Relay (M36)
124
+
125
+ M34 gave GSD-T a real measurement of how much context window each session had consumed. M35 used that signal to refuse starting new work that would exceed the 85% threshold, auto-spawning a detached headless process instead so the user never had to manually run `/clear`. Both milestones still had a ceiling: the headless continuation was a single shot — it ran one Claude session, and if the milestone wasn't complete when that session exhausted its context, a human had to intervene again to trigger the next continuation. Long-running milestones (multi-day builds, large waves) still required periodic human attention to keep the relay going.
126
+
127
+ M36 (v2.77.10) makes the relay automatic and indefinite. The unattended supervisor (`bin/gsd-t-unattended.js`) is a long-lived OS process, fully detached from any Claude Code terminal session, that drives the relay itself: it spawns a fresh `claude -p "/gsd-t-resume"` worker, waits for it to exit, reads the outcome, and immediately spawns the next worker — repeating until the milestone reaches COMPLETED status or a wall-clock cap is hit. Each worker gets a pristine context window. The `/compact` that inevitably fires in a long session is irrelevant because the *next* session has already started fresh. The supervisor IS the orchestrator of runway handoffs. Safety rails (`bin/gsd-t-unattended-safety.js`) prevent infinite-loop scenarios: gutter detection catches stall patterns, blocker sentinels catch unrecoverable errors, and iteration/hour caps ensure the machine doesn't run forever on a broken state. A cross-platform abstraction layer handles macOS sleep-prevention (`caffeinate`), Linux equivalents, and Windows limitations. From the user's perspective: invoke `/user:gsd-t-unattended`, walk away, and receive a native OS notification when the milestone is done — hours or days later.
128
+
129
+ The in-session watch loop (270-second `ScheduleWakeup` ticks, chosen to stay inside the 5-minute prompt-cache TTL) closes the feedback loop for users who keep a Claude session open. And the `gsd-t-resume` Step 0 auto-reattach means that even a `/clear` or accidental session close is transparent: the next resume detects the live supervisor and silently re-enters the watch loop without any manual step. Taken together, M34 + M35 + M36 form a complete three-layer system: measure the context accurately, refuse to degrade when it runs low, and relay execution automatically across as many fresh sessions as the work requires.
@@ -299,49 +299,63 @@ Evolve GSD-T from a static methodology framework into a self-calibrating quality
299
299
 
300
300
  ---
301
301
 
302
- ### 3.7 Token-Aware Orchestration (MEDIUM-HIGH PRIORITYTier 1)
302
+ ### 3.7 Context Gate + Surgical Model Escalation (REWRITTEN IN M35 v2.76.10)
303
303
 
304
- **Problem**: GSD-T runs on Claude's $200 Max plan, where tokens are a hard daily/weekly ceiling not a variable API expense. A typical milestone spawns 30-50+ subagents across all phases. With tiered models, this consumes roughly 50-80% of a daily budget. Without budget awareness, the orchestrator can exhaust tokens mid-milestone, leaving uncommitted work scattered across subagents and forcing a wait until limits reset.
304
+ **Status**: The original v2 design for this section described graduated degradation with `downgrade` and `conserve` bands that silently demoted models and skipped Red Team / doc-ripple phases under context pressure. **M35 removed that behavior entirely** (see `token-budget-contract.md` v3.0.0 and `.gsd-t/M35-definition.md`). The historical text is preserved in git history; this section documents the replacement.
305
305
 
306
- The article's harness doesn't address this because it operates on API billing where cost is variable. On a Max plan, token exhaustion is a binary failure mode you either have capacity or you don't.
306
+ **Problem**: Silent quality degradation is the wrong answer to context pressure. If the orchestrator is allowed to swap opus for sonnet, sonnet for haiku, or skip Red Team / doc-ripple / Design Verification when context is tight, the user silently receives lower-quality work than they asked for. GSD-T's core principle is excellent, deeply-tested results — not "best effort under pressure."
307
307
 
308
- **Solution**: Make the wave and execute orchestrators aware of aggregate session-level token consumption, with graceful degradation as limits approach.
308
+ **Solution**: Three strict components that replace the old graduated-degradation model.
309
309
 
310
310
  **Mechanism**:
311
- 1. **Session budget tracking** — The orchestrator tracks cumulative tokens consumed across all subagent spawns within a session. Uses the existing observability logging data (token-log.md) plus `CLAUDE_CONTEXT_TOKENS_USED` environment variable.
312
- 2. **Budget estimation before spawn** — Before spawning a subagent, estimate the token cost based on: model tier (Opus ~5x Sonnet, Sonnet ~5x Haiku), task complexity (from plan-time scoring if available), and historical average from token-log.md for similar tasks.
313
- 3. **Graduated degradation thresholds**:
314
-
315
- | Session Budget Consumed | Action |
316
- |------------------------|--------|
317
- | < 60% | Normal operation — all models at assigned tiers |
318
- | 60-70% | **WARN**: Display budget alert to user. Reduce iteration budgets to minimum (2). |
319
- | 70-85% | **DOWNGRADE**: Non-critical Sonnet tasks demoted to Haiku. Skip exploratory testing (3.5). Disable shadow-mode audit (3.1). |
320
- | 85-95% | **CONSERVE**: Pause non-essential phases (doc-ripple, design brief generation). Checkpoint all progress to disk. |
321
- | > 95% | **STOP**: Hard stop. Save all progress. Display: "Token budget nearly exhausted. Progress saved. Resume with `/gsd-t-resume` after limit resets." |
322
-
323
- 4. **Model-tier-aware budgeting** — The budget tracker understands that one Opus call ≈ 5 Sonnet calls ≈ 25 Haiku calls in token terms. Degradation actions (downgrading Sonnet → Haiku) are chosen to maximize remaining capacity for high-value tasks.
324
- 5. **Milestone pre-flight check** — Before starting a wave or execute run, estimate total token cost for the remaining work. If estimated cost exceeds available budget, warn the user: "This milestone has ~{N} tasks remaining, estimated at ~{X}% of daily budget. Proceed or split across sessions?"
325
- 6. **Integration with iteration budget (3.6)** — When budget is constrained (>60%), iteration budgets are automatically reduced. At >70%, the system prefers model escalation (Haiku → Sonnet) over additional iterations at the same tier, since one Sonnet attempt is more likely to converge than three Haiku attempts.
326
311
 
327
- **Files affected**:
328
- - MODIFY: `commands/gsd-t-execute.md` — pre-spawn budget check, degradation logic
329
- - MODIFY: `commands/gsd-t-wave.md` milestone pre-flight estimate, per-phase budget check
330
- - MODIFY: `commands/gsd-t-quick.md` — budget-aware model selection
331
- - MODIFY: `templates/CLAUDE-global.md` document token-aware orchestration
332
- - MODIFY: `templates/CLAUDE-project.md` optional `Daily Token Budget` field
333
- - NEW: `bin/token-budget.js` budget estimation, tracking, and threshold logic (Node.js built-ins only)
312
+ 1. **Three-band context gate** (`bin/token-budget.js` v3.0.0, `token-budget-contract.md` v3.0.0):
313
+
314
+ | Session Context Consumed | Band | Action |
315
+ |--------------------------|------|--------|
316
+ | < 70% | `normal` | Proceed at full quality. |
317
+ | 70–85% | `warn` | Log to `.gsd-t/token-log.md` and proceed at **full quality**. **Never** downgrade models, skip Red Team, skip doc-ripple, or skip Design Verification. |
318
+ | 85% | `stop` | Halt cleanly. Checkpoint progress. Hand off to runway estimator / headless auto-spawn (see below). |
319
+
320
+ The bands `downgrade` and `conserve` that existed in v2.x are deleted. `applyModelOverride`, `skipPhases`, and all related machinery are deleted.
321
+
322
+ 2. **Surgical per-phase model selection** (`bin/model-selector.js`, `model-selection-contract.md` v1.0.0 — m35-model-selector-advisor):
323
+ - Declarative phase→tier mapping: `haiku` for strictly mechanical work (test runners, file-existence checks, JSON validation, branch guards), `sonnet` for routine code work (execute step 2, test-sync, doc-ripple wiring), `opus` for high-stakes reasoning (partition, discuss, Red Team, verify judgment, debug root-cause, architecture/contract design).
324
+ - Sonnet is the **routine default** — opus is applied surgically at declared escalation points, not as a fallback for "important-looking" work.
325
+ - Dual-layer: `ANTHROPIC_MODEL=opus` for the interactive session, `model:` directive overrides per-spawn.
326
+ - Optional `/advisor` escalation hook at declared points; convention-based fallback if the native tool is not programmable.
327
+
328
+ 3. **Runway estimator + headless auto-spawn** (`bin/runway-estimator.js`, `bin/headless-auto-spawn.js` — m35-runway-estimator, m35-headless-auto-spawn):
329
+ - Pre-flight: before spawning a phase or task subagent, estimate the token cost of the remaining work from `.gsd-t/token-metrics.jsonl` history. If the projected runway crosses the `stop` threshold, **refuse the run and auto-spawn a headless session** to continue the work.
330
+ - The user never types `/clear` under normal operation. The interactive session stays idle; the headless run consumes the fresh context window and reports results back via `.gsd-t/M35-headless-results-*.md`.
331
+ - The only time a user sees a `/clear` prompt is when headless auto-spawn itself fails — an explicit degradation path, not a silent one.
332
+
333
+ 4. **Per-spawn telemetry** (`bin/token-telemetry.js`, `token-telemetry-contract.md` v1.0.0 — m35-token-telemetry):
334
+ - Every Task subagent spawn is wrapped in a token bracket that records `{timestamp, milestone, command, phase, step, domain, task, model, duration_s, input_tokens_before, input_tokens_after, tokens_consumed, context_window_pct_before, context_window_pct_after, outcome, halt_type, escalated_via_advisor}` to `.gsd-t/token-metrics.jsonl`.
335
+ - `gsd-t metrics --tokens [--by model,command,phase,milestone]`, `gsd-t metrics --halts`, `gsd-t metrics --context-window` surface the history.
336
+ - Telemetry feeds the runway estimator (historical cost-per-phase) and the optimization backlog (`bin/token-optimizer.js` → `.gsd-t/optimization-backlog.md`, detect-only — user selectively promotes via `/user:gsd-t-optimization-apply|reject`).
337
+
338
+ **Files affected (M35 active set)**:
339
+ - MODIFY: `bin/token-budget.js` — three-band `getDegradationActions`, `WARN_THRESHOLD_PCT = 70`, `STOP_THRESHOLD_PCT = 85`
340
+ - MODIFY: `.gsd-t/contracts/token-budget-contract.md` — v3.0.0 rewrite (Option X clean break, no compat shim)
341
+ - NEW: `bin/model-selector.js`, `bin/advisor-integration.js`, `bin/runway-estimator.js`, `bin/headless-auto-spawn.js`, `bin/token-telemetry.js`, `bin/token-optimizer.js`
342
+ - NEW: `.gsd-t/contracts/model-selection-contract.md`, `runway-estimator-contract.md`, `token-telemetry-contract.md`, `headless-auto-spawn-contract.md` (all v1.0.0)
343
+ - MODIFY: `commands/gsd-t-execute.md`, `gsd-t-wave.md`, `gsd-t-quick.md`, `gsd-t-integrate.md`, `gsd-t-debug.md`, `gsd-t-doc-ripple.md` — three-band handlers, Model Assignment blocks, per-spawn token brackets, runway estimator wires
344
+ - MODIFY: `templates/CLAUDE-global.md`, `templates/CLAUDE-project.md` — rewrite Token-Aware Orchestration section
334
345
 
335
346
  **Success criteria**:
336
- - [ ] Orchestrator estimates token cost before each subagent spawn
337
- - [ ] Cumulative session usage is tracked and displayed at each phase boundary
338
- - [ ] Degradation actions trigger at 60%, 70%, 85%, and 95% thresholds
339
- - [ ] Non-critical Sonnet tasks are demoted to Haiku when budget is constrained
340
- - [ ] Milestone pre-flight check warns when estimated cost exceeds available budget
341
- - [ ] Progress is always saved before a hard stop no lost work
342
- - [ ] Default behavior (no budget concern) is unchanged — thresholds only fire when budget tracking detects pressure
347
+ - [ ] Zero `downgrade`, `conserve`, `modelOverride`, `skipPhases` references in live code under `bin/`, `scripts/`, `commands/`, `templates/` (prose discussing the removal is acceptable)
348
+ - [ ] `bin/model-selector.js` contains 8 declarative phase mappings
349
+ - [ ] Runway estimator wired into 5 command files
350
+ - [ ] Per-spawn token telemetry captures the full 18-field schema
351
+ - [ ] Headless auto-spawn tested end-to-end
352
+ - [ ] ~1030/1030 tests green (954 baseline at M35 Wave 1 exit)
353
+ - [ ] M35 dogfooded itself from Wave 3 onward
354
+ - [ ] `v2.76.10` tagged
355
+
356
+ **Non-goals**: cost modeling (M35 tracks tokens, not dollars), auto-optimization (backlog is detect-only), model marketplace (Claude models only), runtime Claude Code patches (convention-based `/advisor` fallback if no programmable API).
343
357
 
344
- **Acceptance test**: Start a wave with a 4-domain milestone. After 3 domains complete (simulating ~70% budget consumed), verify the orchestrator displays a budget warning, reduces iteration budgets, and demotes non-critical tasks to Haiku. Verify all progress is saved and the user sees a clear "resume" instruction.
358
+ **Acceptance test**: Start a multi-wave milestone. Drive session context above 70% mid-wave. Verify: (a) the orchestrator logs a `warn` entry and proceeds at full quality with no model swaps and no phase skips; (b) when the runway estimator projects a `stop` crossing before the next phase completes, it halts cleanly and auto-spawns a headless session to continue; (c) the user never sees a `/clear` prompt unless the headless handoff itself fails; (d) `.gsd-t/token-metrics.jsonl` contains one record per spawn with the full schema.
345
359
 
346
360
  ---
347
361
 
@@ -498,7 +512,7 @@ All new code (`bin/qa-calibrator.js`, component registry logic) uses Node.js bui
498
512
  | Playwright MCP not widely available | Medium | Low | Graceful degradation — feature skips if MCP absent. |
499
513
  | Higher iteration budgets waste context tokens | Low | Medium | Budget governance (M26) caps aggregate rework. Telemetry tracks. |
500
514
  | Command file sizes grow beyond readability | Medium | High | Each injection is max 5-10 lines. Total overhead auditable via 3.1. |
501
- | **Token exhaustion mid-milestone** | **High** | **High** | **Token-aware orchestration (3.7) with graduated degradation. Progress always checkpointed before hard stop.** |
515
+ | **Token exhaustion mid-milestone** | **High** | **High** | **Runway-protected execution (3.7) with pre-flight estimator + headless auto-spawn. No graduated degradation; quality never lowered under pressure. Clean stop at 85% with automatic headless continuation.** |
502
516
  | **QA promotion to Sonnet exceeds token budget** | Low | Medium | QA calls are small relative to execute. Net impact ~10-15% more tokens. Token orchestrator manages ceiling. |
503
517
  | **Budget estimation inaccuracy** | Medium | Medium | Estimates improve over time using historical data from token-log.md. Conservative defaults (overestimate). |
504
518
 
@@ -566,10 +580,10 @@ There are two distinct token budget concerns:
566
580
  - QA model promotion (Haiku → Sonnet): +10-15% tokens per milestone
567
581
  - Red Team model promotion (Sonnet → Opus): +3-5% tokens per milestone (only 1 call)
568
582
  - Exploratory testing: +5-10% tokens per milestone (when MCP available)
569
- - Higher iteration budgets: variable, capped by token-aware orchestrator (3.7)
583
+ - Higher iteration budgets: variable, bounded by the M35 runway estimator (3.7) which refuses runs projected to cross 85%
570
584
  - Harness audit: opt-in only, not counted in normal milestone budgets
571
585
 
572
- **Maximum additional session cost**: ~25-30% more tokens per milestone vs. current. The token-aware orchestrator (3.7) ensures this stays within daily limits through graduated degradation.
586
+ **Maximum additional session cost**: ~25-30% more tokens per milestone vs. current. The M35 runway estimator + headless auto-spawn (3.7) ensures this stays within daily limits through pre-flight refusal + fresh-context headless continuation — **not** through graduated degradation. When a run would exceed the budget, the interactive session hands off cleanly to a headless continuation with a fresh context window; quality is never reduced.
573
587
 
574
588
  ### Command file discipline
575
589
 
@@ -210,6 +210,101 @@
210
210
  **Orphaned requirements**: None — all M32 REQs mapped to tasks.
211
211
  **Unanchored tasks**: None — all 3 domain tasks map directly to functional requirements.
212
212
 
213
+ ## Requirements Traceability (updated by plan phase — M34)
214
+
215
+ | REQ-ID | Requirement Summary | Domain | Task(s) | Status |
216
+ |---------|------------------------------------------------------------------------------------------|-----------------------------|----------|----------|
217
+ | REQ-063 | Context Meter PostToolUse hook — count_tokens API call, state file, fail-open | context-meter-hook | Tasks 1–5 | complete |
218
+ | REQ-064 | Context Meter config schema — apiKeyEnvVar, modelWindowSize, thresholdPct, checkFrequency | context-meter-config | Tasks 1–4 | complete |
219
+ | REQ-065 | Installer integration — install/init hook, doctor gate, status line, update-all migration | installer-integration | Tasks 1–6 | complete |
220
+ | REQ-066 | bin/token-budget.js v2.0.0 — real-source `getSessionStatus()` reading the meter state file | token-budget-replacement | Tasks 1–10 | complete |
221
+ | REQ-067 | Command file migration — execute/wave/quick/integrate/debug use CTX_PCT, no task-counter | token-budget-replacement | Tasks 6–10 | complete |
222
+ | REQ-068 | Docs + tests — README/GSD-T-README/templates/docs/CHANGELOG updated, integration tests added | m34-docs-and-tests | Tasks 1–9 | in_progress |
223
+
224
+ **M34 Functional Requirements:**
225
+ - **REQ-063**: The PostToolUse hook must measure the real transcript token count after every tool call (subject to `checkFrequency`), write the result atomically to `.gsd-t/.context-meter-state.json`, and never crash Claude Code on error.
226
+ - **REQ-064**: The config loader must validate `apiKeyEnvVar` is a string, `modelWindowSize` > 0, `thresholdPct` in (0, 100), `checkFrequency` ≥ 1. Missing config = use defaults.
227
+ - **REQ-065**: `gsd-t doctor` must hard-gate on: API key set, hook registered, script present, config valid, live `count_tokens` dry-run succeeds. Exit code 1 if any RED.
228
+ - **REQ-066**: `bin/token-budget.js` `getSessionStatus()` must read `.gsd-t/.context-meter-state.json` when fresh (within 5 minutes of timestamp) and fall back to a historical heuristic otherwise. Public API shape unchanged from v1.x — callers see no breakage.
229
+ - **REQ-067**: No command file may reference `task-counter.cjs` or `CLAUDE_CONTEXT_TOKENS_*` env vars. All session-stop gates must call `token-budget.getSessionStatus()`.
230
+ - **REQ-068**: All downstream docs (README.md, docs/GSD-T-README.md, templates/CLAUDE-*, docs/*.md, CHANGELOG.md, package.json) must describe M34 by the time the milestone is marked complete.
231
+
232
+ **M34 Non-Functional Requirements:**
233
+ - Hook latency ≤ 200ms P99 (enforced by `req.setTimeout` + `req.destroy()` in the HTTPS client)
234
+ - Zero external npm dependencies (same as the rest of GSD-T)
235
+ - Zero message content in state files, log files, or diagnostics — only token counts, band names, error category codes
236
+ - Zero API-key material written to disk — env var read only, never persisted
237
+
238
+ **Orphaned requirements**: None — all M34 REQs mapped to tasks.
239
+ **Unanchored tasks**: None — all 34 M34 tasks map directly to functional or non-functional requirements.
240
+
241
+ ---
242
+
243
+ ## Requirements Traceability (updated by plan phase — M35)
244
+
245
+ | REQ-ID | Requirement Summary | Domain | Task(s) | Status |
246
+ |---------|-----------------------------------------------------------------------------------------|-----------------------------|-----------|----------|
247
+ | REQ-069 | Silent degradation bands removed — `getDegradationActions()` returns only `{band: 'normal'\|'warn'\|'stop'}` | degradation-rip-out | T1 | complete (Wave 1) |
248
+ | REQ-070 | Three-band model only — `WARN_THRESHOLD_PCT=70`, `STOP_THRESHOLD_PCT=85`, no model overrides or phase skips | degradation-rip-out | T1, T2 | complete (Waves 1–2) |
249
+ | REQ-071 | Surgical per-phase model selection via `bin/model-selector.js` — ≥8 phase mappings, declarative rules table | model-selector-advisor | T2 | complete (Wave 2) |
250
+ | REQ-072 | `/advisor` escalation with graceful fallback — convention-based if API not programmable | model-selector-advisor | T1, T3 | complete (Wave 2) |
251
+ | REQ-073 | Pre-flight runway estimator refuses runs projected to cross 85% stop threshold | runway-estimator | T1–T5 | complete (Wave 3) |
252
+ | REQ-074 | Per-spawn token telemetry to `.gsd-t/token-metrics.jsonl` with frozen 18-field schema | token-telemetry | T1–T3 | complete (Waves 1–2) |
253
+ | REQ-075 | `gsd-t metrics` CLI: `--tokens [--by ...]`, `--halts`, `--tokens --context-window` | token-telemetry | T4–T6 | complete (Wave 2) |
254
+ | REQ-076 | Optimization backlog — detect only, never auto-apply, user promotes or rejects | optimization-backlog | T1–T4 | complete (Wave 4) |
255
+ | REQ-077 | Headless auto-spawn on runway refusal — user never sees a `/clear` prompt | headless-auto-spawn | T1–T5 | complete (Waves 3–4) |
256
+ | REQ-078 | Structural elimination of native compact messages — `halt_type: native-compact` count is 0 during M35 execution | runway-estimator + headless-auto-spawn | T1–T5 (RE), T1–T5 (HAS) | complete (Wave 5) |
257
+ | REQ-079 | `gsd-t unattended` CLI subcommand runs an active milestone to completion unattended on macOS and Linux (24h+ multi-worker relay, detached OS process) | m36-supervisor-core | T1–T5 | complete (M36 Wave 1–2) |
258
+ | REQ-080 | `/user:gsd-t-unattended` slash command launches the supervisor from within a Claude session without blocking the terminal | m36-supervisor-core + m36-watch-loop | T1, T3 | complete (M36 Wave 1–3) |
259
+ | REQ-081 | In-session watch loop ticks every 270s via `ScheduleWakeup` (inside 5-min prompt-cache TTL) to report live supervisor state | m36-watch-loop | T1–T2 | complete (M36 Wave 3) |
260
+ | REQ-082 | `/clear` + `/user:gsd-t-resume` during a live unattended run transparently re-attaches to the watch loop (Step 0 auto-reattach, no user-visible disruption) | m36-watch-loop | T4 | complete (M36 Wave 3) |
261
+ | REQ-083 | Supervisor survives `/compact` and context resets — each worker is a fresh `claude -p` session; context exhaustion is structurally irrelevant | m36-supervisor-core | T1–T5 | complete (M36 Wave 1–2) |
262
+ | REQ-084 | Safety rails prevent infinite loops: gutter detection, blocker sentinels (`BLOCKED_NEEDS_HUMAN`, `DISPATCH_FAILED`), max-hours and max-iterations timeouts | m36-safety-rails | T1–T5 | complete (M36 Wave 2) |
263
+ | REQ-085 | Cross-platform support — macOS (caffeinate sleep-prevention) + Linux (systemd-inhibit or no-op) + Windows (claude.cmd via PATH; sleep-prevention not supported — see docs/unattended-windows-caveats.md) | m36-cross-platform | T1–T5 | complete (M36 Wave 2) |
264
+ | REQ-086 | Handoff-lock primitive (`bin/handoff-lock.js`) eliminates parent/child race in `headless-auto-spawn.js` runway handoffs (M35 gap fix) | m36-m35-gap-fixes | T1–T3 | complete (M36 Wave 2) |
265
+ | REQ-087 | 5 command files no longer emit "Run /clear" STOP — runway-exceeded handoff auto-invokes `autoSpawnHeadless()` seamlessly (M35 gap fix) | m36-m35-gap-fixes | T3 | complete (M36 Wave 3) |
266
+
267
+ **M35 Functional Requirements:**
268
+ - **REQ-069**: `bin/token-budget.js` `getDegradationActions()` must return `{band: 'normal'|'warn'|'stop', pct: number, message: string}` only. No `modelOverride`, no `skipPhases`, no `checkpoint` side-channel.
269
+ - **REQ-070**: `WARN_THRESHOLD_PCT = 70`, `STOP_THRESHOLD_PCT = 85`. `grep -r "downgrade\|conserve\|modelOverride\|skipPhases" bin/ commands/ docs/ templates/` returns zero hits in live code.
270
+ - **REQ-071**: `bin/model-selector.js` exists with a declarative rules table, at least 8 phase mappings across all three tiers (haiku/sonnet/opus), and unit tests for each mapping.
271
+ - **REQ-072**: `bin/advisor-integration.js` exists. If `/advisor` is programmable: calls it. If not: convention-based fallback block injection. Graceful degradation: missed escalations logged, caller never blocked.
272
+ - **REQ-073**: Every long-running command (execute, wave, integrate, debug, quick) invokes `estimateRunway()` at Step 0. On refusal: prints ⛔ block, calls `autoSpawnHeadless()`, exits cleanly. User never sees "run /clear."
273
+ - **REQ-074**: Every Task subagent spawn in execute/wave/quick/integrate/debug/doc-ripple is wrapped in a token bracket. Records appended to `.gsd-t/token-metrics.jsonl` with all 18 schema fields.
274
+ - **REQ-075**: `gsd-t metrics --tokens [--by <field>...]` prints an aggregated table. `gsd-t metrics --halts` shows halt_type breakdown with native-compact warning. `gsd-t metrics --tokens --context-window` buckets by pct_before.
275
+ - **REQ-076**: `bin/token-optimizer.js` runs at `complete-milestone`, appends to `.gsd-t/optimization-backlog.md`. Recommendations never auto-applied. User promotes via `gsd-t-optimization-apply {ID}` or rejects via `gsd-t-optimization-reject {ID}`.
276
+ - **REQ-077**: `bin/headless-auto-spawn.js` `autoSpawnHeadless()` spawns a detached child process, writes `.gsd-t/headless-sessions/{id}.json`, fires macOS notification on completion, and surfaces results banner on next interactive command.
277
+ - **REQ-078**: The combination of `STOP_THRESHOLD_PCT=85` and runway estimator refusing runs that project past 85% means the runtime's 95% native compact is structurally unreachable. `halt_type: native-compact` in `token-metrics.jsonl` is a defect signal.
278
+
279
+ **M35 Non-Functional Requirements:**
280
+ - Zero external npm dependencies (GSD-T mandate — token-telemetry.js, runway-estimator.js, headless-auto-spawn.js must use Node.js built-ins only)
281
+ - `autoSpawnHeadless()` must return control to the interactive session in < 500ms (detached spawn is immediate)
282
+ - `estimateRunway()` must complete in < 100ms (reads two local files, no network)
283
+ - Full test suite: target ~1030 tests total after M35; quality over count
284
+
285
+ **Orphaned requirements**: None — all M35 REQs mapped to tasks.
286
+ **Unanchored tasks**: None — all 38 M35 tasks trace to REQ-069–REQ-078 or REQ-063–068 (existing requirements that M35 code continues to satisfy).
287
+
288
+ **M36 Functional Requirements:**
289
+ - **REQ-079**: `bin/gsd-t-unattended.js` implements the supervisor relay loop: spawn worker → await exit → post-worker safety check → next iter. State written atomically to `.gsd-t/.unattended/state.json`. Contract: `unattended-supervisor-contract.md` v1.0.0.
290
+ - **REQ-080**: `commands/gsd-t-unattended.md` pre-flights branch + dirty tree, spawns supervisor detached, polls for PID readiness, displays initial watch block, and calls `ScheduleWakeup(270, '/user:gsd-t-unattended-watch')`.
291
+ - **REQ-081**: `commands/gsd-t-unattended-watch.md` implements the watch tick decision tree (§8 of contract): reads PID → liveness probe → reads state.json → reschedule or terminal report.
292
+ - **REQ-082**: `commands/gsd-t-resume.md` Step 0 checks `supervisor.pid` before any other resume logic. If live + non-terminal: skip normal resume, print watch block, call `ScheduleWakeup(270, '/user:gsd-t-unattended-watch')`.
293
+ - **REQ-083**: Supervisor relay architecture ensures each worker gets a fresh context window. No compaction state carries over between workers — only `.gsd-t/` milestone state files.
294
+ - **REQ-084**: `bin/gsd-t-unattended-safety.js` exports: `checkGitBranch`, `checkWorktreeCleanliness`, `validateState`, `checkIterationCap`, `checkWallClockCap`, `detectBlockerSentinel`, `detectGutter`. Called at all 4 supervisor hook points.
295
+ - **REQ-085**: `bin/gsd-t-unattended-platform.js` exports: `spawnSupervisor`, `preventSleep`, `releaseSleep`, `notify`, `resolveClaudePath`. Windows: `preventSleep` is a documented no-op. Windows caveats documented in `docs/unattended-windows-caveats.md`.
296
+ - **REQ-086**: `bin/handoff-lock.js` exports: `acquireLock(dir)`, `releaseLock(dir)`, `isLocked(dir)`. Used in `bin/headless-auto-spawn.js` `autoSpawnHeadless()` to guard the parent-exits-before-child-starts window.
297
+ - **REQ-087**: `commands/gsd-t-execute.md`, `gsd-t-wave.md`, `gsd-t-integrate.md`, `gsd-t-quick.md`, `gsd-t-debug.md` — runway-exceeded path calls `autoSpawnHeadless()` and exits cleanly. No "Run /clear" instruction emitted.
298
+
299
+ **M36 Non-Functional Requirements:**
300
+ - Zero external npm dependencies (all M36 bin/ modules use Node.js built-ins only)
301
+ - Supervisor launch → PID-ready in < 5 seconds (poll timeout in launch command)
302
+ - Worker spawn overhead: < 500ms before `claude -p` subprocess starts
303
+ - Test count: 1146 → 1226 (+80 net new tests across 6 M36 domain test files)
304
+
305
+ **Orphaned requirements**: None — all M36 REQs mapped to tasks.
306
+ **Unanchored tasks**: None — all M36 tasks trace to REQ-079–REQ-087.
307
+
213
308
  ---
214
309
 
215
310
  ## M17: Scan Visual Output — Feature Specification