@tekyzinc/gsd-t 2.39.13 → 2.45.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +17 -9
- package/bin/desktop.ini +2 -0
- package/bin/global-sync-manager.js +350 -0
- package/bin/gsd-t.js +592 -2
- package/bin/metrics-collector.js +167 -0
- package/bin/metrics-rollup.js +200 -0
- package/bin/patch-lifecycle.js +195 -0
- package/bin/rule-engine.js +160 -0
- package/commands/desktop.ini +2 -0
- package/commands/gsd-t-complete-milestone.md +192 -5
- package/commands/gsd-t-debug.md +16 -2
- package/commands/gsd-t-execute.md +257 -52
- package/commands/gsd-t-help.md +25 -10
- package/commands/gsd-t-integrate.md +35 -7
- package/commands/gsd-t-metrics.md +143 -0
- package/commands/gsd-t-plan.md +49 -2
- package/commands/gsd-t-quick.md +15 -3
- package/commands/gsd-t-status.md +78 -0
- package/commands/gsd-t-test-sync.md +2 -2
- package/commands/gsd-t-verify.md +140 -9
- package/commands/gsd-t-visualize.md +11 -1
- package/commands/gsd-t-wave.md +34 -19
- package/docs/GSD-T-README.md +9 -6
- package/docs/architecture.md +84 -2
- package/docs/ci-examples/desktop.ini +2 -0
- package/docs/ci-examples/github-actions.yml +104 -0
- package/docs/ci-examples/gitlab-ci.yml +116 -0
- package/docs/desktop.ini +2 -0
- package/docs/infrastructure.md +87 -1
- package/docs/prd-graph-engine.md +2 -2
- package/docs/prd-gsd2-hybrid.md +258 -135
- package/docs/requirements.md +63 -2
- package/examples/.gsd-t/contracts/desktop.ini +2 -0
- package/examples/.gsd-t/desktop.ini +2 -0
- package/examples/.gsd-t/domains/desktop.ini +2 -0
- package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
- package/examples/desktop.ini +2 -0
- package/examples/rules/.gitkeep +0 -0
- package/package.json +40 -40
- package/scripts/desktop.ini +2 -0
- package/scripts/gsd-t-dashboard-server.js +19 -2
- package/scripts/gsd-t-dashboard.html +63 -0
- package/scripts/gsd-t-event-writer.js +1 -0
- package/templates/CLAUDE-global.md +30 -9
- package/templates/desktop.ini +2 -0
package/docs/prd-gsd2-hybrid.md
CHANGED
|
@@ -6,13 +6,24 @@
|
|
|
6
6
|
| **PRD ID** | PRD-GSD2-001 |
|
|
7
7
|
| **Date** | 2026-03-18 |
|
|
8
8
|
| **Author** | GSD-T Team |
|
|
9
|
-
| **Status** |
|
|
9
|
+
| **Status** | ACTIVE — M22 COMPLETE, M23 COMPLETE (2026-03-22), M24 QUEUED |
|
|
10
10
|
| **Milestones** | M22 (Tier 1), M23 (Tier 2), M24 (Docker) |
|
|
11
|
-
| **Version Target** | 2.
|
|
11
|
+
| **Version Target** | 2.40.10 (M22), 2.41.10 (M23), 2.42.10 (M24) |
|
|
12
12
|
| **Priority** | P0 — critical for enterprise delivery quality |
|
|
13
|
-
| **Predecessor** | M21 (Graph-Powered Commands)
|
|
13
|
+
| **Predecessor** | M21 (Graph-Powered Commands) — DELIVERED |
|
|
14
14
|
| **Successor** | Production deployment readiness |
|
|
15
|
-
| **Related** | PRD-GRAPH-001 (M20-M21
|
|
15
|
+
| **Related** | PRD-GRAPH-001 (M20-M21 — DELIVERED) |
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Revision History
|
|
20
|
+
|
|
21
|
+
| Date | Version | Changes |
|
|
22
|
+
|------|---------|---------|
|
|
23
|
+
| 2026-03-18 | v1 | Initial DRAFT |
|
|
24
|
+
| 2026-03-20 | v2 | Revalidation: Fresh dispatch elevated to task-level (not domain-level); Budget ceilings reframed as Context Observability (context window %, token breakdown — not cost enforcement); Model failover dropped from M23; Plan command gains "single context window" constraint; No custom engine needed for M22 — all via Agent tool + team mode + worktree isolation; PRD-GRAPH-001 marked DELIVERED |
|
|
25
|
+
| 2026-03-22 | v3 | M22 COMPLETE — 18/18 tasks across 5 domains, 293 tests passing, v2.40.10 released |
|
|
26
|
+
| 2026-03-22 | v4 | M23 COMPLETE — 3 domains (headless-exec, headless-query, pipeline-integration), 36 new tests (329 total), v2.41.10 released |
|
|
16
27
|
|
|
17
28
|
---
|
|
18
29
|
|
|
@@ -20,13 +31,14 @@
|
|
|
20
31
|
|
|
21
32
|
GSD-T has the strongest development methodology for AI-assisted software engineering: contracts, domains, quality gates, impact analysis, multi-surface awareness. But it has gaps in **delivery runtime** that prevent it from achieving zero-impact releases at enterprise scale:
|
|
22
33
|
|
|
23
|
-
1. **Context rot** — Long milestones with many tasks degrade subagent quality. By task 9 of 12, context is 75% full, and the agent makes mistakes in the last tasks where quality matters most.
|
|
24
|
-
2. **
|
|
25
|
-
3. **
|
|
26
|
-
4. **
|
|
27
|
-
5. **No
|
|
28
|
-
6. **No
|
|
29
|
-
7. **
|
|
34
|
+
1. **Context rot** — Long milestones with many tasks degrade subagent quality. By task 9 of 12, context is 75% full, and the agent makes mistakes in the last tasks where quality matters most. **This is the #1 problem.** Even with domain-level subagent dispatch (which GSD-T already does), context accumulates *within* a domain as tasks execute sequentially.
|
|
35
|
+
2. **Compaction dependency** — When context fills, Claude Code compacts (summarizes) prior context. This loses nuance, introduces drift, and is unpredictable. The goal is to **never trigger compaction** by keeping each unit of work small enough to complete in a fresh context window.
|
|
36
|
+
3. **Checklist blindness** — 8 quality gates can all pass while the actual user-facing behavior doesn't work. A function returns a hardcoded value, a UI component renders static text, a webhook handler is a console.log. Gates check structure, not behavior.
|
|
37
|
+
4. **Static plans** — Plans are created once and never revised. If execution reveals new constraints (API rate limits, data format surprises, missing dependencies), remaining domains execute against an outdated plan.
|
|
38
|
+
5. **No context observability** — Token spend is logged in `token-log.md` but there's no visibility into context window utilization per subagent, no breakdown of where tokens are consumed, and no warning before compaction triggers.
|
|
39
|
+
6. **No CI/CD integration** — GSD-T requires a human at the keyboard. Can't run overnight builds, automated hotfixes, or release gates in a pipeline.
|
|
40
|
+
7. **No programmatic state access** — Reading `.gsd-t/` state requires an LLM call. Can't feed status into dashboards, standup scripts, or monitoring systems.
|
|
41
|
+
8. **Agent file conflicts** — Parallel domain execution in `execute` and `wave` can cause file conflicts when multiple agents work in the same working tree.
|
|
30
42
|
|
|
31
43
|
These gaps come from GSD 2 (github.com/gsd-build/gsd-2), which solves them with patterns that can be adopted into GSD-T without its runtime or LLM-agnostic architecture.
|
|
32
44
|
|
|
@@ -34,11 +46,19 @@ These gaps come from GSD 2 (github.com/gsd-build/gsd-2), which solves them with
|
|
|
34
46
|
|
|
35
47
|
## 2. Objective
|
|
36
48
|
|
|
37
|
-
Integrate
|
|
49
|
+
Integrate 7 enhancements from GSD 2 into GSD-T across 3 tiers, preserving GSD-T's contract-driven methodology while adding enterprise delivery capabilities.
|
|
50
|
+
|
|
51
|
+
**Primary goals** (in priority order):
|
|
52
|
+
1. **Eliminate compaction** — Each task completes in a single fresh context window. Compaction never triggers.
|
|
53
|
+
2. **Reduce context utilization** — Each task agent uses ~10-20% of the context window, not 60-75%.
|
|
54
|
+
3. **Parallel orchestration with adaptive replanning** — Domains execute in parallel with worktree isolation. The orchestrator reads domain summaries and revises remaining plans when execution reveals new constraints.
|
|
38
55
|
|
|
39
56
|
**Core principle**: Quality comes from the methodology (contracts, gates, impact analysis), not from the LLM. These enhancements strengthen the methodology's execution, not replace it.
|
|
40
57
|
|
|
41
|
-
**Key architectural
|
|
58
|
+
**Key architectural decisions**:
|
|
59
|
+
- LLM agnosticism is NOT a goal. GSD-T stays Claude-committed.
|
|
60
|
+
- No custom execution engine needed for M22. All capabilities are achieved via Claude Code's existing Agent tool (parallel subagent dispatch, `isolation: "worktree"`, team mode).
|
|
61
|
+
- Model failover is dropped entirely — not needed on max subscription plan.
|
|
42
62
|
|
|
43
63
|
---
|
|
44
64
|
|
|
@@ -47,54 +67,84 @@ Integrate 8 enhancements from GSD 2 into GSD-T across 3 tiers, preserving GSD-T'
|
|
|
47
67
|
The combination of three capabilities creates safe, high-quality parallel domain execution:
|
|
48
68
|
|
|
49
69
|
```
|
|
50
|
-
|
|
51
|
-
│
|
|
52
|
-
│
|
|
53
|
-
│
|
|
54
|
-
│ │
|
|
55
|
-
│ │
|
|
56
|
-
│ │
|
|
57
|
-
│ │
|
|
58
|
-
│ │
|
|
59
|
-
│
|
|
60
|
-
│
|
|
61
|
-
│ │
|
|
62
|
-
│ │
|
|
63
|
-
│ │
|
|
64
|
-
│ │ (
|
|
65
|
-
│ │
|
|
66
|
-
│ │
|
|
67
|
-
│
|
|
68
|
-
│
|
|
69
|
-
│
|
|
70
|
-
│
|
|
71
|
-
│ │
|
|
72
|
-
│ │
|
|
73
|
-
│
|
|
74
|
-
|
|
70
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
71
|
+
│ PARALLEL EXECUTION │
|
|
72
|
+
│ │
|
|
73
|
+
│ Execute Orchestrator (lightweight — sees summaries only) │
|
|
74
|
+
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
75
|
+
│ │ Dispatches domains in dependency order (wave 1, wave 2) │ │
|
|
76
|
+
│ │ Reads domain summaries → replan check → revise if needed │ │
|
|
77
|
+
│ │ Context: ~4-8% utilization (summaries + plans on disk) │ │
|
|
78
|
+
│ └──────────────────────────────────────────────────────────┘ │
|
|
79
|
+
│ │ │ │ │
|
|
80
|
+
│ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │
|
|
81
|
+
│ │ Domain A │ │ Domain B │ │ Domain C │ │
|
|
82
|
+
│ │ │ │ │ │ │ │
|
|
83
|
+
│ │ Worktree │ │ Worktree │ │ Worktree │ │
|
|
84
|
+
│ │ (own fs) │ │ (own fs) │ │ (own fs) │ │
|
|
85
|
+
│ │ │ │ │ │ │ │
|
|
86
|
+
│ │ Task 1 ──→ fresh subagent (10-20% ctx) → dies │
|
|
87
|
+
│ │ Task 2 ──→ fresh subagent (10-20% ctx) → dies │
|
|
88
|
+
│ │ Task N ──→ fresh subagent (10-20% ctx) → dies │
|
|
89
|
+
│ │ │ │ │ │ │ │
|
|
90
|
+
│ │ Graph │ │ Graph │ │ Graph │ │
|
|
91
|
+
│ │ (boundaries│ │ (boundaries│ │ (boundaries│ │
|
|
92
|
+
│ │ & deps) │ │ & deps) │ │ & deps) │ │
|
|
93
|
+
│ └─────┬─────┘ └─────┴─────┘ └─────┴─────┘ │
|
|
94
|
+
│ │ │ │ │
|
|
95
|
+
│ ▼ ▼ ▼ │
|
|
96
|
+
│ ┌──────────────────────────────────────────────────┐ │
|
|
97
|
+
│ │ Contract-Validated Atomic Merges │ │
|
|
98
|
+
│ │ merge A → test → merge B → test → merge C │ │
|
|
99
|
+
│ └──────────────────────────────────────────────────┘ │
|
|
100
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
75
101
|
```
|
|
76
102
|
|
|
77
103
|
Each agent gets:
|
|
78
|
-
- **Its own filesystem** (worktree) — can't step on other agents' files
|
|
79
|
-
- **Its own context** (fresh dispatch) — only sees relevant domain scope + contracts
|
|
104
|
+
- **Its own filesystem** (worktree via Agent tool's `isolation: "worktree"`) — can't step on other agents' files
|
|
105
|
+
- **Its own context** (fresh dispatch per task) — only sees relevant domain scope + contracts + single task
|
|
80
106
|
- **Full code awareness** (graph) — knows exactly what it owns and what crosses boundaries
|
|
81
107
|
|
|
108
|
+
**Key distinction from v1**: Fresh dispatch is **task-level**, not domain-level. Each individual task within a domain gets its own fresh subagent. The domain dispatcher is lightweight — it sequences tasks and passes prior task summaries (not full context) to the next task agent.
|
|
109
|
+
|
|
82
110
|
---
|
|
83
111
|
|
|
84
112
|
## 4. Enhancement Details
|
|
85
113
|
|
|
86
|
-
### 4.1 Fresh Context Dispatch (Tier 1)
|
|
114
|
+
### 4.1 Fresh Context Dispatch (Tier 1) — TASK-LEVEL
|
|
87
115
|
|
|
88
|
-
**Problem**:
|
|
116
|
+
**Problem**: GSD-T currently dispatches one subagent per domain. That subagent runs all N tasks within the domain sequentially, and context grows with each task. By task 9 of 12, context utilization is 75%+. The agent makes mistakes because it's reasoning through noise. If context exceeds capacity, compaction triggers — losing nuance and introducing drift.
|
|
117
|
+
|
|
118
|
+
**Current architecture** (domain-level):
|
|
119
|
+
```
|
|
120
|
+
Execute orchestrator
|
|
121
|
+
└── Domain-A subagent (fresh, but grows across tasks 1→2→3→...→12)
|
|
122
|
+
└── task 1, task 2, ... task 12 (sequential, context accumulates)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**New architecture** (task-level):
|
|
126
|
+
```
|
|
127
|
+
Execute orchestrator
|
|
128
|
+
└── Domain-A task-dispatcher (lightweight, stays small)
|
|
129
|
+
├── Task 1 subagent (fresh) → completes → summary saved to disk → dies
|
|
130
|
+
├── Task 2 subagent (fresh + task 1 summary) → completes → dies
|
|
131
|
+
└── Task 12 subagent (fresh + prior summaries) → completes → dies
|
|
132
|
+
```
|
|
89
133
|
|
|
90
|
-
**
|
|
134
|
+
**Each task subagent receives ONLY**:
|
|
91
135
|
- Domain's `scope.md` (file list, constraints)
|
|
92
136
|
- Relevant contracts (only those the domain implements or consumes)
|
|
93
|
-
-
|
|
94
|
-
- Graph context for
|
|
137
|
+
- The single task from `tasks.md` (not all tasks — just the current one)
|
|
138
|
+
- Graph context for files this task touches (if graph available)
|
|
139
|
+
- Prior task summaries (10-20 lines each, not full prior context)
|
|
95
140
|
- Prior failure/learning entries for this domain from Decision Log
|
|
96
141
|
|
|
97
|
-
**
|
|
142
|
+
**Context utilization per task**: ~10-20% (down from 60-75% cumulative)
|
|
143
|
+
**Compaction**: Never triggers — each task completes well within one context window
|
|
144
|
+
|
|
145
|
+
**Plan command constraint** (new): `gsd-t-plan` MUST enforce the rule: **"A task must fit in one context window. If it can't, it's two tasks."** This guarantees fresh dispatch works. The plan command validates task scope during generation and splits oversized tasks automatically.
|
|
146
|
+
|
|
147
|
+
**Real-World Scenario**: Payment processing milestone with 12 tasks across 4 domains. Today, by task 9 (fraud scoring in the risk domain), context is 75% full with accumulated residue from tasks 1-8. The agent hallucinates a function name from an earlier task and introduces a bug. With task-level fresh dispatch, the fraud scoring agent gets ~15% context utilization — only risk-domain files, the fraud scoring task, relevant contracts, and prior task summaries. Clean reasoning, zero hallucination, zero compaction risk.
|
|
98
148
|
|
|
99
149
|
**Commands affected**: execute, wave, integrate (any command that dispatches domain tasks)
|
|
100
150
|
|
|
@@ -102,16 +152,18 @@ Each agent gets:
|
|
|
102
152
|
|
|
103
153
|
**Problem**: Parallel domain agents share one working tree. Two agents editing adjacent files can create merge conflicts. If domain A breaks, its partially-written files contaminate the tree for domain B.
|
|
104
154
|
|
|
105
|
-
**Solution**: Each domain agent works in its own git worktree. Merges are atomic and sequential with contract validation between each merge.
|
|
155
|
+
**Solution**: Each domain agent works in its own git worktree via the Agent tool's `isolation: "worktree"` parameter. Merges are atomic and sequential with contract validation between each merge.
|
|
156
|
+
|
|
157
|
+
**Implementation**: No custom worktree management code needed. Claude Code's Agent tool already supports `isolation: "worktree"` which creates a temporary git worktree, gives the agent an isolated copy of the repo, and returns the worktree path and branch when changes are made. The execute command's team-mode dispatch simply adds this parameter to each Agent spawn.
|
|
106
158
|
|
|
107
159
|
**Workflow**:
|
|
108
160
|
```
|
|
109
|
-
1. execute
|
|
161
|
+
1. execute dispatches N agents with isolation: "worktree" (one per domain)
|
|
110
162
|
2. Each domain agent works in its worktree (isolated filesystem)
|
|
111
|
-
3. Domain A completes → merge A's worktree to main → run integration tests
|
|
112
|
-
4. Tests pass → Domain B's worktree merges → run integration tests
|
|
163
|
+
3. Domain A completes → merge A's worktree branch to main → run integration tests
|
|
164
|
+
4. Tests pass → Domain B's worktree branch merges → run integration tests
|
|
113
165
|
5. Tests fail → rollback domain B, keep domain A. Debug domain B.
|
|
114
|
-
6. Clean up worktrees after all merges
|
|
166
|
+
6. Clean up worktrees after all merges (automatic for no-change agents)
|
|
115
167
|
```
|
|
116
168
|
|
|
117
169
|
**Real-World Scenario**: 3 domain agents working simultaneously on auth, payments, and notifications. Auth domain agent introduces a regression in shared middleware. With shared working tree, the other agents see the broken middleware and may adapt to it — propagating the bug. With worktree isolation, auth's regression is contained. When merge fails integration tests, only auth's worktree is discarded. Payments and notifications merge cleanly.
|
|
@@ -149,28 +201,72 @@ All 3 would ship to production and fail. Goal-backward catches them.
|
|
|
149
201
|
|
|
150
202
|
**Problem**: Plans are static. Created during `plan` phase, never revised. If execution reveals new constraints, remaining domains execute against outdated assumptions.
|
|
151
203
|
|
|
152
|
-
**Solution**: After each domain completes in `execute`,
|
|
153
|
-
1. Did execution reveal any new constraints? (API rate limits, missing dependencies, data format surprises)
|
|
204
|
+
**Solution**: After each domain completes in `execute`, the orchestrator reads the domain's result summary and checks:
|
|
205
|
+
1. Did execution reveal any new constraints? (API rate limits, schema mismatches, missing dependencies, data format surprises)
|
|
154
206
|
2. Do remaining domains' plans depend on assumptions that are now invalid?
|
|
155
|
-
3. If yes → revise remaining domain
|
|
207
|
+
3. If yes → revise remaining domain `tasks.md` files on disk before dispatching next domain
|
|
208
|
+
|
|
209
|
+
**How it works without an engine**: The execute orchestrator is an LLM agent. It dispatches domain agents and reads their summaries. Summaries are small (10-20 lines each). The replan check is LLM reasoning: "does this summary invalidate any remaining plan?" Plan revision writes updated `tasks.md` files to disk. Next domain agent reads the revised `tasks.md` from disk (fresh context). The orchestrator stays lightweight (~4-8% context utilization) because it only holds summaries and plan references, not full domain work.
|
|
210
|
+
|
|
211
|
+
**Orchestrator context budget**:
|
|
212
|
+
```
|
|
213
|
+
Execute command prompt: ~2K tokens
|
|
214
|
+
Domain list + dependency order: ~500 tokens
|
|
215
|
+
Summary from Domain A: ~500 tokens
|
|
216
|
+
Replan reasoning: ~1K tokens
|
|
217
|
+
Summary from Domain B: ~500 tokens
|
|
218
|
+
Replan reasoning: ~1K tokens
|
|
219
|
+
...
|
|
220
|
+
Total for 5 domains: ~8K tokens = ~4% context utilization
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
**Guard**: Max 2 replanning cycles per execute run. After that, pause for user input to prevent infinite loops (new constraint → replan → new constraint).
|
|
224
|
+
|
|
225
|
+
**Real-World Scenarios**:
|
|
156
226
|
|
|
157
|
-
**
|
|
227
|
+
**API constraint discovery**: Plan says "use Stripe Charges API." Payments domain discovers Charges API is deprecated — must use PaymentIntents (async, webhook-based). Without replanning, subscriptions domain builds against Charges API (fails at runtime), notifications domain builds synchronous receipt sending (impossible with async confirmation). With replanning, orchestrator revises both domains' plans to use PaymentIntents API before they execute.
|
|
228
|
+
|
|
229
|
+
**Schema shape surprise**: Plan says "query users.org_id." Auth domain discovers existing table uses `organization_id`, not `org_id`. Without replanning, billing and reporting domains build queries referencing `org_id` — every query fails. With replanning, orchestrator updates both domains' plans to use `organization_id`.
|
|
230
|
+
|
|
231
|
+
**Dependency incompatibility**: Plan says "use Socket.io." Websocket domain discovers HTTP/2 incompatibility in the project's Node version, switches to native `ws` library. Without replanning, dashboard-ui imports `socket.io-client` (wrong library). With replanning, orchestrator revises dashboard-ui to use the `ws` client API.
|
|
158
232
|
|
|
159
233
|
**Commands affected**: execute, wave (execute phase dispatches replanning check between domains)
|
|
160
234
|
|
|
161
|
-
### 4.5
|
|
235
|
+
### 4.5 Context Observability (Tier 1)
|
|
236
|
+
|
|
237
|
+
**Problem**: Token spend is logged in `token-log.md` but there's no real-time visibility into context window utilization per subagent, no breakdown of where tokens are consumed across domains/tasks/phases, and no warning before compaction triggers.
|
|
162
238
|
|
|
163
|
-
**
|
|
239
|
+
**Solution**: Context-window-aware monitoring and token usage visibility. NOT cost enforcement — the user is on the max subscription plan and does not need spend limits.
|
|
164
240
|
|
|
165
|
-
**
|
|
166
|
-
-
|
|
167
|
-
-
|
|
168
|
-
-
|
|
169
|
-
-
|
|
241
|
+
**Capabilities**:
|
|
242
|
+
- **Context window % tracking**: After each subagent returns, log the peak context utilization (% of window used). Track via `CLAUDE_CONTEXT_TOKENS_USED` / `CLAUDE_CONTEXT_TOKENS_MAX`.
|
|
243
|
+
- **Compaction proximity warning**: If any subagent exceeds 70% context utilization, warn the user. This indicates a task is too large for fresh dispatch and should be split.
|
|
244
|
+
- **Token breakdown by scope**: Aggregate token usage by domain, by task, and by phase. Show where the most tokens are consumed so the user can identify optimization targets.
|
|
245
|
+
- **Dashboard-ready data**: Token usage data structured for `gsd-t-visualize` and `gsd-t headless query context` consumption.
|
|
170
246
|
|
|
171
|
-
**
|
|
247
|
+
**Data model** (extends existing `token-log.md`):
|
|
248
|
+
```
|
|
249
|
+
| Datetime | Command | Domain | Task | Model | Duration | Tokens | Ctx% | Compacted |
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
New fields: `Domain` (which domain), `Task` (which task within domain), `Ctx%` (peak context window utilization as percentage).
|
|
253
|
+
|
|
254
|
+
**Alerts**:
|
|
255
|
+
- `Ctx% > 70%` → ⚠️ Warning: task approaching compaction threshold. Consider splitting in plan.
|
|
256
|
+
- `Ctx% > 85%` → 🔴 Critical: compaction likely triggered. Task MUST be split.
|
|
257
|
+
- `Compacted = true` → 📊 Logged for visibility. Indicates fresh dispatch failed to prevent compaction for this task.
|
|
258
|
+
|
|
259
|
+
**Real-World Scenario**: After a milestone completes, user runs `gsd-t-status` and sees:
|
|
260
|
+
```
|
|
261
|
+
Token Usage by Domain:
|
|
262
|
+
auth: 12,400 tokens (4 tasks, avg 3,100/task, peak ctx: 14%)
|
|
263
|
+
payments: 28,600 tokens (7 tasks, avg 4,086/task, peak ctx: 18%)
|
|
264
|
+
notifications: 45,200 tokens (3 tasks, avg 15,067/task, peak ctx: 52%) ⚠️
|
|
265
|
+
reporting: 8,100 tokens (2 tasks, avg 4,050/task, peak ctx: 12%)
|
|
266
|
+
```
|
|
267
|
+
User immediately sees that notifications domain is consuming disproportionate tokens with high context utilization. Investigation reveals one task ("build email template engine") is too large — should be split into 3 smaller tasks in future plans.
|
|
172
268
|
|
|
173
|
-
**Commands affected**: execute, wave, integrate (any command that spawns subagents)
|
|
269
|
+
**Commands affected**: execute, wave, integrate (any command that spawns subagents); status and visualize (display); plan (validation — warn if task scope suggests >70% context)
|
|
174
270
|
|
|
175
271
|
---
|
|
176
272
|
|
|
@@ -178,23 +274,33 @@ All 3 would ship to production and fail. Goal-backward catches them.
|
|
|
178
274
|
|
|
179
275
|
**Problem**: GSD-T requires a human at the keyboard. Can't run in CI/CD pipelines, overnight builds, or automated release gates.
|
|
180
276
|
|
|
181
|
-
**Solution**: `gsd-t headless` CLI mode that runs milestones/commands without interactive prompts.
|
|
277
|
+
**Solution**: `gsd-t headless` CLI mode that runs milestones/commands without interactive prompts. Built as a wrapper around `claude -p` (Claude Code's non-interactive piped mode), with parallel orchestration via multiple `claude -p` processes for domain-level parallelism.
|
|
278
|
+
|
|
279
|
+
**Architecture**:
|
|
280
|
+
```
|
|
281
|
+
gsd-t headless wave M25
|
|
282
|
+
└── claude -p "/user:gsd-t-wave M25"
|
|
283
|
+
└── Agent tool dispatches phases (fresh context per phase)
|
|
284
|
+
└── Execute phase dispatches domains (parallel, worktree isolation)
|
|
285
|
+
└── Each domain dispatches tasks (fresh context per task)
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
For parallel domain execution in headless mode, the orchestrator (running inside `claude -p`) uses the same Agent tool + team mode + worktree isolation as interactive mode. No custom engine needed.
|
|
182
289
|
|
|
183
290
|
**Exit codes**:
|
|
184
291
|
| Code | Meaning |
|
|
185
292
|
|------|---------|
|
|
186
|
-
| 0
|
|
187
|
-
| 1
|
|
188
|
-
| 2
|
|
189
|
-
| 3
|
|
190
|
-
| 4
|
|
293
|
+
| 0 | Success — all phases passed |
|
|
294
|
+
| 1 | Verify failure — quality gates didn't pass |
|
|
295
|
+
| 2 | Context budget exceeded — compaction threshold reached |
|
|
296
|
+
| 3 | Error — unrecoverable failure |
|
|
297
|
+
| 4 | Blocked — requires human decision |
|
|
191
298
|
|
|
192
299
|
**Capabilities**:
|
|
193
300
|
- Run a full wave (partition → execute → verify → complete) unattended
|
|
194
301
|
- Output structured JSON results for pipeline consumption
|
|
195
302
|
- Integrate with CI/CD systems (GitHub Actions, GitLab CI, Jenkins)
|
|
196
|
-
-
|
|
197
|
-
- Respect budget ceilings (exit code 2 instead of burning through budget)
|
|
303
|
+
- Respect context observability thresholds (exit code 2 if compaction detected)
|
|
198
304
|
|
|
199
305
|
**Real-World Scenarios**:
|
|
200
306
|
|
|
@@ -220,7 +326,7 @@ gsd-t headless query status # current milestone, phase, domain progress
|
|
|
220
326
|
gsd-t headless query domains # domain list with task counts
|
|
221
327
|
gsd-t headless query contracts # contract compliance status
|
|
222
328
|
gsd-t headless query debt # tech debt items by severity
|
|
223
|
-
gsd-t headless query
|
|
329
|
+
gsd-t headless query context # token usage breakdown + context utilization
|
|
224
330
|
gsd-t headless query backlog # backlog items (filtered)
|
|
225
331
|
gsd-t headless query graph # graph index summary (entity counts, domain mapping)
|
|
226
332
|
```
|
|
@@ -278,24 +384,26 @@ Gets structured JSON status for all 3 projects in under a second. Previously: wa
|
|
|
278
384
|
|
|
279
385
|
### Commands Modified by GSD 2 Enhancements
|
|
280
386
|
|
|
281
|
-
| Command
|
|
282
|
-
|
|
283
|
-
| **execute**
|
|
284
|
-
| **wave**
|
|
285
|
-
| **integrate**
|
|
286
|
-
| **verify**
|
|
287
|
-
| **complete-milestone** |
|
|
288
|
-
| **
|
|
289
|
-
| **
|
|
290
|
-
| **
|
|
291
|
-
| **
|
|
292
|
-
| **
|
|
293
|
-
| **
|
|
294
|
-
| **test-sync**
|
|
295
|
-
| **qa**
|
|
296
|
-
| **gap-analysis**
|
|
297
|
-
| **status**
|
|
298
|
-
| **visualize**
|
|
387
|
+
| Command | Fresh Context | Worktree | Goal-Backward | Adaptive Replan | Context Obs. | Headless |
|
|
388
|
+
|----------------------|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
389
|
+
| **execute** | X | X | | X | X | X |
|
|
390
|
+
| **wave** | X | X | X | X | X | X |
|
|
391
|
+
| **integrate** | X | X | | | X | X |
|
|
392
|
+
| **verify** | | | X | | | X |
|
|
393
|
+
| **complete-milestone** | | | X | | | X |
|
|
394
|
+
| **plan** | | | | | | X |
|
|
395
|
+
| **scan** | | | | | | X |
|
|
396
|
+
| **impact** | | | | | | X |
|
|
397
|
+
| **debug** | X | | | | X | X |
|
|
398
|
+
| **quick** | | | | | | X |
|
|
399
|
+
| **partition** | | | | | | X |
|
|
400
|
+
| **test-sync** | | | | | | X |
|
|
401
|
+
| **qa** | | | | | X | X |
|
|
402
|
+
| **gap-analysis** | | | | | | X |
|
|
403
|
+
| **status** | | | | | X | X |
|
|
404
|
+
| **visualize** | | | | | X | X |
|
|
405
|
+
|
|
406
|
+
**New constraint on plan**: Task-size validation — every task must fit in one context window. If estimated scope exceeds 70% context, plan splits the task automatically.
|
|
299
407
|
|
|
300
408
|
### Commands Unchanged
|
|
301
409
|
|
|
@@ -305,42 +413,50 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
305
413
|
|
|
306
414
|
## 6. Milestone Breakdown
|
|
307
415
|
|
|
308
|
-
### M22: GSD 2 Tier 1 — Execution Quality
|
|
416
|
+
### M22: GSD 2 Tier 1 — Execution Quality — **COMPLETE** (2026-03-22, v2.40.10)
|
|
309
417
|
|
|
310
|
-
**Scope**: Fresh context dispatch, worktree isolation, goal-backward verification, adaptive replanning,
|
|
418
|
+
**Scope**: Fresh context dispatch (task-level), worktree isolation, goal-backward verification, adaptive replanning, context observability.
|
|
311
419
|
|
|
312
|
-
|
|
313
|
-
|--------|-------------|
|
|
314
|
-
| **fresh-dispatch** | Context builder (scope + contracts + graph → subagent prompt), dispatch coordinator |
|
|
315
|
-
| **worktree-isolation** | Worktree lifecycle (create/merge/discard), per-domain branch naming, atomic merge with contract validation |
|
|
316
|
-
| **goal-backward** | Requirement-to-behavior verifier, placeholder detector (console.log/TODO/hardcoded patterns), end-to-end behavior check |
|
|
317
|
-
| **adaptive-replan** | Post-domain constraint checker, plan revision engine, remaining-domain impact assessor |
|
|
318
|
-
| **budget-ceilings** | Budget field in progress.md, cumulative spend tracker, threshold alerts (80%/100%), pause/descope/abort options |
|
|
420
|
+
**Implementation approach**: All capabilities via Claude Code's existing Agent tool. No custom execution engine. Team mode for parallel domain dispatch. `isolation: "worktree"` for filesystem isolation. Subagent-per-task for fresh context.
|
|
319
421
|
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
-
|
|
323
|
-
|
|
324
|
-
-
|
|
325
|
-
-
|
|
326
|
-
-
|
|
422
|
+
| Domain | Deliverables |
|
|
423
|
+
|----------------------|-------------|
|
|
424
|
+
| **fresh-dispatch** | Task-level dispatch coordinator (one subagent per task, not per domain). Context builder (scope + contracts + graph + single task + prior summaries → subagent prompt). Summary capture and forwarding between tasks. |
|
|
425
|
+
| **worktree-isolation** | Agent tool `isolation: "worktree"` integration in execute team mode. Sequential merge with contract validation between each. Per-domain rollback. Worktree cleanup. |
|
|
426
|
+
| **goal-backward** | Requirement-to-behavior verifier. Placeholder detector (console.log/TODO/hardcoded/static patterns). End-to-end behavior check against milestone goals. |
|
|
427
|
+
| **adaptive-replan** | Post-domain summary reader. Constraint-vs-remaining-plan checker. Plan revision (writes updated `tasks.md` to disk). Max 2 replan cycles guard. |
|
|
428
|
+
| **context-observability** | Context window % tracking per subagent. Token breakdown by domain/task/phase. Compaction proximity alerts (70%/85%). Extended `token-log.md` format. Status/visualize integration. Plan validation (warn if task scope suggests >70% context). |
|
|
429
|
+
|
|
430
|
+
**Plan command update**: Add "single context window" constraint. During task generation, validate that each task's scope (files to touch, complexity estimate) fits within ~70% of a context window. If not, automatically split into smaller tasks. This is the guarantee that fresh dispatch works.
|
|
431
|
+
|
|
432
|
+
**Exit Criteria** — ALL MET:
|
|
433
|
+
- [x] Execute dispatches one subagent per TASK (not per domain) — verified by token-log entries
|
|
434
|
+
- [x] Context utilization per task subagent < 25% (measured via context observability)
|
|
435
|
+
- [x] Compaction triggers 0 times across a full milestone execution
|
|
436
|
+
- [x] Execute in parallel mode uses worktrees — zero file conflicts
|
|
437
|
+
- [x] Goal-backward catches at least 1 placeholder that gates missed (tested on synthetic project)
|
|
438
|
+
- [x] Adaptive replanning revises a remaining domain plan when a constraint is discovered
|
|
439
|
+
- [x] Context observability displays token breakdown by domain/task/phase in status output
|
|
440
|
+
- [x] All changes reflected in contracts, Pre-Commit Gate, and 4 reference docs
|
|
441
|
+
|
|
442
|
+
**Result**: 18/18 tasks complete, 293 tests pass, v2.40.10 tagged
|
|
327
443
|
|
|
328
444
|
### M23: GSD 2 Tier 2 — Headless Mode
|
|
329
445
|
|
|
330
|
-
**Scope**: Headless CLI execution, headless query,
|
|
446
|
+
**Scope**: Headless CLI execution (via `claude -p` wrapper), headless query, pipeline integration.
|
|
447
|
+
|
|
448
|
+
**Implementation approach**: `gsd-t headless` wraps `claude -p` for LLM-driven commands. `gsd-t headless query` is pure Node.js file parsing — no LLM call. Parallel orchestration uses the same Agent tool pattern as interactive mode (running inside the `claude -p` session).
|
|
331
449
|
|
|
332
|
-
| Domain
|
|
333
|
-
|
|
334
|
-
| **headless-exec**
|
|
335
|
-
| **headless-query**
|
|
336
|
-
| **
|
|
337
|
-
| **pipeline-integration** | GitHub Actions example workflow, documentation |
|
|
450
|
+
| Domain | Deliverables |
|
|
451
|
+
|------------------------|-------------|
|
|
452
|
+
| **headless-exec** | `gsd-t headless` subcommand. `claude -p` wrapper with argument forwarding. Non-interactive execution. Meaningful exit codes (0-4). Structured JSON output. |
|
|
453
|
+
| **headless-query** | `gsd-t headless query` subcommand. 7 query types (status, domains, contracts, debt, context, backlog, graph). ~50ms response. No LLM calls. Pure file parsing. |
|
|
454
|
+
| **pipeline-integration** | GitHub Actions example workflow. GitLab CI example. Documentation in infrastructure.md. |
|
|
338
455
|
|
|
339
456
|
**Exit Criteria**:
|
|
340
457
|
- `gsd-t headless wave` runs a milestone end-to-end without prompts
|
|
341
458
|
- Exit codes match specification (0-4)
|
|
342
459
|
- `gsd-t headless query status` returns JSON in <100ms
|
|
343
|
-
- Model failover activates when primary model times out (tested with mock)
|
|
344
460
|
- GitHub Actions example workflow runs successfully
|
|
345
461
|
- Documentation in infrastructure.md
|
|
346
462
|
|
|
@@ -348,10 +464,10 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
348
464
|
|
|
349
465
|
**Scope**: Containerized GSD-T execution.
|
|
350
466
|
|
|
351
|
-
| Domain
|
|
352
|
-
|
|
467
|
+
| Domain | Deliverables |
|
|
468
|
+
|------------|-------------|
|
|
353
469
|
| **docker** | Dockerfile, docker-compose.yml, Vault secret injection, volume mount config |
|
|
354
|
-
| **docs**
|
|
470
|
+
| **docs** | Infrastructure.md Docker section, README Docker quickstart |
|
|
355
471
|
|
|
356
472
|
**Exit Criteria**:
|
|
357
473
|
- `docker-compose up` runs a headless milestone
|
|
@@ -363,7 +479,10 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
363
479
|
|
|
364
480
|
## 7. Non-Goals
|
|
365
481
|
|
|
366
|
-
- **LLM agnosticism** — GSD-T stays Claude-committed.
|
|
482
|
+
- **LLM agnosticism** — GSD-T stays Claude-committed. No multi-provider support.
|
|
483
|
+
- **Model failover** — Dropped. Max subscription plan eliminates the need for fallback models.
|
|
484
|
+
- **Custom execution engine** — All M22 capabilities work via Claude Code's existing Agent tool, team mode, and worktree isolation. No separate orchestration process.
|
|
485
|
+
- **Cost-based budget enforcement** — Context observability tracks window utilization and token visibility, not dollar spend or monthly limits.
|
|
367
486
|
- **Standalone runtime** — GSD-T runs inside Claude Code. No Pi SDK, no separate process.
|
|
368
487
|
- **TUI** — Claude Code is the interface. No separate terminal UI.
|
|
369
488
|
- **VS Code extension** — Not needed; Claude Code handles IDE integration.
|
|
@@ -379,21 +498,23 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
379
498
|
|------|-----------|--------|------------|
|
|
380
499
|
| Worktree merges create complex conflict resolution | Medium | High | Sequential merge with tests between each; discard and retry on conflict |
|
|
381
500
|
| Fresh dispatch loses cross-domain context needed for integration | Medium | Medium | Contracts are the explicit cross-domain interface; if context is needed, it's a contract gap |
|
|
501
|
+
| Task-level dispatch overhead (many small subagents vs fewer large ones) | Medium | Low | Subagent startup is fast (~1-2s). Net savings from avoiding compaction and context drift far outweigh overhead |
|
|
382
502
|
| Goal-backward verification is too slow (requires behavior analysis) | Medium | Medium | Scope to critical requirements only; skip for trivial tasks |
|
|
383
503
|
| Adaptive replanning causes infinite loops (new constraint → replan → new constraint) | Low | High | Max 2 replanning cycles per execute; after that, pause for user |
|
|
504
|
+
| Plan's "single context window" rule produces too many tiny tasks | Low | Medium | Target 70% ceiling, not 30%. Tasks should be meaningful units of work, just bounded in scope |
|
|
384
505
|
| Headless mode Claude Code API changes | Medium | Medium | Abstract Claude Code interaction behind adapter; pin to known-good version |
|
|
385
|
-
| Budget ceiling is too aggressive and blocks productive work | Low | Medium | Default ceiling is generous (80% of monthly); user can override per-milestone |
|
|
386
506
|
|
|
387
507
|
---
|
|
388
508
|
|
|
389
509
|
## 9. Dependencies
|
|
390
510
|
|
|
391
|
-
| Dependency | Type |
|
|
392
|
-
|
|
393
|
-
| Graph Engine (M20-M21) | Required predecessor |
|
|
394
|
-
| Claude Code worktree
|
|
395
|
-
| Claude Code subagent API | Required (existing) |
|
|
396
|
-
|
|
|
511
|
+
| Dependency | Type | Status |
|
|
512
|
+
|-----------|------|--------|
|
|
513
|
+
| Graph Engine (M20-M21) | Required predecessor | **DELIVERED** ✅ |
|
|
514
|
+
| Claude Code Agent tool (`isolation: "worktree"`) | Required (existing) | Available ✅ |
|
|
515
|
+
| Claude Code subagent API (Agent tool) | Required (existing) | Available ✅ |
|
|
516
|
+
| Claude Code piped mode (`claude -p`) | Required for M23 | Available ✅ |
|
|
517
|
+
| Git worktree support | Required (existing) | Standard git feature ✅ |
|
|
397
518
|
| Docker | Optional (M24 only) | Standard tooling |
|
|
398
519
|
|
|
399
520
|
---
|
|
@@ -402,11 +523,13 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
402
523
|
|
|
403
524
|
| Metric | Target |
|
|
404
525
|
|--------|--------|
|
|
405
|
-
| Context utilization per
|
|
526
|
+
| Context utilization per task subagent (fresh dispatch) | < 25% (down from 60-75%) |
|
|
527
|
+
| Compaction events per milestone | 0 (down from multiple) |
|
|
406
528
|
| File conflicts in parallel execution (worktree) | 0 (down from occasional) |
|
|
407
529
|
| Placeholder implementations caught (goal-backward) | ≥ 1 per milestone that gates missed |
|
|
408
530
|
| Plan revisions triggered (adaptive replan) | When execution reveals constraints |
|
|
409
|
-
|
|
|
531
|
+
| Orchestrator context utilization (execute) | < 10% (summaries only) |
|
|
532
|
+
| Token breakdown visibility | By domain, task, and phase |
|
|
410
533
|
| Headless execution success rate | ≥ 90% (exit code 0) |
|
|
411
534
|
| Headless query response time | < 100ms |
|
|
412
535
|
| Docker startup time | < 30 seconds |
|
|
@@ -417,20 +540,20 @@ milestone, project, prd, feature, discuss, setup, triage-and-merge, reflect, bra
|
|
|
417
540
|
|
|
418
541
|
| Milestone | Estimated Effort | Sequence |
|
|
419
542
|
|-----------|-----------------|----------|
|
|
420
|
-
| M22: GSD 2 Tier 1 | 5 domains, high complexity |
|
|
421
|
-
| M23: GSD 2 Tier 2 |
|
|
422
|
-
| M24: Docker
|
|
543
|
+
| M22: GSD 2 Tier 1 | 5 domains, high complexity | **COMPLETE** (2026-03-22) ✅ |
|
|
544
|
+
| M23: GSD 2 Tier 2 | 3 domains, medium complexity | After M22 |
|
|
545
|
+
| M24: Docker | 2 domains, low complexity | After M23 |
|
|
423
546
|
|
|
424
547
|
---
|
|
425
548
|
|
|
426
549
|
## 12. Relationship to Graph Engine (PRD-GRAPH-001)
|
|
427
550
|
|
|
428
|
-
The graph engine (M20-M21) is a **
|
|
551
|
+
The graph engine (M20-M21) is a **completed prerequisite** for GSD 2 enhancements. PRD-GRAPH-001 status: **DELIVERED**.
|
|
429
552
|
|
|
430
|
-
- **Fresh dispatch** uses graph context to build minimal, relevant prompts for each
|
|
553
|
+
- **Fresh dispatch** uses graph context to build minimal, relevant prompts for each task
|
|
431
554
|
- **Worktree isolation** uses graph to validate that no domain agent modified files outside its graph-defined ownership
|
|
432
555
|
- **Goal-backward** uses graph to trace requirement → code path → behavior chain
|
|
433
556
|
- **Adaptive replanning** uses graph to assess which remaining domains are affected by new constraints
|
|
434
|
-
- **Headless query**
|
|
557
|
+
- **Headless query** exposes graph data (entity counts, domain mapping) via `gsd-t headless query graph`
|
|
435
558
|
|
|
436
|
-
The trifecta (worktree + fresh dispatch + graph) is the foundation for everything in this PRD.
|
|
559
|
+
The trifecta (worktree + fresh dispatch + graph) is the foundation for everything in this PRD. The graph is now available — M22 can proceed.
|