@tekyzinc/gsd-t 2.74.10 → 2.74.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,38 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [2.74.12] - 2026-04-14
6
+
7
+ ### Fixed — Context-Burn Regression (P0, affects every GSD-T project)
8
+
9
+ **Root cause**: commit `0b91429` (2026-03-24) added an "orchestrator context self-check" that read `CLAUDE_CONTEXT_TOKENS_USED` / `CLAUDE_CONTEXT_TOKENS_MAX` — environment variables **Claude Code never exports**. The guard was always false, so the self-check was silently inert. Commits `da6d3ae` and `b68353e` then promoted Red Team and Design Verification from per-domain to per-task on the assumption that this guard would catch context drain. With the guard broken, per-task spawning of ~10K-token adversarial prompts drained sessions from 77% → 12% context in just 2 tasks (bee-poc reproducer).
10
+
11
+ **Fix ships as a comprehensive two-layer correction:**
12
+
13
+ #### Fix 1: Real task-count gate (replaces vaporware env-var check)
14
+ - **NEW `bin/task-counter.cjs`** — deterministic on-disk task counter. State: `.gsd-t/.task-counter`. Config: `.gsd-t/task-counter-config.json` (default limit: 5). Env override: `GSD_T_TASK_LIMIT`. Commands: `increment <kind>`, `status`, `reset`, `should-stop` (exit code 10 at limit). This is the real signal the old self-check *pretended* to be.
15
+ - **`commands/gsd-t-execute.md`** — Step 0 resets the counter; Step 3.5 calls `node bin/task-counter.cjs should-stop` as a gate before every task spawn; Step 5 increments after each task. At limit, the orchestrator checkpoints and STOPs — user runs `/clear` then `/user:gsd-t-resume`.
16
+ - **`commands/gsd-t-wave.md`** — analogous phase-count gate replaces the broken "Wave Orchestrator Context Self-Check."
17
+ - **`bin/token-budget.js`** — `getSessionStatus()` rewritten to read the task counter instead of env vars. API surface preserved (threshold/pct/consumed/estimated_remaining) so all dependent commands keep working. Graduated-degradation thresholds (warn/downgrade/conserve/stop) now fire on real signal.
18
+
19
+ #### Fix 2: Revert per-task Red Team / Design Verify, extract prompts to templates
20
+ - **NEW `templates/prompts/`** directory with three self-contained prompt files: `qa-subagent.md`, `red-team-subagent.md`, `design-verify-subagent.md`, plus a `README.md` explaining the architecture. Command files reference prompts by **file path**, not by inlining the body. Subagents read the prompt file themselves, so the orchestrator never re-materializes ~3500-token prompt bodies in its own context per spawn.
21
+ - **`commands/gsd-t-execute.md`** — Red Team and Design Verification moved back to **per-domain** (where they were before `da6d3ae` / `b68353e`). QA stays per-task (smaller, and contracts can drift task-by-task). Result: safe-task-count-per-session rises from ~5 to ~15+.
22
+ - **`commands/gsd-t-quick.md`, `gsd-t-integrate.md`, `gsd-t-debug.md`** — Red Team spawn blocks converted to templated-prompt references. ~270 lines of duplicated adversarial prompt boilerplate removed; run-specific categories (Cross-Domain Boundaries, Regression Around the Fix, Original Bug Variants) preserved as one-line context notes to the subagent.
23
+
24
+ #### Fix 3: Token-log schema & placeholder cleanup
25
+ - Removed `Tokens | Compacted | Ctx%` columns from the token-log schema (they always wrote `0 | null | N/A` because the env vars were never set). Added `Tasks-Since-Reset` as the real burn signal.
26
+ - Neutralized **70+ references** to `CLAUDE_CONTEXT_TOKENS_USED` / `CLAUDE_CONTEXT_TOKENS_MAX` across 14 command files. The 3 remaining references (gsd-t-execute.md, gsd-t-wave.md, gsd-t-doc-ripple.md) are historical-note mentions only. `scripts/gsd-t-heartbeat.js` and `scripts/gsd-t-statusline.js` still read the env vars but treat them as optional fallbacks that gracefully degrade (unchanged behavior).
27
+ - Test suite (`test/token-budget.test.js`) rewritten around the new counter-based `getSessionStatus()`. 36/36 passing.
28
+
29
+ ### Propagation
30
+ After publishing, run `/user:gsd-t-version-update-all` to propagate the fix to every registered GSD-T project. Projects will receive the new `bin/task-counter.cjs` and updated command files in a single sweep.
31
+
32
+ ## [2.74.11] - 2026-04-13
33
+
34
+ ### Fixed
35
+ - **`bin/archive-progress.js` → `.cjs` rename** — the new bin tools used CommonJS `require()` but failed in projects with `"type": "module"` in `package.json` (caught on BDS-Analytics-UI during first update-all). Renamed all three new bin tools to `.cjs` so they run as CommonJS regardless of the host project's module type. `version-update-all` now copies `.cjs` files and runs `archive-progress.cjs`.
36
+
5
37
  ## [2.74.10] - 2026-04-13
6
38
 
7
39
  ### Added
package/bin/gsd-t.js CHANGED
@@ -1557,8 +1557,9 @@ function updateSingleProject(projectDir, counts) {
1557
1557
  }
1558
1558
 
1559
1559
  // Bin tools that should ship with every registered project. Listed here so adding
1560
- // a new tool only requires appending to this array.
1561
- const PROJECT_BIN_TOOLS = ["archive-progress.js", "log-tail.js", "context-budget-audit.js"];
1560
+ // a new tool only requires appending to this array. Use .cjs extension so they
1561
+ // always run as CommonJS regardless of the project's package.json "type" field.
1562
+ const PROJECT_BIN_TOOLS = ["archive-progress.cjs", "log-tail.cjs", "context-budget-audit.cjs"];
1562
1563
 
1563
1564
  function copyBinToolsToProject(projectDir, projectName) {
1564
1565
  const projectBinDir = path.join(projectDir, "bin");
@@ -1611,7 +1612,7 @@ function runProgressArchiveMigration(projectDir, projectName) {
1611
1612
  const markerPath = path.join(projectDir, ".gsd-t", ".archive-migration-v1");
1612
1613
  if (fs.existsSync(markerPath)) return false;
1613
1614
 
1614
- const archiveScript = path.join(projectDir, "bin", "archive-progress.js");
1615
+ const archiveScript = path.join(projectDir, "bin", "archive-progress.cjs");
1615
1616
  if (!fs.existsSync(archiveScript)) return false;
1616
1617
 
1617
1618
  try {
@@ -0,0 +1,161 @@
1
+ #!/usr/bin/env node
2
+
3
+ /**
4
+ * GSD-T Task Counter — Real, deterministic context-burn gate.
5
+ *
6
+ * Replaces the broken CLAUDE_CONTEXT_TOKENS_USED self-check (which never
7
+ * worked because Claude Code does not export those env vars). Instead of
8
+ * trying to read the orchestrator's own token usage, we count completed
9
+ * task subagent spawns. After N tasks the orchestrator MUST checkpoint
10
+ * progress and STOP so the user can /clear and resume cleanly.
11
+ *
12
+ * State lives at .gsd-t/.task-counter (single JSON file). A counter
13
+ * persists across orchestrator runs until a /clear-then-resume cycle
14
+ * resets it via the `reset` command.
15
+ *
16
+ * Threshold defaults are conservative — five tasks per session before
17
+ * stop. Override via:
18
+ * - .gsd-t/task-counter-config.json { "limit": 8 }
19
+ * - env GSD_T_TASK_LIMIT=8
20
+ *
21
+ * Zero external dependencies (Node.js built-ins only).
22
+ *
23
+ * CLI usage (called from command markdown via `node bin/task-counter.cjs`):
24
+ * increment <kind> → bump counter, print JSON status
25
+ * status → print JSON status without bumping
26
+ * reset → clear counter (called after /clear resume)
27
+ * should-stop → exit 0 if okay to spawn, exit 10 if must stop
28
+ *
29
+ * JSON status shape:
30
+ * { count, limit, remaining, should_stop, started_at, last_kind }
31
+ */
32
+
33
+ const fs = require("fs");
34
+ const path = require("path");
35
+
36
+ const DEFAULT_LIMIT = 5;
37
+ const STATE_FILE = ".gsd-t/.task-counter";
38
+ const CONFIG_FILE = ".gsd-t/task-counter-config.json";
39
+
40
+ function projectDir() {
41
+ return process.cwd();
42
+ }
43
+
44
+ function statePath() {
45
+ return path.join(projectDir(), STATE_FILE);
46
+ }
47
+
48
+ function configPath() {
49
+ return path.join(projectDir(), CONFIG_FILE);
50
+ }
51
+
52
+ function readLimit() {
53
+ if (process.env.GSD_T_TASK_LIMIT) {
54
+ const n = parseInt(process.env.GSD_T_TASK_LIMIT, 10);
55
+ if (!isNaN(n) && n > 0) return n;
56
+ }
57
+ try {
58
+ const raw = fs.readFileSync(configPath(), "utf8");
59
+ const cfg = JSON.parse(raw);
60
+ if (cfg && typeof cfg.limit === "number" && cfg.limit > 0) return cfg.limit;
61
+ } catch (_) {}
62
+ return DEFAULT_LIMIT;
63
+ }
64
+
65
+ function readState() {
66
+ try {
67
+ const raw = fs.readFileSync(statePath(), "utf8");
68
+ const s = JSON.parse(raw);
69
+ return {
70
+ count: typeof s.count === "number" ? s.count : 0,
71
+ started_at: s.started_at || null,
72
+ last_kind: s.last_kind || null,
73
+ stopped: !!s.stopped,
74
+ };
75
+ } catch (_) {
76
+ return { count: 0, started_at: null, last_kind: null, stopped: false };
77
+ }
78
+ }
79
+
80
+ function writeState(s) {
81
+ const dir = path.dirname(statePath());
82
+ if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
83
+ fs.writeFileSync(statePath(), JSON.stringify(s, null, 2));
84
+ }
85
+
86
+ function buildStatus(state, limit) {
87
+ const remaining = Math.max(0, limit - state.count);
88
+ return {
89
+ count: state.count,
90
+ limit,
91
+ remaining,
92
+ should_stop: state.count >= limit || state.stopped,
93
+ started_at: state.started_at,
94
+ last_kind: state.last_kind,
95
+ };
96
+ }
97
+
98
+ function cmdIncrement(kind) {
99
+ const s = readState();
100
+ s.count += 1;
101
+ s.last_kind = kind || "task";
102
+ if (!s.started_at) s.started_at = new Date().toISOString();
103
+ const limit = readLimit();
104
+ if (s.count >= limit) s.stopped = true;
105
+ writeState(s);
106
+ return buildStatus(s, limit);
107
+ }
108
+
109
+ function cmdStatus() {
110
+ return buildStatus(readState(), readLimit());
111
+ }
112
+
113
+ function cmdReset() {
114
+ writeState({ count: 0, started_at: null, last_kind: null, stopped: false });
115
+ return buildStatus(readState(), readLimit());
116
+ }
117
+
118
+ function cmdShouldStop() {
119
+ return buildStatus(readState(), readLimit()).should_stop;
120
+ }
121
+
122
+ function main() {
123
+ const cmd = process.argv[2];
124
+ const arg = process.argv[3];
125
+ switch (cmd) {
126
+ case "increment": {
127
+ const status = cmdIncrement(arg);
128
+ process.stdout.write(JSON.stringify(status));
129
+ process.exit(status.should_stop ? 10 : 0);
130
+ }
131
+ case "status": {
132
+ process.stdout.write(JSON.stringify(cmdStatus()));
133
+ process.exit(0);
134
+ }
135
+ case "reset": {
136
+ process.stdout.write(JSON.stringify(cmdReset()));
137
+ process.exit(0);
138
+ }
139
+ case "should-stop": {
140
+ process.exit(cmdShouldStop() ? 10 : 0);
141
+ }
142
+ default: {
143
+ process.stderr.write(
144
+ "Usage: task-counter.cjs <increment|status|reset|should-stop> [kind]\n"
145
+ );
146
+ process.exit(2);
147
+ }
148
+ }
149
+ }
150
+
151
+ if (require.main === module) main();
152
+
153
+ module.exports = {
154
+ cmdIncrement,
155
+ cmdStatus,
156
+ cmdReset,
157
+ cmdShouldStop,
158
+ readLimit,
159
+ readState,
160
+ DEFAULT_LIMIT,
161
+ };
@@ -77,13 +77,45 @@ function estimateCost(model, taskType, options) {
77
77
  /**
78
78
  * @param {string} [projectDir]
79
79
  * @returns {{ consumed: number, estimated_remaining: number, pct: number, threshold: string }}
80
+ *
81
+ * v2.74.12: previously this read process.env.CLAUDE_CONTEXT_TOKENS_USED /
82
+ * CLAUDE_CONTEXT_TOKENS_MAX, which Claude Code does not export — so consumed
83
+ * was always 0 and threshold was always 'normal'. The graduated-degradation
84
+ * machinery downstream was inert. Now we synthesise a percent from the real
85
+ * task counter at .gsd-t/.task-counter, mapping 0..limit linearly to
86
+ * 0..100%. This keeps the API stable so commands that ask for thresholds
87
+ * (downgrade/conserve/stop) get a real signal.
80
88
  */
81
89
  function getSessionStatus(projectDir) {
82
- const maxTokens = parseInt(process.env.CLAUDE_CONTEXT_TOKENS_MAX || "200000", 10);
83
- const consumed = readSessionConsumed(projectDir);
84
- const pct = maxTokens > 0 ? Math.round((consumed / maxTokens) * 100 * 10) / 10 : 0;
90
+ const dir = projectDir || process.cwd();
91
+ const counter = readTaskCounter(dir);
92
+ const limit = counter.limit > 0 ? counter.limit : 5;
93
+ const consumed = counter.count;
94
+ const pct = Math.min(100, Math.round((consumed / limit) * 100 * 10) / 10);
85
95
  const threshold = resolveThreshold(pct);
86
- return { consumed, estimated_remaining: maxTokens - consumed, pct, threshold };
96
+ const estimated_remaining = Math.max(0, limit - consumed);
97
+ return { consumed, estimated_remaining, pct, threshold };
98
+ }
99
+
100
+ function readTaskCounter(dir) {
101
+ try {
102
+ const fp = path.join(dir, ".gsd-t", ".task-counter");
103
+ const raw = fs.readFileSync(fp, "utf8");
104
+ const s = JSON.parse(raw);
105
+ let limit = 5;
106
+ try {
107
+ const cfgRaw = fs.readFileSync(path.join(dir, ".gsd-t", "task-counter-config.json"), "utf8");
108
+ const cfg = JSON.parse(cfgRaw);
109
+ if (cfg && typeof cfg.limit === "number" && cfg.limit > 0) limit = cfg.limit;
110
+ } catch (_) {}
111
+ if (process.env.GSD_T_TASK_LIMIT) {
112
+ const n = parseInt(process.env.GSD_T_TASK_LIMIT, 10);
113
+ if (!isNaN(n) && n > 0) limit = n;
114
+ }
115
+ return { count: typeof s.count === "number" ? s.count : 0, limit };
116
+ } catch (_) {
117
+ return { count: 0, limit: 5 };
118
+ }
87
119
  }
88
120
 
89
121
  // ── recordUsage ──────────────────────────────────────────────────────────────
@@ -123,13 +155,16 @@ function getDegradationActions(projectDir) {
123
155
  * @returns {{ estimatedTokens: number, estimatedPct: number, feasible: boolean }}
124
156
  */
125
157
  function estimateMilestoneCost(remainingTasks, projectDir) {
126
- const { estimated_remaining } = getSessionStatus(projectDir);
127
- const maxTokens = parseInt(process.env.CLAUDE_CONTEXT_TOKENS_MAX || "200000", 10);
158
+ const status = getSessionStatus(projectDir);
159
+ const limit = status.consumed + status.estimated_remaining || 5;
128
160
  const estimatedTokens = remainingTasks.reduce((sum, t) => {
129
161
  return sum + estimateCost(t.model, t.taskType, { complexity: t.complexity, projectDir });
130
162
  }, 0);
131
- const estimatedPct = maxTokens > 0 ? Math.round((estimatedTokens / maxTokens) * 100 * 10) / 10 : 0;
132
- const feasible = estimatedTokens <= estimated_remaining * 0.8;
163
+ // Estimate task-equivalents needed for the remaining work: cost-weighted
164
+ // approximation against historical avg, capped at the configured task limit.
165
+ const taskEquivalents = remainingTasks.length;
166
+ const estimatedPct = limit > 0 ? Math.min(100, Math.round((taskEquivalents / limit) * 100 * 10) / 10) : 0;
167
+ const feasible = taskEquivalents <= status.estimated_remaining;
133
168
  return { estimatedTokens, estimatedPct, feasible };
134
169
  }
135
170
 
@@ -10,7 +10,7 @@ To keep the main conversation context lean, run audit via a Task subagent.
10
10
 
11
11
  **OBSERVABILITY LOGGING (MANDATORY):**
12
12
  Before spawning — run via Bash:
13
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
13
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
14
14
 
15
15
  Spawn a fresh subagent using the Task tool:
16
16
  ```
@@ -22,12 +22,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-au
22
22
  ```
23
23
 
24
24
  After subagent returns — run via Bash:
25
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
26
- Compute tokens and compaction:
27
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
28
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
25
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
29
26
  Append to `.gsd-t/token-log.md` (create with header if missing):
30
- `| {DT_START} | {DT_END} | gsd-t-audit | Step 0 | sonnet | {DURATION}s | audit: {args summary} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
27
+ `| {DT_START} | {DT_END} | gsd-t-audit | Step 0 | sonnet | {DURATION}s | audit: {args summary} | | | {COUNTER} |`
31
28
 
32
29
  Relay the subagent's summary to the user. **Do not execute Steps 1–5 yourself.**
33
30
 
@@ -90,7 +90,7 @@ Before drawing any conclusions or presenting final insights, spawn a team of par
90
90
 
91
91
  **OBSERVABILITY LOGGING (MANDATORY):**
92
92
  Before spawning the team — run via Bash:
93
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
93
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
94
94
 
95
95
  ```
96
96
  Spawn a deep research team (run all three in parallel):
@@ -123,12 +123,9 @@ Do NOT proceed to Step 5 until this synthesis is complete.
123
123
  ```
124
124
 
125
125
  After team completes — run via Bash:
126
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
127
- Compute tokens and compaction:
128
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
129
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
130
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
131
- `| {DT_START} | {DT_END} | gsd-t-brainstorm | Step 3 | sonnet | {DURATION}s | deep research: {topic summary} | {TOKENS} | {COMPACTED} |`
126
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
127
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
128
+ `| {DT_START} | {DT_END} | gsd-t-brainstorm | Step 3 | sonnet | {DURATION}s | deep research: {topic summary} | {COUNTER} |`
132
129
 
133
130
  ## Step 4: Capture the Sparks
134
131
 
@@ -72,7 +72,7 @@ If STACK_RULES is empty (no templates/stacks/ dir or no matches), skip silently.
72
72
 
73
73
  **OBSERVABILITY LOGGING (MANDATORY):**
74
74
  Before spawning — run via Bash:
75
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
75
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
76
76
 
77
77
  Spawn a fresh subagent using the Task tool:
78
78
  ```
@@ -83,12 +83,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-de
83
83
  ```
84
84
 
85
85
  After subagent returns — run via Bash:
86
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
87
- Compute tokens and compaction:
88
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
89
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
90
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
91
- `| {DT_START} | {DT_END} | gsd-t-debug | Step 0 | sonnet | {DURATION}s | debug: {issue summary} | {TOKENS} | {COMPACTED} |`
86
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
87
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
88
+ `| {DT_START} | {DT_END} | gsd-t-debug | Step 0 | sonnet | {DURATION}s | debug: {issue summary} | {COUNTER} |`
92
89
 
93
90
  Relay the subagent's summary to the user. **Do not execute Steps 1–5 yourself.**
94
91
 
@@ -124,7 +121,7 @@ The current approach has failed 3+ times. This means the root cause is not yet u
124
121
 
125
122
  **OBSERVABILITY LOGGING (MANDATORY):**
126
123
  Before spawning — run via Bash:
127
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
124
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
128
125
 
129
126
  ```
130
127
  Spawn a deep research team (run all three in parallel):
@@ -153,12 +150,9 @@ Lead: Wait for all three researchers to complete. Then synthesize:
153
150
  ```
154
151
 
155
152
  After team completes — run via Bash:
156
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
157
- Compute tokens and compaction:
158
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
159
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
153
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
160
154
  Append to `.gsd-t/token-log.md`:
161
- `| {DT_START} | {DT_END} | gsd-t-debug | Step 1.5 | sonnet | {DURATION}s | deep research loop break: {issue summary} | {TOKENS} | {COMPACTED} |`
155
+ `| {DT_START} | {DT_END} | gsd-t-debug | Step 1.5 | sonnet | {DURATION}s | deep research loop break: {issue summary} | {COUNTER} |`
162
156
 
163
157
  **STOP. Present findings to the user before making any changes:**
164
158
 
@@ -378,98 +372,30 @@ Commit: `[debug] Fix {description} — root cause: {explanation}`
378
372
 
379
373
  ## Step 5.3: Red Team — Adversarial QA (MANDATORY)
380
374
 
381
- After the fix passes all tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the fix and find regressions. Its success is measured by bugs found, not tests passed.
375
+ After the fix passes all tests, spawn an adversarial Red Team agent to BREAK the fix and find regressions.
382
376
 
383
- ⚙ [{model}] Red Team → adversarial validation of debug fix
384
-
385
- **OBSERVABILITY LOGGING (MANDATORY):**
386
- Before spawning — run via Bash:
387
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
377
+ ⚙ [opus] Red Team → adversarial validation of debug fix
388
378
 
379
+ Resolve the templated prompt path via Bash:
389
380
  ```
390
- Task subagent (general-purpose, model: opus):
391
- "You are a Red Team QA adversary. Your job is to BREAK the fix that was just applied.
392
-
393
- Your value is measured by REAL bugs found. More bugs = more value.
394
- If you find zero bugs, you must prove you were thorough — list every
395
- attack vector you tried and why it didn't break. A short list means
396
- you didn't try hard enough.
397
-
398
- Rules:
399
- - False positives DESTROY your credibility. If you report something
400
- as a bug and it's actually correct behavior, that's worse than
401
- missing a real bug. Never report something you haven't reproduced.
402
- - Style opinions are not bugs. Theoretical concerns are not bugs.
403
- A bug is: 'I did X, expected Y, got Z.' With proof.
404
- - You are done ONLY when you have exhausted every category below
405
- and either found a bug or documented exactly what you tried.
406
-
407
- ## Attack Categories (exhaust ALL of these)
408
-
409
- 1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
410
- match every contract? Test each endpoint/interface/schema shape.
411
- 2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
412
- special characters, SQL injection attempts, XSS payloads, path traversal.
413
- 3. **State Transitions**: What happens when actions are performed out of
414
- order? Double-submit? Concurrent access? Refresh mid-flow?
415
- 4. **Error Paths**: Remove env vars. Kill the database. Send malformed
416
- requests. Does the code handle failures gracefully or crash?
417
- 5. **Regression Around the Fix**: The fix changed specific code. Test
418
- every adjacent code path. Fixes frequently break neighboring functionality.
419
- 6. **Original Bug Variants**: The original bug was found. Are there SIMILAR
420
- bugs in related code? Same pattern, different location?
421
- 7. **Full Suite**: Run the FULL test suite. Did the fix break anything else?
422
- 8. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
423
- behavior (state changes, data loaded, navigation works) or just check
424
- that elements exist? Flag and rewrite any shallow/layout tests.
425
-
426
- ## Exploratory Testing (if Playwright MCP available)
427
-
428
- After all scripted tests pass:
429
- 1. Check if Playwright MCP is registered in Claude Code settings (look for "playwright" in mcpServers)
430
- 2. If available: spend 5 minutes on adversarial interactive exploration using Playwright MCP
431
- - Focus on the fixed area and adjacent code — regressions often lurk nearby
432
- - Try the original bug reproduction path to confirm it is truly fixed
433
- - Probe for variant bugs: same pattern in related code paths
434
- 3. Tag all findings [EXPLORATORY] in your report
435
- 4. If Playwright MCP is not available: skip this section silently
436
- Note: Exploratory findings are additive — they do not replace scripted test results.
437
-
438
- ## Report Format
439
-
440
- For each bug found:
441
- - **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
442
- - **Reproduction**: {exact steps to reproduce}
443
- - **Expected**: {what should happen}
444
- - **Actual**: {what actually happens}
445
- - **Proof**: {test file or command that demonstrates the bug}
446
-
447
- Summary:
448
- - BUGS FOUND: {count} (with severity breakdown)
449
- - COVERAGE GAPS: {untested flows from requirements}
450
- - SHALLOW TESTS REWRITTEN: {count}
451
- - CONTRACTS VERIFIED: {N}/{total}
452
- - ATTACK VECTORS TRIED: {list every category attempted and results}
453
- - VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
454
-
455
- Write all findings to .gsd-t/red-team-report.md.
456
- If bugs found, also append to .gsd-t/qa-issues.md."
381
+ RT_PROMPT="$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t/templates/prompts/red-team-subagent.md"
382
+ [ -f "$RT_PROMPT" ] || RT_PROMPT="templates/prompts/red-team-subagent.md"
383
+ T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
457
384
  ```
458
385
 
386
+ Spawn Task subagent (general-purpose, model: opus):
387
+ > "Read `$RT_PROMPT` and follow it. Context: post-fix validation for a debug session. **Additional categories for this run:** (a) **Regression Around the Fix** — test every code path adjacent to the changed lines; fixes frequently break neighboring functionality. (b) **Original Bug Variants** — the original bug was {one-line description}; search for SIMILAR bugs in related code (same pattern, different location). Write findings to `.gsd-t/red-team-report.md`."
388
+
459
389
  After subagent returns — run via Bash:
460
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
461
- Compute tokens and compaction:
462
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
463
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
390
+ ```
391
+ T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))
392
+ COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")
393
+ ```
464
394
  Append to `.gsd-t/token-log.md`:
465
- `| {DT_START} | {DT_END} | gsd-t-debug | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
466
-
467
- **If Red Team VERDICT is FAIL:**
468
- 1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
469
- 2. Re-run Red Team after fixes
470
- 3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
395
+ `| {DT_START} | {DT_END} | gsd-t-debug | Red Team | opus | {DURATION}s | {VERDICT} — {N} bugs found | | | {COUNTER} |`
471
396
 
472
- **If Red Team VERDICT is GRUDGING PASS:** Proceed to metrics and doc-ripple.
397
+ **If FAIL:** fix CRITICAL/HIGH bugs (≤2 cycles) re-run. Persistent bugs → `.gsd-t/deferred-items.md`.
398
+ **If GRUDGING PASS:** proceed to metrics and doc-ripple.
473
399
 
474
400
  ## Step 5.5: Emit Task Metrics
475
401
 
@@ -360,7 +360,7 @@ After writing all contracts but BEFORE proceeding to partition or build, spawn a
360
360
 
361
361
  **OBSERVABILITY LOGGING (MANDATORY):**
362
362
  Before spawning — run via Bash:
363
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
363
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
364
364
 
365
365
  ⚙ [opus] gsd-t-design-decompose → Chart Classification Verifier
366
366
 
@@ -444,7 +444,7 @@ If ALL ✅ MATCH:
444
444
  ```
445
445
 
446
446
  After subagent returns — run via Bash:
447
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
447
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
448
448
 
449
449
  Compute tokens/compaction per standard pattern. Append to `.gsd-t/token-log.md`.
450
450
 
@@ -53,7 +53,7 @@ If the user requests team exploration or there are 3+ complex open questions:
53
53
 
54
54
  **OBSERVABILITY LOGGING (MANDATORY):**
55
55
  Before spawning the team — run via Bash:
56
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
56
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
57
57
 
58
58
  ```
59
59
  Create an agent team:
@@ -72,12 +72,9 @@ Lead: Synthesize into decisions and update contracts.
72
72
  ```
73
73
 
74
74
  After team completes — run via Bash:
75
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
76
- Compute tokens and compaction:
77
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
78
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
79
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
80
- `| {DT_START} | {DT_END} | gsd-t-discuss | Step 3 | sonnet | {DURATION}s | team discuss: {topic summary} | {TOKENS} | {COMPACTED} |`
75
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
76
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
77
+ `| {DT_START} | {DT_END} | gsd-t-discuss | Step 3 | sonnet | {DURATION}s | team discuss: {topic summary} | {COUNTER} |`
81
78
 
82
79
  Assign teammates based on the nature of the questions:
83
80
  - **Technical choice** (e.g., which database): one advocate per option + critic
@@ -93,24 +93,16 @@ For each document or logical group:
93
93
  **OBSERVABILITY LOGGING (MANDATORY) — for each subagent spawn:**
94
94
 
95
95
  Before spawning — run via Bash:
96
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
96
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
97
97
 
98
98
  After subagent returns — run via Bash:
99
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
99
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
100
100
 
101
- Compute tokens:
102
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
103
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
101
+ Read the task counter (deterministic context-burn signal):
102
+ `COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")`
104
103
 
105
- Compute context utilization:
106
- `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
107
-
108
- Alert thresholds:
109
- - CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
110
- - CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting."`
111
-
112
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
113
- `| {DT_START} | {DT_END} | gsd-t-doc-ripple | Step 5 | {model} | {DURATION}s | update:{document} | {TOKENS} | {COMPACTED} | doc-ripple | — | {CTX_PCT} |`
104
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
105
+ `| {DT_START} | {DT_END} | gsd-t-doc-ripple | Step 5 | {model} | {DURATION}s | update:{document} | doc-ripple | | {COUNTER} |`
114
106
 
115
107
  **Each document-update subagent prompt:**
116
108
  ```