npm - @tekyzinc/gsd-t - Versions diffs - 2.74.10 → 2.74.12 - Mend

@tekyzinc/gsd-t 2.74.10 → 2.74.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +32 -0
package/bin/gsd-t.js +4 -3
package/bin/task-counter.cjs +161 -0
package/bin/token-budget.js +43 -8
package/commands/gsd-t-audit.md +3 -6
package/commands/gsd-t-brainstorm.md +4 -7
package/commands/gsd-t-debug.md +23 -97
package/commands/gsd-t-design-decompose.md +2 -2
package/commands/gsd-t-discuss.md +4 -7
package/commands/gsd-t-doc-ripple.md +6 -14
package/commands/gsd-t-execute.md +121 -411
package/commands/gsd-t-integrate.md +20 -97
package/commands/gsd-t-plan.md +4 -12
package/commands/gsd-t-prd.md +4 -7
package/commands/gsd-t-quick.md +22 -87
package/commands/gsd-t-reflect.md +4 -4
package/commands/gsd-t-verify.md +7 -13
package/commands/gsd-t-visualize.md +4 -4
package/commands/gsd-t-wave.md +36 -23
package/package.json +1 -1
package/templates/prompts/README.md +30 -0
package/templates/prompts/design-verify-subagent.md +99 -0
package/templates/prompts/qa-subagent.md +26 -0
package/templates/prompts/red-team-subagent.md +44 -0
/package/bin/{archive-progress.js → archive-progress.cjs} +0 -0
/package/bin/{context-budget-audit.js → context-budget-audit.cjs} +0 -0
/package/bin/{log-tail.js → log-tail.cjs} +0 -0

package/commands/gsd-t-integrate.md CHANGED Viewed

@@ -170,19 +170,11 @@ Note: Exploratory findings do NOT count against the scripted test pass/fail rati
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Compute context utilization — run via Bash:
-`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
-Alert on context thresholds (display to user inline):
-- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
-- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | | | {COUNTER} |`
 If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {severity} | {finding} |`
@@ -220,99 +212,30 @@ After integration and doc ripple, verify everything works together:
 ## Step 7.5: Red Team — Adversarial QA (MANDATORY)
-After integration tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the integrated system. Its success is measured by bugs found, not tests passed.
+After integration tests pass, spawn an adversarial Red Team agent on the integrated system. Success is measured by bugs found, not tests passed.
-⚙ [{model}] Red Team → adversarial validation of integrated system
-**OBSERVABILITY LOGGING (MANDATORY):**
-Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+⚙ [opus] Red Team → adversarial validation of integrated system
+Resolve the templated prompt path via Bash:
 ```
-Task subagent (general-purpose, model: opus):
-"You are a Red Team QA adversary. Your job is to BREAK the integrated system.
-Your value is measured by REAL bugs found. More bugs = more value.
-If you find zero bugs, you must prove you were thorough — list every
-attack vector you tried and why it didn't break. A short list means
-you didn't try hard enough.
-Rules:
-- False positives DESTROY your credibility. If you report something
-  as a bug and it's actually correct behavior, that's worse than
-  missing a real bug. Never report something you haven't reproduced.
-- Style opinions are not bugs. Theoretical concerns are not bugs.
-  A bug is: 'I did X, expected Y, got Z.' With proof.
-- You are done ONLY when you have exhausted every category below
-  and either found a bug or documented exactly what you tried.
-## Attack Categories (exhaust ALL of these)
-1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
-   match every contract? Test each endpoint/interface/schema shape.
-2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
-   special characters, SQL injection attempts, XSS payloads, path traversal.
-3. **State Transitions**: What happens when actions are performed out of
-   order? Double-submit? Concurrent access? Refresh mid-flow?
-4. **Error Paths**: Remove env vars. Kill the database. Send malformed
-   requests. Does the code handle failures gracefully or crash?
-5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
-   exist in requirements but have NO test coverage? Write tests for them.
-6. **Regression**: Run the FULL test suite. Did any existing tests break?
-7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
-   behavior (state changes, data loaded, navigation works) or just check
-   that elements exist? Flag and rewrite any shallow/layout tests.
-8. **Cross-Domain Boundaries**: Test data flow across EVERY domain boundary.
-   Does data arriving from domain A get validated by domain B? What happens
-   when domain A sends malformed data that passed A's own validation?
-## Exploratory Testing (if Playwright MCP available)
-After all scripted tests pass:
-1. Check if Playwright MCP is registered in Claude Code settings (look for "playwright" in mcpServers)
-2. If available: spend 5 minutes on adversarial interactive exploration using Playwright MCP
-   - Attempt race conditions, double-submits, concurrent access patterns
-   - Try unexpected input sequences, boundary values, rapid state transitions
-   - Probe error recovery: does the app recover after failures or get stuck?
-3. Tag all findings [EXPLORATORY] in your report
-4. If Playwright MCP is not available: skip this section silently
-Note: Exploratory findings are additive — they do not replace scripted test results.
-## Report Format
-For each bug found:
-- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
-  - **Reproduction**: {exact steps to reproduce}
-  - **Expected**: {what should happen}
-  - **Actual**: {what actually happens}
-  - **Proof**: {test file or command that demonstrates the bug}
-Summary:
-- BUGS FOUND: {count} (with severity breakdown)
-- COVERAGE GAPS: {untested flows from requirements}
-- SHALLOW TESTS REWRITTEN: {count}
-- CONTRACTS VERIFIED: {N}/{total}
-- ATTACK VECTORS TRIED: {list every category attempted and results}
-- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
-Write all findings to .gsd-t/red-team-report.md.
-If bugs found, also append to .gsd-t/qa-issues.md."
+RT_PROMPT="$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t/templates/prompts/red-team-subagent.md"
+[ -f "$RT_PROMPT" ] || RT_PROMPT="templates/prompts/red-team-subagent.md"
+T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
 ```
+Spawn Task subagent (general-purpose, model: opus):
+> "Read `$RT_PROMPT` and follow it. Context: cross-domain integration run. **Additional category for this run: Cross-Domain Boundaries** — test data flow across every domain boundary; does data arriving from domain A get validated by domain B; what happens when A sends malformed data that passed A's own validation. Write findings to `.gsd-t/red-team-report.md`."
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+```
+T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))
+COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")
+```
 Append to `.gsd-t/token-log.md`:
-`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
-**If Red Team VERDICT is FAIL:**
-1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
-2. Re-run Red Team after fixes
-3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | opus | {DURATION}s | {VERDICT} — {N} bugs found | | | {COUNTER} |`
-**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
+**If FAIL:** fix CRITICAL/HIGH bugs (≤2 cycles) → re-run. Persistent bugs → `.gsd-t/deferred-items.md`.
+**If GRUDGING PASS:** proceed to doc-ripple.
 ## Step 8: Doc-Ripple (Automated)

package/commands/gsd-t-plan.md CHANGED Viewed

@@ -374,19 +374,11 @@ Report: PASS (all checks pass) or FAIL with specific gaps listed."
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Compute context utilization — run via Bash:
-`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
-Alert on context thresholds (display to user inline):
-- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
-- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | | | {COUNTER} |`
 If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
 `| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`

package/commands/gsd-t-prd.md CHANGED Viewed

@@ -12,7 +12,7 @@ To give PRD generation a fresh context window:
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 Spawn a fresh subagent using the Task tool:
 ```
@@ -23,12 +23,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-pr
 ```
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-prd | Step 0 | sonnet | {DURATION}s | prd: {topic summary} | {TOKENS} | {COMPACTED} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-prd | Step 0 | sonnet | {DURATION}s | prd: {topic summary} | {COUNTER} |`
 Relay the subagent's summary to the user. **Do not execute Steps 1–6 yourself.**

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -10,7 +10,7 @@ To give this task a fresh context window and prevent compaction during consecuti
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 **Token Budget Check (before spawning subagent):**
@@ -95,12 +95,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-qu
 ```
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-quick | Step 0 | sonnet | {DURATION}s | quick: {task summary} | {TOKENS} | {COMPACTED} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-quick | Step 0 | sonnet | {DURATION}s | quick: {task summary} | {COUNTER} |`
 Relay the subagent's summary to the user. **Do not execute Steps 1–5 yourself.**
@@ -262,7 +259,7 @@ If it DOES exist and this task involved UI changes — spawn the Design Verifica
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 ```
 Task subagent (general-purpose, model: opus):
@@ -327,96 +324,34 @@ After subagent returns — run observability Bash and append to token-log.md.
 ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
-After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.
+After tests pass, spawn an adversarial Red Team agent. Its success is measured by bugs found, not tests passed.
-⚙ [{model}] Red Team → adversarial validation of quick task
-**OBSERVABILITY LOGGING (MANDATORY):**
-Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+⚙ [opus] Red Team → adversarial validation of quick task
+Resolve the templated prompt path via Bash (same pattern as execute.md):
 ```
-Task subagent (general-purpose, model: opus):
-"You are a Red Team QA adversary. Your job is to BREAK the code that was just changed.
-Your value is measured by REAL bugs found. More bugs = more value.
-If you find zero bugs, you must prove you were thorough — list every
-attack vector you tried and why it didn't break. A short list means
-you didn't try hard enough.
-Rules:
-- False positives DESTROY your credibility. If you report something
-  as a bug and it's actually correct behavior, that's worse than
-  missing a real bug. Never report something you haven't reproduced.
-- Style opinions are not bugs. Theoretical concerns are not bugs.
-  A bug is: 'I did X, expected Y, got Z.' With proof.
-- You are done ONLY when you have exhausted every category below
-  and either found a bug or documented exactly what you tried.
-## Attack Categories (exhaust ALL of these)
-1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
-   match every contract? Test each endpoint/interface/schema shape.
-2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
-   special characters, SQL injection attempts, XSS payloads, path traversal.
-3. **State Transitions**: What happens when actions are performed out of
-   order? Double-submit? Concurrent access? Refresh mid-flow?
-4. **Error Paths**: Remove env vars. Kill the database. Send malformed
-   requests. Does the code handle failures gracefully or crash?
-5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
-   exist in requirements but have NO test coverage? Write tests for them.
-6. **Regression**: Run the FULL test suite. Did any existing tests break?
-7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
-   behavior (state changes, data loaded, navigation works) or just check
-   that elements exist? Flag and rewrite any shallow/layout tests.
-## Exploratory Testing (if Playwright MCP available)
-After all scripted tests pass:
-1. Check if Playwright MCP is registered in Claude Code settings (look for "playwright" in mcpServers)
-2. If available: spend 5 minutes on adversarial interactive exploration using Playwright MCP
-   - Attempt race conditions, double-submits, concurrent access patterns
-   - Try unexpected input sequences, boundary values, rapid state transitions
-   - Probe error recovery: does the app recover after failures or get stuck?
-3. Tag all findings [EXPLORATORY] in your report
-4. If Playwright MCP is not available: skip this section silently
-Note: Exploratory findings are additive — they do not replace scripted test results.
-## Report Format
-For each bug found:
-- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
-  - **Reproduction**: {exact steps to reproduce}
-  - **Expected**: {what should happen}
-  - **Actual**: {what actually happens}
-  - **Proof**: {test file or command that demonstrates the bug}
-Summary:
-- BUGS FOUND: {count} (with severity breakdown)
-- COVERAGE GAPS: {untested flows from requirements}
-- SHALLOW TESTS REWRITTEN: {count}
-- CONTRACTS VERIFIED: {N}/{total}
-- ATTACK VECTORS TRIED: {list every category attempted and results}
-- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
-Write all findings to .gsd-t/red-team-report.md.
-If bugs found, also append to .gsd-t/qa-issues.md."
+RT_PROMPT="$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t/templates/prompts/red-team-subagent.md"
+[ -f "$RT_PROMPT" ] || RT_PROMPT="templates/prompts/red-team-subagent.md"
+T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
 ```
+Spawn Task subagent (general-purpose, model: opus):
+> "Read `$RT_PROMPT` and follow it. Context for this run: quick task — adversarial validation of the code just changed. Write findings to `.gsd-t/red-team-report.md`."
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+```
+T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))
+COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")
+```
 Append to `.gsd-t/token-log.md`:
-`| {DT_START} | {DT_END} | gsd-t-quick | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+`| {DT_START} | {DT_END} | gsd-t-quick | Red Team | opus | {DURATION}s | {VERDICT} — {N} bugs found | | | {COUNTER} |`
 **If Red Team VERDICT is FAIL:**
-1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
+1. Fix all CRITICAL and HIGH bugs (up to 2 fix cycles)
 2. Re-run Red Team after fixes
-3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+3. If bugs persist, log to `.gsd-t/deferred-items.md` and present to user
-**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
+**If GRUDGING PASS:** Proceed to doc-ripple.
 ## Step 6: Doc-Ripple (Automated)

package/commands/gsd-t-reflect.md CHANGED Viewed

@@ -9,7 +9,7 @@ When invoked directly by the user, spawn yourself as a Task subagent for a fresh
 **OBSERVABILITY LOGGING — before spawning:**
 Run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 ```
 Task subagent (general-purpose, model: sonnet):
@@ -21,14 +21,14 @@ Skip Step 0 — you are already the subagent."
 **OBSERVABILITY LOGGING — after subagent returns:**
 Run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
 Compute tokens:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-reflect | Step 0 | sonnet | {DURATION}s | retrospective generated | {TOKENS} | {COMPACTED} |`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-reflect | Step 0 | sonnet | {DURATION}s | retrospective generated | {COUNTER} |`
 Return the subagent's output and stop. Only skip Step 0 if you are already running as a subagent.

package/commands/gsd-t-verify.md CHANGED Viewed

@@ -158,15 +158,12 @@ Teammate assignments:
 Lead: After receiving teammate reports:
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 Spawn a Task subagent to run the full test suite and contract audit.
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {COUNTER} |`
 Collect all reports, synthesize, create remediation plan.
 ```
@@ -348,7 +345,7 @@ If status is VERIFIED or VERIFIED-WITH-WARNINGS:
 **OBSERVABILITY LOGGING (MANDATORY):**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
 ```
@@ -370,12 +367,9 @@ Report back: one-line status summary."
 ```
 After subagent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
 Append to `.gsd-t/token-log.md`:
-`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | | | {COUNTER} |`
 3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.

package/commands/gsd-t-visualize.md CHANGED Viewed

@@ -9,7 +9,7 @@ When invoked directly by the user, spawn yourself as a Task subagent for a fresh
 **OBSERVABILITY LOGGING — before spawning:**
 Run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 ```
 Task subagent (general-purpose, model: sonnet):
@@ -21,14 +21,14 @@ Skip Step 0 — you are already the subagent."
 **OBSERVABILITY LOGGING — after subagent returns:**
 Run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
 Compute tokens:
 - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
 - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-visualize | Step 0 | sonnet | {DURATION}s | dashboard launched | {TOKENS} | {COMPACTED} |`
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-visualize | Step 0 | sonnet | {DURATION}s | dashboard launched | {COUNTER} |`
 Return the subagent's output and stop. Only skip Step 0 if you are already running as a subagent.

package/commands/gsd-t-wave.md CHANGED Viewed

@@ -2,6 +2,16 @@
 You are the wave orchestrator. You do NOT execute phases yourself. Instead, you spawn an **independent agent for each phase**, giving each a fresh context window. This eliminates context accumulation across phases and prevents mid-wave compaction.
+## Step 0: Reset Phase-Count Gate (MANDATORY — first thing in a fresh session)
+Run via Bash:
+```bash
+node bin/task-counter.cjs reset
+```
+This clears `.gsd-t/.task-counter` so the new wave session starts at 0. The gate logic is in the Phase Agent Spawn Pattern below — it forces a /clear-and-resume after N phase spawns to prevent the wave orchestrator from itself running out of context. Default N=5, override per-project via `.gsd-t/task-counter-config.json` (`{"limit":8}`) or env `GSD_T_TASK_LIMIT=8`.
 ## Step 1: Load State (Lightweight)
 Read ONLY:
@@ -93,7 +103,7 @@ Run via Bash:
 **OBSERVABILITY LOGGING (MANDATORY) — repeat for every phase spawn:**
 Before spawning — run via Bash:
-`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
 ```
 Task agent (subagent_type: "general-purpose", mode: "bypassPermissions"):
@@ -114,28 +124,31 @@ Task agent (subagent_type: "general-purpose", mode: "bypassPermissions"):
 ```
 After phase agent returns — run via Bash:
-`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
-Compute tokens and compaction:
-- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
-- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
-Compute context utilization — run via Bash:
-`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
-Alert on context thresholds (display to user inline):
-- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
-- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
-**Orchestrator Context Self-Check (MANDATORY):**
-After EVERY phase agent returns, check the wave orchestrator's own context:
-- **If CTX_PCT >= 70:**
-  1. Save checkpoint to `.gsd-t/progress.md` — record which phases are complete, which remain
-  2. Output: `⚠️ Wave orchestrator context at {CTX_PCT}% — approaching limit. Progress saved. Run /clear then /user:gsd-t-wave to continue from the next phase.`
-  3. **STOP the wave loop.** Do NOT spawn the next phase agent. The next session resumes from saved state.
-- **If CTX_PCT < 70:** Continue to next phase.
-This prevents the wave orchestrator from running out of context mid-wave.
-Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
-`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
+**Wave Orchestrator Phase-Count Gate (MANDATORY) — replaces the broken context-percent check from v2.74.x:**
+Run via Bash AFTER each phase agent returns:
+```bash
+node bin/task-counter.cjs increment phase
+```
+Read the JSON status the command prints. If `should_stop` is `true` (or the command's exit code is `10`):
+1. Save checkpoint to `.gsd-t/progress.md` — record which phases are complete, which remain.
+2. Output exactly: `⏸️ Wave orchestrator phase-count gate reached ({count}/{limit} phases in this session). Progress saved. Run /clear then /user:gsd-t-wave to continue from the next phase.`
+3. **STOP the wave loop.** Do NOT spawn the next phase agent. The next session resumes from saved state.
+The wave orchestrator shares the same `bin/task-counter.cjs` counter as the execute orchestrator. Each phase spawn (PARTITION, DISCUSS, PLAN, IMPACT, EXECUTE, TEST-SYNC, INTEGRATE, VERIFY+COMPLETE, DOC-RIPPLE) increments the counter by 1. With the default limit of 5, a wave will run at most 5 phase agents per session before forcing a /clear-and-resume — typically two sessions per full wave. Override via `.gsd-t/task-counter-config.json` (`{"limit":8}`) or `GSD_T_TASK_LIMIT=8`.
+The previous version of this gate relied on `CLAUDE_CONTEXT_TOKENS_USED`/`_MAX` env vars which Claude Code does not export — that check was inert and let the orchestrator drain context until forced compaction. The deterministic on-disk counter has the same intent (force a /clear before context runs out) but actually works.
+**On wave entry**, the wave orchestrator runs `node bin/task-counter.cjs reset` exactly once (see Step 0 — it is the very first thing the wave does in a fresh session). The reset is the SIGNAL that this is a clean post-/clear session.
+Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
+`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | | | {COUNTER} |`
+Where `{COUNTER}` is the `count` field from the JSON the increment command just printed.
 ### Phase Sequence

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "2.74.10",
+  "version": "2.74.12",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 56 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",

package/templates/prompts/README.md ADDED Viewed

@@ -0,0 +1,30 @@
+# GSD-T Subagent Prompt Templates
+This directory holds long-form prompts that used to be inlined into command markdown files. Inlining them caused massive context burn — each Task subagent spawn re-materialized ~3000 tokens of prompt boilerplate, dozens of times per milestone.
+Now command files reference these by path. The orchestrator passes the file path to the subagent, the subagent reads it itself. The orchestrator never holds the full prompt in its own context.
+## Files
+| File | Purpose | Run frequency |
+|------|---------|---------------|
+| `qa-subagent.md` | Test generation, execution, gap reporting | Per task |
+| `red-team-subagent.md` | Adversarial bug hunting | Per domain (NOT per task) |
+| `design-verify-subagent.md` | Visual element-by-element design audit | Per domain (NOT per task) |
+## Why per-domain instead of per-task
+Red Team and Design Verification were originally per-domain. They were promoted to per-task by commits `da6d3ae` and `b68353e`, on the assumption that the orchestrator's `CLAUDE_CONTEXT_TOKENS_USED` self-check would catch context drain before it got bad. That env var is never set by Claude Code — the self-check was vaporware. With it inert, per-task spawning of ~10k-token Red Team subagents drained sessions in 5-10 tasks. Reverting them to per-domain raises the safe task count from ~5 to ~15+.
+QA stays per-task because (a) it's much smaller, (b) it grounds against contracts which can drift task by task.
+## Adding a new prompt
+Write the prompt as a self-contained markdown file in this directory. Reference it from the command file with:
+```
+Spawn Task subagent (general-purpose, model: <model>):
+"Read `templates/prompts/<your-prompt>.md` and follow it. Context for this run: <one-line context>."
+```
+Keep the inline context to one line. The prompt body must live in the file, not the command.