@tekyzinc/gsd-t 2.74.10 → 2.74.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -0
- package/bin/gsd-t.js +4 -3
- package/bin/task-counter.cjs +161 -0
- package/bin/token-budget.js +43 -8
- package/commands/gsd-t-audit.md +3 -6
- package/commands/gsd-t-brainstorm.md +4 -7
- package/commands/gsd-t-debug.md +23 -97
- package/commands/gsd-t-design-decompose.md +2 -2
- package/commands/gsd-t-discuss.md +4 -7
- package/commands/gsd-t-doc-ripple.md +6 -14
- package/commands/gsd-t-execute.md +121 -411
- package/commands/gsd-t-integrate.md +20 -97
- package/commands/gsd-t-plan.md +4 -12
- package/commands/gsd-t-prd.md +4 -7
- package/commands/gsd-t-quick.md +22 -87
- package/commands/gsd-t-reflect.md +4 -4
- package/commands/gsd-t-verify.md +7 -13
- package/commands/gsd-t-visualize.md +4 -4
- package/commands/gsd-t-wave.md +36 -23
- package/package.json +1 -1
- package/templates/prompts/README.md +30 -0
- package/templates/prompts/design-verify-subagent.md +99 -0
- package/templates/prompts/qa-subagent.md +26 -0
- package/templates/prompts/red-team-subagent.md +44 -0
- /package/bin/{archive-progress.js → archive-progress.cjs} +0 -0
- /package/bin/{context-budget-audit.js → context-budget-audit.cjs} +0 -0
- /package/bin/{log-tail.js → log-tail.cjs} +0 -0
|
@@ -170,19 +170,11 @@ Note: Exploratory findings do NOT count against the scripted test pass/fail rati
|
|
|
170
170
|
|
|
171
171
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
172
172
|
Before spawning — run via Bash:
|
|
173
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
173
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
174
174
|
After subagent returns — run via Bash:
|
|
175
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
176
|
-
|
|
177
|
-
-
|
|
178
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
179
|
-
Compute context utilization — run via Bash:
|
|
180
|
-
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
181
|
-
Alert on context thresholds (display to user inline):
|
|
182
|
-
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
183
|
-
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
184
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
185
|
-
`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
175
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
176
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
|
|
177
|
+
`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | | | {COUNTER} |`
|
|
186
178
|
If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
|
|
187
179
|
`| {DT_START} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {severity} | {finding} |`
|
|
188
180
|
|
|
@@ -220,99 +212,30 @@ After integration and doc ripple, verify everything works together:
|
|
|
220
212
|
|
|
221
213
|
## Step 7.5: Red Team — Adversarial QA (MANDATORY)
|
|
222
214
|
|
|
223
|
-
After integration tests pass, spawn an adversarial Red Team agent
|
|
215
|
+
After integration tests pass, spawn an adversarial Red Team agent on the integrated system. Success is measured by bugs found, not tests passed.
|
|
224
216
|
|
|
225
|
-
⚙ [
|
|
226
|
-
|
|
227
|
-
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
228
|
-
Before spawning — run via Bash:
|
|
229
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
217
|
+
⚙ [opus] Red Team → adversarial validation of integrated system
|
|
230
218
|
|
|
219
|
+
Resolve the templated prompt path via Bash:
|
|
231
220
|
```
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
Your value is measured by REAL bugs found. More bugs = more value.
|
|
236
|
-
If you find zero bugs, you must prove you were thorough — list every
|
|
237
|
-
attack vector you tried and why it didn't break. A short list means
|
|
238
|
-
you didn't try hard enough.
|
|
239
|
-
|
|
240
|
-
Rules:
|
|
241
|
-
- False positives DESTROY your credibility. If you report something
|
|
242
|
-
as a bug and it's actually correct behavior, that's worse than
|
|
243
|
-
missing a real bug. Never report something you haven't reproduced.
|
|
244
|
-
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
245
|
-
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
246
|
-
- You are done ONLY when you have exhausted every category below
|
|
247
|
-
and either found a bug or documented exactly what you tried.
|
|
248
|
-
|
|
249
|
-
## Attack Categories (exhaust ALL of these)
|
|
250
|
-
|
|
251
|
-
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
252
|
-
match every contract? Test each endpoint/interface/schema shape.
|
|
253
|
-
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
254
|
-
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
255
|
-
3. **State Transitions**: What happens when actions are performed out of
|
|
256
|
-
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
257
|
-
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
258
|
-
requests. Does the code handle failures gracefully or crash?
|
|
259
|
-
5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
|
|
260
|
-
exist in requirements but have NO test coverage? Write tests for them.
|
|
261
|
-
6. **Regression**: Run the FULL test suite. Did any existing tests break?
|
|
262
|
-
7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
263
|
-
behavior (state changes, data loaded, navigation works) or just check
|
|
264
|
-
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
265
|
-
8. **Cross-Domain Boundaries**: Test data flow across EVERY domain boundary.
|
|
266
|
-
Does data arriving from domain A get validated by domain B? What happens
|
|
267
|
-
when domain A sends malformed data that passed A's own validation?
|
|
268
|
-
|
|
269
|
-
## Exploratory Testing (if Playwright MCP available)
|
|
270
|
-
|
|
271
|
-
After all scripted tests pass:
|
|
272
|
-
1. Check if Playwright MCP is registered in Claude Code settings (look for "playwright" in mcpServers)
|
|
273
|
-
2. If available: spend 5 minutes on adversarial interactive exploration using Playwright MCP
|
|
274
|
-
- Attempt race conditions, double-submits, concurrent access patterns
|
|
275
|
-
- Try unexpected input sequences, boundary values, rapid state transitions
|
|
276
|
-
- Probe error recovery: does the app recover after failures or get stuck?
|
|
277
|
-
3. Tag all findings [EXPLORATORY] in your report
|
|
278
|
-
4. If Playwright MCP is not available: skip this section silently
|
|
279
|
-
Note: Exploratory findings are additive — they do not replace scripted test results.
|
|
280
|
-
|
|
281
|
-
## Report Format
|
|
282
|
-
|
|
283
|
-
For each bug found:
|
|
284
|
-
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
285
|
-
- **Reproduction**: {exact steps to reproduce}
|
|
286
|
-
- **Expected**: {what should happen}
|
|
287
|
-
- **Actual**: {what actually happens}
|
|
288
|
-
- **Proof**: {test file or command that demonstrates the bug}
|
|
289
|
-
|
|
290
|
-
Summary:
|
|
291
|
-
- BUGS FOUND: {count} (with severity breakdown)
|
|
292
|
-
- COVERAGE GAPS: {untested flows from requirements}
|
|
293
|
-
- SHALLOW TESTS REWRITTEN: {count}
|
|
294
|
-
- CONTRACTS VERIFIED: {N}/{total}
|
|
295
|
-
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
296
|
-
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
297
|
-
|
|
298
|
-
Write all findings to .gsd-t/red-team-report.md.
|
|
299
|
-
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
221
|
+
RT_PROMPT="$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t/templates/prompts/red-team-subagent.md"
|
|
222
|
+
[ -f "$RT_PROMPT" ] || RT_PROMPT="templates/prompts/red-team-subagent.md"
|
|
223
|
+
T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
300
224
|
```
|
|
301
225
|
|
|
226
|
+
Spawn Task subagent (general-purpose, model: opus):
|
|
227
|
+
> "Read `$RT_PROMPT` and follow it. Context: cross-domain integration run. **Additional category for this run: Cross-Domain Boundaries** — test data flow across every domain boundary; does data arriving from domain A get validated by domain B; what happens when A sends malformed data that passed A's own validation. Write findings to `.gsd-t/red-team-report.md`."
|
|
228
|
+
|
|
302
229
|
After subagent returns — run via Bash:
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
-
|
|
306
|
-
|
|
230
|
+
```
|
|
231
|
+
T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))
|
|
232
|
+
COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")
|
|
233
|
+
```
|
|
307
234
|
Append to `.gsd-t/token-log.md`:
|
|
308
|
-
`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team |
|
|
309
|
-
|
|
310
|
-
**If Red Team VERDICT is FAIL:**
|
|
311
|
-
1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
|
|
312
|
-
2. Re-run Red Team after fixes
|
|
313
|
-
3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
|
|
235
|
+
`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | opus | {DURATION}s | {VERDICT} — {N} bugs found | | | {COUNTER} |`
|
|
314
236
|
|
|
315
|
-
**If
|
|
237
|
+
**If FAIL:** fix CRITICAL/HIGH bugs (≤2 cycles) → re-run. Persistent bugs → `.gsd-t/deferred-items.md`.
|
|
238
|
+
**If GRUDGING PASS:** proceed to doc-ripple.
|
|
316
239
|
|
|
317
240
|
## Step 8: Doc-Ripple (Automated)
|
|
318
241
|
|
package/commands/gsd-t-plan.md
CHANGED
|
@@ -374,19 +374,11 @@ Report: PASS (all checks pass) or FAIL with specific gaps listed."
|
|
|
374
374
|
|
|
375
375
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
376
376
|
Before spawning — run via Bash:
|
|
377
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
377
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
378
378
|
After subagent returns — run via Bash:
|
|
379
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
380
|
-
|
|
381
|
-
-
|
|
382
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
383
|
-
Compute context utilization — run via Bash:
|
|
384
|
-
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
385
|
-
Alert on context thresholds (display to user inline):
|
|
386
|
-
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
387
|
-
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
388
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
389
|
-
`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
379
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
380
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
|
|
381
|
+
`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | | | {COUNTER} |`
|
|
390
382
|
If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
|
|
391
383
|
`| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`
|
|
392
384
|
|
package/commands/gsd-t-prd.md
CHANGED
|
@@ -12,7 +12,7 @@ To give PRD generation a fresh context window:
|
|
|
12
12
|
|
|
13
13
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
14
14
|
Before spawning — run via Bash:
|
|
15
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
15
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
16
16
|
|
|
17
17
|
Spawn a fresh subagent using the Task tool:
|
|
18
18
|
```
|
|
@@ -23,12 +23,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-pr
|
|
|
23
23
|
```
|
|
24
24
|
|
|
25
25
|
After subagent returns — run via Bash:
|
|
26
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
27
|
-
|
|
28
|
-
-
|
|
29
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
30
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
|
|
31
|
-
`| {DT_START} | {DT_END} | gsd-t-prd | Step 0 | sonnet | {DURATION}s | prd: {topic summary} | {TOKENS} | {COMPACTED} |`
|
|
26
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
27
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
|
|
28
|
+
`| {DT_START} | {DT_END} | gsd-t-prd | Step 0 | sonnet | {DURATION}s | prd: {topic summary} | {COUNTER} |`
|
|
32
29
|
|
|
33
30
|
Relay the subagent's summary to the user. **Do not execute Steps 1–6 yourself.**
|
|
34
31
|
|
package/commands/gsd-t-quick.md
CHANGED
|
@@ -10,7 +10,7 @@ To give this task a fresh context window and prevent compaction during consecuti
|
|
|
10
10
|
|
|
11
11
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
12
12
|
Before spawning — run via Bash:
|
|
13
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
13
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
14
14
|
|
|
15
15
|
**Token Budget Check (before spawning subagent):**
|
|
16
16
|
|
|
@@ -95,12 +95,9 @@ Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-qu
|
|
|
95
95
|
```
|
|
96
96
|
|
|
97
97
|
After subagent returns — run via Bash:
|
|
98
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
99
|
-
|
|
100
|
-
-
|
|
101
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
102
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
|
|
103
|
-
`| {DT_START} | {DT_END} | gsd-t-quick | Step 0 | sonnet | {DURATION}s | quick: {task summary} | {TOKENS} | {COMPACTED} |`
|
|
98
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
99
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
|
|
100
|
+
`| {DT_START} | {DT_END} | gsd-t-quick | Step 0 | sonnet | {DURATION}s | quick: {task summary} | {COUNTER} |`
|
|
104
101
|
|
|
105
102
|
Relay the subagent's summary to the user. **Do not execute Steps 1–5 yourself.**
|
|
106
103
|
|
|
@@ -262,7 +259,7 @@ If it DOES exist and this task involved UI changes — spawn the Design Verifica
|
|
|
262
259
|
|
|
263
260
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
264
261
|
Before spawning — run via Bash:
|
|
265
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
262
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
266
263
|
|
|
267
264
|
```
|
|
268
265
|
Task subagent (general-purpose, model: opus):
|
|
@@ -327,96 +324,34 @@ After subagent returns — run observability Bash and append to token-log.md.
|
|
|
327
324
|
|
|
328
325
|
## Step 5.5: Red Team — Adversarial QA (MANDATORY)
|
|
329
326
|
|
|
330
|
-
After tests pass, spawn an adversarial Red Team agent.
|
|
327
|
+
After tests pass, spawn an adversarial Red Team agent. Its success is measured by bugs found, not tests passed.
|
|
331
328
|
|
|
332
|
-
⚙ [
|
|
333
|
-
|
|
334
|
-
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
335
|
-
Before spawning — run via Bash:
|
|
336
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
329
|
+
⚙ [opus] Red Team → adversarial validation of quick task
|
|
337
330
|
|
|
331
|
+
Resolve the templated prompt path via Bash (same pattern as execute.md):
|
|
338
332
|
```
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
Your value is measured by REAL bugs found. More bugs = more value.
|
|
343
|
-
If you find zero bugs, you must prove you were thorough — list every
|
|
344
|
-
attack vector you tried and why it didn't break. A short list means
|
|
345
|
-
you didn't try hard enough.
|
|
346
|
-
|
|
347
|
-
Rules:
|
|
348
|
-
- False positives DESTROY your credibility. If you report something
|
|
349
|
-
as a bug and it's actually correct behavior, that's worse than
|
|
350
|
-
missing a real bug. Never report something you haven't reproduced.
|
|
351
|
-
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
352
|
-
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
353
|
-
- You are done ONLY when you have exhausted every category below
|
|
354
|
-
and either found a bug or documented exactly what you tried.
|
|
355
|
-
|
|
356
|
-
## Attack Categories (exhaust ALL of these)
|
|
357
|
-
|
|
358
|
-
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
359
|
-
match every contract? Test each endpoint/interface/schema shape.
|
|
360
|
-
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
361
|
-
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
362
|
-
3. **State Transitions**: What happens when actions are performed out of
|
|
363
|
-
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
364
|
-
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
365
|
-
requests. Does the code handle failures gracefully or crash?
|
|
366
|
-
5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
|
|
367
|
-
exist in requirements but have NO test coverage? Write tests for them.
|
|
368
|
-
6. **Regression**: Run the FULL test suite. Did any existing tests break?
|
|
369
|
-
7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
370
|
-
behavior (state changes, data loaded, navigation works) or just check
|
|
371
|
-
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
372
|
-
|
|
373
|
-
## Exploratory Testing (if Playwright MCP available)
|
|
374
|
-
|
|
375
|
-
After all scripted tests pass:
|
|
376
|
-
1. Check if Playwright MCP is registered in Claude Code settings (look for "playwright" in mcpServers)
|
|
377
|
-
2. If available: spend 5 minutes on adversarial interactive exploration using Playwright MCP
|
|
378
|
-
- Attempt race conditions, double-submits, concurrent access patterns
|
|
379
|
-
- Try unexpected input sequences, boundary values, rapid state transitions
|
|
380
|
-
- Probe error recovery: does the app recover after failures or get stuck?
|
|
381
|
-
3. Tag all findings [EXPLORATORY] in your report
|
|
382
|
-
4. If Playwright MCP is not available: skip this section silently
|
|
383
|
-
Note: Exploratory findings are additive — they do not replace scripted test results.
|
|
384
|
-
|
|
385
|
-
## Report Format
|
|
386
|
-
|
|
387
|
-
For each bug found:
|
|
388
|
-
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
389
|
-
- **Reproduction**: {exact steps to reproduce}
|
|
390
|
-
- **Expected**: {what should happen}
|
|
391
|
-
- **Actual**: {what actually happens}
|
|
392
|
-
- **Proof**: {test file or command that demonstrates the bug}
|
|
393
|
-
|
|
394
|
-
Summary:
|
|
395
|
-
- BUGS FOUND: {count} (with severity breakdown)
|
|
396
|
-
- COVERAGE GAPS: {untested flows from requirements}
|
|
397
|
-
- SHALLOW TESTS REWRITTEN: {count}
|
|
398
|
-
- CONTRACTS VERIFIED: {N}/{total}
|
|
399
|
-
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
400
|
-
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
401
|
-
|
|
402
|
-
Write all findings to .gsd-t/red-team-report.md.
|
|
403
|
-
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
333
|
+
RT_PROMPT="$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t/templates/prompts/red-team-subagent.md"
|
|
334
|
+
[ -f "$RT_PROMPT" ] || RT_PROMPT="templates/prompts/red-team-subagent.md"
|
|
335
|
+
T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
404
336
|
```
|
|
405
337
|
|
|
338
|
+
Spawn Task subagent (general-purpose, model: opus):
|
|
339
|
+
> "Read `$RT_PROMPT` and follow it. Context for this run: quick task — adversarial validation of the code just changed. Write findings to `.gsd-t/red-team-report.md`."
|
|
340
|
+
|
|
406
341
|
After subagent returns — run via Bash:
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
-
|
|
410
|
-
|
|
342
|
+
```
|
|
343
|
+
T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))
|
|
344
|
+
COUNTER=$(node bin/task-counter.cjs status 2>/dev/null | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>{try{process.stdout.write(String(JSON.parse(s).count||''))}catch(_){process.stdout.write('')}})")
|
|
345
|
+
```
|
|
411
346
|
Append to `.gsd-t/token-log.md`:
|
|
412
|
-
`| {DT_START} | {DT_END} | gsd-t-quick | Red Team |
|
|
347
|
+
`| {DT_START} | {DT_END} | gsd-t-quick | Red Team | opus | {DURATION}s | {VERDICT} — {N} bugs found | | | {COUNTER} |`
|
|
413
348
|
|
|
414
349
|
**If Red Team VERDICT is FAIL:**
|
|
415
|
-
1. Fix all CRITICAL and HIGH bugs
|
|
350
|
+
1. Fix all CRITICAL and HIGH bugs (up to 2 fix cycles)
|
|
416
351
|
2. Re-run Red Team after fixes
|
|
417
|
-
3. If bugs persist
|
|
352
|
+
3. If bugs persist, log to `.gsd-t/deferred-items.md` and present to user
|
|
418
353
|
|
|
419
|
-
**If
|
|
354
|
+
**If GRUDGING PASS:** Proceed to doc-ripple.
|
|
420
355
|
|
|
421
356
|
## Step 6: Doc-Ripple (Automated)
|
|
422
357
|
|
|
@@ -9,7 +9,7 @@ When invoked directly by the user, spawn yourself as a Task subagent for a fresh
|
|
|
9
9
|
**OBSERVABILITY LOGGING — before spawning:**
|
|
10
10
|
|
|
11
11
|
Run via Bash:
|
|
12
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
12
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
13
13
|
|
|
14
14
|
```
|
|
15
15
|
Task subagent (general-purpose, model: sonnet):
|
|
@@ -21,14 +21,14 @@ Skip Step 0 — you are already the subagent."
|
|
|
21
21
|
**OBSERVABILITY LOGGING — after subagent returns:**
|
|
22
22
|
|
|
23
23
|
Run via Bash:
|
|
24
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
24
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
25
25
|
|
|
26
26
|
Compute tokens:
|
|
27
27
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
28
28
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
29
29
|
|
|
30
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes |
|
|
31
|
-
`| {DT_START} | {DT_END} | gsd-t-reflect | Step 0 | sonnet | {DURATION}s | retrospective generated | {
|
|
30
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
|
|
31
|
+
`| {DT_START} | {DT_END} | gsd-t-reflect | Step 0 | sonnet | {DURATION}s | retrospective generated | {COUNTER} |`
|
|
32
32
|
|
|
33
33
|
Return the subagent's output and stop. Only skip Step 0 if you are already running as a subagent.
|
|
34
34
|
|
package/commands/gsd-t-verify.md
CHANGED
|
@@ -158,15 +158,12 @@ Teammate assignments:
|
|
|
158
158
|
Lead: After receiving teammate reports:
|
|
159
159
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
160
160
|
Before spawning — run via Bash:
|
|
161
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
161
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
162
162
|
Spawn a Task subagent to run the full test suite and contract audit.
|
|
163
163
|
After subagent returns — run via Bash:
|
|
164
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
165
|
-
|
|
166
|
-
-
|
|
167
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
168
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
|
|
169
|
-
`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
|
|
164
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
165
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
|
|
166
|
+
`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {COUNTER} |`
|
|
170
167
|
Collect all reports, synthesize, create remediation plan.
|
|
171
168
|
```
|
|
172
169
|
|
|
@@ -348,7 +345,7 @@ If status is VERIFIED or VERIFIED-WITH-WARNINGS:
|
|
|
348
345
|
|
|
349
346
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
350
347
|
Before spawning — run via Bash:
|
|
351
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
348
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
352
349
|
|
|
353
350
|
2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
|
|
354
351
|
```
|
|
@@ -370,12 +367,9 @@ Report back: one-line status summary."
|
|
|
370
367
|
```
|
|
371
368
|
|
|
372
369
|
After subagent returns — run via Bash:
|
|
373
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
374
|
-
Compute tokens and compaction:
|
|
375
|
-
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
376
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
370
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
377
371
|
Append to `.gsd-t/token-log.md`:
|
|
378
|
-
`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone |
|
|
372
|
+
`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | | | {COUNTER} |`
|
|
379
373
|
|
|
380
374
|
3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
|
|
381
375
|
|
|
@@ -9,7 +9,7 @@ When invoked directly by the user, spawn yourself as a Task subagent for a fresh
|
|
|
9
9
|
**OBSERVABILITY LOGGING — before spawning:**
|
|
10
10
|
|
|
11
11
|
Run via Bash:
|
|
12
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
12
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
13
13
|
|
|
14
14
|
```
|
|
15
15
|
Task subagent (general-purpose, model: sonnet):
|
|
@@ -21,14 +21,14 @@ Skip Step 0 — you are already the subagent."
|
|
|
21
21
|
**OBSERVABILITY LOGGING — after subagent returns:**
|
|
22
22
|
|
|
23
23
|
Run via Bash:
|
|
24
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
24
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
25
25
|
|
|
26
26
|
Compute tokens:
|
|
27
27
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
28
28
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
29
29
|
|
|
30
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes |
|
|
31
|
-
`| {DT_START} | {DT_END} | gsd-t-visualize | Step 0 | sonnet | {DURATION}s | dashboard launched | {
|
|
30
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tasks-Since-Reset |` if missing):
|
|
31
|
+
`| {DT_START} | {DT_END} | gsd-t-visualize | Step 0 | sonnet | {DURATION}s | dashboard launched | {COUNTER} |`
|
|
32
32
|
|
|
33
33
|
Return the subagent's output and stop. Only skip Step 0 if you are already running as a subagent.
|
|
34
34
|
|
package/commands/gsd-t-wave.md
CHANGED
|
@@ -2,6 +2,16 @@
|
|
|
2
2
|
|
|
3
3
|
You are the wave orchestrator. You do NOT execute phases yourself. Instead, you spawn an **independent agent for each phase**, giving each a fresh context window. This eliminates context accumulation across phases and prevents mid-wave compaction.
|
|
4
4
|
|
|
5
|
+
## Step 0: Reset Phase-Count Gate (MANDATORY — first thing in a fresh session)
|
|
6
|
+
|
|
7
|
+
Run via Bash:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
node bin/task-counter.cjs reset
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
This clears `.gsd-t/.task-counter` so the new wave session starts at 0. The gate logic is in the Phase Agent Spawn Pattern below — it forces a /clear-and-resume after N phase spawns to prevent the wave orchestrator from itself running out of context. Default N=5, override per-project via `.gsd-t/task-counter-config.json` (`{"limit":8}`) or env `GSD_T_TASK_LIMIT=8`.
|
|
14
|
+
|
|
5
15
|
## Step 1: Load State (Lightweight)
|
|
6
16
|
|
|
7
17
|
Read ONLY:
|
|
@@ -93,7 +103,7 @@ Run via Bash:
|
|
|
93
103
|
|
|
94
104
|
**OBSERVABILITY LOGGING (MANDATORY) — repeat for every phase spawn:**
|
|
95
105
|
Before spawning — run via Bash:
|
|
96
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")
|
|
106
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M")`
|
|
97
107
|
|
|
98
108
|
```
|
|
99
109
|
Task agent (subagent_type: "general-purpose", mode: "bypassPermissions"):
|
|
@@ -114,28 +124,31 @@ Task agent (subagent_type: "general-purpose", mode: "bypassPermissions"):
|
|
|
114
124
|
```
|
|
115
125
|
|
|
116
126
|
After phase agent returns — run via Bash:
|
|
117
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") &&
|
|
118
|
-
|
|
119
|
-
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
`|
|
|
127
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && DURATION=$((T_END-T_START))`
|
|
128
|
+
|
|
129
|
+
**Wave Orchestrator Phase-Count Gate (MANDATORY) — replaces the broken context-percent check from v2.74.x:**
|
|
130
|
+
|
|
131
|
+
Run via Bash AFTER each phase agent returns:
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
node bin/task-counter.cjs increment phase
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Read the JSON status the command prints. If `should_stop` is `true` (or the command's exit code is `10`):
|
|
138
|
+
1. Save checkpoint to `.gsd-t/progress.md` — record which phases are complete, which remain.
|
|
139
|
+
2. Output exactly: `⏸️ Wave orchestrator phase-count gate reached ({count}/{limit} phases in this session). Progress saved. Run /clear then /user:gsd-t-wave to continue from the next phase.`
|
|
140
|
+
3. **STOP the wave loop.** Do NOT spawn the next phase agent. The next session resumes from saved state.
|
|
141
|
+
|
|
142
|
+
The wave orchestrator shares the same `bin/task-counter.cjs` counter as the execute orchestrator. Each phase spawn (PARTITION, DISCUSS, PLAN, IMPACT, EXECUTE, TEST-SYNC, INTEGRATE, VERIFY+COMPLETE, DOC-RIPPLE) increments the counter by 1. With the default limit of 5, a wave will run at most 5 phase agents per session before forcing a /clear-and-resume — typically two sessions per full wave. Override via `.gsd-t/task-counter-config.json` (`{"limit":8}`) or `GSD_T_TASK_LIMIT=8`.
|
|
143
|
+
|
|
144
|
+
The previous version of this gate relied on `CLAUDE_CONTEXT_TOKENS_USED`/`_MAX` env vars which Claude Code does not export — that check was inert and let the orchestrator drain context until forced compaction. The deterministic on-disk counter has the same intent (force a /clear before context runs out) but actually works.
|
|
145
|
+
|
|
146
|
+
**On wave entry**, the wave orchestrator runs `node bin/task-counter.cjs reset` exactly once (see Step 0 — it is the very first thing the wave does in a fresh session). The reset is the SIGNAL that this is a clean post-/clear session.
|
|
147
|
+
|
|
148
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Domain | Task | Tasks-Since-Reset |` if missing):
|
|
149
|
+
`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | | | {COUNTER} |`
|
|
150
|
+
|
|
151
|
+
Where `{COUNTER}` is the `count` field from the JSON the increment command just printed.
|
|
139
152
|
|
|
140
153
|
### Phase Sequence
|
|
141
154
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tekyzinc/gsd-t",
|
|
3
|
-
"version": "2.74.
|
|
3
|
+
"version": "2.74.12",
|
|
4
4
|
"description": "GSD-T: Contract-Driven Development for Claude Code — 56 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
|
|
5
5
|
"author": "Tekyz, Inc.",
|
|
6
6
|
"license": "MIT",
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# GSD-T Subagent Prompt Templates
|
|
2
|
+
|
|
3
|
+
This directory holds long-form prompts that used to be inlined into command markdown files. Inlining them caused massive context burn — each Task subagent spawn re-materialized ~3000 tokens of prompt boilerplate, dozens of times per milestone.
|
|
4
|
+
|
|
5
|
+
Now command files reference these by path. The orchestrator passes the file path to the subagent, the subagent reads it itself. The orchestrator never holds the full prompt in its own context.
|
|
6
|
+
|
|
7
|
+
## Files
|
|
8
|
+
|
|
9
|
+
| File | Purpose | Run frequency |
|
|
10
|
+
|------|---------|---------------|
|
|
11
|
+
| `qa-subagent.md` | Test generation, execution, gap reporting | Per task |
|
|
12
|
+
| `red-team-subagent.md` | Adversarial bug hunting | Per domain (NOT per task) |
|
|
13
|
+
| `design-verify-subagent.md` | Visual element-by-element design audit | Per domain (NOT per task) |
|
|
14
|
+
|
|
15
|
+
## Why per-domain instead of per-task
|
|
16
|
+
|
|
17
|
+
Red Team and Design Verification were originally per-domain. They were promoted to per-task by commits `da6d3ae` and `b68353e`, on the assumption that the orchestrator's `CLAUDE_CONTEXT_TOKENS_USED` self-check would catch context drain before it got bad. That env var is never set by Claude Code — the self-check was vaporware. With it inert, per-task spawning of ~10k-token Red Team subagents drained sessions in 5-10 tasks. Reverting them to per-domain raises the safe task count from ~5 to ~15+.
|
|
18
|
+
|
|
19
|
+
QA stays per-task because (a) it's much smaller, (b) it grounds against contracts which can drift task by task.
|
|
20
|
+
|
|
21
|
+
## Adding a new prompt
|
|
22
|
+
|
|
23
|
+
Write the prompt as a self-contained markdown file in this directory. Reference it from the command file with:
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
Spawn Task subagent (general-purpose, model: <model>):
|
|
27
|
+
"Read `templates/prompts/<your-prompt>.md` and follow it. Context for this run: <one-line context>."
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Keep the inline context to one line. The prompt body must live in the file, not the command.
|