@tekyzinc/gsd-t 2.50.11 → 2.51.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,16 +2,34 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
- ## [2.50.10] - 2026-03-25
5
+ ## [2.51.10] - 2026-03-25
6
6
 
7
7
  ### Added
8
- - **18 new stack rule files** python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright. Total: 22 stack rules (was 4).
8
+ - **Red Team Adversarial QA agent** added to `execute`, `quick`, `integrate`, and `debug` commands. Spawns after the builder's tests pass with inverted incentives success is measured by bugs found, not tests passed.
9
+ - **Exhaustive attack categories**: contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps, cross-domain boundaries (integrate only), fix regression variants (debug only).
10
+ - **False positive penalty**: reporting non-bugs destroys credibility, preventing phantom bug inflation.
11
+ - **VERDICT system**: `FAIL` (bugs found — blocks phase completion) or `GRUDGING PASS` (exhaustive search, nothing found — must prove thoroughness).
12
+ - **Red Team report**: findings written to `.gsd-t/red-team-report.md`; bugs appended to `.gsd-t/qa-issues.md`.
13
+ - Red Team documented in CLAUDE-global template, global CLAUDE.md, GSD-T-README wave diagram, README command table.
14
+
15
+ ## [2.50.12] - 2026-03-25
16
+
17
+ ### Added
18
+ - **23 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright, fastapi, llm (with RAG patterns section), prisma, queues, _auth (universal). Total: 27 stack rules (was 4).
19
+ - **`_auth.md`** (universal) — email-first registration, auth provider abstraction (Cognito/Firebase/Google), token management, password policy, session management, social auth/OAuth, email verification, MFA, authorization/RBAC, auth security, auth UI patterns.
20
+ - **`fastapi.md`** — dependency injection, Pydantic request/response models, lifespan events, BackgroundTasks, async patterns, auto-generated OpenAPI docs.
21
+ - **`llm.md`** — provider-agnostic LLM patterns: structured outputs, streaming, error/retry, token management, conversation state, tool/function calling, RAG patterns (chunking, embeddings, retrieval), prompt management, testing, cost/observability.
22
+ - **`prisma.md`** — schema modeling, migrations, typed client usage, relation queries, transactions, seeding, N+1 prevention.
23
+ - **`queues.md`** — BullMQ/Bull, SQS, RabbitMQ, Celery patterns: idempotent handlers, dead letter queues, retry/backoff, job deduplication, graceful shutdown.
9
24
  - **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
10
25
  - **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
26
+ - **Project-level stack overrides** — `.gsd-t/stacks/` directory for per-project customization of global stack rules. Local files replace global files of the same name.
11
27
 
12
28
  ### Changed
13
- - Stack detection in execute, quick, and debug commands updated to cover all 22 stack files with conditional detection per project dependencies.
29
+ - Stack detection in execute, quick, and debug commands updated to cover all 27 stack files with conditional detection per project dependencies.
30
+ - Detection refactored from one-liner to structured bash with `_sf()` (local override resolver) and `_add()` helper functions.
14
31
  - PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
32
+ - GSD-T-README.md stack detection table expanded to list all 27 files with their detection triggers.
15
33
 
16
34
  ## [2.46.11] - 2026-03-24
17
35
 
package/README.md CHANGED
@@ -156,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
156
156
  | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
157
157
  | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
158
158
  | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
159
+ | *Red Team* | Adversarial QA — finds bugs the builder missed (inverted incentives) | Auto-spawned |
159
160
  | `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
160
161
  | `/user:gsd-t-integrate` | Wire domains together | In wave |
161
162
  | `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
@@ -37,10 +37,16 @@ if [ -d "$STACKS_DIR" ]; then
37
37
  grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
38
38
  grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
39
39
  grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
40
+ grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
41
+ grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
42
+ grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
40
43
  fi
41
44
  ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
42
45
  ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
43
46
  ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
47
+ ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
48
+ ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
49
+ ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
44
50
  [ -f "pubspec.yaml" ] && _add flutter.md
45
51
  [ -f "Dockerfile" ] && _add docker.md
46
52
  [ -d ".github/workflows" ] && _add github-actions.md
@@ -356,6 +362,89 @@ Before committing, ensure the fix is solid:
356
362
 
357
363
  Commit: `[debug] Fix {description} — root cause: {explanation}`
358
364
 
365
+ ## Step 5.3: Red Team — Adversarial QA (MANDATORY)
366
+
367
+ After the fix passes all tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the fix and find regressions. Its success is measured by bugs found, not tests passed.
368
+
369
+ ⚙ [{model}] Red Team → adversarial validation of debug fix
370
+
371
+ **OBSERVABILITY LOGGING (MANDATORY):**
372
+ Before spawning — run via Bash:
373
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
374
+
375
+ ```
376
+ Task subagent (general-purpose, model: sonnet):
377
+ "You are a Red Team QA adversary. Your job is to BREAK the fix that was just applied.
378
+
379
+ Your value is measured by REAL bugs found. More bugs = more value.
380
+ If you find zero bugs, you must prove you were thorough — list every
381
+ attack vector you tried and why it didn't break. A short list means
382
+ you didn't try hard enough.
383
+
384
+ Rules:
385
+ - False positives DESTROY your credibility. If you report something
386
+ as a bug and it's actually correct behavior, that's worse than
387
+ missing a real bug. Never report something you haven't reproduced.
388
+ - Style opinions are not bugs. Theoretical concerns are not bugs.
389
+ A bug is: 'I did X, expected Y, got Z.' With proof.
390
+ - You are done ONLY when you have exhausted every category below
391
+ and either found a bug or documented exactly what you tried.
392
+
393
+ ## Attack Categories (exhaust ALL of these)
394
+
395
+ 1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
396
+ match every contract? Test each endpoint/interface/schema shape.
397
+ 2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
398
+ special characters, SQL injection attempts, XSS payloads, path traversal.
399
+ 3. **State Transitions**: What happens when actions are performed out of
400
+ order? Double-submit? Concurrent access? Refresh mid-flow?
401
+ 4. **Error Paths**: Remove env vars. Kill the database. Send malformed
402
+ requests. Does the code handle failures gracefully or crash?
403
+ 5. **Regression Around the Fix**: The fix changed specific code. Test
404
+ every adjacent code path. Fixes frequently break neighboring functionality.
405
+ 6. **Original Bug Variants**: The original bug was found. Are there SIMILAR
406
+ bugs in related code? Same pattern, different location?
407
+ 7. **Full Suite**: Run the FULL test suite. Did the fix break anything else?
408
+ 8. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
409
+ behavior (state changes, data loaded, navigation works) or just check
410
+ that elements exist? Flag and rewrite any shallow/layout tests.
411
+
412
+ ## Report Format
413
+
414
+ For each bug found:
415
+ - **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
416
+ - **Reproduction**: {exact steps to reproduce}
417
+ - **Expected**: {what should happen}
418
+ - **Actual**: {what actually happens}
419
+ - **Proof**: {test file or command that demonstrates the bug}
420
+
421
+ Summary:
422
+ - BUGS FOUND: {count} (with severity breakdown)
423
+ - COVERAGE GAPS: {untested flows from requirements}
424
+ - SHALLOW TESTS REWRITTEN: {count}
425
+ - CONTRACTS VERIFIED: {N}/{total}
426
+ - ATTACK VECTORS TRIED: {list every category attempted and results}
427
+ - VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
428
+
429
+ Write all findings to .gsd-t/red-team-report.md.
430
+ If bugs found, also append to .gsd-t/qa-issues.md."
431
+ ```
432
+
433
+ After subagent returns — run via Bash:
434
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
435
+ Compute tokens and compaction:
436
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
437
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
438
+ Append to `.gsd-t/token-log.md`:
439
+ `| {DT_START} | {DT_END} | gsd-t-debug | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
440
+
441
+ **If Red Team VERDICT is FAIL:**
442
+ 1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
443
+ 2. Re-run Red Team after fixes
444
+ 3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
445
+
446
+ **If Red Team VERDICT is GRUDGING PASS:** Proceed to metrics and doc-ripple.
447
+
359
448
  ## Step 5.5: Emit Task Metrics
360
449
 
361
450
  After committing, emit a task-metrics record for this debug session — run via Bash:
@@ -156,12 +156,18 @@ if [ -d "$STACKS_DIR" ]; then
156
156
  grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
157
157
  grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
158
158
  grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
159
+ grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
160
+ grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
161
+ grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
159
162
  fi
160
163
 
161
164
  # File-based detection (no package.json needed)
162
165
  ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
163
166
  ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
164
167
  ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
168
+ ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
169
+ ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
170
+ ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
165
171
  [ -f "pubspec.yaml" ] && _add flutter.md
166
172
  [ -f "Dockerfile" ] && _add docker.md
167
173
  [ -d ".github/workflows" ] && _add github-actions.md
@@ -563,6 +569,87 @@ A teammate finishes independent tasks and is waiting on a checkpoint:
563
569
  2. If not, have the teammate work on documentation, tests, or code cleanup within their domain
564
570
  3. Or shut down the teammate and respawn when unblocked
565
571
 
572
+ ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
573
+
574
+ After all domain tasks pass their tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just built. It operates with inverted incentives — its success is measured by bugs found, not tests passed.
575
+
576
+ ⚙ [{model}] Red Team → adversarial validation of executed domains
577
+
578
+ **OBSERVABILITY LOGGING (MANDATORY):**
579
+ Before spawning — run via Bash:
580
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
581
+
582
+ ```
583
+ Task subagent (general-purpose, model: sonnet):
584
+ "You are a Red Team QA adversary. Your job is to BREAK the code that was just written.
585
+
586
+ Your value is measured by REAL bugs found. More bugs = more value.
587
+ If you find zero bugs, you must prove you were thorough — list every
588
+ attack vector you tried and why it didn't break. A short list means
589
+ you didn't try hard enough.
590
+
591
+ Rules:
592
+ - False positives DESTROY your credibility. If you report something
593
+ as a bug and it's actually correct behavior, that's worse than
594
+ missing a real bug. Never report something you haven't reproduced.
595
+ - Style opinions are not bugs. Theoretical concerns are not bugs.
596
+ A bug is: 'I did X, expected Y, got Z.' With proof.
597
+ - You are done ONLY when you have exhausted every category below
598
+ and either found a bug or documented exactly what you tried.
599
+
600
+ ## Attack Categories (exhaust ALL of these)
601
+
602
+ 1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
603
+ match every contract? Test each endpoint/interface/schema shape.
604
+ 2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
605
+ special characters, SQL injection attempts, XSS payloads, path traversal.
606
+ 3. **State Transitions**: What happens when actions are performed out of
607
+ order? Double-submit? Concurrent access? Refresh mid-flow?
608
+ 4. **Error Paths**: Remove env vars. Kill the database. Send malformed
609
+ requests. Does the code handle failures gracefully or crash?
610
+ 5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
611
+ exist in requirements but have NO test coverage? Write tests for them.
612
+ 6. **Regression**: Run the FULL test suite. Did any existing tests break?
613
+ 7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
614
+ behavior (state changes, data loaded, navigation works) or just check
615
+ that elements exist? Flag and rewrite any shallow/layout tests.
616
+
617
+ ## Report Format
618
+
619
+ For each bug found:
620
+ - **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
621
+ - **Reproduction**: {exact steps to reproduce}
622
+ - **Expected**: {what should happen}
623
+ - **Actual**: {what actually happens}
624
+ - **Proof**: {test file or command that demonstrates the bug}
625
+
626
+ Summary:
627
+ - BUGS FOUND: {count} (with severity breakdown)
628
+ - COVERAGE GAPS: {untested flows from requirements}
629
+ - SHALLOW TESTS REWRITTEN: {count}
630
+ - CONTRACTS VERIFIED: {N}/{total}
631
+ - ATTACK VECTORS TRIED: {list every category attempted and results}
632
+ - VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
633
+
634
+ Write all findings to .gsd-t/red-team-report.md.
635
+ If bugs found, also append to .gsd-t/qa-issues.md."
636
+ ```
637
+
638
+ After subagent returns — run via Bash:
639
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
640
+ Compute tokens and compaction:
641
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
642
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
643
+ Append to `.gsd-t/token-log.md`:
644
+ `| {DT_START} | {DT_END} | gsd-t-execute | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
645
+
646
+ **If Red Team VERDICT is FAIL:**
647
+ 1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
648
+ 2. Re-run Red Team after fixes
649
+ 3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
650
+
651
+ **If Red Team VERDICT is GRUDGING PASS:** Proceed to completion.
652
+
566
653
  ## Step 6: Completion
567
654
 
568
655
  When all tasks in all domains are complete:
@@ -258,7 +258,7 @@ Use these when user asks for help on a specific command:
258
258
  - **Use when**: Ready to implement
259
259
  - **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
260
260
  - **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
261
- - **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
261
+ - **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`, `_auth.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
262
262
 
263
263
  ### test-sync
264
264
  - **Summary**: Keep tests aligned with code changes
@@ -198,7 +198,91 @@ After integration and doc ripple, verify everything works together:
198
198
  4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
199
199
  5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
200
200
 
201
- ## Step 7.5: Doc-Ripple (Automated)
201
+ ## Step 7.5: Red Team — Adversarial QA (MANDATORY)
202
+
203
+ After integration tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the integrated system. Its success is measured by bugs found, not tests passed.
204
+
205
+ ⚙ [{model}] Red Team → adversarial validation of integrated system
206
+
207
+ **OBSERVABILITY LOGGING (MANDATORY):**
208
+ Before spawning — run via Bash:
209
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
210
+
211
+ ```
212
+ Task subagent (general-purpose, model: sonnet):
213
+ "You are a Red Team QA adversary. Your job is to BREAK the integrated system.
214
+
215
+ Your value is measured by REAL bugs found. More bugs = more value.
216
+ If you find zero bugs, you must prove you were thorough — list every
217
+ attack vector you tried and why it didn't break. A short list means
218
+ you didn't try hard enough.
219
+
220
+ Rules:
221
+ - False positives DESTROY your credibility. If you report something
222
+ as a bug and it's actually correct behavior, that's worse than
223
+ missing a real bug. Never report something you haven't reproduced.
224
+ - Style opinions are not bugs. Theoretical concerns are not bugs.
225
+ A bug is: 'I did X, expected Y, got Z.' With proof.
226
+ - You are done ONLY when you have exhausted every category below
227
+ and either found a bug or documented exactly what you tried.
228
+
229
+ ## Attack Categories (exhaust ALL of these)
230
+
231
+ 1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
232
+ match every contract? Test each endpoint/interface/schema shape.
233
+ 2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
234
+ special characters, SQL injection attempts, XSS payloads, path traversal.
235
+ 3. **State Transitions**: What happens when actions are performed out of
236
+ order? Double-submit? Concurrent access? Refresh mid-flow?
237
+ 4. **Error Paths**: Remove env vars. Kill the database. Send malformed
238
+ requests. Does the code handle failures gracefully or crash?
239
+ 5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
240
+ exist in requirements but have NO test coverage? Write tests for them.
241
+ 6. **Regression**: Run the FULL test suite. Did any existing tests break?
242
+ 7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
243
+ behavior (state changes, data loaded, navigation works) or just check
244
+ that elements exist? Flag and rewrite any shallow/layout tests.
245
+ 8. **Cross-Domain Boundaries**: Test data flow across EVERY domain boundary.
246
+ Does data arriving from domain A get validated by domain B? What happens
247
+ when domain A sends malformed data that passed A's own validation?
248
+
249
+ ## Report Format
250
+
251
+ For each bug found:
252
+ - **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
253
+ - **Reproduction**: {exact steps to reproduce}
254
+ - **Expected**: {what should happen}
255
+ - **Actual**: {what actually happens}
256
+ - **Proof**: {test file or command that demonstrates the bug}
257
+
258
+ Summary:
259
+ - BUGS FOUND: {count} (with severity breakdown)
260
+ - COVERAGE GAPS: {untested flows from requirements}
261
+ - SHALLOW TESTS REWRITTEN: {count}
262
+ - CONTRACTS VERIFIED: {N}/{total}
263
+ - ATTACK VECTORS TRIED: {list every category attempted and results}
264
+ - VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
265
+
266
+ Write all findings to .gsd-t/red-team-report.md.
267
+ If bugs found, also append to .gsd-t/qa-issues.md."
268
+ ```
269
+
270
+ After subagent returns — run via Bash:
271
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
272
+ Compute tokens and compaction:
273
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
274
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
275
+ Append to `.gsd-t/token-log.md`:
276
+ `| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
277
+
278
+ **If Red Team VERDICT is FAIL:**
279
+ 1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
280
+ 2. Re-run Red Team after fixes
281
+ 3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
282
+
283
+ **If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
284
+
285
+ ## Step 8: Doc-Ripple (Automated)
202
286
 
203
287
  After all integration work is committed but before reporting completion:
204
288
 
@@ -218,7 +302,7 @@ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
218
302
 
219
303
  4. After doc-ripple returns, verify manifest exists and report summary inline
220
304
 
221
- ## Step 8: Handle Integration Issues
305
+ ## Step 9: Handle Integration Issues
222
306
 
223
307
  For each issue found:
224
308
  1. Determine if it's a contract gap (missing specification) or implementation bug
@@ -226,7 +310,7 @@ For each issue found:
226
310
  3. **Implementation bug**: Fix it directly, document the fix
227
311
  4. Log everything in progress.md
228
312
 
229
- ## Step 9: Update State
313
+ ## Step 10: Update State
230
314
 
231
315
  Update `.gsd-t/progress.md`:
232
316
  - Set status to `INTEGRATED`
@@ -41,10 +41,16 @@ if [ -d "$STACKS_DIR" ]; then
41
41
  grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
42
42
  grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
43
43
  grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
44
+ grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
45
+ grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
46
+ grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
44
47
  fi
45
48
  ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
46
49
  ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
47
50
  ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
51
+ ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
52
+ ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
53
+ ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
48
54
  [ -f "pubspec.yaml" ] && _add flutter.md
49
55
  [ -f "Dockerfile" ] && _add docker.md
50
56
  [ -d ".github/workflows" ] && _add github-actions.md
@@ -201,6 +207,87 @@ Quick does not mean skip testing. Before committing:
201
207
  - If a contract exists for the interface touched, does the code still match?
202
208
  4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
203
209
 
210
+ ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
211
+
212
+ After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.
213
+
214
+ ⚙ [{model}] Red Team → adversarial validation of quick task
215
+
216
+ **OBSERVABILITY LOGGING (MANDATORY):**
217
+ Before spawning — run via Bash:
218
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
219
+
220
+ ```
221
+ Task subagent (general-purpose, model: sonnet):
222
+ "You are a Red Team QA adversary. Your job is to BREAK the code that was just changed.
223
+
224
+ Your value is measured by REAL bugs found. More bugs = more value.
225
+ If you find zero bugs, you must prove you were thorough — list every
226
+ attack vector you tried and why it didn't break. A short list means
227
+ you didn't try hard enough.
228
+
229
+ Rules:
230
+ - False positives DESTROY your credibility. If you report something
231
+ as a bug and it's actually correct behavior, that's worse than
232
+ missing a real bug. Never report something you haven't reproduced.
233
+ - Style opinions are not bugs. Theoretical concerns are not bugs.
234
+ A bug is: 'I did X, expected Y, got Z.' With proof.
235
+ - You are done ONLY when you have exhausted every category below
236
+ and either found a bug or documented exactly what you tried.
237
+
238
+ ## Attack Categories (exhaust ALL of these)
239
+
240
+ 1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
241
+ match every contract? Test each endpoint/interface/schema shape.
242
+ 2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
243
+ special characters, SQL injection attempts, XSS payloads, path traversal.
244
+ 3. **State Transitions**: What happens when actions are performed out of
245
+ order? Double-submit? Concurrent access? Refresh mid-flow?
246
+ 4. **Error Paths**: Remove env vars. Kill the database. Send malformed
247
+ requests. Does the code handle failures gracefully or crash?
248
+ 5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
249
+ exist in requirements but have NO test coverage? Write tests for them.
250
+ 6. **Regression**: Run the FULL test suite. Did any existing tests break?
251
+ 7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
252
+ behavior (state changes, data loaded, navigation works) or just check
253
+ that elements exist? Flag and rewrite any shallow/layout tests.
254
+
255
+ ## Report Format
256
+
257
+ For each bug found:
258
+ - **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
259
+ - **Reproduction**: {exact steps to reproduce}
260
+ - **Expected**: {what should happen}
261
+ - **Actual**: {what actually happens}
262
+ - **Proof**: {test file or command that demonstrates the bug}
263
+
264
+ Summary:
265
+ - BUGS FOUND: {count} (with severity breakdown)
266
+ - COVERAGE GAPS: {untested flows from requirements}
267
+ - SHALLOW TESTS REWRITTEN: {count}
268
+ - CONTRACTS VERIFIED: {N}/{total}
269
+ - ATTACK VECTORS TRIED: {list every category attempted and results}
270
+ - VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
271
+
272
+ Write all findings to .gsd-t/red-team-report.md.
273
+ If bugs found, also append to .gsd-t/qa-issues.md."
274
+ ```
275
+
276
+ After subagent returns — run via Bash:
277
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
278
+ Compute tokens and compaction:
279
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
280
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
281
+ Append to `.gsd-t/token-log.md`:
282
+ `| {DT_START} | {DT_END} | gsd-t-quick | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
283
+
284
+ **If Red Team VERDICT is FAIL:**
285
+ 1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
286
+ 2. Re-run Red Team after fixes
287
+ 3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
288
+
289
+ **If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
290
+
204
291
  ## Step 6: Doc-Ripple (Automated)
205
292
 
206
293
  After all work is committed but before reporting completion:
@@ -103,6 +103,7 @@ GSD-T reads all state files and tells you exactly where you left off.
103
103
  | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning, stack rules injection | In wave |
104
104
  | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
105
105
  | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
106
+ | *Red Team* | Adversarial QA — spawns after QA passes to find bugs the builder missed | Auto-spawned |
106
107
  | `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
107
108
  | `/user:gsd-t-integrate` | Wire domains together | In wave |
108
109
  | `/user:gsd-t-verify` | Run quality gates + goal-backward verification → auto-invokes complete-milestone | In wave |
@@ -154,8 +155,9 @@ GSD-T reads all state files and tells you exactly where you left off.
154
155
  │ │ └──────┐ │
155
156
  │ │ ▼ │
156
157
  │ │ ┌───────────────────┐ │
157
- │ │ │ (runs after each │ │
158
- │ │ │ task + at verify)│ │
158
+ │ │ │ QA + Red Team │ │
159
+ │ │ │ (after each phase │ │
160
+ │ │ │ that writes code)│ │
159
161
  │ │ └───────────────────┘ │
160
162
  │ ▼ │
161
163
  │ verify+complete ◄──────────── integrate ◄──────────────────────┘ │
@@ -172,9 +174,9 @@ GSD-T reads all state files and tells you exactly where you left off.
172
174
  | **Discuss** | Explore design decisions | Both |
173
175
  | **Plan** | Create atomic task lists | Solo (always) |
174
176
  | **Impact** | Downstream effect analysis | Solo |
175
- | **Execute** | Build it | Both |
177
+ | **Execute** | Build it (+ Red Team adversarial QA) | Both |
176
178
  | **Test-Sync** | Maintain test coverage | Solo |
177
- | **Integrate** | Wire domains together | Solo (always) |
179
+ | **Integrate** | Wire domains together (+ Red Team adversarial QA) | Solo (always) |
178
180
  | **Verify** | Quality gates | Both |
179
181
  | **Complete** | Archive + tag | Solo |
180
182
 
@@ -231,20 +233,42 @@ GSD-T auto-detects your project's tech stack and injects mandatory best-practice
231
233
  ### How It Works
232
234
 
233
235
  1. At subagent spawn time, GSD-T reads project manifest files to detect the active stack(s).
234
- 2. Universal rules (`templates/stacks/_security.md`) are **always** injected.
236
+ 2. Universal rules (`templates/stacks/_security.md`, `_auth.md`) are **always** injected.
235
237
  3. Stack-specific rules are injected when the corresponding stack is detected.
236
- 4. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
238
+ 4. Project-level overrides in `.gsd-t/stacks/` replace global files of the same name.
239
+ 5. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
237
240
 
238
- ### Stack Detection
241
+ ### Stack Detection (27 files)
239
242
 
240
- | Project File | Detected Stack |
243
+ | Project File | Detected Stack(s) |
241
244
  |---|---|
242
- | `package.json` with `"react"` | React |
243
- | `package.json` with `"typescript"` or `tsconfig.json` | TypeScript |
244
- | `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | Node API |
245
- | `requirements.txt` or `pyproject.toml` | Python |
246
- | `go.mod` | Go |
247
- | `Cargo.toml` | Rust |
245
+ | *(always)* | `_security.md`, `_auth.md` |
246
+ | `package.json` with `"react"` | `react.md` |
247
+ | `package.json` with `"react-native"` | `react-native.md` |
248
+ | `package.json` with `"next"` | `nextjs.md` |
249
+ | `package.json` with `"vue"` | `vue.md` |
250
+ | `package.json` with `"typescript"` or `tsconfig.json` | `typescript.md` |
251
+ | `package.json` with `"tailwindcss"` | `tailwind.md` |
252
+ | `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | `node-api.md`, `rest-api.md` |
253
+ | `package.json` with `"vite"` | `vite.md` |
254
+ | `package.json` with `"@supabase/supabase-js"` | `supabase.md` |
255
+ | `package.json` with `"firebase"` | `firebase.md` |
256
+ | `package.json` with `"graphql"` or `"@apollo/server"` | `graphql.md` |
257
+ | `package.json` with `"zustand"` | `zustand.md` |
258
+ | `package.json` with `"@reduxjs/toolkit"` | `redux.md` |
259
+ | `package.json` with `"prisma"` or `"@prisma/client"` | `prisma.md` |
260
+ | `package.json` with `"pg"`, `"knex"`, or `"drizzle-orm"` | `postgresql.md` |
261
+ | `package.json` with `"neo4j-driver"` | `neo4j.md` |
262
+ | `package.json` with `"bullmq"`, `"bull"`, `"amqplib"`, or `"@aws-sdk/client-sqs"` | `queues.md` |
263
+ | `package.json` with `"openai"`, `"anthropic"`, `"langchain"` | `llm.md` |
264
+ | `Dockerfile` or `compose.yaml` | `docker.md` |
265
+ | `.github/workflows/*.yml` | `github-actions.md` |
266
+ | `playwright.config.*` | `playwright.md` |
267
+ | `requirements.txt` or `pyproject.toml` | `python.md` |
268
+ | `requirements.txt` with `fastapi` | `fastapi.md` |
269
+ | `requirements.txt` with `celery`, `dramatiq`, `rq`, or `arq` | `queues.md` |
270
+ | `requirements.txt` with `openai`, `anthropic`, `langchain` | `llm.md` |
271
+ | `pubspec.yaml` | `flutter.md` |
248
272
 
249
273
  ### Commands That Inject Stack Rules
250
274
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "2.50.11",
3
+ "version": "2.51.10",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",
@@ -306,6 +306,26 @@ Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract:
306
306
 
307
307
  **QA failure OR shallow tests found blocks phase completion.** Lead cannot proceed until QA reports PASS with zero shallow tests, or user explicitly overrides.
308
308
 
309
+ ## Red Team — Adversarial QA (Mandatory)
310
+
311
+ After QA passes, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
312
+
313
+ **Red Team method by command:**
314
+ - `execute` → spawns Red Team after all domain tasks pass (Step 5.5)
315
+ - `integrate` → spawns Red Team after integration tests pass (Step 7.5)
316
+ - `quick` → spawns Red Team after Test & Verify passes (Step 5.5)
317
+ - `debug` → spawns Red Team after fix verification passes (Step 5.3)
318
+ - `wave` → each phase agent handles Red Team per the rules above
319
+
320
+ **Key Red Team rules:**
321
+ - **Inverted incentive**: More bugs found = more value. Zero bugs requires exhaustive proof of thoroughness.
322
+ - **False positive penalty**: Reporting non-bugs destroys credibility. Every bug must be reproduced with proof.
323
+ - **Exhaustive categories**: Contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps — all must be attempted.
324
+ - **VERDICT**: `FAIL` (bugs found — blocks completion) or `GRUDGING PASS` (exhaustive search, nothing found).
325
+ - **Report**: Written to `.gsd-t/red-team-report.md`; bugs also appended to `.gsd-t/qa-issues.md`.
326
+
327
+ **Red Team FAIL blocks phase completion.** CRITICAL/HIGH bugs must be fixed (up to 2 fix cycles). If bugs persist, they are logged to `.gsd-t/deferred-items.md` and presented to the user.
328
+
309
329
  ## Model Display (MANDATORY)
310
330
 
311
331
  **Before every subagent spawn, display the model being used to the user:**