npm - @tekyzinc/gsd-t - Versions diffs - 2.50.11 → 2.51.10 - Mend

@tekyzinc/gsd-t 2.50.11 → 2.51.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +21 -3
package/README.md +1 -0
package/commands/gsd-t-debug.md +89 -0
package/commands/gsd-t-execute.md +87 -0
package/commands/gsd-t-help.md +1 -1
package/commands/gsd-t-integrate.md +87 -3
package/commands/gsd-t-quick.md +87 -0
package/docs/GSD-T-README.md +38 -14
package/package.json +1 -1
package/templates/CLAUDE-global.md +20 -0
package/templates/stacks/_auth.md +324 -0
package/templates/stacks/fastapi.md +377 -0
package/templates/stacks/llm.md +541 -0
package/templates/stacks/prisma.md +400 -0
package/templates/stacks/queues.md +437 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,16 +2,34 @@
 All notable changes to GSD-T are documented here. Updated with each release.
-## [2.50.10] - 2026-03-25
+## [2.51.10] - 2026-03-25
 ### Added
-- **18 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright. Total: 22 stack rules (was 4).
+- **Red Team — Adversarial QA agent** added to `execute`, `quick`, `integrate`, and `debug` commands. Spawns after the builder's tests pass with inverted incentives — success is measured by bugs found, not tests passed.
+- **Exhaustive attack categories**: contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps, cross-domain boundaries (integrate only), fix regression variants (debug only).
+- **False positive penalty**: reporting non-bugs destroys credibility, preventing phantom bug inflation.
+- **VERDICT system**: `FAIL` (bugs found — blocks phase completion) or `GRUDGING PASS` (exhaustive search, nothing found — must prove thoroughness).
+- **Red Team report**: findings written to `.gsd-t/red-team-report.md`; bugs appended to `.gsd-t/qa-issues.md`.
+- Red Team documented in CLAUDE-global template, global CLAUDE.md, GSD-T-README wave diagram, README command table.
+## [2.50.12] - 2026-03-25
+### Added
+- **23 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright, fastapi, llm (with RAG patterns section), prisma, queues, _auth (universal). Total: 27 stack rules (was 4).
+- **`_auth.md`** (universal) — email-first registration, auth provider abstraction (Cognito/Firebase/Google), token management, password policy, session management, social auth/OAuth, email verification, MFA, authorization/RBAC, auth security, auth UI patterns.
+- **`fastapi.md`** — dependency injection, Pydantic request/response models, lifespan events, BackgroundTasks, async patterns, auto-generated OpenAPI docs.
+- **`llm.md`** — provider-agnostic LLM patterns: structured outputs, streaming, error/retry, token management, conversation state, tool/function calling, RAG patterns (chunking, embeddings, retrieval), prompt management, testing, cost/observability.
+- **`prisma.md`** — schema modeling, migrations, typed client usage, relation queries, transactions, seeding, N+1 prevention.
+- **`queues.md`** — BullMQ/Bull, SQS, RabbitMQ, Celery patterns: idempotent handlers, dead letter queues, retry/backoff, job deduplication, graceful shutdown.
 - **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
 - **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
+- **Project-level stack overrides** — `.gsd-t/stacks/` directory for per-project customization of global stack rules. Local files replace global files of the same name.
 ### Changed
-- Stack detection in execute, quick, and debug commands updated to cover all 22 stack files with conditional detection per project dependencies.
+- Stack detection in execute, quick, and debug commands updated to cover all 27 stack files with conditional detection per project dependencies.
+- Detection refactored from one-liner to structured bash with `_sf()` (local override resolver) and `_add()` helper functions.
 - PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
+- GSD-T-README.md stack detection table expanded to list all 27 files with their detection triggers.
 ## [2.46.11] - 2026-03-24

package/README.md CHANGED Viewed

@@ -156,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
 | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
 | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
 | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
+| *Red Team* | Adversarial QA — finds bugs the builder missed (inverted incentives) | Auto-spawned |
 | `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
 | `/user:gsd-t-integrate` | Wire domains together | In wave |
 | `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |

package/commands/gsd-t-debug.md CHANGED Viewed

@@ -37,10 +37,16 @@ if [ -d "$STACKS_DIR" ]; then
     grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
     grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
     grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
+    grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
+    grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
+    grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
   fi
   ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
   ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
   ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
+  ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
+  ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
+  ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
   [ -f "pubspec.yaml" ] && _add flutter.md
   [ -f "Dockerfile" ] && _add docker.md
   [ -d ".github/workflows" ] && _add github-actions.md
@@ -356,6 +362,89 @@ Before committing, ensure the fix is solid:
 Commit: `[debug] Fix {description} — root cause: {explanation}`
+## Step 5.3: Red Team — Adversarial QA (MANDATORY)
+After the fix passes all tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the fix and find regressions. Its success is measured by bugs found, not tests passed.
+⚙ [{model}] Red Team → adversarial validation of debug fix
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: sonnet):
+"You are a Red Team QA adversary. Your job is to BREAK the fix that was just applied.
+Your value is measured by REAL bugs found. More bugs = more value.
+If you find zero bugs, you must prove you were thorough — list every
+attack vector you tried and why it didn't break. A short list means
+you didn't try hard enough.
+Rules:
+- False positives DESTROY your credibility. If you report something
+  as a bug and it's actually correct behavior, that's worse than
+  missing a real bug. Never report something you haven't reproduced.
+- Style opinions are not bugs. Theoretical concerns are not bugs.
+  A bug is: 'I did X, expected Y, got Z.' With proof.
+- You are done ONLY when you have exhausted every category below
+  and either found a bug or documented exactly what you tried.
+## Attack Categories (exhaust ALL of these)
+1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
+   match every contract? Test each endpoint/interface/schema shape.
+2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
+   special characters, SQL injection attempts, XSS payloads, path traversal.
+3. **State Transitions**: What happens when actions are performed out of
+   order? Double-submit? Concurrent access? Refresh mid-flow?
+4. **Error Paths**: Remove env vars. Kill the database. Send malformed
+   requests. Does the code handle failures gracefully or crash?
+5. **Regression Around the Fix**: The fix changed specific code. Test
+   every adjacent code path. Fixes frequently break neighboring functionality.
+6. **Original Bug Variants**: The original bug was found. Are there SIMILAR
+   bugs in related code? Same pattern, different location?
+7. **Full Suite**: Run the FULL test suite. Did the fix break anything else?
+8. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
+   behavior (state changes, data loaded, navigation works) or just check
+   that elements exist? Flag and rewrite any shallow/layout tests.
+## Report Format
+For each bug found:
+- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
+  - **Reproduction**: {exact steps to reproduce}
+  - **Expected**: {what should happen}
+  - **Actual**: {what actually happens}
+  - **Proof**: {test file or command that demonstrates the bug}
+Summary:
+- BUGS FOUND: {count} (with severity breakdown)
+- COVERAGE GAPS: {untested flows from requirements}
+- SHALLOW TESTS REWRITTEN: {count}
+- CONTRACTS VERIFIED: {N}/{total}
+- ATTACK VECTORS TRIED: {list every category attempted and results}
+- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
+Write all findings to .gsd-t/red-team-report.md.
+If bugs found, also append to .gsd-t/qa-issues.md."
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-debug | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+**If Red Team VERDICT is FAIL:**
+1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
+2. Re-run Red Team after fixes
+3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+**If Red Team VERDICT is GRUDGING PASS:** Proceed to metrics and doc-ripple.
 ## Step 5.5: Emit Task Metrics
 After committing, emit a task-metrics record for this debug session — run via Bash:

package/commands/gsd-t-execute.md CHANGED Viewed

@@ -156,12 +156,18 @@ if [ -d "$STACKS_DIR" ]; then
     grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
     grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
     grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
+    grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
+    grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
+    grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
   fi
   # File-based detection (no package.json needed)
   ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
   ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
   ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
+  ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
+  ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
+  ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
   [ -f "pubspec.yaml" ] && _add flutter.md
   [ -f "Dockerfile" ] && _add docker.md
   [ -d ".github/workflows" ] && _add github-actions.md
@@ -563,6 +569,87 @@ A teammate finishes independent tasks and is waiting on a checkpoint:
 2. If not, have the teammate work on documentation, tests, or code cleanup within their domain
 3. Or shut down the teammate and respawn when unblocked
+## Step 5.5: Red Team — Adversarial QA (MANDATORY)
+After all domain tasks pass their tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just built. It operates with inverted incentives — its success is measured by bugs found, not tests passed.
+⚙ [{model}] Red Team → adversarial validation of executed domains
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: sonnet):
+"You are a Red Team QA adversary. Your job is to BREAK the code that was just written.
+Your value is measured by REAL bugs found. More bugs = more value.
+If you find zero bugs, you must prove you were thorough — list every
+attack vector you tried and why it didn't break. A short list means
+you didn't try hard enough.
+Rules:
+- False positives DESTROY your credibility. If you report something
+  as a bug and it's actually correct behavior, that's worse than
+  missing a real bug. Never report something you haven't reproduced.
+- Style opinions are not bugs. Theoretical concerns are not bugs.
+  A bug is: 'I did X, expected Y, got Z.' With proof.
+- You are done ONLY when you have exhausted every category below
+  and either found a bug or documented exactly what you tried.
+## Attack Categories (exhaust ALL of these)
+1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
+   match every contract? Test each endpoint/interface/schema shape.
+2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
+   special characters, SQL injection attempts, XSS payloads, path traversal.
+3. **State Transitions**: What happens when actions are performed out of
+   order? Double-submit? Concurrent access? Refresh mid-flow?
+4. **Error Paths**: Remove env vars. Kill the database. Send malformed
+   requests. Does the code handle failures gracefully or crash?
+5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
+   exist in requirements but have NO test coverage? Write tests for them.
+6. **Regression**: Run the FULL test suite. Did any existing tests break?
+7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
+   behavior (state changes, data loaded, navigation works) or just check
+   that elements exist? Flag and rewrite any shallow/layout tests.
+## Report Format
+For each bug found:
+- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
+  - **Reproduction**: {exact steps to reproduce}
+  - **Expected**: {what should happen}
+  - **Actual**: {what actually happens}
+  - **Proof**: {test file or command that demonstrates the bug}
+Summary:
+- BUGS FOUND: {count} (with severity breakdown)
+- COVERAGE GAPS: {untested flows from requirements}
+- SHALLOW TESTS REWRITTEN: {count}
+- CONTRACTS VERIFIED: {N}/{total}
+- ATTACK VECTORS TRIED: {list every category attempted and results}
+- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
+Write all findings to .gsd-t/red-team-report.md.
+If bugs found, also append to .gsd-t/qa-issues.md."
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-execute | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+**If Red Team VERDICT is FAIL:**
+1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
+2. Re-run Red Team after fixes
+3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+**If Red Team VERDICT is GRUDGING PASS:** Proceed to completion.
 ## Step 6: Completion
 When all tasks in all domains are complete:

package/commands/gsd-t-help.md CHANGED Viewed

@@ -258,7 +258,7 @@ Use these when user asks for help on a specific command:
 - **Use when**: Ready to implement
 - **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
 - **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
-- **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
+- **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`, `_auth.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
 ### test-sync
 - **Summary**: Keep tests aligned with code changes

package/commands/gsd-t-integrate.md CHANGED Viewed

@@ -198,7 +198,91 @@ After integration and doc ripple, verify everything works together:
 4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
 5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
-## Step 7.5: Doc-Ripple (Automated)
+## Step 7.5: Red Team — Adversarial QA (MANDATORY)
+After integration tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the integrated system. Its success is measured by bugs found, not tests passed.
+⚙ [{model}] Red Team → adversarial validation of integrated system
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: sonnet):
+"You are a Red Team QA adversary. Your job is to BREAK the integrated system.
+Your value is measured by REAL bugs found. More bugs = more value.
+If you find zero bugs, you must prove you were thorough — list every
+attack vector you tried and why it didn't break. A short list means
+you didn't try hard enough.
+Rules:
+- False positives DESTROY your credibility. If you report something
+  as a bug and it's actually correct behavior, that's worse than
+  missing a real bug. Never report something you haven't reproduced.
+- Style opinions are not bugs. Theoretical concerns are not bugs.
+  A bug is: 'I did X, expected Y, got Z.' With proof.
+- You are done ONLY when you have exhausted every category below
+  and either found a bug or documented exactly what you tried.
+## Attack Categories (exhaust ALL of these)
+1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
+   match every contract? Test each endpoint/interface/schema shape.
+2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
+   special characters, SQL injection attempts, XSS payloads, path traversal.
+3. **State Transitions**: What happens when actions are performed out of
+   order? Double-submit? Concurrent access? Refresh mid-flow?
+4. **Error Paths**: Remove env vars. Kill the database. Send malformed
+   requests. Does the code handle failures gracefully or crash?
+5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
+   exist in requirements but have NO test coverage? Write tests for them.
+6. **Regression**: Run the FULL test suite. Did any existing tests break?
+7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
+   behavior (state changes, data loaded, navigation works) or just check
+   that elements exist? Flag and rewrite any shallow/layout tests.
+8. **Cross-Domain Boundaries**: Test data flow across EVERY domain boundary.
+   Does data arriving from domain A get validated by domain B? What happens
+   when domain A sends malformed data that passed A's own validation?
+## Report Format
+For each bug found:
+- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
+  - **Reproduction**: {exact steps to reproduce}
+  - **Expected**: {what should happen}
+  - **Actual**: {what actually happens}
+  - **Proof**: {test file or command that demonstrates the bug}
+Summary:
+- BUGS FOUND: {count} (with severity breakdown)
+- COVERAGE GAPS: {untested flows from requirements}
+- SHALLOW TESTS REWRITTEN: {count}
+- CONTRACTS VERIFIED: {N}/{total}
+- ATTACK VECTORS TRIED: {list every category attempted and results}
+- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
+Write all findings to .gsd-t/red-team-report.md.
+If bugs found, also append to .gsd-t/qa-issues.md."
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+**If Red Team VERDICT is FAIL:**
+1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
+2. Re-run Red Team after fixes
+3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
+## Step 8: Doc-Ripple (Automated)
 After all integration work is committed but before reporting completion:
@@ -218,7 +302,7 @@ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
 4. After doc-ripple returns, verify manifest exists and report summary inline
-## Step 8: Handle Integration Issues
+## Step 9: Handle Integration Issues
 For each issue found:
 1. Determine if it's a contract gap (missing specification) or implementation bug
@@ -226,7 +310,7 @@ For each issue found:
 3. **Implementation bug**: Fix it directly, document the fix
 4. Log everything in progress.md
-## Step 9: Update State
+## Step 10: Update State
 Update `.gsd-t/progress.md`:
 - Set status to `INTEGRATED`

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -41,10 +41,16 @@ if [ -d "$STACKS_DIR" ]; then
     grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
     grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
     grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
+    grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
+    grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
+    grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
   fi
   ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
   ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
   ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
+  ([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
+  ([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
+  ([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
   [ -f "pubspec.yaml" ] && _add flutter.md
   [ -f "Dockerfile" ] && _add docker.md
   [ -d ".github/workflows" ] && _add github-actions.md
@@ -201,6 +207,87 @@ Quick does not mean skip testing. Before committing:
    - If a contract exists for the interface touched, does the code still match?
 4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
+## Step 5.5: Red Team — Adversarial QA (MANDATORY)
+After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.
+⚙ [{model}] Red Team → adversarial validation of quick task
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: sonnet):
+"You are a Red Team QA adversary. Your job is to BREAK the code that was just changed.
+Your value is measured by REAL bugs found. More bugs = more value.
+If you find zero bugs, you must prove you were thorough — list every
+attack vector you tried and why it didn't break. A short list means
+you didn't try hard enough.
+Rules:
+- False positives DESTROY your credibility. If you report something
+  as a bug and it's actually correct behavior, that's worse than
+  missing a real bug. Never report something you haven't reproduced.
+- Style opinions are not bugs. Theoretical concerns are not bugs.
+  A bug is: 'I did X, expected Y, got Z.' With proof.
+- You are done ONLY when you have exhausted every category below
+  and either found a bug or documented exactly what you tried.
+## Attack Categories (exhaust ALL of these)
+1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
+   match every contract? Test each endpoint/interface/schema shape.
+2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
+   special characters, SQL injection attempts, XSS payloads, path traversal.
+3. **State Transitions**: What happens when actions are performed out of
+   order? Double-submit? Concurrent access? Refresh mid-flow?
+4. **Error Paths**: Remove env vars. Kill the database. Send malformed
+   requests. Does the code handle failures gracefully or crash?
+5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
+   exist in requirements but have NO test coverage? Write tests for them.
+6. **Regression**: Run the FULL test suite. Did any existing tests break?
+7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
+   behavior (state changes, data loaded, navigation works) or just check
+   that elements exist? Flag and rewrite any shallow/layout tests.
+## Report Format
+For each bug found:
+- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
+  - **Reproduction**: {exact steps to reproduce}
+  - **Expected**: {what should happen}
+  - **Actual**: {what actually happens}
+  - **Proof**: {test file or command that demonstrates the bug}
+Summary:
+- BUGS FOUND: {count} (with severity breakdown)
+- COVERAGE GAPS: {untested flows from requirements}
+- SHALLOW TESTS REWRITTEN: {count}
+- CONTRACTS VERIFIED: {N}/{total}
+- ATTACK VECTORS TRIED: {list every category attempted and results}
+- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
+Write all findings to .gsd-t/red-team-report.md.
+If bugs found, also append to .gsd-t/qa-issues.md."
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-quick | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+**If Red Team VERDICT is FAIL:**
+1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
+2. Re-run Red Team after fixes
+3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
+**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
 ## Step 6: Doc-Ripple (Automated)
 After all work is committed but before reporting completion:

package/docs/GSD-T-README.md CHANGED Viewed

@@ -103,6 +103,7 @@ GSD-T reads all state files and tells you exactly where you left off.
 | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning, stack rules injection | In wave |
 | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
 | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
+| *Red Team* | Adversarial QA — spawns after QA passes to find bugs the builder missed | Auto-spawned |
 | `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
 | `/user:gsd-t-integrate` | Wire domains together | In wave |
 | `/user:gsd-t-verify` | Run quality gates + goal-backward verification → auto-invokes complete-milestone | In wave |
@@ -154,8 +155,9 @@ GSD-T reads all state files and tells you exactly where you left off.
 │                                              │         └──────┐             │
 │                                              │                ▼             │
 │                                              │    ┌───────────────────┐     │
-│                                              │    │ (runs after each  │     │
-│                                              │    │  task + at verify)│     │
+│                                              │    │  QA + Red Team    │     │
+│                                              │    │ (after each phase │     │
+│                                              │    │  that writes code)│     │
 │                                              │    └───────────────────┘     │
 │                                              ▼                              │
 │  verify+complete ◄──────────── integrate ◄──────────────────────┘          │
@@ -172,9 +174,9 @@ GSD-T reads all state files and tells you exactly where you left off.
 | **Discuss** | Explore design decisions | Both |
 | **Plan** | Create atomic task lists | Solo (always) |
 | **Impact** | Downstream effect analysis | Solo |
-| **Execute** | Build it | Both |
+| **Execute** | Build it (+ Red Team adversarial QA) | Both |
 | **Test-Sync** | Maintain test coverage | Solo |
-| **Integrate** | Wire domains together | Solo (always) |
+| **Integrate** | Wire domains together (+ Red Team adversarial QA) | Solo (always) |
 | **Verify** | Quality gates | Both |
 | **Complete** | Archive + tag | Solo |
@@ -231,20 +233,42 @@ GSD-T auto-detects your project's tech stack and injects mandatory best-practice
 ### How It Works
 1. At subagent spawn time, GSD-T reads project manifest files to detect the active stack(s).
-2. Universal rules (`templates/stacks/_security.md`) are **always** injected.
+2. Universal rules (`templates/stacks/_security.md`, `_auth.md`) are **always** injected.
 3. Stack-specific rules are injected when the corresponding stack is detected.
-4. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
+4. Project-level overrides in `.gsd-t/stacks/` replace global files of the same name.
+5. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
-### Stack Detection
+### Stack Detection (27 files)
-| Project File | Detected Stack |
+| Project File | Detected Stack(s) |
 |---|---|
-| `package.json` with `"react"` | React |
-| `package.json` with `"typescript"` or `tsconfig.json` | TypeScript |
-| `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | Node API |
-| `requirements.txt` or `pyproject.toml` | Python |
-| `go.mod` | Go |
-| `Cargo.toml` | Rust |
+| *(always)* | `_security.md`, `_auth.md` |
+| `package.json` with `"react"` | `react.md` |
+| `package.json` with `"react-native"` | `react-native.md` |
+| `package.json` with `"next"` | `nextjs.md` |
+| `package.json` with `"vue"` | `vue.md` |
+| `package.json` with `"typescript"` or `tsconfig.json` | `typescript.md` |
+| `package.json` with `"tailwindcss"` | `tailwind.md` |
+| `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | `node-api.md`, `rest-api.md` |
+| `package.json` with `"vite"` | `vite.md` |
+| `package.json` with `"@supabase/supabase-js"` | `supabase.md` |
+| `package.json` with `"firebase"` | `firebase.md` |
+| `package.json` with `"graphql"` or `"@apollo/server"` | `graphql.md` |
+| `package.json` with `"zustand"` | `zustand.md` |
+| `package.json` with `"@reduxjs/toolkit"` | `redux.md` |
+| `package.json` with `"prisma"` or `"@prisma/client"` | `prisma.md` |
+| `package.json` with `"pg"`, `"knex"`, or `"drizzle-orm"` | `postgresql.md` |
+| `package.json` with `"neo4j-driver"` | `neo4j.md` |
+| `package.json` with `"bullmq"`, `"bull"`, `"amqplib"`, or `"@aws-sdk/client-sqs"` | `queues.md` |
+| `package.json` with `"openai"`, `"anthropic"`, `"langchain"` | `llm.md` |
+| `Dockerfile` or `compose.yaml` | `docker.md` |
+| `.github/workflows/*.yml` | `github-actions.md` |
+| `playwright.config.*` | `playwright.md` |
+| `requirements.txt` or `pyproject.toml` | `python.md` |
+| `requirements.txt` with `fastapi` | `fastapi.md` |
+| `requirements.txt` with `celery`, `dramatiq`, `rq`, or `arq` | `queues.md` |
+| `requirements.txt` with `openai`, `anthropic`, `langchain` | `llm.md` |
+| `pubspec.yaml` | `flutter.md` |
 ### Commands That Inject Stack Rules

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "2.50.11",
+  "version": "2.51.10",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",

package/templates/CLAUDE-global.md CHANGED Viewed

@@ -306,6 +306,26 @@ Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract:
 **QA failure OR shallow tests found blocks phase completion.** Lead cannot proceed until QA reports PASS with zero shallow tests, or user explicitly overrides.
+## Red Team — Adversarial QA (Mandatory)
+After QA passes, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
+**Red Team method by command:**
+- `execute` → spawns Red Team after all domain tasks pass (Step 5.5)
+- `integrate` → spawns Red Team after integration tests pass (Step 7.5)
+- `quick` → spawns Red Team after Test & Verify passes (Step 5.5)
+- `debug` → spawns Red Team after fix verification passes (Step 5.3)
+- `wave` → each phase agent handles Red Team per the rules above
+**Key Red Team rules:**
+- **Inverted incentive**: More bugs found = more value. Zero bugs requires exhaustive proof of thoroughness.
+- **False positive penalty**: Reporting non-bugs destroys credibility. Every bug must be reproduced with proof.
+- **Exhaustive categories**: Contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps — all must be attempted.
+- **VERDICT**: `FAIL` (bugs found — blocks completion) or `GRUDGING PASS` (exhaustive search, nothing found).
+- **Report**: Written to `.gsd-t/red-team-report.md`; bugs also appended to `.gsd-t/qa-issues.md`.
+**Red Team FAIL blocks phase completion.** CRITICAL/HIGH bugs must be fixed (up to 2 fix cycles). If bugs persist, they are logged to `.gsd-t/deferred-items.md` and presented to the user.
 ## Model Display (MANDATORY)
 **Before every subagent spawn, display the model being used to the user:**