@tekyzinc/gsd-t 2.50.11 → 2.51.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +21 -3
- package/README.md +1 -0
- package/commands/gsd-t-debug.md +89 -0
- package/commands/gsd-t-execute.md +87 -0
- package/commands/gsd-t-help.md +1 -1
- package/commands/gsd-t-integrate.md +87 -3
- package/commands/gsd-t-quick.md +87 -0
- package/docs/GSD-T-README.md +38 -14
- package/package.json +1 -1
- package/templates/CLAUDE-global.md +20 -0
- package/templates/stacks/_auth.md +324 -0
- package/templates/stacks/fastapi.md +377 -0
- package/templates/stacks/llm.md +541 -0
- package/templates/stacks/prisma.md +400 -0
- package/templates/stacks/queues.md +437 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,16 +2,34 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to GSD-T are documented here. Updated with each release.
|
|
4
4
|
|
|
5
|
-
## [2.
|
|
5
|
+
## [2.51.10] - 2026-03-25
|
|
6
6
|
|
|
7
7
|
### Added
|
|
8
|
-
- **
|
|
8
|
+
- **Red Team — Adversarial QA agent** added to `execute`, `quick`, `integrate`, and `debug` commands. Spawns after the builder's tests pass with inverted incentives — success is measured by bugs found, not tests passed.
|
|
9
|
+
- **Exhaustive attack categories**: contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps, cross-domain boundaries (integrate only), fix regression variants (debug only).
|
|
10
|
+
- **False positive penalty**: reporting non-bugs destroys credibility, preventing phantom bug inflation.
|
|
11
|
+
- **VERDICT system**: `FAIL` (bugs found — blocks phase completion) or `GRUDGING PASS` (exhaustive search, nothing found — must prove thoroughness).
|
|
12
|
+
- **Red Team report**: findings written to `.gsd-t/red-team-report.md`; bugs appended to `.gsd-t/qa-issues.md`.
|
|
13
|
+
- Red Team documented in CLAUDE-global template, global CLAUDE.md, GSD-T-README wave diagram, README command table.
|
|
14
|
+
|
|
15
|
+
## [2.50.12] - 2026-03-25
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
- **23 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright, fastapi, llm (with RAG patterns section), prisma, queues, _auth (universal). Total: 27 stack rules (was 4).
|
|
19
|
+
- **`_auth.md`** (universal) — email-first registration, auth provider abstraction (Cognito/Firebase/Google), token management, password policy, session management, social auth/OAuth, email verification, MFA, authorization/RBAC, auth security, auth UI patterns.
|
|
20
|
+
- **`fastapi.md`** — dependency injection, Pydantic request/response models, lifespan events, BackgroundTasks, async patterns, auto-generated OpenAPI docs.
|
|
21
|
+
- **`llm.md`** — provider-agnostic LLM patterns: structured outputs, streaming, error/retry, token management, conversation state, tool/function calling, RAG patterns (chunking, embeddings, retrieval), prompt management, testing, cost/observability.
|
|
22
|
+
- **`prisma.md`** — schema modeling, migrations, typed client usage, relation queries, transactions, seeding, N+1 prevention.
|
|
23
|
+
- **`queues.md`** — BullMQ/Bull, SQS, RabbitMQ, Celery patterns: idempotent handlers, dead letter queues, retry/backoff, job deduplication, graceful shutdown.
|
|
9
24
|
- **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
|
|
10
25
|
- **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
|
|
26
|
+
- **Project-level stack overrides** — `.gsd-t/stacks/` directory for per-project customization of global stack rules. Local files replace global files of the same name.
|
|
11
27
|
|
|
12
28
|
### Changed
|
|
13
|
-
- Stack detection in execute, quick, and debug commands updated to cover all
|
|
29
|
+
- Stack detection in execute, quick, and debug commands updated to cover all 27 stack files with conditional detection per project dependencies.
|
|
30
|
+
- Detection refactored from one-liner to structured bash with `_sf()` (local override resolver) and `_add()` helper functions.
|
|
14
31
|
- PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
|
|
32
|
+
- GSD-T-README.md stack detection table expanded to list all 27 files with their detection triggers.
|
|
15
33
|
|
|
16
34
|
## [2.46.11] - 2026-03-24
|
|
17
35
|
|
package/README.md
CHANGED
|
@@ -156,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
|
|
|
156
156
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
|
|
157
157
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
158
158
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
159
|
+
| *Red Team* | Adversarial QA — finds bugs the builder missed (inverted incentives) | Auto-spawned |
|
|
159
160
|
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
160
161
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
161
162
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
|
package/commands/gsd-t-debug.md
CHANGED
|
@@ -37,10 +37,16 @@ if [ -d "$STACKS_DIR" ]; then
|
|
|
37
37
|
grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
|
|
38
38
|
grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
|
|
39
39
|
grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
|
|
40
|
+
grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
|
|
41
|
+
grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
|
|
42
|
+
grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
|
|
40
43
|
fi
|
|
41
44
|
([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
|
|
42
45
|
([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
|
|
43
46
|
([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
|
|
47
|
+
([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
|
|
48
|
+
([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
|
|
49
|
+
([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
|
|
44
50
|
[ -f "pubspec.yaml" ] && _add flutter.md
|
|
45
51
|
[ -f "Dockerfile" ] && _add docker.md
|
|
46
52
|
[ -d ".github/workflows" ] && _add github-actions.md
|
|
@@ -356,6 +362,89 @@ Before committing, ensure the fix is solid:
|
|
|
356
362
|
|
|
357
363
|
Commit: `[debug] Fix {description} — root cause: {explanation}`
|
|
358
364
|
|
|
365
|
+
## Step 5.3: Red Team — Adversarial QA (MANDATORY)
|
|
366
|
+
|
|
367
|
+
After the fix passes all tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the fix and find regressions. Its success is measured by bugs found, not tests passed.
|
|
368
|
+
|
|
369
|
+
⚙ [{model}] Red Team → adversarial validation of debug fix
|
|
370
|
+
|
|
371
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
372
|
+
Before spawning — run via Bash:
|
|
373
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
374
|
+
|
|
375
|
+
```
|
|
376
|
+
Task subagent (general-purpose, model: sonnet):
|
|
377
|
+
"You are a Red Team QA adversary. Your job is to BREAK the fix that was just applied.
|
|
378
|
+
|
|
379
|
+
Your value is measured by REAL bugs found. More bugs = more value.
|
|
380
|
+
If you find zero bugs, you must prove you were thorough — list every
|
|
381
|
+
attack vector you tried and why it didn't break. A short list means
|
|
382
|
+
you didn't try hard enough.
|
|
383
|
+
|
|
384
|
+
Rules:
|
|
385
|
+
- False positives DESTROY your credibility. If you report something
|
|
386
|
+
as a bug and it's actually correct behavior, that's worse than
|
|
387
|
+
missing a real bug. Never report something you haven't reproduced.
|
|
388
|
+
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
389
|
+
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
390
|
+
- You are done ONLY when you have exhausted every category below
|
|
391
|
+
and either found a bug or documented exactly what you tried.
|
|
392
|
+
|
|
393
|
+
## Attack Categories (exhaust ALL of these)
|
|
394
|
+
|
|
395
|
+
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
396
|
+
match every contract? Test each endpoint/interface/schema shape.
|
|
397
|
+
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
398
|
+
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
399
|
+
3. **State Transitions**: What happens when actions are performed out of
|
|
400
|
+
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
401
|
+
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
402
|
+
requests. Does the code handle failures gracefully or crash?
|
|
403
|
+
5. **Regression Around the Fix**: The fix changed specific code. Test
|
|
404
|
+
every adjacent code path. Fixes frequently break neighboring functionality.
|
|
405
|
+
6. **Original Bug Variants**: The original bug was found. Are there SIMILAR
|
|
406
|
+
bugs in related code? Same pattern, different location?
|
|
407
|
+
7. **Full Suite**: Run the FULL test suite. Did the fix break anything else?
|
|
408
|
+
8. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
409
|
+
behavior (state changes, data loaded, navigation works) or just check
|
|
410
|
+
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
411
|
+
|
|
412
|
+
## Report Format
|
|
413
|
+
|
|
414
|
+
For each bug found:
|
|
415
|
+
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
416
|
+
- **Reproduction**: {exact steps to reproduce}
|
|
417
|
+
- **Expected**: {what should happen}
|
|
418
|
+
- **Actual**: {what actually happens}
|
|
419
|
+
- **Proof**: {test file or command that demonstrates the bug}
|
|
420
|
+
|
|
421
|
+
Summary:
|
|
422
|
+
- BUGS FOUND: {count} (with severity breakdown)
|
|
423
|
+
- COVERAGE GAPS: {untested flows from requirements}
|
|
424
|
+
- SHALLOW TESTS REWRITTEN: {count}
|
|
425
|
+
- CONTRACTS VERIFIED: {N}/{total}
|
|
426
|
+
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
427
|
+
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
428
|
+
|
|
429
|
+
Write all findings to .gsd-t/red-team-report.md.
|
|
430
|
+
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
After subagent returns — run via Bash:
|
|
434
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
435
|
+
Compute tokens and compaction:
|
|
436
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
437
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
438
|
+
Append to `.gsd-t/token-log.md`:
|
|
439
|
+
`| {DT_START} | {DT_END} | gsd-t-debug | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
440
|
+
|
|
441
|
+
**If Red Team VERDICT is FAIL:**
|
|
442
|
+
1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
|
|
443
|
+
2. Re-run Red Team after fixes
|
|
444
|
+
3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
|
|
445
|
+
|
|
446
|
+
**If Red Team VERDICT is GRUDGING PASS:** Proceed to metrics and doc-ripple.
|
|
447
|
+
|
|
359
448
|
## Step 5.5: Emit Task Metrics
|
|
360
449
|
|
|
361
450
|
After committing, emit a task-metrics record for this debug session — run via Bash:
|
|
@@ -156,12 +156,18 @@ if [ -d "$STACKS_DIR" ]; then
|
|
|
156
156
|
grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
|
|
157
157
|
grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
|
|
158
158
|
grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
|
|
159
|
+
grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
|
|
160
|
+
grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
|
|
161
|
+
grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
|
|
159
162
|
fi
|
|
160
163
|
|
|
161
164
|
# File-based detection (no package.json needed)
|
|
162
165
|
([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
|
|
163
166
|
([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
|
|
164
167
|
([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
|
|
168
|
+
([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
|
|
169
|
+
([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
|
|
170
|
+
([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
|
|
165
171
|
[ -f "pubspec.yaml" ] && _add flutter.md
|
|
166
172
|
[ -f "Dockerfile" ] && _add docker.md
|
|
167
173
|
[ -d ".github/workflows" ] && _add github-actions.md
|
|
@@ -563,6 +569,87 @@ A teammate finishes independent tasks and is waiting on a checkpoint:
|
|
|
563
569
|
2. If not, have the teammate work on documentation, tests, or code cleanup within their domain
|
|
564
570
|
3. Or shut down the teammate and respawn when unblocked
|
|
565
571
|
|
|
572
|
+
## Step 5.5: Red Team — Adversarial QA (MANDATORY)
|
|
573
|
+
|
|
574
|
+
After all domain tasks pass their tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just built. It operates with inverted incentives — its success is measured by bugs found, not tests passed.
|
|
575
|
+
|
|
576
|
+
⚙ [{model}] Red Team → adversarial validation of executed domains
|
|
577
|
+
|
|
578
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
579
|
+
Before spawning — run via Bash:
|
|
580
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
581
|
+
|
|
582
|
+
```
|
|
583
|
+
Task subagent (general-purpose, model: sonnet):
|
|
584
|
+
"You are a Red Team QA adversary. Your job is to BREAK the code that was just written.
|
|
585
|
+
|
|
586
|
+
Your value is measured by REAL bugs found. More bugs = more value.
|
|
587
|
+
If you find zero bugs, you must prove you were thorough — list every
|
|
588
|
+
attack vector you tried and why it didn't break. A short list means
|
|
589
|
+
you didn't try hard enough.
|
|
590
|
+
|
|
591
|
+
Rules:
|
|
592
|
+
- False positives DESTROY your credibility. If you report something
|
|
593
|
+
as a bug and it's actually correct behavior, that's worse than
|
|
594
|
+
missing a real bug. Never report something you haven't reproduced.
|
|
595
|
+
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
596
|
+
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
597
|
+
- You are done ONLY when you have exhausted every category below
|
|
598
|
+
and either found a bug or documented exactly what you tried.
|
|
599
|
+
|
|
600
|
+
## Attack Categories (exhaust ALL of these)
|
|
601
|
+
|
|
602
|
+
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
603
|
+
match every contract? Test each endpoint/interface/schema shape.
|
|
604
|
+
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
605
|
+
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
606
|
+
3. **State Transitions**: What happens when actions are performed out of
|
|
607
|
+
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
608
|
+
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
609
|
+
requests. Does the code handle failures gracefully or crash?
|
|
610
|
+
5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
|
|
611
|
+
exist in requirements but have NO test coverage? Write tests for them.
|
|
612
|
+
6. **Regression**: Run the FULL test suite. Did any existing tests break?
|
|
613
|
+
7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
614
|
+
behavior (state changes, data loaded, navigation works) or just check
|
|
615
|
+
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
616
|
+
|
|
617
|
+
## Report Format
|
|
618
|
+
|
|
619
|
+
For each bug found:
|
|
620
|
+
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
621
|
+
- **Reproduction**: {exact steps to reproduce}
|
|
622
|
+
- **Expected**: {what should happen}
|
|
623
|
+
- **Actual**: {what actually happens}
|
|
624
|
+
- **Proof**: {test file or command that demonstrates the bug}
|
|
625
|
+
|
|
626
|
+
Summary:
|
|
627
|
+
- BUGS FOUND: {count} (with severity breakdown)
|
|
628
|
+
- COVERAGE GAPS: {untested flows from requirements}
|
|
629
|
+
- SHALLOW TESTS REWRITTEN: {count}
|
|
630
|
+
- CONTRACTS VERIFIED: {N}/{total}
|
|
631
|
+
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
632
|
+
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
633
|
+
|
|
634
|
+
Write all findings to .gsd-t/red-team-report.md.
|
|
635
|
+
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
636
|
+
```
|
|
637
|
+
|
|
638
|
+
After subagent returns — run via Bash:
|
|
639
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
640
|
+
Compute tokens and compaction:
|
|
641
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
642
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
643
|
+
Append to `.gsd-t/token-log.md`:
|
|
644
|
+
`| {DT_START} | {DT_END} | gsd-t-execute | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
645
|
+
|
|
646
|
+
**If Red Team VERDICT is FAIL:**
|
|
647
|
+
1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
|
|
648
|
+
2. Re-run Red Team after fixes
|
|
649
|
+
3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
|
|
650
|
+
|
|
651
|
+
**If Red Team VERDICT is GRUDGING PASS:** Proceed to completion.
|
|
652
|
+
|
|
566
653
|
## Step 6: Completion
|
|
567
654
|
|
|
568
655
|
When all tasks in all domains are complete:
|
package/commands/gsd-t-help.md
CHANGED
|
@@ -258,7 +258,7 @@ Use these when user asks for help on a specific command:
|
|
|
258
258
|
- **Use when**: Ready to implement
|
|
259
259
|
- **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
|
|
260
260
|
- **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
|
|
261
|
-
- **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
|
|
261
|
+
- **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`, `_auth.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
|
|
262
262
|
|
|
263
263
|
### test-sync
|
|
264
264
|
- **Summary**: Keep tests aligned with code changes
|
|
@@ -198,7 +198,91 @@ After integration and doc ripple, verify everything works together:
|
|
|
198
198
|
4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
|
|
199
199
|
5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
|
|
200
200
|
|
|
201
|
-
## Step 7.5:
|
|
201
|
+
## Step 7.5: Red Team — Adversarial QA (MANDATORY)
|
|
202
|
+
|
|
203
|
+
After integration tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the integrated system. Its success is measured by bugs found, not tests passed.
|
|
204
|
+
|
|
205
|
+
⚙ [{model}] Red Team → adversarial validation of integrated system
|
|
206
|
+
|
|
207
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
208
|
+
Before spawning — run via Bash:
|
|
209
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
210
|
+
|
|
211
|
+
```
|
|
212
|
+
Task subagent (general-purpose, model: sonnet):
|
|
213
|
+
"You are a Red Team QA adversary. Your job is to BREAK the integrated system.
|
|
214
|
+
|
|
215
|
+
Your value is measured by REAL bugs found. More bugs = more value.
|
|
216
|
+
If you find zero bugs, you must prove you were thorough — list every
|
|
217
|
+
attack vector you tried and why it didn't break. A short list means
|
|
218
|
+
you didn't try hard enough.
|
|
219
|
+
|
|
220
|
+
Rules:
|
|
221
|
+
- False positives DESTROY your credibility. If you report something
|
|
222
|
+
as a bug and it's actually correct behavior, that's worse than
|
|
223
|
+
missing a real bug. Never report something you haven't reproduced.
|
|
224
|
+
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
225
|
+
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
226
|
+
- You are done ONLY when you have exhausted every category below
|
|
227
|
+
and either found a bug or documented exactly what you tried.
|
|
228
|
+
|
|
229
|
+
## Attack Categories (exhaust ALL of these)
|
|
230
|
+
|
|
231
|
+
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
232
|
+
match every contract? Test each endpoint/interface/schema shape.
|
|
233
|
+
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
234
|
+
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
235
|
+
3. **State Transitions**: What happens when actions are performed out of
|
|
236
|
+
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
237
|
+
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
238
|
+
requests. Does the code handle failures gracefully or crash?
|
|
239
|
+
5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
|
|
240
|
+
exist in requirements but have NO test coverage? Write tests for them.
|
|
241
|
+
6. **Regression**: Run the FULL test suite. Did any existing tests break?
|
|
242
|
+
7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
243
|
+
behavior (state changes, data loaded, navigation works) or just check
|
|
244
|
+
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
245
|
+
8. **Cross-Domain Boundaries**: Test data flow across EVERY domain boundary.
|
|
246
|
+
Does data arriving from domain A get validated by domain B? What happens
|
|
247
|
+
when domain A sends malformed data that passed A's own validation?
|
|
248
|
+
|
|
249
|
+
## Report Format
|
|
250
|
+
|
|
251
|
+
For each bug found:
|
|
252
|
+
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
253
|
+
- **Reproduction**: {exact steps to reproduce}
|
|
254
|
+
- **Expected**: {what should happen}
|
|
255
|
+
- **Actual**: {what actually happens}
|
|
256
|
+
- **Proof**: {test file or command that demonstrates the bug}
|
|
257
|
+
|
|
258
|
+
Summary:
|
|
259
|
+
- BUGS FOUND: {count} (with severity breakdown)
|
|
260
|
+
- COVERAGE GAPS: {untested flows from requirements}
|
|
261
|
+
- SHALLOW TESTS REWRITTEN: {count}
|
|
262
|
+
- CONTRACTS VERIFIED: {N}/{total}
|
|
263
|
+
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
264
|
+
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
265
|
+
|
|
266
|
+
Write all findings to .gsd-t/red-team-report.md.
|
|
267
|
+
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
After subagent returns — run via Bash:
|
|
271
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
272
|
+
Compute tokens and compaction:
|
|
273
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
274
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
275
|
+
Append to `.gsd-t/token-log.md`:
|
|
276
|
+
`| {DT_START} | {DT_END} | gsd-t-integrate | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
277
|
+
|
|
278
|
+
**If Red Team VERDICT is FAIL:**
|
|
279
|
+
1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
|
|
280
|
+
2. Re-run Red Team after fixes
|
|
281
|
+
3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
|
|
282
|
+
|
|
283
|
+
**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
|
|
284
|
+
|
|
285
|
+
## Step 8: Doc-Ripple (Automated)
|
|
202
286
|
|
|
203
287
|
After all integration work is committed but before reporting completion:
|
|
204
288
|
|
|
@@ -218,7 +302,7 @@ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
|
218
302
|
|
|
219
303
|
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
220
304
|
|
|
221
|
-
## Step
|
|
305
|
+
## Step 9: Handle Integration Issues
|
|
222
306
|
|
|
223
307
|
For each issue found:
|
|
224
308
|
1. Determine if it's a contract gap (missing specification) or implementation bug
|
|
@@ -226,7 +310,7 @@ For each issue found:
|
|
|
226
310
|
3. **Implementation bug**: Fix it directly, document the fix
|
|
227
311
|
4. Log everything in progress.md
|
|
228
312
|
|
|
229
|
-
## Step
|
|
313
|
+
## Step 10: Update State
|
|
230
314
|
|
|
231
315
|
Update `.gsd-t/progress.md`:
|
|
232
316
|
- Set status to `INTEGRATED`
|
package/commands/gsd-t-quick.md
CHANGED
|
@@ -41,10 +41,16 @@ if [ -d "$STACKS_DIR" ]; then
|
|
|
41
41
|
grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && _add redux.md
|
|
42
42
|
grep -q '"neo4j-driver"' package.json 2>/dev/null && _add neo4j.md
|
|
43
43
|
grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && _add postgresql.md
|
|
44
|
+
grep -qE '"(prisma|@prisma/client)"' package.json 2>/dev/null && _add prisma.md
|
|
45
|
+
grep -qE '"(bullmq|bull|amqplib|@aws-sdk/client-sqs|bee-queue|agenda)"' package.json 2>/dev/null && _add queues.md
|
|
46
|
+
grep -qE '"(openai|anthropic|@anthropic-ai/sdk|langchain|llama-index|@google/generative-ai)"' package.json 2>/dev/null && _add llm.md
|
|
44
47
|
fi
|
|
45
48
|
([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && _add python.md
|
|
46
49
|
([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && _add postgresql.md
|
|
47
50
|
([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && _add neo4j.md
|
|
51
|
+
([ -f "requirements.txt" ] && grep -q "fastapi" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "fastapi" pyproject.toml 2>/dev/null) && _add fastapi.md
|
|
52
|
+
([ -f "requirements.txt" ] && grep -qE "(celery|dramatiq|rq|arq)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(celery|dramatiq|rq|arq)" pyproject.toml 2>/dev/null) && _add queues.md
|
|
53
|
+
([ -f "requirements.txt" ] && grep -qE "(openai|anthropic|langchain|llama.index)" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -qE "(openai|anthropic|langchain|llama.index)" pyproject.toml 2>/dev/null) && _add llm.md
|
|
48
54
|
[ -f "pubspec.yaml" ] && _add flutter.md
|
|
49
55
|
[ -f "Dockerfile" ] && _add docker.md
|
|
50
56
|
[ -d ".github/workflows" ] && _add github-actions.md
|
|
@@ -201,6 +207,87 @@ Quick does not mean skip testing. Before committing:
|
|
|
201
207
|
- If a contract exists for the interface touched, does the code still match?
|
|
202
208
|
4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
|
|
203
209
|
|
|
210
|
+
## Step 5.5: Red Team — Adversarial QA (MANDATORY)
|
|
211
|
+
|
|
212
|
+
After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.
|
|
213
|
+
|
|
214
|
+
⚙ [{model}] Red Team → adversarial validation of quick task
|
|
215
|
+
|
|
216
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
217
|
+
Before spawning — run via Bash:
|
|
218
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
Task subagent (general-purpose, model: sonnet):
|
|
222
|
+
"You are a Red Team QA adversary. Your job is to BREAK the code that was just changed.
|
|
223
|
+
|
|
224
|
+
Your value is measured by REAL bugs found. More bugs = more value.
|
|
225
|
+
If you find zero bugs, you must prove you were thorough — list every
|
|
226
|
+
attack vector you tried and why it didn't break. A short list means
|
|
227
|
+
you didn't try hard enough.
|
|
228
|
+
|
|
229
|
+
Rules:
|
|
230
|
+
- False positives DESTROY your credibility. If you report something
|
|
231
|
+
as a bug and it's actually correct behavior, that's worse than
|
|
232
|
+
missing a real bug. Never report something you haven't reproduced.
|
|
233
|
+
- Style opinions are not bugs. Theoretical concerns are not bugs.
|
|
234
|
+
A bug is: 'I did X, expected Y, got Z.' With proof.
|
|
235
|
+
- You are done ONLY when you have exhausted every category below
|
|
236
|
+
and either found a bug or documented exactly what you tried.
|
|
237
|
+
|
|
238
|
+
## Attack Categories (exhaust ALL of these)
|
|
239
|
+
|
|
240
|
+
1. **Contract Violations**: Read .gsd-t/contracts/. Does the code EXACTLY
|
|
241
|
+
match every contract? Test each endpoint/interface/schema shape.
|
|
242
|
+
2. **Boundary Inputs**: Empty strings, null, undefined, huge payloads,
|
|
243
|
+
special characters, SQL injection attempts, XSS payloads, path traversal.
|
|
244
|
+
3. **State Transitions**: What happens when actions are performed out of
|
|
245
|
+
order? Double-submit? Concurrent access? Refresh mid-flow?
|
|
246
|
+
4. **Error Paths**: Remove env vars. Kill the database. Send malformed
|
|
247
|
+
requests. Does the code handle failures gracefully or crash?
|
|
248
|
+
5. **Missing Flows**: Read docs/requirements.md. Are there user flows that
|
|
249
|
+
exist in requirements but have NO test coverage? Write tests for them.
|
|
250
|
+
6. **Regression**: Run the FULL test suite. Did any existing tests break?
|
|
251
|
+
7. **E2E Functional Gaps**: Review ALL Playwright specs. Do they test actual
|
|
252
|
+
behavior (state changes, data loaded, navigation works) or just check
|
|
253
|
+
that elements exist? Flag and rewrite any shallow/layout tests.
|
|
254
|
+
|
|
255
|
+
## Report Format
|
|
256
|
+
|
|
257
|
+
For each bug found:
|
|
258
|
+
- **BUG-{N}**: {severity: CRITICAL/HIGH/MEDIUM/LOW}
|
|
259
|
+
- **Reproduction**: {exact steps to reproduce}
|
|
260
|
+
- **Expected**: {what should happen}
|
|
261
|
+
- **Actual**: {what actually happens}
|
|
262
|
+
- **Proof**: {test file or command that demonstrates the bug}
|
|
263
|
+
|
|
264
|
+
Summary:
|
|
265
|
+
- BUGS FOUND: {count} (with severity breakdown)
|
|
266
|
+
- COVERAGE GAPS: {untested flows from requirements}
|
|
267
|
+
- SHALLOW TESTS REWRITTEN: {count}
|
|
268
|
+
- CONTRACTS VERIFIED: {N}/{total}
|
|
269
|
+
- ATTACK VECTORS TRIED: {list every category attempted and results}
|
|
270
|
+
- VERDICT: FAIL ({N} bugs found) | GRUDGING PASS (exhaustive search, nothing found)
|
|
271
|
+
|
|
272
|
+
Write all findings to .gsd-t/red-team-report.md.
|
|
273
|
+
If bugs found, also append to .gsd-t/qa-issues.md."
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
After subagent returns — run via Bash:
|
|
277
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
278
|
+
Compute tokens and compaction:
|
|
279
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
280
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
281
|
+
Append to `.gsd-t/token-log.md`:
|
|
282
|
+
`| {DT_START} | {DT_END} | gsd-t-quick | Red Team | sonnet | {DURATION}s | {VERDICT} — {N} bugs found | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
283
|
+
|
|
284
|
+
**If Red Team VERDICT is FAIL:**
|
|
285
|
+
1. Fix all CRITICAL and HIGH bugs immediately (up to 2 fix attempts per bug)
|
|
286
|
+
2. Re-run Red Team after fixes
|
|
287
|
+
3. If bugs persist after 2 fix cycles, log to `.gsd-t/deferred-items.md` and present to user
|
|
288
|
+
|
|
289
|
+
**If Red Team VERDICT is GRUDGING PASS:** Proceed to doc-ripple.
|
|
290
|
+
|
|
204
291
|
## Step 6: Doc-Ripple (Automated)
|
|
205
292
|
|
|
206
293
|
After all work is committed but before reporting completion:
|
package/docs/GSD-T-README.md
CHANGED
|
@@ -103,6 +103,7 @@ GSD-T reads all state files and tells you exactly where you left off.
|
|
|
103
103
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning, stack rules injection | In wave |
|
|
104
104
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
105
105
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
106
|
+
| *Red Team* | Adversarial QA — spawns after QA passes to find bugs the builder missed | Auto-spawned |
|
|
106
107
|
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
107
108
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
108
109
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward verification → auto-invokes complete-milestone | In wave |
|
|
@@ -154,8 +155,9 @@ GSD-T reads all state files and tells you exactly where you left off.
|
|
|
154
155
|
│ │ └──────┐ │
|
|
155
156
|
│ │ ▼ │
|
|
156
157
|
│ │ ┌───────────────────┐ │
|
|
157
|
-
│ │ │
|
|
158
|
-
│ │ │
|
|
158
|
+
│ │ │ QA + Red Team │ │
|
|
159
|
+
│ │ │ (after each phase │ │
|
|
160
|
+
│ │ │ that writes code)│ │
|
|
159
161
|
│ │ └───────────────────┘ │
|
|
160
162
|
│ ▼ │
|
|
161
163
|
│ verify+complete ◄──────────── integrate ◄──────────────────────┘ │
|
|
@@ -172,9 +174,9 @@ GSD-T reads all state files and tells you exactly where you left off.
|
|
|
172
174
|
| **Discuss** | Explore design decisions | Both |
|
|
173
175
|
| **Plan** | Create atomic task lists | Solo (always) |
|
|
174
176
|
| **Impact** | Downstream effect analysis | Solo |
|
|
175
|
-
| **Execute** | Build it | Both |
|
|
177
|
+
| **Execute** | Build it (+ Red Team adversarial QA) | Both |
|
|
176
178
|
| **Test-Sync** | Maintain test coverage | Solo |
|
|
177
|
-
| **Integrate** | Wire domains together | Solo (always) |
|
|
179
|
+
| **Integrate** | Wire domains together (+ Red Team adversarial QA) | Solo (always) |
|
|
178
180
|
| **Verify** | Quality gates | Both |
|
|
179
181
|
| **Complete** | Archive + tag | Solo |
|
|
180
182
|
|
|
@@ -231,20 +233,42 @@ GSD-T auto-detects your project's tech stack and injects mandatory best-practice
|
|
|
231
233
|
### How It Works
|
|
232
234
|
|
|
233
235
|
1. At subagent spawn time, GSD-T reads project manifest files to detect the active stack(s).
|
|
234
|
-
2. Universal rules (`templates/stacks/_security.md`) are **always** injected.
|
|
236
|
+
2. Universal rules (`templates/stacks/_security.md`, `_auth.md`) are **always** injected.
|
|
235
237
|
3. Stack-specific rules are injected when the corresponding stack is detected.
|
|
236
|
-
4.
|
|
238
|
+
4. Project-level overrides in `.gsd-t/stacks/` replace global files of the same name.
|
|
239
|
+
5. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
|
|
237
240
|
|
|
238
|
-
### Stack Detection
|
|
241
|
+
### Stack Detection (27 files)
|
|
239
242
|
|
|
240
|
-
| Project File | Detected Stack |
|
|
243
|
+
| Project File | Detected Stack(s) |
|
|
241
244
|
|---|---|
|
|
242
|
-
| `
|
|
243
|
-
| `package.json` with `"
|
|
244
|
-
| `package.json` with `"
|
|
245
|
-
| `
|
|
246
|
-
| `
|
|
247
|
-
| `
|
|
245
|
+
| *(always)* | `_security.md`, `_auth.md` |
|
|
246
|
+
| `package.json` with `"react"` | `react.md` |
|
|
247
|
+
| `package.json` with `"react-native"` | `react-native.md` |
|
|
248
|
+
| `package.json` with `"next"` | `nextjs.md` |
|
|
249
|
+
| `package.json` with `"vue"` | `vue.md` |
|
|
250
|
+
| `package.json` with `"typescript"` or `tsconfig.json` | `typescript.md` |
|
|
251
|
+
| `package.json` with `"tailwindcss"` | `tailwind.md` |
|
|
252
|
+
| `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | `node-api.md`, `rest-api.md` |
|
|
253
|
+
| `package.json` with `"vite"` | `vite.md` |
|
|
254
|
+
| `package.json` with `"@supabase/supabase-js"` | `supabase.md` |
|
|
255
|
+
| `package.json` with `"firebase"` | `firebase.md` |
|
|
256
|
+
| `package.json` with `"graphql"` or `"@apollo/server"` | `graphql.md` |
|
|
257
|
+
| `package.json` with `"zustand"` | `zustand.md` |
|
|
258
|
+
| `package.json` with `"@reduxjs/toolkit"` | `redux.md` |
|
|
259
|
+
| `package.json` with `"prisma"` or `"@prisma/client"` | `prisma.md` |
|
|
260
|
+
| `package.json` with `"pg"`, `"knex"`, or `"drizzle-orm"` | `postgresql.md` |
|
|
261
|
+
| `package.json` with `"neo4j-driver"` | `neo4j.md` |
|
|
262
|
+
| `package.json` with `"bullmq"`, `"bull"`, `"amqplib"`, or `"@aws-sdk/client-sqs"` | `queues.md` |
|
|
263
|
+
| `package.json` with `"openai"`, `"anthropic"`, `"langchain"` | `llm.md` |
|
|
264
|
+
| `Dockerfile` or `compose.yaml` | `docker.md` |
|
|
265
|
+
| `.github/workflows/*.yml` | `github-actions.md` |
|
|
266
|
+
| `playwright.config.*` | `playwright.md` |
|
|
267
|
+
| `requirements.txt` or `pyproject.toml` | `python.md` |
|
|
268
|
+
| `requirements.txt` with `fastapi` | `fastapi.md` |
|
|
269
|
+
| `requirements.txt` with `celery`, `dramatiq`, `rq`, or `arq` | `queues.md` |
|
|
270
|
+
| `requirements.txt` with `openai`, `anthropic`, `langchain` | `llm.md` |
|
|
271
|
+
| `pubspec.yaml` | `flutter.md` |
|
|
248
272
|
|
|
249
273
|
### Commands That Inject Stack Rules
|
|
250
274
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tekyzinc/gsd-t",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.51.10",
|
|
4
4
|
"description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
|
|
5
5
|
"author": "Tekyz, Inc.",
|
|
6
6
|
"license": "MIT",
|
|
@@ -306,6 +306,26 @@ Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract:
|
|
|
306
306
|
|
|
307
307
|
**QA failure OR shallow tests found blocks phase completion.** Lead cannot proceed until QA reports PASS with zero shallow tests, or user explicitly overrides.
|
|
308
308
|
|
|
309
|
+
## Red Team — Adversarial QA (Mandatory)
|
|
310
|
+
|
|
311
|
+
After QA passes, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
|
|
312
|
+
|
|
313
|
+
**Red Team method by command:**
|
|
314
|
+
- `execute` → spawns Red Team after all domain tasks pass (Step 5.5)
|
|
315
|
+
- `integrate` → spawns Red Team after integration tests pass (Step 7.5)
|
|
316
|
+
- `quick` → spawns Red Team after Test & Verify passes (Step 5.5)
|
|
317
|
+
- `debug` → spawns Red Team after fix verification passes (Step 5.3)
|
|
318
|
+
- `wave` → each phase agent handles Red Team per the rules above
|
|
319
|
+
|
|
320
|
+
**Key Red Team rules:**
|
|
321
|
+
- **Inverted incentive**: More bugs found = more value. Zero bugs requires exhaustive proof of thoroughness.
|
|
322
|
+
- **False positive penalty**: Reporting non-bugs destroys credibility. Every bug must be reproduced with proof.
|
|
323
|
+
- **Exhaustive categories**: Contract violations, boundary inputs, state transitions, error paths, missing flows, regression, E2E functional gaps — all must be attempted.
|
|
324
|
+
- **VERDICT**: `FAIL` (bugs found — blocks completion) or `GRUDGING PASS` (exhaustive search, nothing found).
|
|
325
|
+
- **Report**: Written to `.gsd-t/red-team-report.md`; bugs also appended to `.gsd-t/qa-issues.md`.
|
|
326
|
+
|
|
327
|
+
**Red Team FAIL blocks phase completion.** CRITICAL/HIGH bugs must be fixed (up to 2 fix cycles). If bugs persist, they are logged to `.gsd-t/deferred-items.md` and presented to the user.
|
|
328
|
+
|
|
309
329
|
## Model Display (MANDATORY)
|
|
310
330
|
|
|
311
331
|
**Before every subagent spawn, display the model being used to the user:**
|