@tekyzinc/gsd-t 2.45.11 → 2.50.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +23 -0
- package/README.md +26 -5
- package/bin/debug-ledger.js +193 -0
- package/bin/gsd-t.js +259 -1
- package/commands/gsd-t-complete-milestone.md +2 -1
- package/commands/gsd-t-debug.md +48 -2
- package/commands/gsd-t-doc-ripple.md +148 -0
- package/commands/gsd-t-execute.md +102 -5
- package/commands/gsd-t-help.md +25 -2
- package/commands/gsd-t-integrate.md +41 -1
- package/commands/gsd-t-qa.md +26 -5
- package/commands/gsd-t-quick.md +39 -1
- package/commands/gsd-t-test-sync.md +26 -1
- package/commands/gsd-t-verify.md +8 -2
- package/commands/gsd-t-wave.md +57 -0
- package/docs/GSD-T-README.md +84 -1
- package/docs/architecture.md +9 -1
- package/docs/framework-comparison-scorecard.md +160 -0
- package/docs/requirements.md +33 -0
- package/examples/rules/desktop.ini +2 -0
- package/package.json +2 -2
- package/templates/CLAUDE-global.md +82 -4
- package/templates/stacks/_security.md +243 -0
- package/templates/stacks/desktop.ini +2 -0
- package/templates/stacks/docker.md +202 -0
- package/templates/stacks/firebase.md +166 -0
- package/templates/stacks/flutter.md +205 -0
- package/templates/stacks/github-actions.md +201 -0
- package/templates/stacks/graphql.md +216 -0
- package/templates/stacks/neo4j.md +218 -0
- package/templates/stacks/nextjs.md +184 -0
- package/templates/stacks/node-api.md +196 -0
- package/templates/stacks/playwright.md +528 -0
- package/templates/stacks/postgresql.md +225 -0
- package/templates/stacks/python.md +243 -0
- package/templates/stacks/react-native.md +216 -0
- package/templates/stacks/react.md +293 -0
- package/templates/stacks/redux.md +193 -0
- package/templates/stacks/rest-api.md +202 -0
- package/templates/stacks/supabase.md +188 -0
- package/templates/stacks/tailwind.md +169 -0
- package/templates/stacks/typescript.md +176 -0
- package/templates/stacks/vite.md +176 -0
- package/templates/stacks/vue.md +189 -0
- package/templates/stacks/zustand.md +203 -0
package/commands/gsd-t-quick.md
CHANGED
|
@@ -12,12 +12,29 @@ To give this task a fresh context window and prevent compaction during consecuti
|
|
|
12
12
|
Before spawning — run via Bash:
|
|
13
13
|
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
14
14
|
|
|
15
|
+
**Stack Rules Detection (before spawning subagent):**
|
|
16
|
+
Run via Bash to detect project stack and collect matching rules:
|
|
17
|
+
`GSD_T_DIR=$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t; STACKS_DIR="$GSD_T_DIR/templates/stacks"; STACK_RULES=""; if [ -d "$STACKS_DIR" ]; then for f in "$STACKS_DIR"/_*.md; do [ -f "$f" ] && STACK_RULES="${STACK_RULES}$(cat "$f")"$'\n\n'; done; if [ -f "package.json" ]; then grep -q '"react-native"' package.json 2>/dev/null && [ -f "$STACKS_DIR/react-native.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/react-native.md")"$'\n\n'; grep -q '"react"' package.json 2>/dev/null && ! grep -q '"react-native"' package.json 2>/dev/null && [ -f "$STACKS_DIR/react.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/react.md")"$'\n\n'; grep -q '"next"' package.json 2>/dev/null && [ -f "$STACKS_DIR/nextjs.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/nextjs.md")"$'\n\n'; grep -q '"vue"' package.json 2>/dev/null && [ -f "$STACKS_DIR/vue.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/vue.md")"$'\n\n'; (grep -q '"typescript"' package.json 2>/dev/null || [ -f "tsconfig.json" ]) && [ -f "$STACKS_DIR/typescript.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/typescript.md")"$'\n\n'; grep -qE '"(express|fastify|hono|koa)"' package.json 2>/dev/null && [ -f "$STACKS_DIR/node-api.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/node-api.md")"$'\n\n'; grep -q '"tailwindcss"' package.json 2>/dev/null && [ -f "$STACKS_DIR/tailwind.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/tailwind.md")"$'\n\n'; grep -q '"vite"' package.json 2>/dev/null && [ -f "$STACKS_DIR/vite.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/vite.md")"$'\n\n'; grep -q '"@supabase/supabase-js"' package.json 2>/dev/null && [ -f "$STACKS_DIR/supabase.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/supabase.md")"$'\n\n'; grep -q '"firebase"' package.json 2>/dev/null && [ -f "$STACKS_DIR/firebase.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/firebase.md")"$'\n\n'; grep -qE '"(graphql|@apollo/client|urql)"' package.json 2>/dev/null && [ -f "$STACKS_DIR/graphql.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/graphql.md")"$'\n\n'; grep -q '"zustand"' package.json 2>/dev/null && [ -f "$STACKS_DIR/zustand.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/zustand.md")"$'\n\n'; grep -q '"@reduxjs/toolkit"' package.json 2>/dev/null && [ -f "$STACKS_DIR/redux.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/redux.md")"$'\n\n'; grep -q '"neo4j-driver"' package.json 2>/dev/null && [ -f "$STACKS_DIR/neo4j.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/neo4j.md")"$'\n\n'; grep -qE '"(pg|prisma|drizzle-orm|knex)"' package.json 2>/dev/null && [ -f "$STACKS_DIR/postgresql.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/postgresql.md")"$'\n\n'; grep -qE '"(express|fastify|hono|koa)"' package.json 2>/dev/null && [ -f "$STACKS_DIR/rest-api.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/rest-api.md")"$'\n\n'; fi; ([ -f "requirements.txt" ] || [ -f "pyproject.toml" ] || [ -f "Pipfile" ]) && [ -f "$STACKS_DIR/python.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/python.md")"$'\n\n'; ([ -f "requirements.txt" ] && grep -q "psycopg" requirements.txt 2>/dev/null || [ -f "pyproject.toml" ] && grep -q "psycopg" pyproject.toml 2>/dev/null) && [ -f "$STACKS_DIR/postgresql.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/postgresql.md")"$'\n\n'; ([ -f "requirements.txt" ] && grep -q "neo4j" requirements.txt 2>/dev/null) && [ -f "$STACKS_DIR/neo4j.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/neo4j.md")"$'\n\n'; [ -f "pubspec.yaml" ] && [ -f "$STACKS_DIR/flutter.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/flutter.md")"$'\n\n'; [ -f "Dockerfile" ] && [ -f "$STACKS_DIR/docker.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/docker.md")"$'\n\n'; [ -d ".github/workflows" ] && [ -f "$STACKS_DIR/github-actions.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/github-actions.md")"$'\n\n'; ([ -f "playwright.config.ts" ] || [ -f "playwright.config.js" ]) && [ -f "$STACKS_DIR/playwright.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/playwright.md")"$'\n\n'; [ -f "go.mod" ] && [ -f "$STACKS_DIR/go.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/go.md")"$'\n\n'; [ -f "Cargo.toml" ] && [ -f "$STACKS_DIR/rust.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/rust.md")"$'\n\n'; fi`
|
|
18
|
+
|
|
19
|
+
If STACK_RULES is non-empty, append to the subagent prompt:
|
|
20
|
+
```
|
|
21
|
+
## Stack Rules (MANDATORY — violations fail this task)
|
|
22
|
+
|
|
23
|
+
{STACK_RULES}
|
|
24
|
+
|
|
25
|
+
These standards have the same enforcement weight as contract compliance.
|
|
26
|
+
Violations are task failures, not warnings.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
If STACK_RULES is empty (no templates/stacks/ dir or no matches), skip silently.
|
|
30
|
+
|
|
15
31
|
Spawn a fresh subagent using the Task tool:
|
|
16
32
|
```
|
|
17
33
|
subagent_type: general-purpose
|
|
18
34
|
prompt: "You are running gsd-t-quick for this request: {$ARGUMENTS}
|
|
19
35
|
Working directory: {current project root}
|
|
20
|
-
Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-quick starting at Step 1.
|
|
36
|
+
Read CLAUDE.md and .gsd-t/progress.md for project context, then execute gsd-t-quick starting at Step 1.
|
|
37
|
+
{STACK_RULES block — if non-empty, append the ## Stack Rules section defined above; omit if empty}"
|
|
21
38
|
```
|
|
22
39
|
|
|
23
40
|
After subagent returns — run via Bash:
|
|
@@ -133,6 +150,7 @@ Quick does not mean skip testing. Before committing:
|
|
|
133
150
|
- Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
|
|
134
151
|
- Cover all modes/flags affected by this change
|
|
135
152
|
- "No feature code without test code" applies to quick tasks too
|
|
153
|
+
- **Functional tests only** — every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated). Tests that only check element existence (`isVisible`, `toBeEnabled`) are shallow/layout tests and are not acceptable. If a test would pass on an empty HTML page with the right IDs, rewrite it.
|
|
136
154
|
2. **Run ALL configured test suites** — not just affected tests, not just one suite:
|
|
137
155
|
a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
|
|
138
156
|
b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
|
|
@@ -145,6 +163,26 @@ Quick does not mean skip testing. Before committing:
|
|
|
145
163
|
- If a contract exists for the interface touched, does the code still match?
|
|
146
164
|
4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
|
|
147
165
|
|
|
166
|
+
## Step 6: Doc-Ripple (Automated)
|
|
167
|
+
|
|
168
|
+
After all work is committed but before reporting completion:
|
|
169
|
+
|
|
170
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
171
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
172
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
173
|
+
|
|
174
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
175
|
+
|
|
176
|
+
Task subagent (general-purpose, model: sonnet):
|
|
177
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
178
|
+
Git diff context: {files changed list}
|
|
179
|
+
Command that triggered: quick
|
|
180
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
181
|
+
Update all affected documents.
|
|
182
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
183
|
+
|
|
184
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
185
|
+
|
|
148
186
|
$ARGUMENTS
|
|
149
187
|
|
|
150
188
|
## Auto-Clear
|
|
@@ -151,6 +151,27 @@ If Playwright is configured (`playwright.config.*` or Playwright in dependencies
|
|
|
151
151
|
|
|
152
152
|
**This is NOT optional.** Every new code path that a user can reach must have a Playwright spec. "We'll add tests later" is never acceptable.
|
|
153
153
|
|
|
154
|
+
**FUNCTIONAL TESTS — NOT LAYOUT TESTS (MANDATORY):**
|
|
155
|
+
E2E specs that only check element existence (`isVisible`, `toBeAttached`, `toBeEnabled`) are
|
|
156
|
+
layout tests. Layout tests pass even when every feature is broken — they are worthless for QA.
|
|
157
|
+
|
|
158
|
+
Every Playwright assertion MUST verify **functional behavior** — that an action produced the
|
|
159
|
+
correct outcome:
|
|
160
|
+
- **Tab/navigation**: Click → assert the NEW content loaded (unique text, data, or elements
|
|
161
|
+
that only appear on the destination view). Never just assert the tab element exists.
|
|
162
|
+
- **Forms**: Fill → submit → assert success feedback AND data persisted (API call observed
|
|
163
|
+
via `page.waitForResponse`, or list/table updated with new entry).
|
|
164
|
+
- **Interactive widgets** (terminals, editors, code panels): Open → interact → assert the
|
|
165
|
+
widget responded (keystroke produced output, content was saved, command executed).
|
|
166
|
+
- **Connections** (WebSocket, SSE, polling): Assert status transitions ("Connecting" →
|
|
167
|
+
"Connected") and verify data flows through the connection.
|
|
168
|
+
- **State toggles** (dark mode, expand/collapse, enable/disable): Assert the EFFECT of the
|
|
169
|
+
toggle, not just that the toggle control exists.
|
|
170
|
+
- **Error handling**: Trigger error → assert error content → assert recovery path works.
|
|
171
|
+
|
|
172
|
+
**Rule: If a test would pass on an empty HTML page with the correct element IDs and no
|
|
173
|
+
JavaScript, it is not a functional test. Rewrite it.**
|
|
174
|
+
|
|
154
175
|
### D) Capture Results
|
|
155
176
|
For all test types:
|
|
156
177
|
- PASS: Test still valid
|
|
@@ -275,7 +296,11 @@ After each task completes:
|
|
|
275
296
|
2. **If new code paths have zero test coverage: write tests NOW** — do not defer
|
|
276
297
|
3. Run ALL affected unit/integration tests
|
|
277
298
|
4. Run ALL affected Playwright E2E tests
|
|
278
|
-
5. If failures: fix immediately (up to 2 attempts) before continuing
|
|
299
|
+
5. If failures: fix immediately (up to 2 attempts) before continuing. If both attempts fail:
|
|
300
|
+
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'test-sync-failure',error:'2 in-context fix attempts exhausted',hypothesis:'see test-coverage.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
301
|
+
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
302
|
+
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
303
|
+
4. Exit code 0 → tests pass, continue; 1/4 → log to `.gsd-t/deferred-items.md`, report failure; 3 → report error
|
|
279
304
|
6. If E2E specs are missing for new features/modes/flows: **create them NOW**, not later
|
|
280
305
|
7. If E2E specs need updating for changed behavior: update them before continuing
|
|
281
306
|
8. **No task is complete until its tests exist and pass** — do not move to the next task with test gaps
|
package/commands/gsd-t-verify.md
CHANGED
|
@@ -104,7 +104,8 @@ Work through each dimension sequentially. For each:
|
|
|
104
104
|
- Confirm specs cover: happy path, error states, edge cases, all modes/flags
|
|
105
105
|
- If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
|
|
106
106
|
- **Missing E2E coverage on new functionality = verification FAIL**
|
|
107
|
-
5.
|
|
107
|
+
5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
|
|
108
|
+
6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
|
|
108
109
|
|
|
109
110
|
### Team Mode (when agent teams are enabled)
|
|
110
111
|
```
|
|
@@ -308,7 +309,12 @@ Update `.gsd-t/progress.md`:
|
|
|
308
309
|
|
|
309
310
|
**All Levels**:
|
|
310
311
|
- VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
|
|
311
|
-
- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts
|
|
312
|
+
- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts:
|
|
313
|
+
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
314
|
+
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
315
|
+
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
316
|
+
4. Exit code 0 → re-run verification; 1/4 → log to `.gsd-t/deferred-items.md`, STOP and report to user; 3 → report error
|
|
317
|
+
**Level 1–2**: Return to execute phase for remediation tasks.
|
|
312
318
|
|
|
313
319
|
## Document Ripple
|
|
314
320
|
|
package/commands/gsd-t-wave.md
CHANGED
|
@@ -52,6 +52,22 @@ If `.gsd-t/graph/meta.json` exists, the code graph is available for all phases.
|
|
|
52
52
|
|
|
53
53
|
For each phase, spawn the agent like this:
|
|
54
54
|
|
|
55
|
+
**Stack Rules Detection (before spawning subagent):**
|
|
56
|
+
Run via Bash to detect project stack and collect matching rules:
|
|
57
|
+
`GSD_T_DIR=$(npm root -g 2>/dev/null)/@tekyzinc/gsd-t; STACKS_DIR="$GSD_T_DIR/templates/stacks"; STACK_RULES=""; if [ -d "$STACKS_DIR" ]; then for f in "$STACKS_DIR"/_*.md; do [ -f "$f" ] && STACK_RULES="${STACK_RULES}$(cat "$f")"$'\n\n'; done; if [ -f "package.json" ]; then grep -q '"react"' package.json 2>/dev/null && [ -f "$STACKS_DIR/react.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/react.md")"$'\n\n'; (grep -q '"typescript"' package.json 2>/dev/null || [ -f "tsconfig.json" ]) && [ -f "$STACKS_DIR/typescript.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/typescript.md")"$'\n\n'; grep -qE '"(express|fastify|hono|koa)"' package.json 2>/dev/null && [ -f "$STACKS_DIR/node-api.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/node-api.md")"$'\n\n'; fi; [ -f "requirements.txt" ] || [ -f "pyproject.toml" ] && [ -f "$STACKS_DIR/python.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/python.md")"$'\n\n'; [ -f "go.mod" ] && [ -f "$STACKS_DIR/go.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/go.md")"$'\n\n'; [ -f "Cargo.toml" ] && [ -f "$STACKS_DIR/rust.md" ] && STACK_RULES="${STACK_RULES}$(cat "$STACKS_DIR/rust.md")"$'\n\n'; fi`
|
|
58
|
+
|
|
59
|
+
If STACK_RULES is non-empty, append to the subagent prompt:
|
|
60
|
+
```
|
|
61
|
+
## Stack Rules (MANDATORY — violations fail this task)
|
|
62
|
+
|
|
63
|
+
{STACK_RULES}
|
|
64
|
+
|
|
65
|
+
These standards have the same enforcement weight as contract compliance.
|
|
66
|
+
Violations are task failures, not warnings.
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
If STACK_RULES is empty (no templates/stacks/ dir or no matches), skip silently.
|
|
70
|
+
|
|
55
71
|
**OBSERVABILITY LOGGING (MANDATORY) — repeat for every phase spawn:**
|
|
56
72
|
Before spawning — run via Bash:
|
|
57
73
|
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
@@ -84,6 +100,17 @@ Compute context utilization — run via Bash:
|
|
|
84
100
|
Alert on context thresholds (display to user inline):
|
|
85
101
|
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
86
102
|
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
103
|
+
|
|
104
|
+
**Orchestrator Context Self-Check (MANDATORY):**
|
|
105
|
+
After EVERY phase agent returns, check the wave orchestrator's own context:
|
|
106
|
+
- **If CTX_PCT >= 70:**
|
|
107
|
+
1. Save checkpoint to `.gsd-t/progress.md` — record which phases are complete, which remain
|
|
108
|
+
2. Output: `⚠️ Wave orchestrator context at {CTX_PCT}% — approaching limit. Progress saved. Run /clear then /user:gsd-t-wave to continue from the next phase.`
|
|
109
|
+
3. **STOP the wave loop.** Do NOT spawn the next phase agent. The next session resumes from saved state.
|
|
110
|
+
- **If CTX_PCT < 70:** Continue to next phase.
|
|
111
|
+
|
|
112
|
+
This prevents the wave orchestrator from running out of context mid-wave.
|
|
113
|
+
|
|
87
114
|
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
88
115
|
`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
89
116
|
|
|
@@ -149,6 +176,26 @@ Spawn agent → `commands/gsd-t-verify.md`
|
|
|
149
176
|
📋 Phase 8 (VERIFY+COMPLETE): {N} gates passed | Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings
|
|
150
177
|
```
|
|
151
178
|
|
|
179
|
+
#### 9. DOC-RIPPLE (Automated — after verify+complete)
|
|
180
|
+
|
|
181
|
+
After the final phase completes but before wave reports done:
|
|
182
|
+
|
|
183
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
184
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed
|
|
185
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
186
|
+
|
|
187
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
188
|
+
|
|
189
|
+
Task subagent (general-purpose, model: sonnet):
|
|
190
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
191
|
+
Git diff context: {files changed list}
|
|
192
|
+
Command that triggered: wave
|
|
193
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
194
|
+
Update all affected documents.
|
|
195
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
196
|
+
|
|
197
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
198
|
+
|
|
152
199
|
### Between Each Phase
|
|
153
200
|
|
|
154
201
|
After each agent completes, run this spot-check before proceeding:
|
|
@@ -234,6 +281,11 @@ If the user interrupts or a phase agent fails:
|
|
|
234
281
|
- Report blocking issues to user
|
|
235
282
|
|
|
236
283
|
**Level 3**: Spawn a remediation agent to fix blocking issues, then re-spawn impact agent. Max 2 attempts.
|
|
284
|
+
If both attempts fail:
|
|
285
|
+
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'impact-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see impact-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
286
|
+
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
287
|
+
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
288
|
+
4. Exit code 0 → continue; 1/4 → log to `.gsd-t/deferred-items.md`, report to user; 3 → report error
|
|
237
289
|
**Level 1–2**: Ask user for direction.
|
|
238
290
|
|
|
239
291
|
### If tests fail during execute:
|
|
@@ -245,6 +297,11 @@ If the user interrupts or a phase agent fails:
|
|
|
245
297
|
- Read verify report for failure details
|
|
246
298
|
|
|
247
299
|
**Level 3**: Spawn remediation agent, then re-spawn verify agent. Max 2 attempts.
|
|
300
|
+
If both attempts fail:
|
|
301
|
+
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
302
|
+
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
303
|
+
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
304
|
+
4. Exit code 0 → re-spawn verify agent; 1/4 → log to `.gsd-t/deferred-items.md`, report to user; 3 → report error
|
|
248
305
|
**Level 1–2**: Ask user for direction.
|
|
249
306
|
|
|
250
307
|
## Why Agent-Per-Phase
|
package/docs/GSD-T-README.md
CHANGED
|
@@ -100,9 +100,10 @@ GSD-T reads all state files and tells you exactly where you left off.
|
|
|
100
100
|
| `/user:gsd-t-discuss` | Multi-perspective design exploration | In wave |
|
|
101
101
|
| `/user:gsd-t-plan` | Create atomic task lists per domain (tasks auto-split to fit one context window) | In wave |
|
|
102
102
|
| `/user:gsd-t-impact` | Analyze downstream effects | In wave |
|
|
103
|
-
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
|
|
103
|
+
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning, stack rules injection | In wave |
|
|
104
104
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
105
105
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
106
|
+
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
106
107
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
107
108
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward verification → auto-invokes complete-milestone | In wave |
|
|
108
109
|
| `/user:gsd-t-complete-milestone` | Archive + git tag (auto-invoked by verify, also standalone) | In wave |
|
|
@@ -223,6 +224,88 @@ your-project/
|
|
|
223
224
|
|
|
224
225
|
---
|
|
225
226
|
|
|
227
|
+
## Stack Rules Engine
|
|
228
|
+
|
|
229
|
+
GSD-T auto-detects your project's tech stack and injects mandatory best-practice rules into subagent prompts at execute-time. This ensures stack conventions are enforced at the same weight as contract compliance — violations are task failures, not warnings.
|
|
230
|
+
|
|
231
|
+
### How It Works
|
|
232
|
+
|
|
233
|
+
1. At subagent spawn time, GSD-T reads project manifest files to detect the active stack(s).
|
|
234
|
+
2. Universal rules (`templates/stacks/_security.md`) are **always** injected.
|
|
235
|
+
3. Stack-specific rules are injected when the corresponding stack is detected.
|
|
236
|
+
4. Rules are appended to the subagent prompt as a `## Stack Rules (MANDATORY)` section.
|
|
237
|
+
|
|
238
|
+
### Stack Detection
|
|
239
|
+
|
|
240
|
+
| Project File | Detected Stack |
|
|
241
|
+
|---|---|
|
|
242
|
+
| `package.json` with `"react"` | React |
|
|
243
|
+
| `package.json` with `"typescript"` or `tsconfig.json` | TypeScript |
|
|
244
|
+
| `package.json` with `"express"`, `"fastify"`, `"hono"`, or `"koa"` | Node API |
|
|
245
|
+
| `requirements.txt` or `pyproject.toml` | Python |
|
|
246
|
+
| `go.mod` | Go |
|
|
247
|
+
| `Cargo.toml` | Rust |
|
|
248
|
+
|
|
249
|
+
### Commands That Inject Stack Rules
|
|
250
|
+
|
|
251
|
+
`gsd-t-execute`, `gsd-t-quick`, `gsd-t-integrate`, `gsd-t-wave`, `gsd-t-debug`
|
|
252
|
+
|
|
253
|
+
### Extending
|
|
254
|
+
|
|
255
|
+
Drop a `.md` file into `templates/stacks/` to add a new stack. Files prefixed with `_` are universal (always injected). Files without a prefix are stack-specific (injected only when detected). If the `stacks/` directory is missing, detection skips silently — no error.
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Headless Mode
|
|
260
|
+
|
|
261
|
+
Run GSD-T non-interactively in CI/CD pipelines or automated workflows.
|
|
262
|
+
|
|
263
|
+
### headless exec
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
gsd-t headless verify --json --timeout=1200 # Run verify non-interactively
|
|
267
|
+
gsd-t headless execute --json # Execute tasks without interactive prompts
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### headless query
|
|
271
|
+
|
|
272
|
+
```bash
|
|
273
|
+
gsd-t headless query status # Project state — no LLM, <100ms
|
|
274
|
+
gsd-t headless query domains # Domain list with status
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
### headless debug-loop
|
|
278
|
+
|
|
279
|
+
Compaction-proof automated test-fix-retest cycles. Each iteration runs as a separate `claude -p` session with fresh context. A cumulative debug ledger (`.gsd-t/debug-state.jsonl`) preserves all hypothesis/fix/learning history across sessions. An anti-repetition preamble is injected into each session to prevent retrying failed approaches.
|
|
280
|
+
|
|
281
|
+
```bash
|
|
282
|
+
gsd-t headless --debug-loop [--max-iterations=N] [--test-cmd=CMD] [--fix-scope=PATTERN] [--json] [--log]
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
**Flags:**
|
|
286
|
+
|
|
287
|
+
| Flag | Default | Description |
|
|
288
|
+
|--------------------|---------|-------------|
|
|
289
|
+
| `--max-iterations` | 20 | Hard ceiling on iterations |
|
|
290
|
+
| `--test-cmd` | (auto) | Override test command (auto-detected from project) |
|
|
291
|
+
| `--fix-scope` | (all) | Limit fix scope to specific files or patterns |
|
|
292
|
+
| `--json` | false | Structured JSON output after each iteration |
|
|
293
|
+
| `--log` | false | Write per-iteration logs to `.gsd-t/` |
|
|
294
|
+
|
|
295
|
+
**Escalation tiers:**
|
|
296
|
+
|
|
297
|
+
| Iterations | Model | Behavior |
|
|
298
|
+
|------------|--------|----------|
|
|
299
|
+
| 1–5 | sonnet | Standard debug — one fix per session |
|
|
300
|
+
| 6–15 | opus | Deeper reasoning — reads full ledger, may attempt multi-file fixes |
|
|
301
|
+
| 16–20 | STOP | Write full diagnostic summary, present to user, exit code 4 |
|
|
302
|
+
|
|
303
|
+
**Exit codes:** `0` all tests pass · `1` max iterations reached · `2` compaction error · `3` process error · `4` needs human decision
|
|
304
|
+
|
|
305
|
+
**Auto-escalation from commands:** `gsd-t-execute`, `gsd-t-test-sync`, `gsd-t-verify`, `gsd-t-debug`, and `gsd-t-wave` delegate to `--debug-loop` automatically after 2 failed in-context fix attempts.
|
|
306
|
+
|
|
307
|
+
---
|
|
308
|
+
|
|
226
309
|
## Key Principles
|
|
227
310
|
|
|
228
311
|
1. **Contracts are the source of truth.** Code implements contracts, not the other way around. If code and contract disagree, fix one or the other — never leave them inconsistent.
|
package/docs/architecture.md
CHANGED
|
@@ -16,7 +16,7 @@ The framework has no runtime — it is consumed entirely by Claude Code's slash
|
|
|
16
16
|
- **Purpose**: Install, update, diagnose, and manage GSD-T across projects
|
|
17
17
|
- **Location**: `bin/gsd-t.js` (1,798 lines, 90+ functions, all ≤ 30 lines)
|
|
18
18
|
- **Dependencies**: Node.js built-ins only (fs, path, os, child_process, https, crypto)
|
|
19
|
-
- **Subcommands**: install, update, status, doctor, init, uninstall, update-all, register, changelog, graph (index/status/query), headless (exec/query)
|
|
19
|
+
- **Subcommands**: install, update, status, doctor, init, uninstall, update-all, register, changelog, graph (index/status/query), headless (exec/query/--debug-loop)
|
|
20
20
|
- **Organization**: Configuration → Guard section → Helpers → Heartbeat → Commands → Install/Update → Init → Status → Uninstall → Update-All → Doctor → Register → Update Check → Help → Main dispatch
|
|
21
21
|
- **All functions ≤ 30 lines** (M6 refactoring). Largest: `doRegister()` at 30 lines, `summarize()` at 30 lines.
|
|
22
22
|
|
|
@@ -78,6 +78,14 @@ The framework has no runtime — it is consumed entirely by Claude Code's slash
|
|
|
78
78
|
- **Exit codes**: 0=success, 1=verify-fail, 2=context-budget-exceeded, 3=error, 4=blocked-needs-human
|
|
79
79
|
- **CI/CD examples**: `docs/ci-examples/github-actions.yml` (GitHub Actions), `docs/ci-examples/gitlab-ci.yml` (GitLab CI)
|
|
80
80
|
|
|
81
|
+
### Compaction-Proof Debug Loop (M29 — complete)
|
|
82
|
+
- **bin/debug-ledger.js** (193 lines): JSONL-based debug persistence layer. 6 exported functions: `readLedger`, `appendEntry`, `compactLedger`, `generateAntiRepetitionPreamble`, `getLedgerStats`, `clearLedger`. Ledger file: `.gsd-t/debug-state.jsonl` (11-field schema per entry). Compaction triggers at 50KB — haiku session condenses history, last 5 raw entries preserved. Anti-repetition preamble lists all STILL_FAILS hypotheses, current narrowing direction, and tests still failing. Zero external deps.
|
|
83
|
+
- **doHeadlessDebugLoop(flags)**: External iteration manager in `bin/gsd-t.js`. Runs test-fix-retest as separate `claude -p` sessions — each session starts with zero accumulated context. Escalation tiers: sonnet (iterations 1-5), opus (6-15), STOP with full diagnostic output (16-20). `--max-iterations N` flag (default 20) enforced by external process.
|
|
84
|
+
- **parseDebugLoopFlags(args)**: Extracts `--max-iterations`, `--test-cmd`, `--fix-scope`, `--json`, `--log` from args. Defaults: maxIterations=20.
|
|
85
|
+
- **getEscalationModel(iteration)**: Returns "sonnet" for 1-5, "opus" for 6-15, null for 16-20 (STOP tier).
|
|
86
|
+
- **Command integration**: execute, wave, test-sync, verify, debug all delegate fix-retest loops to `gsd-t headless --debug-loop` after 2 in-context fix attempts.
|
|
87
|
+
- **Exit codes (debug-loop specific)**: 0=all tests pass (ledger cleared), 1=max iterations reached, 3=process error, 4=escalation stop (needs human)
|
|
88
|
+
|
|
81
89
|
### Graph Engine (M20 — complete)
|
|
82
90
|
- **`bin/graph-store.js`** (147 lines): File-based graph storage in `.gsd-t/graph/`. 8 JSON files (index, calls, imports, contracts, requirements, tests, surfaces, meta). Read/write operations, MD5 file hashing for incremental indexing, staleness detection. Zero external deps. Note: no symlink protection (TD-099).
|
|
83
91
|
- **`bin/graph-parsers.js`** (327 lines): Language-specific entity parsers. JS/TS: function declarations, arrow functions, classes, methods, imports (ES/CJS), exports. Python: def/class/import. Regex-based (no Tree-sitter). Returns `{ entities, imports, calls }`.
|
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
# Framework Comparison Scorecard
|
|
2
|
+
|
|
3
|
+
**Purpose**: Unbiased comparison of development frameworks.
|
|
4
|
+
**Instructions**: Score each framework 1-5 per dimension. Use the rubric at the bottom. Equal weights — no dimension gaming.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Frameworks Being Compared
|
|
9
|
+
|
|
10
|
+
| Slot | Framework | Version/Variant | Evaluator | Date |
|
|
11
|
+
|------|-----------|-----------------|-----------|------|
|
|
12
|
+
| F1 | | | | |
|
|
13
|
+
| F2 | | | | |
|
|
14
|
+
| F3 | | | | |
|
|
15
|
+
| F4 | | | | |
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## A. Onboarding & Adoption (Dimensions 1-3)
|
|
20
|
+
|
|
21
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
22
|
+
|---|-----------|------------------|----|----|----|----|
|
|
23
|
+
| 1 | Time to first productive output | How quickly can someone go from choosing the framework to shipping real work? | | | | |
|
|
24
|
+
| 2 | Team adoption friction | How willing is a typical team to adopt it after initial exposure? | | | | |
|
|
25
|
+
| 3 | Works without specific tooling | Can it be used with any IDE, editor, or AI assistant? | | | | |
|
|
26
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
27
|
+
|
|
28
|
+
## B. Execution & Delivery (Dimensions 4-7)
|
|
29
|
+
|
|
30
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
31
|
+
|---|-----------|------------------|----|----|----|----|
|
|
32
|
+
| 4 | Defect prevention | How effectively does the framework prevent bugs from reaching production? | | | | |
|
|
33
|
+
| 5 | Throughput | How many features can be shipped per unit time? | | | | |
|
|
34
|
+
| 6 | Rework prevention | How well does the framework prevent completed work from needing redo? | | | | |
|
|
35
|
+
| 7 | Idea-to-deploy cycle time | How quickly can a concept move from idea to production? | | | | |
|
|
36
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
37
|
+
|
|
38
|
+
## C. Sustainability & Maintenance (Dimensions 8-11)
|
|
39
|
+
|
|
40
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
41
|
+
|---|-----------|------------------|----|----|----|----|
|
|
42
|
+
| 8 | New member ramp-up | How quickly can a new team member contribute independently? | | | | |
|
|
43
|
+
| 9 | Context recovery | How easily can work resume after an interruption of days or weeks? | | | | |
|
|
44
|
+
| 10 | Tech debt management | How well does the framework track and control technical debt? | | | | |
|
|
45
|
+
| 11 | Documentation freshness | How well does the framework keep documentation accurate and current? | | | | |
|
|
46
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
47
|
+
|
|
48
|
+
## D. Flexibility & Universality (Dimensions 12-15)
|
|
49
|
+
|
|
50
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
51
|
+
|---|-----------|------------------|----|----|----|----|
|
|
52
|
+
| 12 | Minimum viable process | Can you use a small portion of it and still get value? | | | | |
|
|
53
|
+
| 13 | Project type coverage | Does it work across web, mobile, data, infra, and non-code projects? | | | | |
|
|
54
|
+
| 14 | Team size range | Is it effective from 1-person teams to 50-person teams? | | | | |
|
|
55
|
+
| 15 | Overhead proportionality | Does ceremony scale with project size rather than being fixed? | | | | |
|
|
56
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
57
|
+
|
|
58
|
+
## E. Automation & AI-Agent Capabilities (Dimensions 16-19)
|
|
59
|
+
|
|
60
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
61
|
+
|---|-----------|------------------|----|----|----|----|
|
|
62
|
+
| 16 | Agentic workflow support | Does the framework enable AI agents to execute work autonomously (task dispatch, parallel execution, adaptive replanning)? | | | | |
|
|
63
|
+
| 17 | QA automation | Does the framework automate test generation, execution, and gap detection — not just run existing tests? | | | | |
|
|
64
|
+
| 18 | QA coverage enforcement | Does the framework enforce minimum coverage and block progress when tests are missing or failing? | | | | |
|
|
65
|
+
| 19 | Contract enforcement | Does the framework define and validate interfaces between components automatically (API shapes, schemas, props)? | | | | |
|
|
66
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
67
|
+
|
|
68
|
+
## F. Observability & Decision Quality (Dimensions 20-22)
|
|
69
|
+
|
|
70
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
71
|
+
|---|-----------|------------------|----|----|----|----|
|
|
72
|
+
| 20 | Decision traceability | Can you find why a choice was made 6 months later? | | | | |
|
|
73
|
+
| 21 | Progress accuracy | Does reported progress match actual state? | | | | |
|
|
74
|
+
| 22 | Risk visibility | Do problems surface early or only at integration? | | | | |
|
|
75
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Summary
|
|
80
|
+
|
|
81
|
+
| Category | F1 | F2 | F3 | F4 |
|
|
82
|
+
|---------------------------------------------|----|----|----|----|
|
|
83
|
+
| A. Onboarding & Adoption (1-3) | | | | |
|
|
84
|
+
| B. Execution & Delivery (4-7) | | | | |
|
|
85
|
+
| C. Sustainability & Maintenance (8-11) | | | | |
|
|
86
|
+
| D. Flexibility & Universality (12-15) | | | | |
|
|
87
|
+
| E. Automation & AI-Agent Capabilities (16-19) | | | | |
|
|
88
|
+
| F. Observability & Decisions (20-22) | | | | |
|
|
89
|
+
| **Overall Average (1-5)** | **—** | **—** | **—** | **—** |
|
|
90
|
+
| **Normalized Score (/100)** | **—** | **—** | **—** | **—** |
|
|
91
|
+
|
|
92
|
+
### Calculation
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
Category Average = sum of dimension scores in category / number of dimensions in category
|
|
96
|
+
Overall Average = sum of all 22 dimension scores / 22
|
|
97
|
+
Normalized /100 = Overall Average × 20
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Radar Chart Data
|
|
103
|
+
|
|
104
|
+
For visual comparison, plot each framework on a 6-axis radar chart using the category averages:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
Axis 1: Onboarding & Adoption
|
|
108
|
+
Axis 2: Execution & Delivery
|
|
109
|
+
Axis 3: Sustainability & Maintenance
|
|
110
|
+
Axis 4: Flexibility & Universality
|
|
111
|
+
Axis 5: Automation & AI-Agent Capabilities
|
|
112
|
+
Axis 6: Observability & Decisions
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Scoring Rubric
|
|
118
|
+
|
|
119
|
+
Use this rubric consistently across all frameworks and dimensions:
|
|
120
|
+
|
|
121
|
+
| Score | Label | Definition |
|
|
122
|
+
|-------|---------------|------------------------------------------------------------------------------|
|
|
123
|
+
| 1 | Absent | Not addressed by the framework. User must solve this entirely on their own. |
|
|
124
|
+
| 2 | Minimal | Acknowledged but not enforced. Ad-hoc or optional guidance only. |
|
|
125
|
+
| 3 | Supported | Present with some structure, but inconsistently applied or easy to skip. |
|
|
126
|
+
| 4 | Systematic | Well-integrated, mostly enforced, clear process with known exceptions. |
|
|
127
|
+
| 5 | Core strength | Foundational to the framework. Systematically enforced, hard to bypass. |
|
|
128
|
+
|
|
129
|
+
### Scoring guidelines
|
|
130
|
+
|
|
131
|
+
- **Score what the framework provides**, not what a disciplined team could achieve without it
|
|
132
|
+
- **Score the default experience**, not the best-case customized setup
|
|
133
|
+
- **Score independently** — don't let a high score in one dimension inflate adjacent ones
|
|
134
|
+
- **Use 3 as the anchor** — most frameworks land at 3 for most dimensions. Reserve 1 and 5 for clear extremes
|
|
135
|
+
- **When uncertain**, score conservatively (lower)
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Bias Checks
|
|
140
|
+
|
|
141
|
+
Before finalizing scores, verify:
|
|
142
|
+
|
|
143
|
+
- [ ] No single framework scores 5 on more than 9 of 22 dimensions
|
|
144
|
+
- [ ] No single framework scores below 2 on more than 9 of 22 dimensions
|
|
145
|
+
- [ ] Every framework has at least one category where it leads
|
|
146
|
+
- [ ] The evaluator did not design or build any of the frameworks being compared (if they did, note the conflict and consider a second evaluator)
|
|
147
|
+
- [ ] Dimensions were not added or removed after seeing preliminary scores
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## Notes & Justifications
|
|
152
|
+
|
|
153
|
+
Use this section to record reasoning for any score that might be controversial:
|
|
154
|
+
|
|
155
|
+
| Dimension | Framework | Score | Justification |
|
|
156
|
+
|-----------|-----------|-------|---------------|
|
|
157
|
+
| | | | |
|
|
158
|
+
| | | | |
|
|
159
|
+
| | | | |
|
|
160
|
+
| | | | |
|
package/docs/requirements.md
CHANGED
|
@@ -58,6 +58,16 @@
|
|
|
58
58
|
| REQ-047 | Global ELO & Rankings — gsd-t-status displays global ELO score and cross-project rank when global metrics exist | P2 | planned | validated by use |
|
|
59
59
|
| REQ-048 | Global Rule Promotion on Milestone Completion — gsd-t-complete-milestone copies promoted rules to global-rules.jsonl and updates global rollup after local promotion | P1 | planned | validated by use |
|
|
60
60
|
| REQ-049 | E2E Enforcement Rule — when playwright.config.* or cypress.config.* exists, ALL test-running commands (execute, quick, debug, test-sync, integrate, verify, complete-milestone) MUST run the full E2E suite. Unit-only results are NEVER sufficient. QA subagent prompts explicitly mandate E2E detection and execution. | P1 | complete | enforced in 7 command files + CLAUDE.md + pre-commit-gate contract |
|
|
61
|
+
| REQ-050 | Functional E2E Test Quality Standard — Playwright specs MUST verify functional behavior (state changes, data flow, content updates after actions), NOT just element existence (isVisible, toBeEnabled). Shallow layout tests that would pass on an empty HTML page are flagged and block verification. QA subagent audits for shallow tests. | P1 | complete | enforced in execute, qa, test-sync, verify, quick, debug, integrate, complete-milestone + global CLAUDE.md + CLAUDE-global template |
|
|
62
|
+
| REQ-051 | Document Ripple Completion Gate — when a change affects multiple files, identify the full blast radius BEFORE starting, complete ALL updates in one pass, and only report completion after every downstream document is updated. Partial delivery is never acceptable. The user should never need to ask "did you update everything?" | P1 | complete | enforced in global CLAUDE.md + CLAUDE-global template + project CLAUDE.md |
|
|
63
|
+
| REQ-052 | Doc-Ripple Subagent — dedicated agent auto-spawned after code-modifying commands (execute, integrate, quick, debug, wave) that analyzes git diff, identifies full blast radius of affected documents, and spawns parallel subagents to update them. Produces manifest audit trail. Threshold logic skips trivial changes. | P1 | complete | M28: contract ACTIVE, command file, 43 tests, wired into execute/integrate/quick/debug/wave |
|
|
64
|
+
| REQ-053 | Debug Ledger Protocol — structured JSONL ledger (.gsd-t/debug-state.jsonl) persists hypothesis/fix/learning entries across debug sessions. Supports read, append, compact (at 50KB), anti-repetition preamble generation, and clear. | P1 | complete | M29: bin/debug-ledger.js, test/debug-ledger.test.js (46 tests) |
|
|
65
|
+
| REQ-054 | Headless Debug-Loop — `gsd-t headless --debug-loop` runs test-fix-retest cycles as separate `claude -p` sessions with fresh context each. External loop controller (pure Node.js, zero AI context). Escalation tiers: sonnet 1-5, opus 6-15, STOP 16-20. --max-iterations enforced externally. | P1 | complete | M29: bin/gsd-t.js headless extension, test/headless-debug-loop.test.js (37 tests) |
|
|
66
|
+
| REQ-055 | Anti-Repetition Preamble — each debug-loop iteration injects a preamble listing all failed hypotheses, current narrowing direction, and tests still failing. Prevents repeat of eliminated approaches. | P1 | complete | M29: bin/debug-ledger.js generateAntiRepetitionPreamble, test/debug-ledger.test.js |
|
|
67
|
+
| REQ-056 | Debug-Loop Command Integration — execute, wave, test-sync, verify, and debug commands delegate to headless debug-loop after 2 in-context fix attempts fail. Preserves existing try-twice behavior for quick fixes. | P1 | complete | M29: 5 command files (execute, debug, wave, test-sync, verify) |
|
|
68
|
+
| REQ-057 | Stack Rule Templates — best practice rule files in `templates/stacks/` for React, TypeScript, and Node.js API. Each file follows a standard structure (mandatory framing, numbered sections, GOOD/BAD examples, verification checklist) and stays under 200 lines. Universal templates (`_` prefix) always injected; stack-specific templates injected when detected. | P1 | complete | M30: templates/stacks/ (4 files: _security.md, react.md, typescript.md, node-api.md) |
|
|
69
|
+
| REQ-058 | Stack Detection Engine — auto-detect project tech stack from manifest files (package.json, requirements.txt, go.mod, Cargo.toml) at subagent spawn time. Match detected stacks against available templates. Inject matched rules into subagent prompts with mandatory enforcement framing. Resilient: skip silently if no templates exist or no matches found. | P1 | complete | M30: 5 command files (execute, quick, integrate, wave, debug) |
|
|
70
|
+
| REQ-059 | Stack Rule QA Enforcement — QA subagent prompts include stack rule compliance validation. Stack rule violations have the same severity as contract violations — they fail the task, not warn. Report format includes "Stack rules: compliant/N violations". | P1 | complete | M30: execute QA prompt + all 5 commands |
|
|
61
71
|
|
|
62
72
|
## Technical Requirements
|
|
63
73
|
|
|
@@ -163,6 +173,29 @@
|
|
|
163
173
|
**Orphaned requirements**: None — all M27 REQs mapped to tasks.
|
|
164
174
|
**Unanchored tasks**: global-metrics Task 4 (tests) and cross-project-sync Task 3 (tests) are QA infrastructure supporting REQ-043 through REQ-045. command-extensions Task 4 (reference docs) supports Pre-Commit Gate compliance.
|
|
165
175
|
|
|
176
|
+
## Requirements Traceability (updated by plan phase — M29)
|
|
177
|
+
|
|
178
|
+
| REQ-ID | Requirement Summary | Domain | Task(s) | Status |
|
|
179
|
+
|---------|--------------------------------------------------------------|----------------------|----------------|---------|
|
|
180
|
+
| REQ-053 | Debug Ledger Protocol — JSONL ledger with read/write/compact | debug-state-protocol | Task 1, 2, 3 | complete |
|
|
181
|
+
| REQ-054 | Headless Debug-Loop — external loop controller | headless-loop | Task 1, 2, 3 | complete |
|
|
182
|
+
| REQ-055 | Anti-Repetition Preamble — failed hypothesis injection | debug-state-protocol, headless-loop | dsp Task 2, hl Task 2 | complete |
|
|
183
|
+
| REQ-056 | Debug-Loop Command Integration — delegate after 2 failures | command-integration | Task 1, 2 | complete |
|
|
184
|
+
|
|
185
|
+
**Orphaned requirements**: None — all M29 REQs mapped to tasks.
|
|
186
|
+
**Unanchored tasks**: debug-state-protocol Task 3 (tests) and headless-loop Task 3 (tests) are QA infrastructure supporting REQ-053 through REQ-055. command-integration Task 3 (reference docs) supports Pre-Commit Gate compliance.
|
|
187
|
+
|
|
188
|
+
## Requirements Traceability (updated by plan phase — M30)
|
|
189
|
+
|
|
190
|
+
| REQ-ID | Requirement Summary | Domain | Task(s) | Status |
|
|
191
|
+
|---------|--------------------------------------------------------------|----------------------|----------------|---------|
|
|
192
|
+
| REQ-057 | Stack Rule Templates — react.md, typescript.md, node-api.md | stack-templates | Task 1, 2, 3 | complete |
|
|
193
|
+
| REQ-058 | Stack Detection Engine — auto-detect + prompt injection | command-integration | Task 1, 2 | complete |
|
|
194
|
+
| REQ-059 | Stack Rule QA Enforcement — QA validates compliance | command-integration | Task 1, 2 | complete |
|
|
195
|
+
|
|
196
|
+
**Orphaned requirements**: None — all M30 REQs mapped to tasks.
|
|
197
|
+
**Unanchored tasks**: command-integration Task 3 (tests) is QA infrastructure supporting REQ-057 through REQ-059. command-integration Task 4 (reference docs) supports Pre-Commit Gate compliance.
|
|
198
|
+
|
|
166
199
|
---
|
|
167
200
|
|
|
168
201
|
## M17: Scan Visual Output — Feature Specification
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tekyzinc/gsd-t",
|
|
3
|
-
"version": "2.
|
|
4
|
-
"description": "GSD-T: Contract-Driven Development for Claude Code —
|
|
3
|
+
"version": "2.50.10",
|
|
4
|
+
"description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
|
|
5
5
|
"author": "Tekyz, Inc.",
|
|
6
6
|
"license": "MIT",
|
|
7
7
|
"repository": {
|