@tekyzinc/gsd-t 2.45.11 → 2.50.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +23 -0
  2. package/README.md +26 -5
  3. package/bin/debug-ledger.js +193 -0
  4. package/bin/gsd-t.js +259 -1
  5. package/commands/gsd-t-complete-milestone.md +2 -1
  6. package/commands/gsd-t-debug.md +48 -2
  7. package/commands/gsd-t-doc-ripple.md +148 -0
  8. package/commands/gsd-t-execute.md +102 -5
  9. package/commands/gsd-t-help.md +25 -2
  10. package/commands/gsd-t-integrate.md +41 -1
  11. package/commands/gsd-t-qa.md +26 -5
  12. package/commands/gsd-t-quick.md +39 -1
  13. package/commands/gsd-t-test-sync.md +26 -1
  14. package/commands/gsd-t-verify.md +8 -2
  15. package/commands/gsd-t-wave.md +57 -0
  16. package/docs/GSD-T-README.md +84 -1
  17. package/docs/architecture.md +9 -1
  18. package/docs/framework-comparison-scorecard.md +160 -0
  19. package/docs/requirements.md +33 -0
  20. package/examples/rules/desktop.ini +2 -0
  21. package/package.json +2 -2
  22. package/templates/CLAUDE-global.md +82 -4
  23. package/templates/stacks/_security.md +243 -0
  24. package/templates/stacks/desktop.ini +2 -0
  25. package/templates/stacks/docker.md +202 -0
  26. package/templates/stacks/firebase.md +166 -0
  27. package/templates/stacks/flutter.md +205 -0
  28. package/templates/stacks/github-actions.md +201 -0
  29. package/templates/stacks/graphql.md +216 -0
  30. package/templates/stacks/neo4j.md +218 -0
  31. package/templates/stacks/nextjs.md +184 -0
  32. package/templates/stacks/node-api.md +196 -0
  33. package/templates/stacks/playwright.md +528 -0
  34. package/templates/stacks/postgresql.md +225 -0
  35. package/templates/stacks/python.md +243 -0
  36. package/templates/stacks/react-native.md +216 -0
  37. package/templates/stacks/react.md +293 -0
  38. package/templates/stacks/redux.md +193 -0
  39. package/templates/stacks/rest-api.md +202 -0
  40. package/templates/stacks/supabase.md +188 -0
  41. package/templates/stacks/tailwind.md +169 -0
  42. package/templates/stacks/typescript.md +176 -0
  43. package/templates/stacks/vite.md +176 -0
  44. package/templates/stacks/vue.md +189 -0
  45. package/templates/stacks/zustand.md +203 -0
package/CHANGELOG.md CHANGED
@@ -2,6 +2,29 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [2.50.10] - 2026-03-25
6
+
7
+ ### Added
8
+ - **18 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright. Total: 22 stack rules (was 4).
9
+ - **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
10
+ - **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
11
+
12
+ ### Changed
13
+ - Stack detection in execute, quick, and debug commands updated to cover all 22 stack files with conditional detection per project dependencies.
14
+ - PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
15
+
16
+ ## [2.46.11] - 2026-03-24
17
+
18
+ ### Added
19
+ - **M28: Doc-Ripple Subagent** — automated document ripple enforcement agent. Threshold check (7 FIRE/3 SKIP conditions), blast radius analysis, manifest generation, parallel document updates. New command: `gsd-t-doc-ripple`. 43 new tests. Wired into execute, integrate, quick, debug, wave.
20
+ - **Orchestrator context self-check** — execute and wave orchestrators now check their own context utilization after every domain/phase. If >= 70%, saves progress and stops to prevent session breaks.
21
+ - **Functional E2E test quality standard (REQ-050)** — Playwright specs must verify functional behavior, not just element existence. Shallow test audit added to qa, test-sync, verify, complete-milestone commands.
22
+ - **Document Ripple Completion Gate (REQ-051)** — structural rule preventing "done" reports until all downstream documents are updated.
23
+
24
+ ### Changed
25
+ - Command count: 50 → 51 (added `gsd-t-doc-ripple`)
26
+ - Package description updated to include doc-ripple enforcement
27
+
5
28
  ## [2.39.12] - 2026-03-19
6
29
 
7
30
  ### Added
package/README.md CHANGED
@@ -3,6 +3,7 @@
3
3
  A methodology for reliable, parallelizable development using Claude Code with optional Agent Teams support.
4
4
 
5
5
  **Eliminates context rot** — task-level fresh dispatch (one subagent per task, ~10-20% context each) means compaction never triggers.
6
+ **Compaction-proof debug loops** — `gsd-t headless --debug-loop` runs test-fix-retest cycles as separate `claude -p` sessions. A JSONL debug ledger persists all hypothesis/fix/learning history across fresh sessions. Anti-repetition preamble injection prevents retrying failed hypotheses. Escalation tiers (sonnet → opus → human) and a hard iteration ceiling enforced externally.
6
7
  **Safe parallel execution** — worktree isolation gives each domain agent its own filesystem; sequential atomic merges prevent conflicts.
7
8
  **Maintains test coverage** — automatically keeps tests aligned with code changes.
8
9
  **Catches downstream effects** — analyzes impact before changes break things.
@@ -11,6 +12,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
11
12
  **Generates visual scan reports** — every `/gsd-t-scan` produces a self-contained HTML report with 6 live architectural diagrams, a tech debt register, and domain health scores; optional DOCX/PDF export via `--export docx|pdf`.
12
13
  **Self-learning rule engine** — declarative rules in rules.jsonl detect failure patterns from task metrics. Candidate patches progress through a 5-stage lifecycle (candidate, applied, measured, promoted, graduated) with >55% improvement gates before becoming permanent methodology artifacts.
13
14
  **Cross-project learning** — proven rules propagate to `~/.claude/metrics/` and sync across all registered projects via `update-all`. Rules validated in 3+ projects become universal; 5+ projects qualify for npm distribution. Cross-project signal comparison and global ELO rankings available via `gsd-t-metrics --cross-project` and `gsd-t-status`.
15
+ **Stack Rules Engine** — auto-detects project tech stack (React, TypeScript, Node API, Python, Go, Rust) from manifest files and injects mandatory best-practice rules into subagent prompts at execute-time. Universal security rules always apply; stack-specific rules layer on top. Extensible: drop a `.md` file in `templates/stacks/` to add a new stack.
14
16
 
15
17
  ---
16
18
 
@@ -22,7 +24,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
22
24
  npx @tekyzinc/gsd-t install
23
25
  ```
24
26
 
25
- This installs 45 GSD-T commands + 5 utility commands (50 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
27
+ This installs 46 GSD-T commands + 5 utility commands (51 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
26
28
 
27
29
  ### Start Using It
28
30
 
@@ -83,8 +85,21 @@ npx @tekyzinc/gsd-t uninstall # Remove commands (keeps project files)
83
85
  gsd-t headless verify --json --timeout=1200 # Run verify non-interactively
84
86
  gsd-t headless query status # Get project state (no LLM, <100ms)
85
87
  gsd-t headless query domains # List domains (no LLM)
88
+
89
+ # Headless debug-loop (compaction-proof automated test-fix-retest)
90
+ gsd-t headless --debug-loop # Auto-detect test cmd, up to 20 iterations
91
+ gsd-t headless --debug-loop --max-iterations=10 # Cap at 10 iterations
92
+ gsd-t headless --debug-loop --test-cmd="npm test" # Override test command
93
+ gsd-t headless --debug-loop --fix-scope="src/auth/**" # Limit fix scope
94
+ gsd-t headless --debug-loop --json --log # Structured output + per-iteration logs
86
95
  ```
87
96
 
97
+ Each iteration runs as a fresh `claude -p` session. A cumulative debug ledger (`.gsd-t/debug-state.jsonl`) preserves hypothesis/fix/learning history across sessions. An anti-repetition preamble prevents retrying failed approaches.
98
+
99
+ **Escalation tiers**: sonnet (iterations 1–5) → opus (6–15) → STOP with diagnostic summary (16–20)
100
+
101
+ **Exit codes**: `0` all tests pass · `1` max iterations reached · `2` compaction error · `3` process error · `4` needs human decision
102
+
88
103
  ### Updating
89
104
 
90
105
  When a new version is published:
@@ -141,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
141
156
  | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
142
157
  | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
143
158
  | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
159
+ | `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
144
160
  | `/user:gsd-t-integrate` | Wire domains together | In wave |
145
161
  | `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
146
162
  | `/user:gsd-t-complete-milestone` | Archive + git tag (goal-backward gate required) | In wave |
@@ -314,13 +330,13 @@ get-stuff-done-teams/
314
330
  ├── LICENSE
315
331
  ├── bin/
316
332
  │ └── gsd-t.js # CLI installer
317
- ├── commands/ # 50 slash commands
318
- │ ├── gsd-t-*.md # 44 GSD-T workflow commands
333
+ ├── commands/ # 51 slash commands
334
+ │ ├── gsd-t-*.md # 45 GSD-T workflow commands
319
335
  │ ├── gsd.md # GSD-T smart router
320
336
  │ ├── branch.md # Git branch helper
321
337
  │ ├── checkin.md # Auto-version + commit/push helper
322
338
  │ └── Claude-md.md # Reload CLAUDE.md directives
323
- ├── templates/ # Document templates
339
+ ├── templates/ # Document templates (9 base + stacks/)
324
340
  │ ├── CLAUDE-global.md
325
341
  │ ├── CLAUDE-project.md
326
342
  │ ├── requirements.md
@@ -329,7 +345,12 @@ get-stuff-done-teams/
329
345
  │ ├── infrastructure.md
330
346
  │ ├── progress.md
331
347
  │ ├── backlog.md
332
- └── backlog-settings.md
348
+ ├── backlog-settings.md
349
+ │ └── stacks/ # Stack Rules Engine templates
350
+ │ ├── _security.md # Universal — always injected
351
+ │ ├── react.md
352
+ │ ├── typescript.md
353
+ │ └── node-api.md
333
354
  ├── scripts/ # Runtime utility scripts (installed to ~/.claude/scripts/)
334
355
  │ ├── gsd-t-tools.js # State CLI (get/set/validate/list)
335
356
  │ ├── gsd-t-statusline.js # Context usage bar
@@ -0,0 +1,193 @@
1
+ #!/usr/bin/env node
2
+
3
+ /**
4
+ * GSD-T Debug Ledger — Persistent debug iteration store
5
+ *
6
+ * Reads and writes debug iteration records to .gsd-t/debug-state.jsonl.
7
+ * Supports compaction detection and ledger lifecycle management.
8
+ *
9
+ * Zero external dependencies (Node.js built-ins only).
10
+ */
11
+
12
+ const fs = require("fs");
13
+ const path = require("path");
14
+
15
+ // ── Constants ─────────────────────────────────────────────────────────────────
16
+
17
+ const COMPACTION_THRESHOLD = 51200; // 50KB
18
+
19
+ const REQUIRED_FIELDS = [
20
+ "iteration", "timestamp", "test", "error",
21
+ "hypothesis", "fix", "fixFiles", "result",
22
+ "learning", "model", "duration",
23
+ ];
24
+
25
+ const VALID_RESULTS = new Set(["PASS", "STILL_FAILS"]);
26
+
27
+ // ── Exports ───────────────────────────────────────────────────────────────────
28
+
29
+ module.exports = {
30
+ readLedger, appendEntry, getLedgerStats, clearLedger,
31
+ compactLedger, generateAntiRepetitionPreamble,
32
+ };
33
+
34
+ // ── readLedger ────────────────────────────────────────────────────────────────
35
+
36
+ /**
37
+ * Read all entries from the debug ledger.
38
+ * @param {string} projectDir - Root directory of the project
39
+ * @returns {object[]} Array of parsed ledger entry objects
40
+ */
41
+ function readLedger(projectDir) {
42
+ const fp = ledgerPath(projectDir);
43
+ if (!fs.existsSync(fp)) return [];
44
+ const content = fs.readFileSync(fp, "utf8").trim();
45
+ if (!content) return [];
46
+ return content.split("\n").map(safeParse).filter(Boolean);
47
+ }
48
+
49
+ // ── appendEntry ───────────────────────────────────────────────────────────────
50
+
51
+ /**
52
+ * Validate and append one debug iteration entry to the ledger.
53
+ * Creates the file and parent directories if they do not exist.
54
+ * @param {string} projectDir - Root directory of the project
55
+ * @param {object} entry - Debug iteration record (see Required Fields)
56
+ * @throws {Error} If required fields are missing or invalid
57
+ */
58
+ function appendEntry(projectDir, entry) {
59
+ const err = validateEntry(entry);
60
+ if (err) throw new Error(err);
61
+ const fp = ledgerPath(projectDir);
62
+ ensureDir(path.dirname(fp));
63
+ fs.appendFileSync(fp, JSON.stringify(entry) + "\n");
64
+ }
65
+
66
+ // ── getLedgerStats ────────────────────────────────────────────────────────────
67
+
68
+ /**
69
+ * Return summary statistics for the current ledger.
70
+ * @param {string} projectDir - Root directory of the project
71
+ * @returns {{ entryCount: number, sizeBytes: number, needsCompaction: boolean, failedHypotheses: string[], passCount: number, failCount: number }}
72
+ */
73
+ function getLedgerStats(projectDir) {
74
+ const fp = ledgerPath(projectDir);
75
+ const entries = readLedger(projectDir);
76
+ const sizeBytes = fs.existsSync(fp) ? fs.statSync(fp).size : 0;
77
+ const failedHypotheses = entries
78
+ .filter((e) => e.result === "STILL_FAILS" && e.hypothesis)
79
+ .map((e) => e.hypothesis);
80
+ const passCount = entries.filter((e) => e.result === "PASS").length;
81
+ const failCount = entries.filter((e) => e.result === "STILL_FAILS").length;
82
+ return {
83
+ entryCount: entries.length,
84
+ sizeBytes,
85
+ needsCompaction: sizeBytes > COMPACTION_THRESHOLD,
86
+ failedHypotheses,
87
+ passCount,
88
+ failCount,
89
+ };
90
+ }
91
+
92
+ // ── clearLedger ───────────────────────────────────────────────────────────────
93
+
94
+ /**
95
+ * Delete the debug ledger file. Called when all tests pass.
96
+ * No-op if the file does not exist.
97
+ * @param {string} projectDir - Root directory of the project
98
+ */
99
+ function clearLedger(projectDir) {
100
+ const fp = ledgerPath(projectDir);
101
+ if (fs.existsSync(fp)) fs.unlinkSync(fp);
102
+ }
103
+
104
+ // ── compactLedger ─────────────────────────────────────────────────────────────
105
+
106
+ /**
107
+ * Compact the ledger by replacing all but the last 5 entries with a summary.
108
+ * @param {string} projectDir - Root directory of the project
109
+ * @param {string} summary - Summarization of compacted entries
110
+ */
111
+ function compactLedger(projectDir, summary) {
112
+ const entries = readLedger(projectDir);
113
+ const tail = entries.slice(-5);
114
+ const compactedEntry = {
115
+ compacted: true,
116
+ learning: summary,
117
+ iteration: 0,
118
+ timestamp: new Date().toISOString(),
119
+ test: "compacted",
120
+ error: "see summary",
121
+ hypothesis: "compacted",
122
+ fix: "compacted",
123
+ fixFiles: [],
124
+ result: "compacted",
125
+ model: "haiku",
126
+ duration: 0,
127
+ };
128
+ const fp = ledgerPath(projectDir);
129
+ ensureDir(path.dirname(fp));
130
+ const lines = [compactedEntry, ...tail].map((e) => JSON.stringify(e)).join("\n") + "\n";
131
+ fs.writeFileSync(fp, lines);
132
+ }
133
+
134
+ // ── generateAntiRepetitionPreamble ────────────────────────────────────────────
135
+
136
+ /**
137
+ * Build a preamble string listing failed hypotheses and the current narrowing
138
+ * direction. Injected into each claude -p session to prevent repeated attempts.
139
+ * @param {string} projectDir - Root directory of the project
140
+ * @returns {string} Formatted preamble, or empty string if ledger is empty
141
+ */
142
+ function generateAntiRepetitionPreamble(projectDir) {
143
+ const entries = readLedger(projectDir);
144
+ if (!entries.length) return "";
145
+ const failed = entries.filter((e) => e.result === "STILL_FAILS");
146
+ const learnings = entries.filter((e) => e.learning && !e.compacted);
147
+ const lastLearning = learnings.length ? learnings[learnings.length - 1].learning : null;
148
+ const failLines = failed
149
+ .map((e, i) => `${i + 1}. [iteration ${e.iteration}] "${e.hypothesis}" — FAILED: ${e.error}`)
150
+ .join("\n");
151
+ const stillFailing = failed.map((e) => `- ${e.test}: ${e.error}`).join("\n");
152
+ const direction = lastLearning
153
+ ? `Based on ${entries.length} iterations, the evidence points to: ${lastLearning}`
154
+ : "No narrowing direction established yet.";
155
+ return [
156
+ "## Debug Ledger Context (DO NOT retry failed approaches)",
157
+ "",
158
+ "### Failed Hypotheses (DO NOT retry these):",
159
+ failLines || "(none yet)",
160
+ "",
161
+ "### Current Narrowing Direction:",
162
+ direction,
163
+ "",
164
+ "### Tests Still Failing:",
165
+ stillFailing || "(none recorded)",
166
+ ].join("\n");
167
+ }
168
+
169
+ // ── Internal helpers ──────────────────────────────────────────────────────────
170
+
171
+ function ledgerPath(projectDir) {
172
+ return path.join(projectDir || process.cwd(), ".gsd-t", "debug-state.jsonl");
173
+ }
174
+
175
+ function ensureDir(dir) {
176
+ if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
177
+ }
178
+
179
+ function safeParse(line) {
180
+ try { return JSON.parse(line); } catch { return null; }
181
+ }
182
+
183
+ function validateEntry(entry) {
184
+ if (!entry || typeof entry !== "object") return "Entry must be an object";
185
+ for (const f of REQUIRED_FIELDS) {
186
+ if (entry[f] === undefined || entry[f] === null) return `Missing required field: ${f}`;
187
+ }
188
+ if (typeof entry.iteration !== "number") return "iteration must be a number";
189
+ if (typeof entry.duration !== "number") return "duration must be a number";
190
+ if (!Array.isArray(entry.fixFiles)) return "fixFiles must be an array";
191
+ if (!VALID_RESULTS.has(entry.result)) return `result must be "PASS" or "STILL_FAILS"`;
192
+ return null;
193
+ }
package/bin/gsd-t.js CHANGED
@@ -19,6 +19,7 @@ const fs = require("fs");
19
19
  const path = require("path");
20
20
  const os = require("os");
21
21
  const { execFileSync, spawn: cpSpawn } = require("child_process");
22
+ const debugLedger = require(path.join(__dirname, "debug-ledger.js"));
22
23
 
23
24
  // ─── Configuration ───────────────────────────────────────────────────────────
24
25
 
@@ -2174,6 +2175,236 @@ function doHeadlessQuery(type) {
2174
2175
  process.stdout.write(JSON.stringify(result) + "\n");
2175
2176
  }
2176
2177
 
2178
+ /**
2179
+ * Parse debug-loop flags from args array.
2180
+ * Extracts --max-iterations, --test-cmd, --fix-scope, --json, --log from args.
2181
+ */
2182
+ function parseDebugLoopFlags(args) {
2183
+ const flags = { maxIterations: 20, testCmd: null, fixScope: null, json: false, log: false };
2184
+ const positional = [];
2185
+ for (const arg of args) {
2186
+ if (arg.startsWith("--max-iterations=")) {
2187
+ const n = parseInt(arg.slice("--max-iterations=".length), 10);
2188
+ if (!isNaN(n) && n > 0) flags.maxIterations = n;
2189
+ } else if (arg.startsWith("--test-cmd=")) {
2190
+ flags.testCmd = arg.slice("--test-cmd=".length);
2191
+ } else if (arg.startsWith("--fix-scope=")) {
2192
+ flags.fixScope = arg.slice("--fix-scope=".length);
2193
+ } else if (arg === "--json") {
2194
+ flags.json = true;
2195
+ } else if (arg === "--log") {
2196
+ flags.log = true;
2197
+ } else {
2198
+ positional.push(arg);
2199
+ }
2200
+ }
2201
+ return { flags, positional };
2202
+ }
2203
+
2204
+ /**
2205
+ * Return the escalation model for a given iteration number.
2206
+ * Tiers: 1-5 → sonnet, 6-15 → opus, 16+ → null (stop)
2207
+ */
2208
+ function getEscalationModel(iteration) {
2209
+ if (iteration >= 1 && iteration <= 5) return "sonnet";
2210
+ if (iteration >= 6 && iteration <= 15) return "opus";
2211
+ return null;
2212
+ }
2213
+
2214
+ /**
2215
+ * Spawn a single `claude -p` session and return stdout as a string.
2216
+ * Returns null if the process fails.
2217
+ */
2218
+ function spawnClaudeSession(prompt, model) {
2219
+ try {
2220
+ return execFileSync("claude", ["-p", prompt, "--model", model], {
2221
+ encoding: "utf8", timeout: 300000,
2222
+ stdio: ["pipe", "pipe", "pipe"],
2223
+ });
2224
+ } catch (e) {
2225
+ return (e.stdout || "") + (e.stderr || "") || null;
2226
+ }
2227
+ }
2228
+
2229
+ /**
2230
+ * Parse test pass/fail from claude output.
2231
+ * Returns { passed: bool, summary: string }.
2232
+ */
2233
+ function parseTestResult(output) {
2234
+ const out = (output || "").toLowerCase();
2235
+ const passed =
2236
+ /\ball tests? pass(ed|ing)?\b/.test(out) ||
2237
+ /\ball \d+ tests? pass/.test(out) ||
2238
+ /\bno (test )?failures?\b/.test(out) ||
2239
+ /\btests? (all )?pass(ed)?\b/.test(out);
2240
+ const failed =
2241
+ /\bfail(ed|ing|ure)?\b/.test(out) ||
2242
+ /\berror\b/.test(out) ||
2243
+ /\bnot ok\b/.test(out);
2244
+ const summary = (output || "").slice(0, 500).replace(/\n/g, " ").trim();
2245
+ return { passed: passed && !failed, summary };
2246
+ }
2247
+
2248
+ /**
2249
+ * Run ledger compaction: spawn haiku to summarize, then compact.
2250
+ */
2251
+ function runLedgerCompaction(projectDir, jsonMode) {
2252
+ const entries = debugLedger.readLedger(projectDir);
2253
+ const compactPrompt =
2254
+ "Read this debug ledger. Produce a condensed summary of what has been tried, " +
2255
+ "what failed, and what the evidence suggests. Be concise.\n\n" +
2256
+ JSON.stringify(entries, null, 2);
2257
+ let summary = "Compacted — see previous entries.";
2258
+ try {
2259
+ const out = execFileSync("claude", ["-p", compactPrompt, "--model", "haiku"], {
2260
+ encoding: "utf8", timeout: 120000, stdio: ["pipe", "pipe", "pipe"],
2261
+ });
2262
+ summary = (out || "").trim() || summary;
2263
+ } catch (e) {
2264
+ if (!jsonMode) warn("Compaction haiku session failed — using default summary");
2265
+ }
2266
+ debugLedger.compactLedger(projectDir, summary);
2267
+ }
2268
+
2269
+ /**
2270
+ * Write a per-iteration log file under .gsd-t/.
2271
+ */
2272
+ function writeIterationLog(projectDir, ts, iteration, entry, rawOutput) {
2273
+ const logDir = path.join(projectDir, ".gsd-t");
2274
+ if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
2275
+ const fname = `headless-debug-${ts}-iter-${iteration}.log`;
2276
+ const content = [
2277
+ `Iteration: ${iteration}`,
2278
+ `Timestamp: ${entry.timestamp}`,
2279
+ `Model: ${entry.model}`,
2280
+ `Result: ${entry.result}`,
2281
+ `Fix: ${entry.fix}`,
2282
+ `Learning: ${entry.learning}`,
2283
+ `---`,
2284
+ rawOutput || "",
2285
+ ].join("\n");
2286
+ fs.writeFileSync(path.join(logDir, fname), content);
2287
+ }
2288
+
2289
+ /**
2290
+ * Full debug-loop: validate flags, check claude CLI, run iteration cycle.
2291
+ */
2292
+ function doHeadlessDebugLoop(flags) {
2293
+ const opts = flags || {};
2294
+ const jsonMode = opts.json || false;
2295
+ const projectDir = process.cwd();
2296
+
2297
+ if (opts.maxIterations < 1) {
2298
+ const msg = "--max-iterations must be >= 1";
2299
+ if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
2300
+ else error(msg);
2301
+ process.exit(3);
2302
+ }
2303
+
2304
+ try {
2305
+ execFileSync("claude", ["--version"], { encoding: "utf8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] });
2306
+ } catch {
2307
+ const msg = "claude CLI not found. Install with: npm install -g @anthropic-ai/claude-code";
2308
+ if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
2309
+ else error(msg);
2310
+ process.exit(3);
2311
+ }
2312
+
2313
+ if (!jsonMode) {
2314
+ heading("GSD-T Headless — Debug Loop");
2315
+ info(`Max iterations: ${opts.maxIterations}`);
2316
+ if (opts.testCmd) info(`Test command: ${opts.testCmd}`);
2317
+ if (opts.fixScope) info(`Fix scope: ${opts.fixScope}`);
2318
+ if (opts.log) info(`Logging: enabled`);
2319
+ log("");
2320
+ }
2321
+
2322
+ const ts = Date.now();
2323
+
2324
+ for (let iteration = 1; iteration <= opts.maxIterations; iteration++) {
2325
+ const model = getEscalationModel(iteration);
2326
+
2327
+ // STOP tier: escalation stop
2328
+ if (model === null) {
2329
+ const entries = debugLedger.readLedger(projectDir);
2330
+ const stats = debugLedger.getLedgerStats(projectDir);
2331
+ const diagMsg = `ESCALATION STOP at iteration ${iteration}. ` +
2332
+ `Entries: ${stats.entryCount}, Failures: ${stats.failCount}. ` +
2333
+ `Failed hypotheses:\n${stats.failedHypotheses.map((h, i) => ` ${i + 1}. ${h}`).join("\n")}`;
2334
+ if (jsonMode) {
2335
+ process.stdout.write(JSON.stringify({ success: false, exitCode: 4, iteration, diagnostic: diagMsg, entries }) + "\n");
2336
+ } else {
2337
+ log("");
2338
+ warn(diagMsg);
2339
+ }
2340
+ process.exit(4);
2341
+ }
2342
+
2343
+ // Check compaction
2344
+ const stats = debugLedger.getLedgerStats(projectDir);
2345
+ if (stats.needsCompaction) {
2346
+ if (!jsonMode) info("Ledger compaction triggered...");
2347
+ try { runLedgerCompaction(projectDir, jsonMode); }
2348
+ catch { process.exit(2); }
2349
+ }
2350
+
2351
+ // Generate preamble and build prompt
2352
+ const preamble = debugLedger.generateAntiRepetitionPreamble(projectDir);
2353
+ const scopeHint = opts.fixScope ? `\nFix scope: ${opts.fixScope}` : "";
2354
+ const testHint = opts.testCmd ? `\nRun tests with: ${opts.testCmd}` : "";
2355
+ const prompt = [preamble, `Fix the failing test(s). Write your fix, then run the test suite. Report results.${scopeHint}${testHint}`]
2356
+ .filter(Boolean).join("\n\n");
2357
+
2358
+ if (!jsonMode) info(`Iteration ${iteration}/${opts.maxIterations} [${model}]...`);
2359
+
2360
+ const iterStart = Date.now();
2361
+ let rawOutput = null;
2362
+ try { rawOutput = spawnClaudeSession(prompt, model); }
2363
+ catch (e) {
2364
+ if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, iteration, error: String(e) }) + "\n");
2365
+ else error(`Process error at iteration ${iteration}: ${e.message}`);
2366
+ process.exit(3);
2367
+ }
2368
+ const duration = Math.round((Date.now() - iterStart) / 1000);
2369
+
2370
+ const { passed, summary } = parseTestResult(rawOutput);
2371
+ const result = passed ? "PASS" : "STILL_FAILS";
2372
+
2373
+ // Extract fix description from output (first 200 chars of output)
2374
+ const fixDesc = (rawOutput || "").split("\n").find((l) => l.trim().length > 20) || "see output";
2375
+ const entry = {
2376
+ iteration, timestamp: new Date().toISOString(),
2377
+ test: opts.testCmd || "unspecified", error: passed ? "" : summary,
2378
+ hypothesis: `iteration-${iteration}`, fix: fixDesc.trim().slice(0, 200),
2379
+ fixFiles: [], result, learning: summary.slice(0, 300),
2380
+ model, duration,
2381
+ };
2382
+
2383
+ try { debugLedger.appendEntry(projectDir, entry); }
2384
+ catch (e) {
2385
+ if (!jsonMode) warn(`Failed to append ledger entry: ${e.message}`);
2386
+ }
2387
+
2388
+ if (opts.log) writeIterationLog(projectDir, ts, iteration, entry, rawOutput);
2389
+
2390
+ if (jsonMode) {
2391
+ process.stdout.write(JSON.stringify({ success: passed, exitCode: passed ? 0 : 1, iteration, result, model, duration, summary }) + "\n");
2392
+ } else {
2393
+ info(` Result: ${result}`);
2394
+ }
2395
+
2396
+ if (passed) {
2397
+ debugLedger.clearLedger(projectDir);
2398
+ if (!jsonMode) log(`\n${GREEN}All tests pass — debug loop complete.${RESET}`);
2399
+ process.exit(0);
2400
+ }
2401
+ }
2402
+
2403
+ // Max iterations reached
2404
+ if (!jsonMode) warn(`Max iterations (${opts.maxIterations}) reached without all tests passing.`);
2405
+ process.exit(1);
2406
+ }
2407
+
2177
2408
  function doHeadless(args) {
2178
2409
  const sub = args[0];
2179
2410
  if (!sub || sub === "--help" || sub === "-h") {
@@ -2181,6 +2412,12 @@ function doHeadless(args) {
2181
2412
  return;
2182
2413
  }
2183
2414
 
2415
+ if (sub === "--debug-loop") {
2416
+ const { flags } = parseDebugLoopFlags(args.slice(1));
2417
+ doHeadlessDebugLoop(flags);
2418
+ return;
2419
+ }
2420
+
2184
2421
  if (sub === "query") {
2185
2422
  const type = args[1];
2186
2423
  doHeadlessQuery(type);
@@ -2196,7 +2433,24 @@ function showHeadlessHelp() {
2196
2433
  log(`\n${BOLD}GSD-T Headless Mode${RESET}\n`);
2197
2434
  log(`${BOLD}Usage:${RESET}`);
2198
2435
  log(` ${CYAN}gsd-t headless${RESET} <command> [args] [--json] [--timeout=N] [--log]`);
2199
- log(` ${CYAN}gsd-t headless query${RESET} <type>\n`);
2436
+ log(` ${CYAN}gsd-t headless query${RESET} <type>`);
2437
+ log(` ${CYAN}gsd-t headless --debug-loop${RESET} [--max-iterations=N] [--test-cmd=CMD] [--fix-scope=SCOPE] [--json] [--log]\n`);
2438
+ log(`${BOLD}Debug-loop flags:${RESET}`);
2439
+ log(` ${CYAN}--max-iterations=N${RESET} Hard ceiling on iterations (default: 20)`);
2440
+ log(` ${CYAN}--test-cmd=CMD${RESET} Override test command`);
2441
+ log(` ${CYAN}--fix-scope=SCOPE${RESET} Limit fix scope to specific files or test patterns`);
2442
+ log(` ${CYAN}--json${RESET} Structured JSON output per iteration`);
2443
+ log(` ${CYAN}--log${RESET} Write per-iteration logs to .gsd-t/\n`);
2444
+ log(`${BOLD}Debug-loop escalation tiers:${RESET}`);
2445
+ log(` Iterations 1-5: sonnet (standard debug)`);
2446
+ log(` Iterations 6-15: opus (deeper reasoning)`);
2447
+ log(` Iterations 16-20: STOP (exit code 4 — needs human)\n`);
2448
+ log(`${BOLD}Debug-loop exit codes:${RESET}`);
2449
+ log(` 0 all tests pass`);
2450
+ log(` 1 max iterations reached`);
2451
+ log(` 2 ledger compaction error`);
2452
+ log(` 3 process error`);
2453
+ log(` 4 escalation stop — needs human\n`);
2200
2454
  log(`${BOLD}Exec flags:${RESET}`);
2201
2455
  log(` ${CYAN}--json${RESET} Structured JSON output`);
2202
2456
  log(` ${CYAN}--timeout=N${RESET} Kill after N seconds (default: 300)`);
@@ -2304,6 +2558,10 @@ module.exports = {
2304
2558
  doHeadlessExec,
2305
2559
  doHeadlessQuery,
2306
2560
  doHeadless,
2561
+ // Headless debug-loop
2562
+ parseDebugLoopFlags,
2563
+ getEscalationModel,
2564
+ doHeadlessDebugLoop,
2307
2565
  queryStatus,
2308
2566
  queryDomains,
2309
2567
  queryContracts,
@@ -445,8 +445,9 @@ Verify the milestone is truly complete:
445
445
  c. If specs are missing or stale, invoke `gsd-t-test-sync` first.
446
446
  d. Report: "Unit: X/Y pass | E2E: X/Y pass"
447
447
  2. **Verify all pass**: Every test must pass. If any fail, fix before tagging (up to 2 attempts)
448
+ 3. **Functional test quality gate**: Read every Playwright spec. Verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded to input) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). Shallow tests that would pass on an empty HTML page with the right element IDs are a milestone completion FAIL. Flag and rewrite before proceeding.
448
449
  4. **Compare to baseline**: If a test baseline was recorded at milestone start, verify coverage has improved or at minimum not regressed
449
- 5. **Log test results**: Include test pass/fail counts in the milestone summary (Step 4)
450
+ 5. **Log test results**: Include test pass/fail counts and shallow test audit results in the milestone summary (Step 4)
450
451
 
451
452
  ## Step 11: Create Git Tag
452
453