npm - @tekyzinc/gsd-t - Versions diffs - 2.45.11 → 2.50.10 - Mend

@tekyzinc/gsd-t 2.45.11 → 2.50.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +23 -0
package/README.md +26 -5
package/bin/debug-ledger.js +193 -0
package/bin/gsd-t.js +259 -1
package/commands/gsd-t-complete-milestone.md +2 -1
package/commands/gsd-t-debug.md +48 -2
package/commands/gsd-t-doc-ripple.md +148 -0
package/commands/gsd-t-execute.md +102 -5
package/commands/gsd-t-help.md +25 -2
package/commands/gsd-t-integrate.md +41 -1
package/commands/gsd-t-qa.md +26 -5
package/commands/gsd-t-quick.md +39 -1
package/commands/gsd-t-test-sync.md +26 -1
package/commands/gsd-t-verify.md +8 -2
package/commands/gsd-t-wave.md +57 -0
package/docs/GSD-T-README.md +84 -1
package/docs/architecture.md +9 -1
package/docs/framework-comparison-scorecard.md +160 -0
package/docs/requirements.md +33 -0
package/examples/rules/desktop.ini +2 -0
package/package.json +2 -2
package/templates/CLAUDE-global.md +82 -4
package/templates/stacks/_security.md +243 -0
package/templates/stacks/desktop.ini +2 -0
package/templates/stacks/docker.md +202 -0
package/templates/stacks/firebase.md +166 -0
package/templates/stacks/flutter.md +205 -0
package/templates/stacks/github-actions.md +201 -0
package/templates/stacks/graphql.md +216 -0
package/templates/stacks/neo4j.md +218 -0
package/templates/stacks/nextjs.md +184 -0
package/templates/stacks/node-api.md +196 -0
package/templates/stacks/playwright.md +528 -0
package/templates/stacks/postgresql.md +225 -0
package/templates/stacks/python.md +243 -0
package/templates/stacks/react-native.md +216 -0
package/templates/stacks/react.md +293 -0
package/templates/stacks/redux.md +193 -0
package/templates/stacks/rest-api.md +202 -0
package/templates/stacks/supabase.md +188 -0
package/templates/stacks/tailwind.md +169 -0
package/templates/stacks/typescript.md +176 -0
package/templates/stacks/vite.md +176 -0
package/templates/stacks/vue.md +189 -0
package/templates/stacks/zustand.md +203 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,29 @@
 All notable changes to GSD-T are documented here. Updated with each release.
+## [2.50.10] - 2026-03-25
+### Added
+- **18 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright. Total: 22 stack rules (was 4).
+- **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
+- **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
+### Changed
+- Stack detection in execute, quick, and debug commands updated to cover all 22 stack files with conditional detection per project dependencies.
+- PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
+## [2.46.11] - 2026-03-24
+### Added
+- **M28: Doc-Ripple Subagent** — automated document ripple enforcement agent. Threshold check (7 FIRE/3 SKIP conditions), blast radius analysis, manifest generation, parallel document updates. New command: `gsd-t-doc-ripple`. 43 new tests. Wired into execute, integrate, quick, debug, wave.
+- **Orchestrator context self-check** — execute and wave orchestrators now check their own context utilization after every domain/phase. If >= 70%, saves progress and stops to prevent session breaks.
+- **Functional E2E test quality standard (REQ-050)** — Playwright specs must verify functional behavior, not just element existence. Shallow test audit added to qa, test-sync, verify, complete-milestone commands.
+- **Document Ripple Completion Gate (REQ-051)** — structural rule preventing "done" reports until all downstream documents are updated.
+### Changed
+- Command count: 50 → 51 (added `gsd-t-doc-ripple`)
+- Package description updated to include doc-ripple enforcement
 ## [2.39.12] - 2026-03-19
 ### Added

package/README.md CHANGED Viewed

@@ -3,6 +3,7 @@
 A methodology for reliable, parallelizable development using Claude Code with optional Agent Teams support.
 **Eliminates context rot** — task-level fresh dispatch (one subagent per task, ~10-20% context each) means compaction never triggers.
+**Compaction-proof debug loops** — `gsd-t headless --debug-loop` runs test-fix-retest cycles as separate `claude -p` sessions. A JSONL debug ledger persists all hypothesis/fix/learning history across fresh sessions. Anti-repetition preamble injection prevents retrying failed hypotheses. Escalation tiers (sonnet → opus → human) and a hard iteration ceiling enforced externally.
 **Safe parallel execution** — worktree isolation gives each domain agent its own filesystem; sequential atomic merges prevent conflicts.
 **Maintains test coverage** — automatically keeps tests aligned with code changes.
 **Catches downstream effects** — analyzes impact before changes break things.
@@ -11,6 +12,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
 **Generates visual scan reports** — every `/gsd-t-scan` produces a self-contained HTML report with 6 live architectural diagrams, a tech debt register, and domain health scores; optional DOCX/PDF export via `--export docx|pdf`.
 **Self-learning rule engine** — declarative rules in rules.jsonl detect failure patterns from task metrics. Candidate patches progress through a 5-stage lifecycle (candidate, applied, measured, promoted, graduated) with >55% improvement gates before becoming permanent methodology artifacts.
 **Cross-project learning** — proven rules propagate to `~/.claude/metrics/` and sync across all registered projects via `update-all`. Rules validated in 3+ projects become universal; 5+ projects qualify for npm distribution. Cross-project signal comparison and global ELO rankings available via `gsd-t-metrics --cross-project` and `gsd-t-status`.
+**Stack Rules Engine** — auto-detects project tech stack (React, TypeScript, Node API, Python, Go, Rust) from manifest files and injects mandatory best-practice rules into subagent prompts at execute-time. Universal security rules always apply; stack-specific rules layer on top. Extensible: drop a `.md` file in `templates/stacks/` to add a new stack.
 ---
@@ -22,7 +24,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
 npx @tekyzinc/gsd-t install
 ```
-This installs 45 GSD-T commands + 5 utility commands (50 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
+This installs 46 GSD-T commands + 5 utility commands (51 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
 ### Start Using It
@@ -83,8 +85,21 @@ npx @tekyzinc/gsd-t uninstall      # Remove commands (keeps project files)
 gsd-t headless verify --json --timeout=1200  # Run verify non-interactively
 gsd-t headless query status                  # Get project state (no LLM, <100ms)
 gsd-t headless query domains                 # List domains (no LLM)
+# Headless debug-loop (compaction-proof automated test-fix-retest)
+gsd-t headless --debug-loop                             # Auto-detect test cmd, up to 20 iterations
+gsd-t headless --debug-loop --max-iterations=10         # Cap at 10 iterations
+gsd-t headless --debug-loop --test-cmd="npm test"       # Override test command
+gsd-t headless --debug-loop --fix-scope="src/auth/**"   # Limit fix scope
+gsd-t headless --debug-loop --json --log                # Structured output + per-iteration logs
 ```
+Each iteration runs as a fresh `claude -p` session. A cumulative debug ledger (`.gsd-t/debug-state.jsonl`) preserves hypothesis/fix/learning history across sessions. An anti-repetition preamble prevents retrying failed approaches.
+**Escalation tiers**: sonnet (iterations 1–5) → opus (6–15) → STOP with diagnostic summary (16–20)
+**Exit codes**: `0` all tests pass · `1` max iterations reached · `2` compaction error · `3` process error · `4` needs human decision
 ### Updating
 When a new version is published:
@@ -141,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
 | `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
 | `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
 | `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
+| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
 | `/user:gsd-t-integrate` | Wire domains together | In wave |
 | `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
 | `/user:gsd-t-complete-milestone` | Archive + git tag (goal-backward gate required) | In wave |
@@ -314,13 +330,13 @@ get-stuff-done-teams/
 ├── LICENSE
 ├── bin/
 │   └── gsd-t.js                       # CLI installer
-├── commands/                          # 50 slash commands
-│   ├── gsd-t-*.md                     # 44 GSD-T workflow commands
+├── commands/                          # 51 slash commands
+│   ├── gsd-t-*.md                     # 45 GSD-T workflow commands
 │   ├── gsd.md                         # GSD-T smart router
 │   ├── branch.md                      # Git branch helper
 │   ├── checkin.md                     # Auto-version + commit/push helper
 │   └── Claude-md.md                   # Reload CLAUDE.md directives
-├── templates/                         # Document templates
+├── templates/                         # Document templates (9 base + stacks/)
 │   ├── CLAUDE-global.md
 │   ├── CLAUDE-project.md
 │   ├── requirements.md
@@ -329,7 +345,12 @@ get-stuff-done-teams/
 │   ├── infrastructure.md
 │   ├── progress.md
 │   ├── backlog.md
-│   └── backlog-settings.md
+│   ├── backlog-settings.md
+│   └── stacks/                        # Stack Rules Engine templates
+│       ├── _security.md               # Universal — always injected
+│       ├── react.md
+│       ├── typescript.md
+│       └── node-api.md
 ├── scripts/                           # Runtime utility scripts (installed to ~/.claude/scripts/)
 │   ├── gsd-t-tools.js                 # State CLI (get/set/validate/list)
 │   ├── gsd-t-statusline.js            # Context usage bar

package/bin/debug-ledger.js ADDED Viewed

@@ -0,0 +1,193 @@
+#!/usr/bin/env node
+/**
+ * GSD-T Debug Ledger — Persistent debug iteration store
+ *
+ * Reads and writes debug iteration records to .gsd-t/debug-state.jsonl.
+ * Supports compaction detection and ledger lifecycle management.
+ *
+ * Zero external dependencies (Node.js built-ins only).
+ */
+const fs = require("fs");
+const path = require("path");
+// ── Constants ─────────────────────────────────────────────────────────────────
+const COMPACTION_THRESHOLD = 51200; // 50KB
+const REQUIRED_FIELDS = [
+  "iteration", "timestamp", "test", "error",
+  "hypothesis", "fix", "fixFiles", "result",
+  "learning", "model", "duration",
+];
+const VALID_RESULTS = new Set(["PASS", "STILL_FAILS"]);
+// ── Exports ───────────────────────────────────────────────────────────────────
+module.exports = {
+  readLedger, appendEntry, getLedgerStats, clearLedger,
+  compactLedger, generateAntiRepetitionPreamble,
+};
+// ── readLedger ────────────────────────────────────────────────────────────────
+/**
+ * Read all entries from the debug ledger.
+ * @param {string} projectDir - Root directory of the project
+ * @returns {object[]} Array of parsed ledger entry objects
+ */
+function readLedger(projectDir) {
+  const fp = ledgerPath(projectDir);
+  if (!fs.existsSync(fp)) return [];
+  const content = fs.readFileSync(fp, "utf8").trim();
+  if (!content) return [];
+  return content.split("\n").map(safeParse).filter(Boolean);
+}
+// ── appendEntry ───────────────────────────────────────────────────────────────
+/**
+ * Validate and append one debug iteration entry to the ledger.
+ * Creates the file and parent directories if they do not exist.
+ * @param {string} projectDir - Root directory of the project
+ * @param {object} entry - Debug iteration record (see Required Fields)
+ * @throws {Error} If required fields are missing or invalid
+ */
+function appendEntry(projectDir, entry) {
+  const err = validateEntry(entry);
+  if (err) throw new Error(err);
+  const fp = ledgerPath(projectDir);
+  ensureDir(path.dirname(fp));
+  fs.appendFileSync(fp, JSON.stringify(entry) + "\n");
+}
+// ── getLedgerStats ────────────────────────────────────────────────────────────
+/**
+ * Return summary statistics for the current ledger.
+ * @param {string} projectDir - Root directory of the project
+ * @returns {{ entryCount: number, sizeBytes: number, needsCompaction: boolean, failedHypotheses: string[], passCount: number, failCount: number }}
+ */
+function getLedgerStats(projectDir) {
+  const fp = ledgerPath(projectDir);
+  const entries = readLedger(projectDir);
+  const sizeBytes = fs.existsSync(fp) ? fs.statSync(fp).size : 0;
+  const failedHypotheses = entries
+    .filter((e) => e.result === "STILL_FAILS" && e.hypothesis)
+    .map((e) => e.hypothesis);
+  const passCount = entries.filter((e) => e.result === "PASS").length;
+  const failCount = entries.filter((e) => e.result === "STILL_FAILS").length;
+  return {
+    entryCount: entries.length,
+    sizeBytes,
+    needsCompaction: sizeBytes > COMPACTION_THRESHOLD,
+    failedHypotheses,
+    passCount,
+    failCount,
+  };
+}
+// ── clearLedger ───────────────────────────────────────────────────────────────
+/**
+ * Delete the debug ledger file. Called when all tests pass.
+ * No-op if the file does not exist.
+ * @param {string} projectDir - Root directory of the project
+ */
+function clearLedger(projectDir) {
+  const fp = ledgerPath(projectDir);
+  if (fs.existsSync(fp)) fs.unlinkSync(fp);
+}
+// ── compactLedger ─────────────────────────────────────────────────────────────
+/**
+ * Compact the ledger by replacing all but the last 5 entries with a summary.
+ * @param {string} projectDir - Root directory of the project
+ * @param {string} summary - Summarization of compacted entries
+ */
+function compactLedger(projectDir, summary) {
+  const entries = readLedger(projectDir);
+  const tail = entries.slice(-5);
+  const compactedEntry = {
+    compacted: true,
+    learning: summary,
+    iteration: 0,
+    timestamp: new Date().toISOString(),
+    test: "compacted",
+    error: "see summary",
+    hypothesis: "compacted",
+    fix: "compacted",
+    fixFiles: [],
+    result: "compacted",
+    model: "haiku",
+    duration: 0,
+  };
+  const fp = ledgerPath(projectDir);
+  ensureDir(path.dirname(fp));
+  const lines = [compactedEntry, ...tail].map((e) => JSON.stringify(e)).join("\n") + "\n";
+  fs.writeFileSync(fp, lines);
+}
+// ── generateAntiRepetitionPreamble ────────────────────────────────────────────
+/**
+ * Build a preamble string listing failed hypotheses and the current narrowing
+ * direction. Injected into each claude -p session to prevent repeated attempts.
+ * @param {string} projectDir - Root directory of the project
+ * @returns {string} Formatted preamble, or empty string if ledger is empty
+ */
+function generateAntiRepetitionPreamble(projectDir) {
+  const entries = readLedger(projectDir);
+  if (!entries.length) return "";
+  const failed = entries.filter((e) => e.result === "STILL_FAILS");
+  const learnings = entries.filter((e) => e.learning && !e.compacted);
+  const lastLearning = learnings.length ? learnings[learnings.length - 1].learning : null;
+  const failLines = failed
+    .map((e, i) => `${i + 1}. [iteration ${e.iteration}] "${e.hypothesis}" — FAILED: ${e.error}`)
+    .join("\n");
+  const stillFailing = failed.map((e) => `- ${e.test}: ${e.error}`).join("\n");
+  const direction = lastLearning
+    ? `Based on ${entries.length} iterations, the evidence points to: ${lastLearning}`
+    : "No narrowing direction established yet.";
+  return [
+    "## Debug Ledger Context (DO NOT retry failed approaches)",
+    "",
+    "### Failed Hypotheses (DO NOT retry these):",
+    failLines || "(none yet)",
+    "",
+    "### Current Narrowing Direction:",
+    direction,
+    "",
+    "### Tests Still Failing:",
+    stillFailing || "(none recorded)",
+  ].join("\n");
+}
+// ── Internal helpers ──────────────────────────────────────────────────────────
+function ledgerPath(projectDir) {
+  return path.join(projectDir || process.cwd(), ".gsd-t", "debug-state.jsonl");
+}
+function ensureDir(dir) {
+  if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
+}
+function safeParse(line) {
+  try { return JSON.parse(line); } catch { return null; }
+}
+function validateEntry(entry) {
+  if (!entry || typeof entry !== "object") return "Entry must be an object";
+  for (const f of REQUIRED_FIELDS) {
+    if (entry[f] === undefined || entry[f] === null) return `Missing required field: ${f}`;
+  }
+  if (typeof entry.iteration !== "number") return "iteration must be a number";
+  if (typeof entry.duration !== "number") return "duration must be a number";
+  if (!Array.isArray(entry.fixFiles)) return "fixFiles must be an array";
+  if (!VALID_RESULTS.has(entry.result)) return `result must be "PASS" or "STILL_FAILS"`;
+  return null;
+}

package/bin/gsd-t.js CHANGED Viewed

@@ -19,6 +19,7 @@ const fs = require("fs");
 const path = require("path");
 const os = require("os");
 const { execFileSync, spawn: cpSpawn } = require("child_process");
+const debugLedger = require(path.join(__dirname, "debug-ledger.js"));
 // ─── Configuration ───────────────────────────────────────────────────────────
@@ -2174,6 +2175,236 @@ function doHeadlessQuery(type) {
   process.stdout.write(JSON.stringify(result) + "\n");
 }
+/**
+ * Parse debug-loop flags from args array.
+ * Extracts --max-iterations, --test-cmd, --fix-scope, --json, --log from args.
+ */
+function parseDebugLoopFlags(args) {
+  const flags = { maxIterations: 20, testCmd: null, fixScope: null, json: false, log: false };
+  const positional = [];
+  for (const arg of args) {
+    if (arg.startsWith("--max-iterations=")) {
+      const n = parseInt(arg.slice("--max-iterations=".length), 10);
+      if (!isNaN(n) && n > 0) flags.maxIterations = n;
+    } else if (arg.startsWith("--test-cmd=")) {
+      flags.testCmd = arg.slice("--test-cmd=".length);
+    } else if (arg.startsWith("--fix-scope=")) {
+      flags.fixScope = arg.slice("--fix-scope=".length);
+    } else if (arg === "--json") {
+      flags.json = true;
+    } else if (arg === "--log") {
+      flags.log = true;
+    } else {
+      positional.push(arg);
+    }
+  }
+  return { flags, positional };
+}
+/**
+ * Return the escalation model for a given iteration number.
+ * Tiers: 1-5 → sonnet, 6-15 → opus, 16+ → null (stop)
+ */
+function getEscalationModel(iteration) {
+  if (iteration >= 1 && iteration <= 5) return "sonnet";
+  if (iteration >= 6 && iteration <= 15) return "opus";
+  return null;
+}
+/**
+ * Spawn a single `claude -p` session and return stdout as a string.
+ * Returns null if the process fails.
+ */
+function spawnClaudeSession(prompt, model) {
+  try {
+    return execFileSync("claude", ["-p", prompt, "--model", model], {
+      encoding: "utf8", timeout: 300000,
+      stdio: ["pipe", "pipe", "pipe"],
+    });
+  } catch (e) {
+    return (e.stdout || "") + (e.stderr || "") || null;
+  }
+}
+/**
+ * Parse test pass/fail from claude output.
+ * Returns { passed: bool, summary: string }.
+ */
+function parseTestResult(output) {
+  const out = (output || "").toLowerCase();
+  const passed =
+    /\ball tests? pass(ed|ing)?\b/.test(out) ||
+    /\ball \d+ tests? pass/.test(out) ||
+    /\bno (test )?failures?\b/.test(out) ||
+    /\btests? (all )?pass(ed)?\b/.test(out);
+  const failed =
+    /\bfail(ed|ing|ure)?\b/.test(out) ||
+    /\berror\b/.test(out) ||
+    /\bnot ok\b/.test(out);
+  const summary = (output || "").slice(0, 500).replace(/\n/g, " ").trim();
+  return { passed: passed && !failed, summary };
+}
+/**
+ * Run ledger compaction: spawn haiku to summarize, then compact.
+ */
+function runLedgerCompaction(projectDir, jsonMode) {
+  const entries = debugLedger.readLedger(projectDir);
+  const compactPrompt =
+    "Read this debug ledger. Produce a condensed summary of what has been tried, " +
+    "what failed, and what the evidence suggests. Be concise.\n\n" +
+    JSON.stringify(entries, null, 2);
+  let summary = "Compacted — see previous entries.";
+  try {
+    const out = execFileSync("claude", ["-p", compactPrompt, "--model", "haiku"], {
+      encoding: "utf8", timeout: 120000, stdio: ["pipe", "pipe", "pipe"],
+    });
+    summary = (out || "").trim() || summary;
+  } catch (e) {
+    if (!jsonMode) warn("Compaction haiku session failed — using default summary");
+  }
+  debugLedger.compactLedger(projectDir, summary);
+}
+/**
+ * Write a per-iteration log file under .gsd-t/.
+ */
+function writeIterationLog(projectDir, ts, iteration, entry, rawOutput) {
+  const logDir = path.join(projectDir, ".gsd-t");
+  if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
+  const fname = `headless-debug-${ts}-iter-${iteration}.log`;
+  const content = [
+    `Iteration: ${iteration}`,
+    `Timestamp: ${entry.timestamp}`,
+    `Model: ${entry.model}`,
+    `Result: ${entry.result}`,
+    `Fix: ${entry.fix}`,
+    `Learning: ${entry.learning}`,
+    `---`,
+    rawOutput || "",
+  ].join("\n");
+  fs.writeFileSync(path.join(logDir, fname), content);
+}
+/**
+ * Full debug-loop: validate flags, check claude CLI, run iteration cycle.
+ */
+function doHeadlessDebugLoop(flags) {
+  const opts = flags || {};
+  const jsonMode = opts.json || false;
+  const projectDir = process.cwd();
+  if (opts.maxIterations < 1) {
+    const msg = "--max-iterations must be >= 1";
+    if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
+    else error(msg);
+    process.exit(3);
+  }
+  try {
+    execFileSync("claude", ["--version"], { encoding: "utf8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] });
+  } catch {
+    const msg = "claude CLI not found. Install with: npm install -g @anthropic-ai/claude-code";
+    if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
+    else error(msg);
+    process.exit(3);
+  }
+  if (!jsonMode) {
+    heading("GSD-T Headless — Debug Loop");
+    info(`Max iterations: ${opts.maxIterations}`);
+    if (opts.testCmd) info(`Test command: ${opts.testCmd}`);
+    if (opts.fixScope) info(`Fix scope: ${opts.fixScope}`);
+    if (opts.log) info(`Logging: enabled`);
+    log("");
+  }
+  const ts = Date.now();
+  for (let iteration = 1; iteration <= opts.maxIterations; iteration++) {
+    const model = getEscalationModel(iteration);
+    // STOP tier: escalation stop
+    if (model === null) {
+      const entries = debugLedger.readLedger(projectDir);
+      const stats = debugLedger.getLedgerStats(projectDir);
+      const diagMsg = `ESCALATION STOP at iteration ${iteration}. ` +
+        `Entries: ${stats.entryCount}, Failures: ${stats.failCount}. ` +
+        `Failed hypotheses:\n${stats.failedHypotheses.map((h, i) => `  ${i + 1}. ${h}`).join("\n")}`;
+      if (jsonMode) {
+        process.stdout.write(JSON.stringify({ success: false, exitCode: 4, iteration, diagnostic: diagMsg, entries }) + "\n");
+      } else {
+        log("");
+        warn(diagMsg);
+      }
+      process.exit(4);
+    }
+    // Check compaction
+    const stats = debugLedger.getLedgerStats(projectDir);
+    if (stats.needsCompaction) {
+      if (!jsonMode) info("Ledger compaction triggered...");
+      try { runLedgerCompaction(projectDir, jsonMode); }
+      catch { process.exit(2); }
+    }
+    // Generate preamble and build prompt
+    const preamble = debugLedger.generateAntiRepetitionPreamble(projectDir);
+    const scopeHint = opts.fixScope ? `\nFix scope: ${opts.fixScope}` : "";
+    const testHint = opts.testCmd ? `\nRun tests with: ${opts.testCmd}` : "";
+    const prompt = [preamble, `Fix the failing test(s). Write your fix, then run the test suite. Report results.${scopeHint}${testHint}`]
+      .filter(Boolean).join("\n\n");
+    if (!jsonMode) info(`Iteration ${iteration}/${opts.maxIterations} [${model}]...`);
+    const iterStart = Date.now();
+    let rawOutput = null;
+    try { rawOutput = spawnClaudeSession(prompt, model); }
+    catch (e) {
+      if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, iteration, error: String(e) }) + "\n");
+      else error(`Process error at iteration ${iteration}: ${e.message}`);
+      process.exit(3);
+    }
+    const duration = Math.round((Date.now() - iterStart) / 1000);
+    const { passed, summary } = parseTestResult(rawOutput);
+    const result = passed ? "PASS" : "STILL_FAILS";
+    // Extract fix description from output (first 200 chars of output)
+    const fixDesc = (rawOutput || "").split("\n").find((l) => l.trim().length > 20) || "see output";
+    const entry = {
+      iteration, timestamp: new Date().toISOString(),
+      test: opts.testCmd || "unspecified", error: passed ? "" : summary,
+      hypothesis: `iteration-${iteration}`, fix: fixDesc.trim().slice(0, 200),
+      fixFiles: [], result, learning: summary.slice(0, 300),
+      model, duration,
+    };
+    try { debugLedger.appendEntry(projectDir, entry); }
+    catch (e) {
+      if (!jsonMode) warn(`Failed to append ledger entry: ${e.message}`);
+    }
+    if (opts.log) writeIterationLog(projectDir, ts, iteration, entry, rawOutput);
+    if (jsonMode) {
+      process.stdout.write(JSON.stringify({ success: passed, exitCode: passed ? 0 : 1, iteration, result, model, duration, summary }) + "\n");
+    } else {
+      info(`  Result: ${result}`);
+    }
+    if (passed) {
+      debugLedger.clearLedger(projectDir);
+      if (!jsonMode) log(`\n${GREEN}All tests pass — debug loop complete.${RESET}`);
+      process.exit(0);
+    }
+  }
+  // Max iterations reached
+  if (!jsonMode) warn(`Max iterations (${opts.maxIterations}) reached without all tests passing.`);
+  process.exit(1);
+}
 function doHeadless(args) {
   const sub = args[0];
   if (!sub || sub === "--help" || sub === "-h") {
@@ -2181,6 +2412,12 @@ function doHeadless(args) {
     return;
   }
+  if (sub === "--debug-loop") {
+    const { flags } = parseDebugLoopFlags(args.slice(1));
+    doHeadlessDebugLoop(flags);
+    return;
+  }
   if (sub === "query") {
     const type = args[1];
     doHeadlessQuery(type);
@@ -2196,7 +2433,24 @@ function showHeadlessHelp() {
   log(`\n${BOLD}GSD-T Headless Mode${RESET}\n`);
   log(`${BOLD}Usage:${RESET}`);
   log(`  ${CYAN}gsd-t headless${RESET} <command> [args] [--json] [--timeout=N] [--log]`);
-  log(`  ${CYAN}gsd-t headless query${RESET} <type>\n`);
+  log(`  ${CYAN}gsd-t headless query${RESET} <type>`);
+  log(`  ${CYAN}gsd-t headless --debug-loop${RESET} [--max-iterations=N] [--test-cmd=CMD] [--fix-scope=SCOPE] [--json] [--log]\n`);
+  log(`${BOLD}Debug-loop flags:${RESET}`);
+  log(`  ${CYAN}--max-iterations=N${RESET}  Hard ceiling on iterations (default: 20)`);
+  log(`  ${CYAN}--test-cmd=CMD${RESET}      Override test command`);
+  log(`  ${CYAN}--fix-scope=SCOPE${RESET}   Limit fix scope to specific files or test patterns`);
+  log(`  ${CYAN}--json${RESET}              Structured JSON output per iteration`);
+  log(`  ${CYAN}--log${RESET}               Write per-iteration logs to .gsd-t/\n`);
+  log(`${BOLD}Debug-loop escalation tiers:${RESET}`);
+  log(`  Iterations 1-5:   sonnet  (standard debug)`);
+  log(`  Iterations 6-15:  opus    (deeper reasoning)`);
+  log(`  Iterations 16-20: STOP    (exit code 4 — needs human)\n`);
+  log(`${BOLD}Debug-loop exit codes:${RESET}`);
+  log(`  0  all tests pass`);
+  log(`  1  max iterations reached`);
+  log(`  2  ledger compaction error`);
+  log(`  3  process error`);
+  log(`  4  escalation stop — needs human\n`);
   log(`${BOLD}Exec flags:${RESET}`);
   log(`  ${CYAN}--json${RESET}        Structured JSON output`);
   log(`  ${CYAN}--timeout=N${RESET}   Kill after N seconds (default: 300)`);
@@ -2304,6 +2558,10 @@ module.exports = {
   doHeadlessExec,
   doHeadlessQuery,
   doHeadless,
+  // Headless debug-loop
+  parseDebugLoopFlags,
+  getEscalationModel,
+  doHeadlessDebugLoop,
   queryStatus,
   queryDomains,
   queryContracts,

package/commands/gsd-t-complete-milestone.md CHANGED Viewed

@@ -445,8 +445,9 @@ Verify the milestone is truly complete:
    c. If specs are missing or stale, invoke `gsd-t-test-sync` first.
    d. Report: "Unit: X/Y pass | E2E: X/Y pass"
 2. **Verify all pass**: Every test must pass. If any fail, fix before tagging (up to 2 attempts)
+3. **Functional test quality gate**: Read every Playwright spec. Verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded to input) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). Shallow tests that would pass on an empty HTML page with the right element IDs are a milestone completion FAIL. Flag and rewrite before proceeding.
 4. **Compare to baseline**: If a test baseline was recorded at milestone start, verify coverage has improved or at minimum not regressed
-5. **Log test results**: Include test pass/fail counts in the milestone summary (Step 4)
+5. **Log test results**: Include test pass/fail counts and shallow test audit results in the milestone summary (Step 4)
 ## Step 11: Create Git Tag