@tekyzinc/gsd-t 2.45.11 → 2.50.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +23 -0
- package/README.md +26 -5
- package/bin/debug-ledger.js +193 -0
- package/bin/gsd-t.js +259 -1
- package/commands/gsd-t-complete-milestone.md +2 -1
- package/commands/gsd-t-debug.md +48 -2
- package/commands/gsd-t-doc-ripple.md +148 -0
- package/commands/gsd-t-execute.md +102 -5
- package/commands/gsd-t-help.md +25 -2
- package/commands/gsd-t-integrate.md +41 -1
- package/commands/gsd-t-qa.md +26 -5
- package/commands/gsd-t-quick.md +39 -1
- package/commands/gsd-t-test-sync.md +26 -1
- package/commands/gsd-t-verify.md +8 -2
- package/commands/gsd-t-wave.md +57 -0
- package/docs/GSD-T-README.md +84 -1
- package/docs/architecture.md +9 -1
- package/docs/framework-comparison-scorecard.md +160 -0
- package/docs/requirements.md +33 -0
- package/examples/rules/desktop.ini +2 -0
- package/package.json +2 -2
- package/templates/CLAUDE-global.md +82 -4
- package/templates/stacks/_security.md +243 -0
- package/templates/stacks/desktop.ini +2 -0
- package/templates/stacks/docker.md +202 -0
- package/templates/stacks/firebase.md +166 -0
- package/templates/stacks/flutter.md +205 -0
- package/templates/stacks/github-actions.md +201 -0
- package/templates/stacks/graphql.md +216 -0
- package/templates/stacks/neo4j.md +218 -0
- package/templates/stacks/nextjs.md +184 -0
- package/templates/stacks/node-api.md +196 -0
- package/templates/stacks/playwright.md +528 -0
- package/templates/stacks/postgresql.md +225 -0
- package/templates/stacks/python.md +243 -0
- package/templates/stacks/react-native.md +216 -0
- package/templates/stacks/react.md +293 -0
- package/templates/stacks/redux.md +193 -0
- package/templates/stacks/rest-api.md +202 -0
- package/templates/stacks/supabase.md +188 -0
- package/templates/stacks/tailwind.md +169 -0
- package/templates/stacks/typescript.md +176 -0
- package/templates/stacks/vite.md +176 -0
- package/templates/stacks/vue.md +189 -0
- package/templates/stacks/zustand.md +203 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,29 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to GSD-T are documented here. Updated with each release.
|
|
4
4
|
|
|
5
|
+
## [2.50.10] - 2026-03-25
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **18 new stack rule files** — python, flutter, tailwind, react-native, vite, nextjs, vue, docker, postgresql (with graph-in-SQL section), github-actions, rest-api, supabase, firebase, graphql, zustand, redux, neo4j, playwright. Total: 22 stack rules (was 4).
|
|
9
|
+
- **Playwright best practices** — coverage matrix per feature, pairwise combinatorial testing, state transition testing, multi-step workflow testing, Page Object Model, API mocking patterns. Enforces rigorous test depth across permutations.
|
|
10
|
+
- **react.md expanded** — added state management decision table, form management (react-hook-form + zod), React naming conventions (3 new sections from external best practices review).
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
- Stack detection in execute, quick, and debug commands updated to cover all 22 stack files with conditional detection per project dependencies.
|
|
14
|
+
- PostgreSQL graph-in-SQL patterns (adjacency lists, junction tables, recursive CTEs) added to postgresql.md based on real project analysis.
|
|
15
|
+
|
|
16
|
+
## [2.46.11] - 2026-03-24
|
|
17
|
+
|
|
18
|
+
### Added
|
|
19
|
+
- **M28: Doc-Ripple Subagent** — automated document ripple enforcement agent. Threshold check (7 FIRE/3 SKIP conditions), blast radius analysis, manifest generation, parallel document updates. New command: `gsd-t-doc-ripple`. 43 new tests. Wired into execute, integrate, quick, debug, wave.
|
|
20
|
+
- **Orchestrator context self-check** — execute and wave orchestrators now check their own context utilization after every domain/phase. If >= 70%, saves progress and stops to prevent session breaks.
|
|
21
|
+
- **Functional E2E test quality standard (REQ-050)** — Playwright specs must verify functional behavior, not just element existence. Shallow test audit added to qa, test-sync, verify, complete-milestone commands.
|
|
22
|
+
- **Document Ripple Completion Gate (REQ-051)** — structural rule preventing "done" reports until all downstream documents are updated.
|
|
23
|
+
|
|
24
|
+
### Changed
|
|
25
|
+
- Command count: 50 → 51 (added `gsd-t-doc-ripple`)
|
|
26
|
+
- Package description updated to include doc-ripple enforcement
|
|
27
|
+
|
|
5
28
|
## [2.39.12] - 2026-03-19
|
|
6
29
|
|
|
7
30
|
### Added
|
package/README.md
CHANGED
|
@@ -3,6 +3,7 @@
|
|
|
3
3
|
A methodology for reliable, parallelizable development using Claude Code with optional Agent Teams support.
|
|
4
4
|
|
|
5
5
|
**Eliminates context rot** — task-level fresh dispatch (one subagent per task, ~10-20% context each) means compaction never triggers.
|
|
6
|
+
**Compaction-proof debug loops** — `gsd-t headless --debug-loop` runs test-fix-retest cycles as separate `claude -p` sessions. A JSONL debug ledger persists all hypothesis/fix/learning history across fresh sessions. Anti-repetition preamble injection prevents retrying failed hypotheses. Escalation tiers (sonnet → opus → human) and a hard iteration ceiling enforced externally.
|
|
6
7
|
**Safe parallel execution** — worktree isolation gives each domain agent its own filesystem; sequential atomic merges prevent conflicts.
|
|
7
8
|
**Maintains test coverage** — automatically keeps tests aligned with code changes.
|
|
8
9
|
**Catches downstream effects** — analyzes impact before changes break things.
|
|
@@ -11,6 +12,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
|
|
|
11
12
|
**Generates visual scan reports** — every `/gsd-t-scan` produces a self-contained HTML report with 6 live architectural diagrams, a tech debt register, and domain health scores; optional DOCX/PDF export via `--export docx|pdf`.
|
|
12
13
|
**Self-learning rule engine** — declarative rules in rules.jsonl detect failure patterns from task metrics. Candidate patches progress through a 5-stage lifecycle (candidate, applied, measured, promoted, graduated) with >55% improvement gates before becoming permanent methodology artifacts.
|
|
13
14
|
**Cross-project learning** — proven rules propagate to `~/.claude/metrics/` and sync across all registered projects via `update-all`. Rules validated in 3+ projects become universal; 5+ projects qualify for npm distribution. Cross-project signal comparison and global ELO rankings available via `gsd-t-metrics --cross-project` and `gsd-t-status`.
|
|
15
|
+
**Stack Rules Engine** — auto-detects project tech stack (React, TypeScript, Node API, Python, Go, Rust) from manifest files and injects mandatory best-practice rules into subagent prompts at execute-time. Universal security rules always apply; stack-specific rules layer on top. Extensible: drop a `.md` file in `templates/stacks/` to add a new stack.
|
|
14
16
|
|
|
15
17
|
---
|
|
16
18
|
|
|
@@ -22,7 +24,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
|
|
|
22
24
|
npx @tekyzinc/gsd-t install
|
|
23
25
|
```
|
|
24
26
|
|
|
25
|
-
This installs
|
|
27
|
+
This installs 46 GSD-T commands + 5 utility commands (51 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
|
|
26
28
|
|
|
27
29
|
### Start Using It
|
|
28
30
|
|
|
@@ -83,8 +85,21 @@ npx @tekyzinc/gsd-t uninstall # Remove commands (keeps project files)
|
|
|
83
85
|
gsd-t headless verify --json --timeout=1200 # Run verify non-interactively
|
|
84
86
|
gsd-t headless query status # Get project state (no LLM, <100ms)
|
|
85
87
|
gsd-t headless query domains # List domains (no LLM)
|
|
88
|
+
|
|
89
|
+
# Headless debug-loop (compaction-proof automated test-fix-retest)
|
|
90
|
+
gsd-t headless --debug-loop # Auto-detect test cmd, up to 20 iterations
|
|
91
|
+
gsd-t headless --debug-loop --max-iterations=10 # Cap at 10 iterations
|
|
92
|
+
gsd-t headless --debug-loop --test-cmd="npm test" # Override test command
|
|
93
|
+
gsd-t headless --debug-loop --fix-scope="src/auth/**" # Limit fix scope
|
|
94
|
+
gsd-t headless --debug-loop --json --log # Structured output + per-iteration logs
|
|
86
95
|
```
|
|
87
96
|
|
|
97
|
+
Each iteration runs as a fresh `claude -p` session. A cumulative debug ledger (`.gsd-t/debug-state.jsonl`) preserves hypothesis/fix/learning history across sessions. An anti-repetition preamble prevents retrying failed approaches.
|
|
98
|
+
|
|
99
|
+
**Escalation tiers**: sonnet (iterations 1–5) → opus (6–15) → STOP with diagnostic summary (16–20)
|
|
100
|
+
|
|
101
|
+
**Exit codes**: `0` all tests pass · `1` max iterations reached · `2` compaction error · `3` process error · `4` needs human decision
|
|
102
|
+
|
|
88
103
|
### Updating
|
|
89
104
|
|
|
90
105
|
When a new version is published:
|
|
@@ -141,6 +156,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
|
|
|
141
156
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
|
|
142
157
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
143
158
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
159
|
+
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
144
160
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
145
161
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
|
|
146
162
|
| `/user:gsd-t-complete-milestone` | Archive + git tag (goal-backward gate required) | In wave |
|
|
@@ -314,13 +330,13 @@ get-stuff-done-teams/
|
|
|
314
330
|
├── LICENSE
|
|
315
331
|
├── bin/
|
|
316
332
|
│ └── gsd-t.js # CLI installer
|
|
317
|
-
├── commands/ #
|
|
318
|
-
│ ├── gsd-t-*.md #
|
|
333
|
+
├── commands/ # 51 slash commands
|
|
334
|
+
│ ├── gsd-t-*.md # 45 GSD-T workflow commands
|
|
319
335
|
│ ├── gsd.md # GSD-T smart router
|
|
320
336
|
│ ├── branch.md # Git branch helper
|
|
321
337
|
│ ├── checkin.md # Auto-version + commit/push helper
|
|
322
338
|
│ └── Claude-md.md # Reload CLAUDE.md directives
|
|
323
|
-
├── templates/ # Document templates
|
|
339
|
+
├── templates/ # Document templates (9 base + stacks/)
|
|
324
340
|
│ ├── CLAUDE-global.md
|
|
325
341
|
│ ├── CLAUDE-project.md
|
|
326
342
|
│ ├── requirements.md
|
|
@@ -329,7 +345,12 @@ get-stuff-done-teams/
|
|
|
329
345
|
│ ├── infrastructure.md
|
|
330
346
|
│ ├── progress.md
|
|
331
347
|
│ ├── backlog.md
|
|
332
|
-
│
|
|
348
|
+
│ ├── backlog-settings.md
|
|
349
|
+
│ └── stacks/ # Stack Rules Engine templates
|
|
350
|
+
│ ├── _security.md # Universal — always injected
|
|
351
|
+
│ ├── react.md
|
|
352
|
+
│ ├── typescript.md
|
|
353
|
+
│ └── node-api.md
|
|
333
354
|
├── scripts/ # Runtime utility scripts (installed to ~/.claude/scripts/)
|
|
334
355
|
│ ├── gsd-t-tools.js # State CLI (get/set/validate/list)
|
|
335
356
|
│ ├── gsd-t-statusline.js # Context usage bar
|
|
@@ -0,0 +1,193 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
/**
|
|
4
|
+
* GSD-T Debug Ledger — Persistent debug iteration store
|
|
5
|
+
*
|
|
6
|
+
* Reads and writes debug iteration records to .gsd-t/debug-state.jsonl.
|
|
7
|
+
* Supports compaction detection and ledger lifecycle management.
|
|
8
|
+
*
|
|
9
|
+
* Zero external dependencies (Node.js built-ins only).
|
|
10
|
+
*/
|
|
11
|
+
|
|
12
|
+
const fs = require("fs");
|
|
13
|
+
const path = require("path");
|
|
14
|
+
|
|
15
|
+
// ── Constants ─────────────────────────────────────────────────────────────────
|
|
16
|
+
|
|
17
|
+
const COMPACTION_THRESHOLD = 51200; // 50KB
|
|
18
|
+
|
|
19
|
+
const REQUIRED_FIELDS = [
|
|
20
|
+
"iteration", "timestamp", "test", "error",
|
|
21
|
+
"hypothesis", "fix", "fixFiles", "result",
|
|
22
|
+
"learning", "model", "duration",
|
|
23
|
+
];
|
|
24
|
+
|
|
25
|
+
const VALID_RESULTS = new Set(["PASS", "STILL_FAILS"]);
|
|
26
|
+
|
|
27
|
+
// ── Exports ───────────────────────────────────────────────────────────────────
|
|
28
|
+
|
|
29
|
+
module.exports = {
|
|
30
|
+
readLedger, appendEntry, getLedgerStats, clearLedger,
|
|
31
|
+
compactLedger, generateAntiRepetitionPreamble,
|
|
32
|
+
};
|
|
33
|
+
|
|
34
|
+
// ── readLedger ────────────────────────────────────────────────────────────────
|
|
35
|
+
|
|
36
|
+
/**
|
|
37
|
+
* Read all entries from the debug ledger.
|
|
38
|
+
* @param {string} projectDir - Root directory of the project
|
|
39
|
+
* @returns {object[]} Array of parsed ledger entry objects
|
|
40
|
+
*/
|
|
41
|
+
function readLedger(projectDir) {
|
|
42
|
+
const fp = ledgerPath(projectDir);
|
|
43
|
+
if (!fs.existsSync(fp)) return [];
|
|
44
|
+
const content = fs.readFileSync(fp, "utf8").trim();
|
|
45
|
+
if (!content) return [];
|
|
46
|
+
return content.split("\n").map(safeParse).filter(Boolean);
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
// ── appendEntry ───────────────────────────────────────────────────────────────
|
|
50
|
+
|
|
51
|
+
/**
|
|
52
|
+
* Validate and append one debug iteration entry to the ledger.
|
|
53
|
+
* Creates the file and parent directories if they do not exist.
|
|
54
|
+
* @param {string} projectDir - Root directory of the project
|
|
55
|
+
* @param {object} entry - Debug iteration record (see Required Fields)
|
|
56
|
+
* @throws {Error} If required fields are missing or invalid
|
|
57
|
+
*/
|
|
58
|
+
function appendEntry(projectDir, entry) {
|
|
59
|
+
const err = validateEntry(entry);
|
|
60
|
+
if (err) throw new Error(err);
|
|
61
|
+
const fp = ledgerPath(projectDir);
|
|
62
|
+
ensureDir(path.dirname(fp));
|
|
63
|
+
fs.appendFileSync(fp, JSON.stringify(entry) + "\n");
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
// ── getLedgerStats ────────────────────────────────────────────────────────────
|
|
67
|
+
|
|
68
|
+
/**
|
|
69
|
+
* Return summary statistics for the current ledger.
|
|
70
|
+
* @param {string} projectDir - Root directory of the project
|
|
71
|
+
* @returns {{ entryCount: number, sizeBytes: number, needsCompaction: boolean, failedHypotheses: string[], passCount: number, failCount: number }}
|
|
72
|
+
*/
|
|
73
|
+
function getLedgerStats(projectDir) {
|
|
74
|
+
const fp = ledgerPath(projectDir);
|
|
75
|
+
const entries = readLedger(projectDir);
|
|
76
|
+
const sizeBytes = fs.existsSync(fp) ? fs.statSync(fp).size : 0;
|
|
77
|
+
const failedHypotheses = entries
|
|
78
|
+
.filter((e) => e.result === "STILL_FAILS" && e.hypothesis)
|
|
79
|
+
.map((e) => e.hypothesis);
|
|
80
|
+
const passCount = entries.filter((e) => e.result === "PASS").length;
|
|
81
|
+
const failCount = entries.filter((e) => e.result === "STILL_FAILS").length;
|
|
82
|
+
return {
|
|
83
|
+
entryCount: entries.length,
|
|
84
|
+
sizeBytes,
|
|
85
|
+
needsCompaction: sizeBytes > COMPACTION_THRESHOLD,
|
|
86
|
+
failedHypotheses,
|
|
87
|
+
passCount,
|
|
88
|
+
failCount,
|
|
89
|
+
};
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
// ── clearLedger ───────────────────────────────────────────────────────────────
|
|
93
|
+
|
|
94
|
+
/**
|
|
95
|
+
* Delete the debug ledger file. Called when all tests pass.
|
|
96
|
+
* No-op if the file does not exist.
|
|
97
|
+
* @param {string} projectDir - Root directory of the project
|
|
98
|
+
*/
|
|
99
|
+
function clearLedger(projectDir) {
|
|
100
|
+
const fp = ledgerPath(projectDir);
|
|
101
|
+
if (fs.existsSync(fp)) fs.unlinkSync(fp);
|
|
102
|
+
}
|
|
103
|
+
|
|
104
|
+
// ── compactLedger ─────────────────────────────────────────────────────────────
|
|
105
|
+
|
|
106
|
+
/**
|
|
107
|
+
* Compact the ledger by replacing all but the last 5 entries with a summary.
|
|
108
|
+
* @param {string} projectDir - Root directory of the project
|
|
109
|
+
* @param {string} summary - Summarization of compacted entries
|
|
110
|
+
*/
|
|
111
|
+
function compactLedger(projectDir, summary) {
|
|
112
|
+
const entries = readLedger(projectDir);
|
|
113
|
+
const tail = entries.slice(-5);
|
|
114
|
+
const compactedEntry = {
|
|
115
|
+
compacted: true,
|
|
116
|
+
learning: summary,
|
|
117
|
+
iteration: 0,
|
|
118
|
+
timestamp: new Date().toISOString(),
|
|
119
|
+
test: "compacted",
|
|
120
|
+
error: "see summary",
|
|
121
|
+
hypothesis: "compacted",
|
|
122
|
+
fix: "compacted",
|
|
123
|
+
fixFiles: [],
|
|
124
|
+
result: "compacted",
|
|
125
|
+
model: "haiku",
|
|
126
|
+
duration: 0,
|
|
127
|
+
};
|
|
128
|
+
const fp = ledgerPath(projectDir);
|
|
129
|
+
ensureDir(path.dirname(fp));
|
|
130
|
+
const lines = [compactedEntry, ...tail].map((e) => JSON.stringify(e)).join("\n") + "\n";
|
|
131
|
+
fs.writeFileSync(fp, lines);
|
|
132
|
+
}
|
|
133
|
+
|
|
134
|
+
// ── generateAntiRepetitionPreamble ────────────────────────────────────────────
|
|
135
|
+
|
|
136
|
+
/**
|
|
137
|
+
* Build a preamble string listing failed hypotheses and the current narrowing
|
|
138
|
+
* direction. Injected into each claude -p session to prevent repeated attempts.
|
|
139
|
+
* @param {string} projectDir - Root directory of the project
|
|
140
|
+
* @returns {string} Formatted preamble, or empty string if ledger is empty
|
|
141
|
+
*/
|
|
142
|
+
function generateAntiRepetitionPreamble(projectDir) {
|
|
143
|
+
const entries = readLedger(projectDir);
|
|
144
|
+
if (!entries.length) return "";
|
|
145
|
+
const failed = entries.filter((e) => e.result === "STILL_FAILS");
|
|
146
|
+
const learnings = entries.filter((e) => e.learning && !e.compacted);
|
|
147
|
+
const lastLearning = learnings.length ? learnings[learnings.length - 1].learning : null;
|
|
148
|
+
const failLines = failed
|
|
149
|
+
.map((e, i) => `${i + 1}. [iteration ${e.iteration}] "${e.hypothesis}" — FAILED: ${e.error}`)
|
|
150
|
+
.join("\n");
|
|
151
|
+
const stillFailing = failed.map((e) => `- ${e.test}: ${e.error}`).join("\n");
|
|
152
|
+
const direction = lastLearning
|
|
153
|
+
? `Based on ${entries.length} iterations, the evidence points to: ${lastLearning}`
|
|
154
|
+
: "No narrowing direction established yet.";
|
|
155
|
+
return [
|
|
156
|
+
"## Debug Ledger Context (DO NOT retry failed approaches)",
|
|
157
|
+
"",
|
|
158
|
+
"### Failed Hypotheses (DO NOT retry these):",
|
|
159
|
+
failLines || "(none yet)",
|
|
160
|
+
"",
|
|
161
|
+
"### Current Narrowing Direction:",
|
|
162
|
+
direction,
|
|
163
|
+
"",
|
|
164
|
+
"### Tests Still Failing:",
|
|
165
|
+
stillFailing || "(none recorded)",
|
|
166
|
+
].join("\n");
|
|
167
|
+
}
|
|
168
|
+
|
|
169
|
+
// ── Internal helpers ──────────────────────────────────────────────────────────
|
|
170
|
+
|
|
171
|
+
function ledgerPath(projectDir) {
|
|
172
|
+
return path.join(projectDir || process.cwd(), ".gsd-t", "debug-state.jsonl");
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
function ensureDir(dir) {
|
|
176
|
+
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
|
|
177
|
+
}
|
|
178
|
+
|
|
179
|
+
function safeParse(line) {
|
|
180
|
+
try { return JSON.parse(line); } catch { return null; }
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
function validateEntry(entry) {
|
|
184
|
+
if (!entry || typeof entry !== "object") return "Entry must be an object";
|
|
185
|
+
for (const f of REQUIRED_FIELDS) {
|
|
186
|
+
if (entry[f] === undefined || entry[f] === null) return `Missing required field: ${f}`;
|
|
187
|
+
}
|
|
188
|
+
if (typeof entry.iteration !== "number") return "iteration must be a number";
|
|
189
|
+
if (typeof entry.duration !== "number") return "duration must be a number";
|
|
190
|
+
if (!Array.isArray(entry.fixFiles)) return "fixFiles must be an array";
|
|
191
|
+
if (!VALID_RESULTS.has(entry.result)) return `result must be "PASS" or "STILL_FAILS"`;
|
|
192
|
+
return null;
|
|
193
|
+
}
|
package/bin/gsd-t.js
CHANGED
|
@@ -19,6 +19,7 @@ const fs = require("fs");
|
|
|
19
19
|
const path = require("path");
|
|
20
20
|
const os = require("os");
|
|
21
21
|
const { execFileSync, spawn: cpSpawn } = require("child_process");
|
|
22
|
+
const debugLedger = require(path.join(__dirname, "debug-ledger.js"));
|
|
22
23
|
|
|
23
24
|
// ─── Configuration ───────────────────────────────────────────────────────────
|
|
24
25
|
|
|
@@ -2174,6 +2175,236 @@ function doHeadlessQuery(type) {
|
|
|
2174
2175
|
process.stdout.write(JSON.stringify(result) + "\n");
|
|
2175
2176
|
}
|
|
2176
2177
|
|
|
2178
|
+
/**
|
|
2179
|
+
* Parse debug-loop flags from args array.
|
|
2180
|
+
* Extracts --max-iterations, --test-cmd, --fix-scope, --json, --log from args.
|
|
2181
|
+
*/
|
|
2182
|
+
function parseDebugLoopFlags(args) {
|
|
2183
|
+
const flags = { maxIterations: 20, testCmd: null, fixScope: null, json: false, log: false };
|
|
2184
|
+
const positional = [];
|
|
2185
|
+
for (const arg of args) {
|
|
2186
|
+
if (arg.startsWith("--max-iterations=")) {
|
|
2187
|
+
const n = parseInt(arg.slice("--max-iterations=".length), 10);
|
|
2188
|
+
if (!isNaN(n) && n > 0) flags.maxIterations = n;
|
|
2189
|
+
} else if (arg.startsWith("--test-cmd=")) {
|
|
2190
|
+
flags.testCmd = arg.slice("--test-cmd=".length);
|
|
2191
|
+
} else if (arg.startsWith("--fix-scope=")) {
|
|
2192
|
+
flags.fixScope = arg.slice("--fix-scope=".length);
|
|
2193
|
+
} else if (arg === "--json") {
|
|
2194
|
+
flags.json = true;
|
|
2195
|
+
} else if (arg === "--log") {
|
|
2196
|
+
flags.log = true;
|
|
2197
|
+
} else {
|
|
2198
|
+
positional.push(arg);
|
|
2199
|
+
}
|
|
2200
|
+
}
|
|
2201
|
+
return { flags, positional };
|
|
2202
|
+
}
|
|
2203
|
+
|
|
2204
|
+
/**
|
|
2205
|
+
* Return the escalation model for a given iteration number.
|
|
2206
|
+
* Tiers: 1-5 → sonnet, 6-15 → opus, 16+ → null (stop)
|
|
2207
|
+
*/
|
|
2208
|
+
function getEscalationModel(iteration) {
|
|
2209
|
+
if (iteration >= 1 && iteration <= 5) return "sonnet";
|
|
2210
|
+
if (iteration >= 6 && iteration <= 15) return "opus";
|
|
2211
|
+
return null;
|
|
2212
|
+
}
|
|
2213
|
+
|
|
2214
|
+
/**
|
|
2215
|
+
* Spawn a single `claude -p` session and return stdout as a string.
|
|
2216
|
+
* Returns null if the process fails.
|
|
2217
|
+
*/
|
|
2218
|
+
function spawnClaudeSession(prompt, model) {
|
|
2219
|
+
try {
|
|
2220
|
+
return execFileSync("claude", ["-p", prompt, "--model", model], {
|
|
2221
|
+
encoding: "utf8", timeout: 300000,
|
|
2222
|
+
stdio: ["pipe", "pipe", "pipe"],
|
|
2223
|
+
});
|
|
2224
|
+
} catch (e) {
|
|
2225
|
+
return (e.stdout || "") + (e.stderr || "") || null;
|
|
2226
|
+
}
|
|
2227
|
+
}
|
|
2228
|
+
|
|
2229
|
+
/**
|
|
2230
|
+
* Parse test pass/fail from claude output.
|
|
2231
|
+
* Returns { passed: bool, summary: string }.
|
|
2232
|
+
*/
|
|
2233
|
+
function parseTestResult(output) {
|
|
2234
|
+
const out = (output || "").toLowerCase();
|
|
2235
|
+
const passed =
|
|
2236
|
+
/\ball tests? pass(ed|ing)?\b/.test(out) ||
|
|
2237
|
+
/\ball \d+ tests? pass/.test(out) ||
|
|
2238
|
+
/\bno (test )?failures?\b/.test(out) ||
|
|
2239
|
+
/\btests? (all )?pass(ed)?\b/.test(out);
|
|
2240
|
+
const failed =
|
|
2241
|
+
/\bfail(ed|ing|ure)?\b/.test(out) ||
|
|
2242
|
+
/\berror\b/.test(out) ||
|
|
2243
|
+
/\bnot ok\b/.test(out);
|
|
2244
|
+
const summary = (output || "").slice(0, 500).replace(/\n/g, " ").trim();
|
|
2245
|
+
return { passed: passed && !failed, summary };
|
|
2246
|
+
}
|
|
2247
|
+
|
|
2248
|
+
/**
|
|
2249
|
+
* Run ledger compaction: spawn haiku to summarize, then compact.
|
|
2250
|
+
*/
|
|
2251
|
+
function runLedgerCompaction(projectDir, jsonMode) {
|
|
2252
|
+
const entries = debugLedger.readLedger(projectDir);
|
|
2253
|
+
const compactPrompt =
|
|
2254
|
+
"Read this debug ledger. Produce a condensed summary of what has been tried, " +
|
|
2255
|
+
"what failed, and what the evidence suggests. Be concise.\n\n" +
|
|
2256
|
+
JSON.stringify(entries, null, 2);
|
|
2257
|
+
let summary = "Compacted — see previous entries.";
|
|
2258
|
+
try {
|
|
2259
|
+
const out = execFileSync("claude", ["-p", compactPrompt, "--model", "haiku"], {
|
|
2260
|
+
encoding: "utf8", timeout: 120000, stdio: ["pipe", "pipe", "pipe"],
|
|
2261
|
+
});
|
|
2262
|
+
summary = (out || "").trim() || summary;
|
|
2263
|
+
} catch (e) {
|
|
2264
|
+
if (!jsonMode) warn("Compaction haiku session failed — using default summary");
|
|
2265
|
+
}
|
|
2266
|
+
debugLedger.compactLedger(projectDir, summary);
|
|
2267
|
+
}
|
|
2268
|
+
|
|
2269
|
+
/**
|
|
2270
|
+
* Write a per-iteration log file under .gsd-t/.
|
|
2271
|
+
*/
|
|
2272
|
+
function writeIterationLog(projectDir, ts, iteration, entry, rawOutput) {
|
|
2273
|
+
const logDir = path.join(projectDir, ".gsd-t");
|
|
2274
|
+
if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
|
|
2275
|
+
const fname = `headless-debug-${ts}-iter-${iteration}.log`;
|
|
2276
|
+
const content = [
|
|
2277
|
+
`Iteration: ${iteration}`,
|
|
2278
|
+
`Timestamp: ${entry.timestamp}`,
|
|
2279
|
+
`Model: ${entry.model}`,
|
|
2280
|
+
`Result: ${entry.result}`,
|
|
2281
|
+
`Fix: ${entry.fix}`,
|
|
2282
|
+
`Learning: ${entry.learning}`,
|
|
2283
|
+
`---`,
|
|
2284
|
+
rawOutput || "",
|
|
2285
|
+
].join("\n");
|
|
2286
|
+
fs.writeFileSync(path.join(logDir, fname), content);
|
|
2287
|
+
}
|
|
2288
|
+
|
|
2289
|
+
/**
|
|
2290
|
+
* Full debug-loop: validate flags, check claude CLI, run iteration cycle.
|
|
2291
|
+
*/
|
|
2292
|
+
function doHeadlessDebugLoop(flags) {
|
|
2293
|
+
const opts = flags || {};
|
|
2294
|
+
const jsonMode = opts.json || false;
|
|
2295
|
+
const projectDir = process.cwd();
|
|
2296
|
+
|
|
2297
|
+
if (opts.maxIterations < 1) {
|
|
2298
|
+
const msg = "--max-iterations must be >= 1";
|
|
2299
|
+
if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
|
|
2300
|
+
else error(msg);
|
|
2301
|
+
process.exit(3);
|
|
2302
|
+
}
|
|
2303
|
+
|
|
2304
|
+
try {
|
|
2305
|
+
execFileSync("claude", ["--version"], { encoding: "utf8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] });
|
|
2306
|
+
} catch {
|
|
2307
|
+
const msg = "claude CLI not found. Install with: npm install -g @anthropic-ai/claude-code";
|
|
2308
|
+
if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, error: msg }) + "\n");
|
|
2309
|
+
else error(msg);
|
|
2310
|
+
process.exit(3);
|
|
2311
|
+
}
|
|
2312
|
+
|
|
2313
|
+
if (!jsonMode) {
|
|
2314
|
+
heading("GSD-T Headless — Debug Loop");
|
|
2315
|
+
info(`Max iterations: ${opts.maxIterations}`);
|
|
2316
|
+
if (opts.testCmd) info(`Test command: ${opts.testCmd}`);
|
|
2317
|
+
if (opts.fixScope) info(`Fix scope: ${opts.fixScope}`);
|
|
2318
|
+
if (opts.log) info(`Logging: enabled`);
|
|
2319
|
+
log("");
|
|
2320
|
+
}
|
|
2321
|
+
|
|
2322
|
+
const ts = Date.now();
|
|
2323
|
+
|
|
2324
|
+
for (let iteration = 1; iteration <= opts.maxIterations; iteration++) {
|
|
2325
|
+
const model = getEscalationModel(iteration);
|
|
2326
|
+
|
|
2327
|
+
// STOP tier: escalation stop
|
|
2328
|
+
if (model === null) {
|
|
2329
|
+
const entries = debugLedger.readLedger(projectDir);
|
|
2330
|
+
const stats = debugLedger.getLedgerStats(projectDir);
|
|
2331
|
+
const diagMsg = `ESCALATION STOP at iteration ${iteration}. ` +
|
|
2332
|
+
`Entries: ${stats.entryCount}, Failures: ${stats.failCount}. ` +
|
|
2333
|
+
`Failed hypotheses:\n${stats.failedHypotheses.map((h, i) => ` ${i + 1}. ${h}`).join("\n")}`;
|
|
2334
|
+
if (jsonMode) {
|
|
2335
|
+
process.stdout.write(JSON.stringify({ success: false, exitCode: 4, iteration, diagnostic: diagMsg, entries }) + "\n");
|
|
2336
|
+
} else {
|
|
2337
|
+
log("");
|
|
2338
|
+
warn(diagMsg);
|
|
2339
|
+
}
|
|
2340
|
+
process.exit(4);
|
|
2341
|
+
}
|
|
2342
|
+
|
|
2343
|
+
// Check compaction
|
|
2344
|
+
const stats = debugLedger.getLedgerStats(projectDir);
|
|
2345
|
+
if (stats.needsCompaction) {
|
|
2346
|
+
if (!jsonMode) info("Ledger compaction triggered...");
|
|
2347
|
+
try { runLedgerCompaction(projectDir, jsonMode); }
|
|
2348
|
+
catch { process.exit(2); }
|
|
2349
|
+
}
|
|
2350
|
+
|
|
2351
|
+
// Generate preamble and build prompt
|
|
2352
|
+
const preamble = debugLedger.generateAntiRepetitionPreamble(projectDir);
|
|
2353
|
+
const scopeHint = opts.fixScope ? `\nFix scope: ${opts.fixScope}` : "";
|
|
2354
|
+
const testHint = opts.testCmd ? `\nRun tests with: ${opts.testCmd}` : "";
|
|
2355
|
+
const prompt = [preamble, `Fix the failing test(s). Write your fix, then run the test suite. Report results.${scopeHint}${testHint}`]
|
|
2356
|
+
.filter(Boolean).join("\n\n");
|
|
2357
|
+
|
|
2358
|
+
if (!jsonMode) info(`Iteration ${iteration}/${opts.maxIterations} [${model}]...`);
|
|
2359
|
+
|
|
2360
|
+
const iterStart = Date.now();
|
|
2361
|
+
let rawOutput = null;
|
|
2362
|
+
try { rawOutput = spawnClaudeSession(prompt, model); }
|
|
2363
|
+
catch (e) {
|
|
2364
|
+
if (jsonMode) process.stdout.write(JSON.stringify({ success: false, exitCode: 3, iteration, error: String(e) }) + "\n");
|
|
2365
|
+
else error(`Process error at iteration ${iteration}: ${e.message}`);
|
|
2366
|
+
process.exit(3);
|
|
2367
|
+
}
|
|
2368
|
+
const duration = Math.round((Date.now() - iterStart) / 1000);
|
|
2369
|
+
|
|
2370
|
+
const { passed, summary } = parseTestResult(rawOutput);
|
|
2371
|
+
const result = passed ? "PASS" : "STILL_FAILS";
|
|
2372
|
+
|
|
2373
|
+
// Extract fix description from output (first 200 chars of output)
|
|
2374
|
+
const fixDesc = (rawOutput || "").split("\n").find((l) => l.trim().length > 20) || "see output";
|
|
2375
|
+
const entry = {
|
|
2376
|
+
iteration, timestamp: new Date().toISOString(),
|
|
2377
|
+
test: opts.testCmd || "unspecified", error: passed ? "" : summary,
|
|
2378
|
+
hypothesis: `iteration-${iteration}`, fix: fixDesc.trim().slice(0, 200),
|
|
2379
|
+
fixFiles: [], result, learning: summary.slice(0, 300),
|
|
2380
|
+
model, duration,
|
|
2381
|
+
};
|
|
2382
|
+
|
|
2383
|
+
try { debugLedger.appendEntry(projectDir, entry); }
|
|
2384
|
+
catch (e) {
|
|
2385
|
+
if (!jsonMode) warn(`Failed to append ledger entry: ${e.message}`);
|
|
2386
|
+
}
|
|
2387
|
+
|
|
2388
|
+
if (opts.log) writeIterationLog(projectDir, ts, iteration, entry, rawOutput);
|
|
2389
|
+
|
|
2390
|
+
if (jsonMode) {
|
|
2391
|
+
process.stdout.write(JSON.stringify({ success: passed, exitCode: passed ? 0 : 1, iteration, result, model, duration, summary }) + "\n");
|
|
2392
|
+
} else {
|
|
2393
|
+
info(` Result: ${result}`);
|
|
2394
|
+
}
|
|
2395
|
+
|
|
2396
|
+
if (passed) {
|
|
2397
|
+
debugLedger.clearLedger(projectDir);
|
|
2398
|
+
if (!jsonMode) log(`\n${GREEN}All tests pass — debug loop complete.${RESET}`);
|
|
2399
|
+
process.exit(0);
|
|
2400
|
+
}
|
|
2401
|
+
}
|
|
2402
|
+
|
|
2403
|
+
// Max iterations reached
|
|
2404
|
+
if (!jsonMode) warn(`Max iterations (${opts.maxIterations}) reached without all tests passing.`);
|
|
2405
|
+
process.exit(1);
|
|
2406
|
+
}
|
|
2407
|
+
|
|
2177
2408
|
function doHeadless(args) {
|
|
2178
2409
|
const sub = args[0];
|
|
2179
2410
|
if (!sub || sub === "--help" || sub === "-h") {
|
|
@@ -2181,6 +2412,12 @@ function doHeadless(args) {
|
|
|
2181
2412
|
return;
|
|
2182
2413
|
}
|
|
2183
2414
|
|
|
2415
|
+
if (sub === "--debug-loop") {
|
|
2416
|
+
const { flags } = parseDebugLoopFlags(args.slice(1));
|
|
2417
|
+
doHeadlessDebugLoop(flags);
|
|
2418
|
+
return;
|
|
2419
|
+
}
|
|
2420
|
+
|
|
2184
2421
|
if (sub === "query") {
|
|
2185
2422
|
const type = args[1];
|
|
2186
2423
|
doHeadlessQuery(type);
|
|
@@ -2196,7 +2433,24 @@ function showHeadlessHelp() {
|
|
|
2196
2433
|
log(`\n${BOLD}GSD-T Headless Mode${RESET}\n`);
|
|
2197
2434
|
log(`${BOLD}Usage:${RESET}`);
|
|
2198
2435
|
log(` ${CYAN}gsd-t headless${RESET} <command> [args] [--json] [--timeout=N] [--log]`);
|
|
2199
|
-
log(` ${CYAN}gsd-t headless query${RESET} <type
|
|
2436
|
+
log(` ${CYAN}gsd-t headless query${RESET} <type>`);
|
|
2437
|
+
log(` ${CYAN}gsd-t headless --debug-loop${RESET} [--max-iterations=N] [--test-cmd=CMD] [--fix-scope=SCOPE] [--json] [--log]\n`);
|
|
2438
|
+
log(`${BOLD}Debug-loop flags:${RESET}`);
|
|
2439
|
+
log(` ${CYAN}--max-iterations=N${RESET} Hard ceiling on iterations (default: 20)`);
|
|
2440
|
+
log(` ${CYAN}--test-cmd=CMD${RESET} Override test command`);
|
|
2441
|
+
log(` ${CYAN}--fix-scope=SCOPE${RESET} Limit fix scope to specific files or test patterns`);
|
|
2442
|
+
log(` ${CYAN}--json${RESET} Structured JSON output per iteration`);
|
|
2443
|
+
log(` ${CYAN}--log${RESET} Write per-iteration logs to .gsd-t/\n`);
|
|
2444
|
+
log(`${BOLD}Debug-loop escalation tiers:${RESET}`);
|
|
2445
|
+
log(` Iterations 1-5: sonnet (standard debug)`);
|
|
2446
|
+
log(` Iterations 6-15: opus (deeper reasoning)`);
|
|
2447
|
+
log(` Iterations 16-20: STOP (exit code 4 — needs human)\n`);
|
|
2448
|
+
log(`${BOLD}Debug-loop exit codes:${RESET}`);
|
|
2449
|
+
log(` 0 all tests pass`);
|
|
2450
|
+
log(` 1 max iterations reached`);
|
|
2451
|
+
log(` 2 ledger compaction error`);
|
|
2452
|
+
log(` 3 process error`);
|
|
2453
|
+
log(` 4 escalation stop — needs human\n`);
|
|
2200
2454
|
log(`${BOLD}Exec flags:${RESET}`);
|
|
2201
2455
|
log(` ${CYAN}--json${RESET} Structured JSON output`);
|
|
2202
2456
|
log(` ${CYAN}--timeout=N${RESET} Kill after N seconds (default: 300)`);
|
|
@@ -2304,6 +2558,10 @@ module.exports = {
|
|
|
2304
2558
|
doHeadlessExec,
|
|
2305
2559
|
doHeadlessQuery,
|
|
2306
2560
|
doHeadless,
|
|
2561
|
+
// Headless debug-loop
|
|
2562
|
+
parseDebugLoopFlags,
|
|
2563
|
+
getEscalationModel,
|
|
2564
|
+
doHeadlessDebugLoop,
|
|
2307
2565
|
queryStatus,
|
|
2308
2566
|
queryDomains,
|
|
2309
2567
|
queryContracts,
|
|
@@ -445,8 +445,9 @@ Verify the milestone is truly complete:
|
|
|
445
445
|
c. If specs are missing or stale, invoke `gsd-t-test-sync` first.
|
|
446
446
|
d. Report: "Unit: X/Y pass | E2E: X/Y pass"
|
|
447
447
|
2. **Verify all pass**: Every test must pass. If any fail, fix before tagging (up to 2 attempts)
|
|
448
|
+
3. **Functional test quality gate**: Read every Playwright spec. Verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded to input) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). Shallow tests that would pass on an empty HTML page with the right element IDs are a milestone completion FAIL. Flag and rewrite before proceeding.
|
|
448
449
|
4. **Compare to baseline**: If a test baseline was recorded at milestone start, verify coverage has improved or at minimum not regressed
|
|
449
|
-
5. **Log test results**: Include test pass/fail counts in the milestone summary (Step 4)
|
|
450
|
+
5. **Log test results**: Include test pass/fail counts and shallow test audit results in the milestone summary (Step 4)
|
|
450
451
|
|
|
451
452
|
## Step 11: Create Git Tag
|
|
452
453
|
|