spec-and-loop 3.3.2 → 3.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,6 +2,8 @@
2
2
 
3
3
  You are writing `tasks.md` for an OpenSpec change that will be executed by `ralph-run` in a fresh-session loop. Every iteration re-reads this file plus proposal.md, design.md, and specs. The loop implements one task per iteration, runs verification, and marks progress only on success.
4
4
 
5
+ When an iteration emits `BLOCKED_HANDOFF`, the runner may now invoke the supervisor loop before surfacing the handoff to a human. That makes the implementer's structured blocker note load-bearing: it must describe the real scope conflict or missing precondition precisely enough for either a human or the supervisor to patch `tasks.md` without guessing.
6
+
5
7
  ## Task template
6
8
 
7
9
  Every `- [ ]` checkbox must follow this shape:
@@ -117,6 +119,24 @@ Baseline artifact compatibility repair template:
117
119
  - Repo-wide or slow validators for a narrow task when a focused verifier exists (`npm test`, `make all`, full browser/e2e suites)
118
120
  - Ambiguous package-manager forwarding such as `npm test -- event-schema` unless confirmed to execute only the intended test scope
119
121
 
122
+ ## Pre-loop scope-handoff pre-scan
123
+
124
+ Before handing `tasks.md` to `ralph-run`, audit every pending `- [ ]` checkbox for the seven failure modes that most commonly cause `BLOCKED_HANDOFF` mid-loop. Each one is cheap to spot statically and expensive to discover after the loop has burned an iteration plus an auto-resolve attempt on it.
125
+
126
+ For every pending task, verify in this order:
127
+
128
+ 1. **Referenced files exist.** Every path in `Scope:`, `Done when:`, and `Stop and hand off if:` resolves with `ls` or `git ls-files`. Dangling references (`SPEC.md` when only `SPEC-IA.md` exists, line ranges that drifted after a prior task, deleted fixtures) cause the agent to either hand off or hallucinate.
129
+ 2. **Referenced sections exist.** `### Acceptance Criteria items 1–9` must point to a real numbered list in a real document. Heading references must match the actual heading text (case-sensitive `rg "^## <heading>$" <file>`).
130
+ 3. **Verifier scope matches scope statement.** If `Scope:` names one file but `Done when:` runs a command that touches more (`pnpm test:update-snapshots` regenerates snapshots for *all* test files, not just one), either broaden `Scope:` to match the verifier's reach or replace the verifier with a narrower one (`vitest --run <single-test-file>`).
131
+ 4. **Pre-existing failures are classified.** If the verifier is a multi-file gate and the repo has known unrelated failures, they must be enumerated in a "Pre-existing unrelated failures" sub-section with file:line references and an explicit "do not stop on these" clause. See [Quality gates](#quality-gates).
132
+ 5. **Stop-conditions are objective.** Phrases like "diffs that cannot be explained" or "behavior looks wrong" are subjective and the agent will either over-trigger or under-trigger. Replace with grep-able evidence: "snapshot diff contains `atm-*` class on a docs MDX content element," "`hbr-tab-panel` appears in the rendered DOM."
133
+ 6. **Manual-only tasks are flagged.** Any task whose `Done when:` requires a human in a browser, eyes on a deployed URL, or visual judgement must be tagged with a `[manual]` marker in its title and have an explicit `Stop and hand off if:` line that says "manual verification required — emit BLOCKED_HANDOFF with verification template." This makes the handoff intentional, not a mid-loop surprise.
134
+ 7. **Cross-task scope conflicts are absent.** If task N writes a file that task N-1 already finished, or task N's `Stop and hand off if:` would trigger on the normal completion of task N+1, reorder or merge them. Read tasks in execution order and confirm no two tasks claim ownership of the same file/route/symbol.
135
+
136
+ If any check fails, edit `tasks.md` before starting the loop. The cost of a static edit is a few seconds; the cost of discovering the same issue at iteration 21 is a `BLOCKED_HANDOFF`, a dirty worktree, and a context-poisoned restart.
137
+
138
+ This pre-scan is mandatory before `ralph-run` on a freshly authored or freshly edited `tasks.md`. After remediation, re-run `openspec validate <change>` to confirm the change still validates.
139
+
120
140
  ## Examples
121
141
 
122
142
  **Bad** — vague, no verifier:
package/QUICKSTART.md CHANGED
@@ -75,6 +75,8 @@ ralph-run --change add-hello-world
75
75
  - Create a runner-managed task commit when auto-commit is enabled and task-scoped staging succeeds
76
76
  - Track progress in tasks.md
77
77
 
78
+ If you want the handoff-recovery details and self-heal flags, see [`README.md#supervisor-loop`](./README.md#supervisor-loop).
79
+
78
80
  ## What Just Happened?
79
81
 
80
82
  1. **Created a spec** with OpenSpec
package/README.md CHANGED
@@ -232,6 +232,24 @@ git log --oneline
232
232
  ralph-run --add-context "Prefer async/await over callbacks"
233
233
  ```
234
234
 
235
+ ## Supervisor Loop
236
+
237
+ When an iteration emits `BLOCKED_HANDOFF` for a structural task-shaping problem that the existing fast-path classifier cannot safely resolve, `ralph-run` can invoke a bounded supervisor pass that patches `tasks.md`, validates the change with `npx openspec validate <change> --strict`, and then hands the next iteration a cleaner task instead of stopping immediately for manual intervention. Set `RALPH_SELF_HEAL=0` or pass `--no-self-heal` to restore the fully manual handoff flow.
238
+
239
+ | Flag | Env var | Default | Effect |
240
+ |------|---------|---------|--------|
241
+ | `--no-self-heal` | `RALPH_SELF_HEAL=0` | self-heal enabled | Disable supervisor self-heal and exit on unresolved `BLOCKED_HANDOFF` as before. |
242
+ | `--self-heal-max-tries <n>` | `RALPH_SELF_HEAL_MAX_TRIES` | `3` | Cap supervisor tries per blocker event. |
243
+ | `--no-self-heal-downstream` | `RALPH_SELF_HEAL_DOWNSTREAM=0` | downstream patching enabled | Prevent supervisor edits to downstream pending tasks. |
244
+ | `--no-self-heal-hints` | `RALPH_SELF_HEAL_HINTS=0` | hints enabled | Disable `## Supervisor Investigation Hints` injection into the next implementer prompt. |
245
+ | `--no-self-heal-log-access` | `RALPH_SELF_HEAL_LOG_ACCESS=0` | log-path injection enabled | Keep supervisor prompts from receiving run-log paths. |
246
+ | `--self-heal-verbose` | `RALPH_SELF_HEAL_VERBOSE=1` | verbose off | Emit supervisor debug logging. |
247
+ | `--no-self-heal-verbose` | `RALPH_SELF_HEAL_VERBOSE=0` | inherits `--verbose` when present | Force supervisor debug logging off, even if `--verbose` is set for the main runner. |
248
+
249
+ ### Token economy
250
+
251
+ Supervisor prompts keep the expensive rule context on by default but aggressively compact it: Tier 1 trims `downstream_tasks`, `change_design`, `change_proposal`, and retry-only context; Tier 2 distills `OPENSPEC-RALPH-BP.md`; Tier 3 consolidates per-patch rationale into summary fields. Escape hatches are available per variable: `RALPH_SELF_HEAL_FULL_DOWNSTREAM=1`, `RALPH_SELF_HEAL_FULL_DESIGN=1`, `RALPH_SELF_HEAL_FULL_PROPOSAL=1`, and `RALPH_SELF_HEAL_FULL_BP_CONTEXT=1` restore verbatim inputs; `RALPH_SELF_HEAL_KEEP_DOWNSTREAM_ON_RETRY=1` and `RALPH_SELF_HEAL_KEEP_HANDOFF_HISTORY_ON_RETRY=1` keep retry-suppressed sections; `RALPH_SELF_HEAL_PER_PATCH_RATIONALES=1` restores per-patch rationale output when you need the uncompressed response shape.
252
+
235
253
  ## Example Workflow
236
254
 
237
255
  ```bash
@@ -472,7 +490,7 @@ For common issues and solutions, see [QUICKSTART.md#troubleshooting](./QUICKSTAR
472
490
  | Variable | Default | Description |
473
491
  |----------|---------|-------------|
474
492
  | `RALPH_BASE_PROMPT_WARN_BYTES` | `4096` | Byte threshold above which `render()` emits a one-line warning to stderr when `{{base_prompt}}` resolves to a large file. Set to `0` to silence warnings entirely. Invalid values fall back to `4096` with a one-time notice per process. |
475
- | `RALPH_ITERATION_IDLE_TIMEOUT_MS` | `300000` | Milliseconds of silence on stdout+stderr before the per-iteration idle watchdog fires. Set to `0` to disable the watchdog entirely and restore pre-change behavior (no timeout). |
493
+ | `RALPH_ITERATION_IDLE_TIMEOUT_MS` | `900000` | Milliseconds of silence on stdout+stderr before the per-iteration idle watchdog fires (default 15 minutes). Set to `0` to disable the watchdog entirely and restore pre-change behavior (no timeout). |
476
494
  | `RALPH_ITERATION_KILL_GRACE_MS` | `10000` | Milliseconds the runner waits after sending `SIGTERM` to a timed-out iteration child before escalating to `SIGKILL`. |
477
495
 
478
496
  ### Auto-commit ignore-filter surfacing and iteration watchdog
@@ -516,7 +534,7 @@ The `iteration_timeout_idle` reason also appears in the `## Recent Loop Signals`
516
534
  Set `RALPH_ITERATION_IDLE_TIMEOUT_MS=0` to disable the watchdog if your agent workflow runs legitimately long silent tools (e.g., large integration test suites). Example:
517
535
 
518
536
  ```bash
519
- RALPH_ITERATION_IDLE_TIMEOUT_MS=900000 ralph-run --change my-feature # 15-minute idle threshold
537
+ RALPH_ITERATION_IDLE_TIMEOUT_MS=1800000 ralph-run --change my-feature # 30-minute idle threshold
520
538
  RALPH_ITERATION_IDLE_TIMEOUT_MS=0 ralph-run --change my-feature # watchdog disabled
521
539
  ```
522
540
 
@@ -12,6 +12,27 @@ const fs = require('fs');
12
12
  const path = require('path');
13
13
 
14
14
  const HISTORY_FILE = 'ralph-history.json';
15
+ const SUPERVISOR_ITERATION_FIELDS = [
16
+ 'supervisorInvoked',
17
+ 'supervisorTryIndex',
18
+ 'supervisorOutcome',
19
+ 'supervisorPatchedTasks',
20
+ 'supervisorBlockerHash',
21
+ 'supervisorSoftWarnings',
22
+ 'supervisorHints',
23
+ 'supervisorHintsDropped',
24
+ 'supervisorReadLogs',
25
+ 'supervisorReadLogsBytes',
26
+ ];
27
+ const SUPERVISOR_EDIT_REQUIRED_FIELDS = [
28
+ 'iteration',
29
+ 'blockerHash',
30
+ 'tryIndex',
31
+ 'taskNumber',
32
+ 'rationaleSummary',
33
+ 'validatorOk',
34
+ 'softWarnings',
35
+ ];
15
36
 
16
37
  /**
17
38
  * Return the absolute path to the history file.
@@ -34,7 +55,7 @@ function read(ralphDir) {
34
55
  if (!fs.existsSync(file)) return [];
35
56
  try {
36
57
  const parsed = JSON.parse(fs.readFileSync(file, 'utf8'));
37
- return Array.isArray(parsed) ? parsed : [];
58
+ return Array.isArray(parsed) ? parsed.map(_normalizeHistoryEntry) : [];
38
59
  } catch {
39
60
  return [];
40
61
  }
@@ -97,7 +118,7 @@ function read(ralphDir) {
97
118
  function append(ralphDir, entry) {
98
119
  _ensureDir(ralphDir);
99
120
  const entries = read(ralphDir);
100
- entries.push(Object.assign({ timestamp: new Date().toISOString() }, entry));
121
+ entries.push(_normalizeHistoryEntry(Object.assign({ timestamp: new Date().toISOString() }, entry)));
101
122
  _write(ralphDir, entries);
102
123
  }
103
124
 
@@ -137,4 +158,58 @@ function _write(ralphDir, data) {
137
158
  fs.writeFileSync(historyPath(ralphDir), JSON.stringify(data, null, 2), 'utf8');
138
159
  }
139
160
 
161
+ function _normalizeHistoryEntry(entry) {
162
+ if (!entry || typeof entry !== 'object' || Array.isArray(entry)) {
163
+ return entry;
164
+ }
165
+
166
+ if (entry.type === 'supervisorEdit') {
167
+ return _normalizeSupervisorEditEntry(entry);
168
+ }
169
+
170
+ return _normalizeIterationEntry(entry);
171
+ }
172
+
173
+ function _normalizeIterationEntry(entry) {
174
+ const normalized = Object.assign({}, entry);
175
+ if (!_hasAnyOwnField(normalized, SUPERVISOR_ITERATION_FIELDS)) {
176
+ return normalized;
177
+ }
178
+
179
+ if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorPatchedTasks') &&
180
+ !Array.isArray(normalized.supervisorPatchedTasks)) {
181
+ normalized.supervisorPatchedTasks = [];
182
+ }
183
+ if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorSoftWarnings') &&
184
+ !Array.isArray(normalized.supervisorSoftWarnings)) {
185
+ normalized.supervisorSoftWarnings = [];
186
+ }
187
+ if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorHints') &&
188
+ !Array.isArray(normalized.supervisorHints)) {
189
+ normalized.supervisorHints = [];
190
+ }
191
+ if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorHintsDropped') &&
192
+ !Array.isArray(normalized.supervisorHintsDropped)) {
193
+ normalized.supervisorHintsDropped = [];
194
+ }
195
+ return normalized;
196
+ }
197
+
198
+ function _normalizeSupervisorEditEntry(entry) {
199
+ const normalized = Object.assign({}, entry);
200
+ for (const field of SUPERVISOR_EDIT_REQUIRED_FIELDS) {
201
+ if (!Object.prototype.hasOwnProperty.call(normalized, field)) {
202
+ normalized[field] = field === 'softWarnings' ? [] : null;
203
+ }
204
+ }
205
+ if (!Array.isArray(normalized.softWarnings)) {
206
+ normalized.softWarnings = [];
207
+ }
208
+ return normalized;
209
+ }
210
+
211
+ function _hasAnyOwnField(object, fields) {
212
+ return fields.some((field) => Object.prototype.hasOwnProperty.call(object, field));
213
+ }
214
+
140
215
  module.exports = { read, append, recent, clear, historyPath };
@@ -22,6 +22,13 @@ const errors = require('./errors');
22
22
  const tasks = require('./tasks');
23
23
  const status = require('./status');
24
24
  const prompt = require('./prompt');
25
+ const supervisor = require('./supervisor');
26
+
27
+ // Full passthrough of the supervisor module so tests and downstream code can
28
+ // reach every documented helper without drift between this curated list and
29
+ // the actual `module.exports` in `supervisor.js`. Task 5.5 originally listed a
30
+ // curated subset; using the module directly removes that drift risk.
31
+ const _supervisor = supervisor;
25
32
 
26
33
  /**
27
34
  * Run the mini Ralph loop with the provided options.
@@ -91,4 +98,5 @@ module.exports = {
91
98
  _prompt: prompt,
92
99
  _runner: runner,
93
100
  _status: status,
101
+ _supervisor,
94
102
  };
@@ -73,7 +73,7 @@ async function invoke(opts) {
73
73
  stderr: result.stderr,
74
74
  exitCode: result.exitCode,
75
75
  signal: result.signal,
76
- toolUsage: _extractToolUsage(result.stdout),
76
+ toolUsage: _extractDetailedToolUsage(result.stdout),
77
77
  filesChanged,
78
78
  // Pass through watchdog fields when present (task 2.1)
79
79
  ...(result.failureReason !== undefined && {
@@ -88,7 +88,7 @@ async function invoke(opts) {
88
88
  /**
89
89
  * Spawn the opencode process and stream output to terminal while capturing.
90
90
  * Wraps the subprocess with a per-iteration stream-idle watchdog controlled
91
- * by RALPH_ITERATION_IDLE_TIMEOUT_MS (default 300000 ms; 0 = disabled) and
91
+ * by RALPH_ITERATION_IDLE_TIMEOUT_MS (default 900000 ms; 0 = disabled) and
92
92
  * RALPH_ITERATION_KILL_GRACE_MS (default 10000 ms).
93
93
  *
94
94
  * When the watchdog fires the returned result gains:
@@ -108,7 +108,7 @@ function _spawnOpenCode(args, verbose) {
108
108
  // Parse watchdog knobs from environment. task 2.1 (surface-autocommit-ignore-warning-and-watchdog)
109
109
  const idleTimeoutRaw = process.env.RALPH_ITERATION_IDLE_TIMEOUT_MS;
110
110
  const killGraceRaw = process.env.RALPH_ITERATION_KILL_GRACE_MS;
111
- const idleTimeoutMs = idleTimeoutRaw !== undefined ? Number(idleTimeoutRaw) : 300000;
111
+ const idleTimeoutMs = idleTimeoutRaw !== undefined ? Number(idleTimeoutRaw) : 900000;
112
112
  const killGraceMs = killGraceRaw !== undefined ? Number(killGraceRaw) : 10000;
113
113
  const watchdogEnabled = idleTimeoutMs !== 0;
114
114
 
@@ -291,6 +291,30 @@ function _extractToolUsage(text) {
291
291
  return usage;
292
292
  }
293
293
 
294
+ function _extractDetailedToolUsage(text) {
295
+ const detailed = _extractToolUsageDetails(text);
296
+ if (Array.isArray(detailed) && detailed.length > 0) {
297
+ return detailed;
298
+ }
299
+ return _extractToolUsage(text);
300
+ }
301
+
302
+ function _extractToolUsageDetails(text) {
303
+ if (!text) return [];
304
+
305
+ const match = String(text).match(/```tool-usage\s*([\s\S]*?)```/i);
306
+ if (!match) {
307
+ return [];
308
+ }
309
+
310
+ try {
311
+ const parsed = JSON.parse(match[1].trim());
312
+ return Array.isArray(parsed) ? parsed : [];
313
+ } catch {
314
+ return [];
315
+ }
316
+ }
317
+
294
318
  /**
295
319
  * Take a snapshot of dirty/untracked paths via git status.
296
320
  * Returns a Map of repo-relative file paths to existence/content fingerprints.
@@ -444,6 +468,8 @@ module.exports = {
444
468
  _spawnOpenCode,
445
469
  _looksLikeCliHelp,
446
470
  _extractToolUsage,
471
+ _extractDetailedToolUsage,
472
+ _extractToolUsageDetails,
447
473
  _gitSnapshot,
448
474
  _diffSnapshots,
449
475
  };
@@ -118,7 +118,7 @@ function render(options, iteration) {
118
118
  if (!options.promptTemplate) {
119
119
  // No template — base prompt is the whole output
120
120
  const base = loadBase(options);
121
- return base;
121
+ return _appendSupervisorInvestigationHints(base, options);
122
122
  }
123
123
 
124
124
  const templatePath = options.promptTemplate;
@@ -173,7 +173,36 @@ function render(options, iteration) {
173
173
  : '- Do not create git commits yourself. The Ralph runner manages automatic task commits when auto-commit is enabled.',
174
174
  };
175
175
 
176
- return _renderTemplate(template, vars);
176
+ let rendered = _renderTemplate(template, vars);
177
+ rendered = _appendSupervisorInvestigationHints(rendered, options);
178
+ return rendered;
179
+ }
180
+
181
+ function _appendSupervisorInvestigationHints(rendered, options = {}) {
182
+ if (options.selfHealHints === false || process.env.RALPH_SELF_HEAL_HINTS === '0') {
183
+ return rendered;
184
+ }
185
+
186
+ const hints = Array.isArray(options.supervisorHints) ? options.supervisorHints.filter(_isUsableSupervisorHint) : [];
187
+ if (hints.length === 0) {
188
+ return rendered;
189
+ }
190
+
191
+ const section = [
192
+ '## Supervisor Investigation Hints',
193
+ '',
194
+ ...hints.map((hint) => `- \`${hint.path}\`: ${hint.rationale}`),
195
+ ].join('\n');
196
+ return `${String(rendered || '').trimEnd()}\n\n${section}`;
197
+ }
198
+
199
+ function _isUsableSupervisorHint(hint) {
200
+ return Boolean(
201
+ hint &&
202
+ typeof hint === 'object' &&
203
+ typeof hint.path === 'string' && hint.path.trim() &&
204
+ typeof hint.rationale === 'string' && hint.rationale.trim()
205
+ );
177
206
  }
178
207
 
179
208
  /**
@@ -189,4 +218,12 @@ function _renderTemplate(template, vars) {
189
218
  });
190
219
  }
191
220
 
192
- module.exports = { loadBase, render, _renderTemplate, _warnThreshold, _resetWarnNotice };
221
+ module.exports = {
222
+ loadBase,
223
+ render,
224
+ _appendSupervisorInvestigationHints,
225
+ _isUsableSupervisorHint,
226
+ _renderTemplate,
227
+ _warnThreshold,
228
+ _resetWarnNotice,
229
+ };