npm - spec-and-loop - Versions diffs - 3.3.2 → 3.3.4 - Mend

spec-and-loop 3.3.2 → 3.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/OPENSPEC-RALPH-BP.md +20 -0
package/QUICKSTART.md +2 -0
package/README.md +20 -2
package/lib/mini-ralph/history.js +77 -2
package/lib/mini-ralph/index.js +8 -0
package/lib/mini-ralph/invoker.js +29 -3
package/lib/mini-ralph/prompt.js +40 -3
package/lib/mini-ralph/runner-autocommit.js +440 -0
package/lib/mini-ralph/runner-baseline-gate.js +431 -0
package/lib/mini-ralph/runner-handoff.js +338 -0
package/lib/mini-ralph/runner-pending-dirty.js +168 -0
package/lib/mini-ralph/runner.js +518 -1202
package/lib/mini-ralph/state.js +35 -3
package/lib/mini-ralph/status.js +37 -1
package/lib/mini-ralph/supervisor-rules.js +379 -0
package/lib/mini-ralph/supervisor-state.js +218 -0
package/lib/mini-ralph/supervisor.js +1319 -0
package/package.json +1 -1
package/scripts/mini-ralph-cli.js +75 -2
package/scripts/ralph-run.sh +121 -2
package/scripts/supervisor-prompt.md +134 -0

package/OPENSPEC-RALPH-BP.md CHANGED Viewed

@@ -2,6 +2,8 @@
 You are writing `tasks.md` for an OpenSpec change that will be executed by `ralph-run` in a fresh-session loop. Every iteration re-reads this file plus proposal.md, design.md, and specs. The loop implements one task per iteration, runs verification, and marks progress only on success.
+When an iteration emits `BLOCKED_HANDOFF`, the runner may now invoke the supervisor loop before surfacing the handoff to a human. That makes the implementer's structured blocker note load-bearing: it must describe the real scope conflict or missing precondition precisely enough for either a human or the supervisor to patch `tasks.md` without guessing.
 ## Task template
 Every `- [ ]` checkbox must follow this shape:
@@ -117,6 +119,24 @@ Baseline artifact compatibility repair template:
 - Repo-wide or slow validators for a narrow task when a focused verifier exists (`npm test`, `make all`, full browser/e2e suites)
 - Ambiguous package-manager forwarding such as `npm test -- event-schema` unless confirmed to execute only the intended test scope
+## Pre-loop scope-handoff pre-scan
+Before handing `tasks.md` to `ralph-run`, audit every pending `- [ ]` checkbox for the seven failure modes that most commonly cause `BLOCKED_HANDOFF` mid-loop. Each one is cheap to spot statically and expensive to discover after the loop has burned an iteration plus an auto-resolve attempt on it.
+For every pending task, verify in this order:
+1. **Referenced files exist.** Every path in `Scope:`, `Done when:`, and `Stop and hand off if:` resolves with `ls` or `git ls-files`. Dangling references (`SPEC.md` when only `SPEC-IA.md` exists, line ranges that drifted after a prior task, deleted fixtures) cause the agent to either hand off or hallucinate.
+2. **Referenced sections exist.** `### Acceptance Criteria items 1–9` must point to a real numbered list in a real document. Heading references must match the actual heading text (case-sensitive `rg "^## <heading>$" <file>`).
+3. **Verifier scope matches scope statement.** If `Scope:` names one file but `Done when:` runs a command that touches more (`pnpm test:update-snapshots` regenerates snapshots for *all* test files, not just one), either broaden `Scope:` to match the verifier's reach or replace the verifier with a narrower one (`vitest --run <single-test-file>`).
+4. **Pre-existing failures are classified.** If the verifier is a multi-file gate and the repo has known unrelated failures, they must be enumerated in a "Pre-existing unrelated failures" sub-section with file:line references and an explicit "do not stop on these" clause. See [Quality gates](#quality-gates).
+5. **Stop-conditions are objective.** Phrases like "diffs that cannot be explained" or "behavior looks wrong" are subjective and the agent will either over-trigger or under-trigger. Replace with grep-able evidence: "snapshot diff contains `atm-*` class on a docs MDX content element," "`hbr-tab-panel` appears in the rendered DOM."
+6. **Manual-only tasks are flagged.** Any task whose `Done when:` requires a human in a browser, eyes on a deployed URL, or visual judgement must be tagged with a `[manual]` marker in its title and have an explicit `Stop and hand off if:` line that says "manual verification required — emit BLOCKED_HANDOFF with verification template." This makes the handoff intentional, not a mid-loop surprise.
+7. **Cross-task scope conflicts are absent.** If task N writes a file that task N-1 already finished, or task N's `Stop and hand off if:` would trigger on the normal completion of task N+1, reorder or merge them. Read tasks in execution order and confirm no two tasks claim ownership of the same file/route/symbol.
+If any check fails, edit `tasks.md` before starting the loop. The cost of a static edit is a few seconds; the cost of discovering the same issue at iteration 21 is a `BLOCKED_HANDOFF`, a dirty worktree, and a context-poisoned restart.
+This pre-scan is mandatory before `ralph-run` on a freshly authored or freshly edited `tasks.md`. After remediation, re-run `openspec validate <change>` to confirm the change still validates.
 ## Examples
 **Bad** — vague, no verifier:

package/QUICKSTART.md CHANGED Viewed

@@ -75,6 +75,8 @@ ralph-run --change add-hello-world
 - Create a runner-managed task commit when auto-commit is enabled and task-scoped staging succeeds
 - Track progress in tasks.md
+If you want the handoff-recovery details and self-heal flags, see [`README.md#supervisor-loop`](./README.md#supervisor-loop).
 ## What Just Happened?
 1. **Created a spec** with OpenSpec

package/README.md CHANGED Viewed

@@ -232,6 +232,24 @@ git log --oneline
 ralph-run --add-context "Prefer async/await over callbacks"
 ```
+## Supervisor Loop
+When an iteration emits `BLOCKED_HANDOFF` for a structural task-shaping problem that the existing fast-path classifier cannot safely resolve, `ralph-run` can invoke a bounded supervisor pass that patches `tasks.md`, validates the change with `npx openspec validate <change> --strict`, and then hands the next iteration a cleaner task instead of stopping immediately for manual intervention. Set `RALPH_SELF_HEAL=0` or pass `--no-self-heal` to restore the fully manual handoff flow.
+| Flag | Env var | Default | Effect |
+|------|---------|---------|--------|
+| `--no-self-heal` | `RALPH_SELF_HEAL=0` | self-heal enabled | Disable supervisor self-heal and exit on unresolved `BLOCKED_HANDOFF` as before. |
+| `--self-heal-max-tries <n>` | `RALPH_SELF_HEAL_MAX_TRIES` | `3` | Cap supervisor tries per blocker event. |
+| `--no-self-heal-downstream` | `RALPH_SELF_HEAL_DOWNSTREAM=0` | downstream patching enabled | Prevent supervisor edits to downstream pending tasks. |
+| `--no-self-heal-hints` | `RALPH_SELF_HEAL_HINTS=0` | hints enabled | Disable `## Supervisor Investigation Hints` injection into the next implementer prompt. |
+| `--no-self-heal-log-access` | `RALPH_SELF_HEAL_LOG_ACCESS=0` | log-path injection enabled | Keep supervisor prompts from receiving run-log paths. |
+| `--self-heal-verbose` | `RALPH_SELF_HEAL_VERBOSE=1` | verbose off | Emit supervisor debug logging. |
+| `--no-self-heal-verbose` | `RALPH_SELF_HEAL_VERBOSE=0` | inherits `--verbose` when present | Force supervisor debug logging off, even if `--verbose` is set for the main runner. |
+### Token economy
+Supervisor prompts keep the expensive rule context on by default but aggressively compact it: Tier 1 trims `downstream_tasks`, `change_design`, `change_proposal`, and retry-only context; Tier 2 distills `OPENSPEC-RALPH-BP.md`; Tier 3 consolidates per-patch rationale into summary fields. Escape hatches are available per variable: `RALPH_SELF_HEAL_FULL_DOWNSTREAM=1`, `RALPH_SELF_HEAL_FULL_DESIGN=1`, `RALPH_SELF_HEAL_FULL_PROPOSAL=1`, and `RALPH_SELF_HEAL_FULL_BP_CONTEXT=1` restore verbatim inputs; `RALPH_SELF_HEAL_KEEP_DOWNSTREAM_ON_RETRY=1` and `RALPH_SELF_HEAL_KEEP_HANDOFF_HISTORY_ON_RETRY=1` keep retry-suppressed sections; `RALPH_SELF_HEAL_PER_PATCH_RATIONALES=1` restores per-patch rationale output when you need the uncompressed response shape.
 ## Example Workflow
 ```bash
@@ -472,7 +490,7 @@ For common issues and solutions, see [QUICKSTART.md#troubleshooting](./QUICKSTAR
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `RALPH_BASE_PROMPT_WARN_BYTES` | `4096` | Byte threshold above which `render()` emits a one-line warning to stderr when `{{base_prompt}}` resolves to a large file. Set to `0` to silence warnings entirely. Invalid values fall back to `4096` with a one-time notice per process. |
-| `RALPH_ITERATION_IDLE_TIMEOUT_MS` | `300000` | Milliseconds of silence on stdout+stderr before the per-iteration idle watchdog fires. Set to `0` to disable the watchdog entirely and restore pre-change behavior (no timeout). |
+| `RALPH_ITERATION_IDLE_TIMEOUT_MS` | `900000` | Milliseconds of silence on stdout+stderr before the per-iteration idle watchdog fires (default 15 minutes). Set to `0` to disable the watchdog entirely and restore pre-change behavior (no timeout). |
 | `RALPH_ITERATION_KILL_GRACE_MS` | `10000` | Milliseconds the runner waits after sending `SIGTERM` to a timed-out iteration child before escalating to `SIGKILL`. |
 ### Auto-commit ignore-filter surfacing and iteration watchdog
@@ -516,7 +534,7 @@ The `iteration_timeout_idle` reason also appears in the `## Recent Loop Signals`
 Set `RALPH_ITERATION_IDLE_TIMEOUT_MS=0` to disable the watchdog if your agent workflow runs legitimately long silent tools (e.g., large integration test suites). Example:
 ```bash
-RALPH_ITERATION_IDLE_TIMEOUT_MS=900000 ralph-run --change my-feature   # 15-minute idle threshold
+RALPH_ITERATION_IDLE_TIMEOUT_MS=1800000 ralph-run --change my-feature  # 30-minute idle threshold
 RALPH_ITERATION_IDLE_TIMEOUT_MS=0 ralph-run --change my-feature        # watchdog disabled
 ```

package/lib/mini-ralph/history.js CHANGED Viewed

@@ -12,6 +12,27 @@ const fs = require('fs');
 const path = require('path');
 const HISTORY_FILE = 'ralph-history.json';
+const SUPERVISOR_ITERATION_FIELDS = [
+  'supervisorInvoked',
+  'supervisorTryIndex',
+  'supervisorOutcome',
+  'supervisorPatchedTasks',
+  'supervisorBlockerHash',
+  'supervisorSoftWarnings',
+  'supervisorHints',
+  'supervisorHintsDropped',
+  'supervisorReadLogs',
+  'supervisorReadLogsBytes',
+];
+const SUPERVISOR_EDIT_REQUIRED_FIELDS = [
+  'iteration',
+  'blockerHash',
+  'tryIndex',
+  'taskNumber',
+  'rationaleSummary',
+  'validatorOk',
+  'softWarnings',
+];
 /**
  * Return the absolute path to the history file.
@@ -34,7 +55,7 @@ function read(ralphDir) {
   if (!fs.existsSync(file)) return [];
   try {
     const parsed = JSON.parse(fs.readFileSync(file, 'utf8'));
-    return Array.isArray(parsed) ? parsed : [];
+    return Array.isArray(parsed) ? parsed.map(_normalizeHistoryEntry) : [];
   } catch {
     return [];
   }
@@ -97,7 +118,7 @@ function read(ralphDir) {
 function append(ralphDir, entry) {
   _ensureDir(ralphDir);
   const entries = read(ralphDir);
-  entries.push(Object.assign({ timestamp: new Date().toISOString() }, entry));
+  entries.push(_normalizeHistoryEntry(Object.assign({ timestamp: new Date().toISOString() }, entry)));
   _write(ralphDir, entries);
 }
@@ -137,4 +158,58 @@ function _write(ralphDir, data) {
   fs.writeFileSync(historyPath(ralphDir), JSON.stringify(data, null, 2), 'utf8');
 }
+function _normalizeHistoryEntry(entry) {
+  if (!entry || typeof entry !== 'object' || Array.isArray(entry)) {
+    return entry;
+  }
+  if (entry.type === 'supervisorEdit') {
+    return _normalizeSupervisorEditEntry(entry);
+  }
+  return _normalizeIterationEntry(entry);
+}
+function _normalizeIterationEntry(entry) {
+  const normalized = Object.assign({}, entry);
+  if (!_hasAnyOwnField(normalized, SUPERVISOR_ITERATION_FIELDS)) {
+    return normalized;
+  }
+  if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorPatchedTasks') &&
+    !Array.isArray(normalized.supervisorPatchedTasks)) {
+    normalized.supervisorPatchedTasks = [];
+  }
+  if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorSoftWarnings') &&
+    !Array.isArray(normalized.supervisorSoftWarnings)) {
+    normalized.supervisorSoftWarnings = [];
+  }
+  if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorHints') &&
+    !Array.isArray(normalized.supervisorHints)) {
+    normalized.supervisorHints = [];
+  }
+  if (Object.prototype.hasOwnProperty.call(normalized, 'supervisorHintsDropped') &&
+    !Array.isArray(normalized.supervisorHintsDropped)) {
+    normalized.supervisorHintsDropped = [];
+  }
+  return normalized;
+}
+function _normalizeSupervisorEditEntry(entry) {
+  const normalized = Object.assign({}, entry);
+  for (const field of SUPERVISOR_EDIT_REQUIRED_FIELDS) {
+    if (!Object.prototype.hasOwnProperty.call(normalized, field)) {
+      normalized[field] = field === 'softWarnings' ? [] : null;
+    }
+  }
+  if (!Array.isArray(normalized.softWarnings)) {
+    normalized.softWarnings = [];
+  }
+  return normalized;
+}
+function _hasAnyOwnField(object, fields) {
+  return fields.some((field) => Object.prototype.hasOwnProperty.call(object, field));
+}
 module.exports = { read, append, recent, clear, historyPath };

package/lib/mini-ralph/index.js CHANGED Viewed

@@ -22,6 +22,13 @@ const errors = require('./errors');
 const tasks = require('./tasks');
 const status = require('./status');
 const prompt = require('./prompt');
+const supervisor = require('./supervisor');
+// Full passthrough of the supervisor module so tests and downstream code can
+// reach every documented helper without drift between this curated list and
+// the actual `module.exports` in `supervisor.js`. Task 5.5 originally listed a
+// curated subset; using the module directly removes that drift risk.
+const _supervisor = supervisor;
 /**
  * Run the mini Ralph loop with the provided options.
@@ -91,4 +98,5 @@ module.exports = {
   _prompt: prompt,
   _runner: runner,
   _status: status,
+  _supervisor,
 };

package/lib/mini-ralph/invoker.js CHANGED Viewed

@@ -73,7 +73,7 @@ async function invoke(opts) {
     stderr: result.stderr,
     exitCode: result.exitCode,
     signal: result.signal,
-    toolUsage: _extractToolUsage(result.stdout),
+    toolUsage: _extractDetailedToolUsage(result.stdout),
     filesChanged,
     // Pass through watchdog fields when present (task 2.1)
     ...(result.failureReason !== undefined && {
@@ -88,7 +88,7 @@ async function invoke(opts) {
 /**
  * Spawn the opencode process and stream output to terminal while capturing.
  * Wraps the subprocess with a per-iteration stream-idle watchdog controlled
- * by RALPH_ITERATION_IDLE_TIMEOUT_MS (default 300000 ms; 0 = disabled) and
+ * by RALPH_ITERATION_IDLE_TIMEOUT_MS (default 900000 ms; 0 = disabled) and
  * RALPH_ITERATION_KILL_GRACE_MS (default 10000 ms).
  *
  * When the watchdog fires the returned result gains:
@@ -108,7 +108,7 @@ function _spawnOpenCode(args, verbose) {
   // Parse watchdog knobs from environment. task 2.1 (surface-autocommit-ignore-warning-and-watchdog)
   const idleTimeoutRaw = process.env.RALPH_ITERATION_IDLE_TIMEOUT_MS;
   const killGraceRaw   = process.env.RALPH_ITERATION_KILL_GRACE_MS;
-  const idleTimeoutMs  = idleTimeoutRaw !== undefined ? Number(idleTimeoutRaw) : 300000;
+  const idleTimeoutMs  = idleTimeoutRaw !== undefined ? Number(idleTimeoutRaw) : 900000;
   const killGraceMs    = killGraceRaw   !== undefined ? Number(killGraceRaw)   : 10000;
   const watchdogEnabled = idleTimeoutMs !== 0;
@@ -291,6 +291,30 @@ function _extractToolUsage(text) {
   return usage;
 }
+function _extractDetailedToolUsage(text) {
+  const detailed = _extractToolUsageDetails(text);
+  if (Array.isArray(detailed) && detailed.length > 0) {
+    return detailed;
+  }
+  return _extractToolUsage(text);
+}
+function _extractToolUsageDetails(text) {
+  if (!text) return [];
+  const match = String(text).match(/```tool-usage\s*([\s\S]*?)```/i);
+  if (!match) {
+    return [];
+  }
+  try {
+    const parsed = JSON.parse(match[1].trim());
+    return Array.isArray(parsed) ? parsed : [];
+  } catch {
+    return [];
+  }
+}
 /**
  * Take a snapshot of dirty/untracked paths via git status.
  * Returns a Map of repo-relative file paths to existence/content fingerprints.
@@ -444,6 +468,8 @@ module.exports = {
   _spawnOpenCode,
   _looksLikeCliHelp,
   _extractToolUsage,
+  _extractDetailedToolUsage,
+  _extractToolUsageDetails,
   _gitSnapshot,
   _diffSnapshots,
 };

package/lib/mini-ralph/prompt.js CHANGED Viewed

@@ -118,7 +118,7 @@ function render(options, iteration) {
   if (!options.promptTemplate) {
     // No template — base prompt is the whole output
     const base = loadBase(options);
-    return base;
+    return _appendSupervisorInvestigationHints(base, options);
   }
   const templatePath = options.promptTemplate;
@@ -173,7 +173,36 @@ function render(options, iteration) {
       : '- Do not create git commits yourself. The Ralph runner manages automatic task commits when auto-commit is enabled.',
   };
-  return _renderTemplate(template, vars);
+  let rendered = _renderTemplate(template, vars);
+  rendered = _appendSupervisorInvestigationHints(rendered, options);
+  return rendered;
+}
+function _appendSupervisorInvestigationHints(rendered, options = {}) {
+  if (options.selfHealHints === false || process.env.RALPH_SELF_HEAL_HINTS === '0') {
+    return rendered;
+  }
+  const hints = Array.isArray(options.supervisorHints) ? options.supervisorHints.filter(_isUsableSupervisorHint) : [];
+  if (hints.length === 0) {
+    return rendered;
+  }
+  const section = [
+    '## Supervisor Investigation Hints',
+    '',
+    ...hints.map((hint) => `- \`${hint.path}\`: ${hint.rationale}`),
+  ].join('\n');
+  return `${String(rendered || '').trimEnd()}\n\n${section}`;
+}
+function _isUsableSupervisorHint(hint) {
+  return Boolean(
+    hint &&
+    typeof hint === 'object' &&
+    typeof hint.path === 'string' && hint.path.trim() &&
+    typeof hint.rationale === 'string' && hint.rationale.trim()
+  );
 }
 /**
@@ -189,4 +218,12 @@ function _renderTemplate(template, vars) {
   });
 }
-module.exports = { loadBase, render, _renderTemplate, _warnThreshold, _resetWarnNotice };
+module.exports = {
+  loadBase,
+  render,
+  _appendSupervisorInvestigationHints,
+  _isUsableSupervisorHint,
+  _renderTemplate,
+  _warnThreshold,
+  _resetWarnNotice,
+};