spec-and-loop 3.1.0 → 3.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -21,6 +21,7 @@ Enforced rules:
21
21
  - Title is one outcome, not a list. If you need "and" twice, split.
22
22
  - Scope names files so the loop does not hunt.
23
23
  - `Done when` bullets are observable or runnable. No soft verbs (`ensure`, `support`, `validate`, `keep`) without attached evidence.
24
+ - Verifier commands use the narrowest runnable command that proves the scoped change. Prefer a named test file, spec pattern, package script, or static check over a full-suite command.
24
25
  - `Stop and hand off if` gives the loop written permission to halt.
25
26
 
26
27
  ## Ordering
@@ -52,22 +53,55 @@ Rules:
52
53
 
53
54
  Split test: if the loop stopped halfway, would the repo be clean and reviewable? If yes and there's a verifier for each half, split. If no half is meaningful alone, don't split.
54
55
 
56
+ ## Surgical validation
57
+
58
+ Task validators must be surgical and efficient so the loop spends tokens on implementation signal, not unrelated test noise.
59
+
60
+ - Start every task with the cheapest verifier that proves the task's stated scope: direct unit test file, targeted node/browser spec, exact lint/typecheck command for touched files if available, schema validator, or focused `rg` assertion.
61
+ - Verify command routing before writing it into `tasks.md`. If `npm test -- <pattern>` or similar still runs unrelated suites in that repo, write the direct runner command instead (for example, `pnpm exec vitest --config <config> --run <test-file>`).
62
+ - Use broad gates (`npm test`, `pnpm typecheck`, `make all`, browser/e2e suites) only when the task owns repo-wide integration behavior, when they are recorded as pre-flight baselines, or in a final integrated quality-gate task.
63
+ - If a broad gate is still required for a narrow task, pair it with explicit baseline classification: `` `<gate command>` exits 0, or failures match the pre-flight baseline with no new failures in this task's scope ``.
64
+ - Prefer one focused verifier per task. Add a second verifier only when it proves a different artifact class, such as a schema validator plus one targeted unit test.
65
+
55
66
  ## Quality gates
56
67
 
57
68
  - A failing `Done when` check means the task is NOT done. No rationalization.
58
69
  - "Pre-existing" requires a before-baseline. Without one, any failure could be a regression.
59
70
  - First task in a chain that needs clean gates must be a pre-flight baseline that records gate output.
60
71
  - Explicitly distinguish known-broken validators (document and continue) from required-clean validators (hard stop). If only one is named, the loop generalizes permissively.
72
+ - If a pre-flight baseline records a failing gate, later tasks MUST NOT require only a strict clean result for that same gate unless the task is intentionally responsible for fixing that baseline failure. Use one of these explicit forms:
73
+ - Baseline classification: `` `<gate command>` exits 0, or failures match the pre-flight baseline with no new failures in this task's scope ``
74
+ - Authorized cleanup: `` `<gate command>` exits 0 after fixing the named baseline failures in `<path/one.ts>` and `<path/two.ts>` ``
75
+ - Hard blocker: `` `<gate command>` exits 0; baseline failures are not allowed for this task ``
76
+ - When strict clean-gate text conflicts with a failing pre-flight baseline and no classification/cleanup rule is written, `ralph-run` will warn the agent to stop with `BLOCKED_HANDOFF` instead of spending iterations on unauthorized cleanup.
77
+ - When a task refers to a pre-flight baseline, or follows a completed pre-flight baseline task, but the matching `.ralph/baselines/<change>-<gate>.txt` artifact is missing, `ralph-run` will warn the agent to stop with `BLOCKED_HANDOFF` instead of treating undocumented failures as known.
78
+ - A pre-flight baseline task must produce runner-recognizable artifacts, not just human-readable logs: baseline files must live under the change-local `.ralph/baselines/` directory that `ralph-run` reads, their filenames must identify the gate (`typecheck`, `lint`, `test`, etc.), and every captured gate file must end with a literal `EXIT=<integer>` line.
79
+ - If a later task is allowed to repair baseline artifact compatibility, say so explicitly. Its `Scope:` must name the change-local `.ralph/baselines/` directory and its `Done when:` bullets must require the missing or malformed baseline files to be restored with parseable `EXIT=<integer>` footers. Without that authorization, baseline artifact repair remains an operator handoff, not product implementation work.
80
+ - Authorized cleanup is intentionally narrow: the named files must be backticked, the cleanup is limited to compiler/lint-only fixes, and `ralph-run` gives the agent one repair attempt for those files on that task. If the gate still fails after that attempt, the next prompt tells the agent to hand off instead of retrying.
61
81
 
62
82
  Pre-flight template:
63
83
  ```markdown
64
84
  - [ ] **Pre-flight: record quality gate baselines**
65
- - Scope: no code edits
85
+ - Scope: no code edits; writes only under `.ralph/baselines/`
66
86
  - Change: Capture current state of all gates later tasks require.
67
87
  - Done when:
68
- - `.ralph/baselines/<change>-<gate>.txt` exists for each gate with full output
69
- - `.ralph/baselines/<change>-readme.md` lists passing/failing gates and exact failing identifiers
70
- - Stop and hand off if: any gate is nondeterministic across two runs.
88
+ - `.ralph/baselines/<gate>.txt` or `.ralph/baselines/<change>-<gate>.txt` exists for each gate with full output
89
+ - every captured gate file ends with a literal `EXIT=<integer>` line
90
+ - `.ralph/baselines/<change>-readme.md` lists passing/failing gates, exit codes, and exact failing identifiers
91
+ - Stop and hand off if: any gate is nondeterministic across two runs, or any captured baseline file is missing the `EXIT=<integer>` final line after retrying the capture command.
92
+ ```
93
+
94
+ Baseline artifact compatibility repair template:
95
+ ```markdown
96
+ - [ ] **Repair pre-flight baseline artifact compatibility**
97
+ - Scope: `.ralph/baselines/`, `tasks.md`
98
+ - Change: Restore or regenerate baseline artifacts so `ralph-run` can classify later quality-gate failures.
99
+ - Done when:
100
+ - change-local `.ralph/baselines/<gate>.txt` files exist for every gate referenced by later baseline-classified tasks
101
+ - every restored gate file ends with a literal `EXIT=<integer>` line
102
+ - the baseline readme records the source of any restored artifact and the exit code for each gate
103
+ - Stop and hand off if:
104
+ - the original gate output is missing, the original exit code cannot be recovered, or restoring the artifact would require rerunning a nondeterministic gate.
71
105
  ```
72
106
 
73
107
  ## Anti-patterns (do not do these)
@@ -80,6 +114,8 @@ Pre-flight template:
80
114
  - `Done when` that only checks unit tests when real behavior is end-to-end
81
115
  - Visual verification without splitting from code changes (context overflow risk)
82
116
  - "Maybe this, maybe that" wording in tasks or specs once loop starts
117
+ - Repo-wide or slow validators for a narrow task when a focused verifier exists (`npm test`, `make all`, full browser/e2e suites)
118
+ - Ambiguous package-manager forwarding such as `npm test -- event-schema` unless confirmed to execute only the intended test scope
83
119
 
84
120
  ## Examples
85
121
 
@@ -119,7 +155,7 @@ Pre-flight template:
119
155
  - Change: Harbor components registered once at boot, typed for TSX.
120
156
  - Done when:
121
157
  - `rg "registerHarbor" src` returns exactly one call site
122
- - `npm test -- harbor-bootstrap` passes
158
+ - `npm exec vitest --run src/components/harbor-bootstrap.test.tsx` exits 0
123
159
  - Stop and hand off if: more than one registration site is required.
124
160
  ```
125
161
 
@@ -136,7 +172,7 @@ Pre-flight template:
136
172
  - Change: ReleaseCard renders timestamps through the shared helper.
137
173
  - Done when:
138
174
  - `rg "toLocaleDateString" src/components/ReleaseCard.tsx` returns no matches
139
- - `npm test -- ReleaseCard` passes
175
+ - `npm exec vitest --run src/components/ReleaseCard.test.tsx` exits 0
140
176
  - Stop and hand off if: `formatDate` does not cover a required locale.
141
177
  ```
142
178
 
@@ -52,6 +52,10 @@ function read(ralphDir) {
52
52
  * @param {Array} entry.toolUsage - Tool usage summary array
53
53
  * @param {Array} entry.filesChanged - Files changed in this iteration
54
54
  * @param {number} entry.exitCode - OpenCode exit code
55
+ * @param {boolean} [entry.blockedHandoffDetected] - Whether the iteration emitted
56
+ * the configured blocked-handoff promise and stopped for operator action.
57
+ * @param {string} [entry.blockedHandoffNote] - Compact, single-line preview of
58
+ * the extracted blocker note. The full note is persisted in HANDOFF.md.
55
59
  * @param {number} [entry.promptBytes] - UTF-8 byte length of the assembled prompt
56
60
  * @param {number} [entry.promptChars] - Character length of the assembled prompt
57
61
  * @param {number} [entry.promptTokens] - Estimated token count for the prompt (chars/4, rounded)
@@ -35,6 +35,7 @@ const prompt = require('./prompt');
35
35
  * @param {number} [options.maxIterations] - Maximum iterations (default: 50)
36
36
  * @param {string} [options.completionPromise] - Promise string signaling loop completion (default: "COMPLETE")
37
37
  * @param {string} [options.taskPromise] - Promise string signaling task completion (default: "READY_FOR_NEXT_TASK")
38
+ * @param {string} [options.blockedHandoffPromise] - Promise string signaling the agent is blocked and requesting human handoff (default: "BLOCKED_HANDOFF")
38
39
  * @param {boolean} [options.tasksMode] - Enable tasks mode (default: false)
39
40
  * @param {string} [options.tasksFile] - Path to tasks file when tasksMode is true
40
41
  * @param {boolean} [options.noCommit] - Suppress auto-commit (default: false)