ralphctl 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/README.md +29 -16
  2. package/dist/absolute-path-WUTZQ37D.mjs +8 -0
  3. package/dist/chunk-6RDMCLWU.mjs +108 -0
  4. package/dist/chunk-HIU74KTO.mjs +1046 -0
  5. package/dist/chunk-S3PTDH57.mjs +78 -0
  6. package/dist/chunk-WV4D2CPG.mjs +26 -0
  7. package/dist/cli.mjs +22413 -717
  8. package/dist/manifest.json +24 -0
  9. package/dist/prompt-adapter-JQICGVX7.mjs +7 -0
  10. package/dist/prompts/ideate.md +3 -1
  11. package/dist/prompts/plan-auto.md +23 -8
  12. package/dist/prompts/plan-common-examples.md +3 -3
  13. package/dist/prompts/plan-common.md +6 -5
  14. package/dist/prompts/plan-interactive.md +30 -7
  15. package/dist/prompts/repo-onboard.md +154 -64
  16. package/dist/prompts/signals-task.md +3 -0
  17. package/dist/prompts/sprint-feedback.md +3 -0
  18. package/dist/prompts/task-evaluation.md +74 -53
  19. package/dist/prompts/task-execution.md +65 -21
  20. package/dist/prompts/ticket-refine.md +11 -8
  21. package/dist/prompts/validation-checklist.md +3 -2
  22. package/dist/skills/default/abstraction-first/SKILL.md +45 -0
  23. package/dist/skills/default/alignment/SKILL.md +46 -0
  24. package/dist/skills/default/iterative-review/SKILL.md +48 -0
  25. package/dist/skills/exec/.gitkeep +0 -0
  26. package/dist/skills/plan/.gitkeep +0 -0
  27. package/dist/skills/refine/.gitkeep +0 -0
  28. package/dist/storage-paths-IPNZZM5D.mjs +15 -0
  29. package/dist/validation-error-QT6Q7FYU.mjs +7 -0
  30. package/package.json +9 -4
  31. package/dist/add-67UFUI54.mjs +0 -17
  32. package/dist/add-DVPVHENV.mjs +0 -18
  33. package/dist/bootstrap-FMHG6DRY.mjs +0 -11
  34. package/dist/chunk-62HYDA7L.mjs +0 -1128
  35. package/dist/chunk-747KW2RW.mjs +0 -24
  36. package/dist/chunk-BSB4EDGR.mjs +0 -260
  37. package/dist/chunk-BT5FKIZX.mjs +0 -787
  38. package/dist/chunk-CBMFRQ4Y.mjs +0 -441
  39. package/dist/chunk-CFUVE2BP.mjs +0 -16
  40. package/dist/chunk-D6QZNEYN.mjs +0 -5520
  41. package/dist/chunk-FNAAA32W.mjs +0 -103
  42. package/dist/chunk-GQ2WFKBN.mjs +0 -269
  43. package/dist/chunk-IWXBJD2D.mjs +0 -27
  44. package/dist/chunk-OGEXYSFS.mjs +0 -228
  45. package/dist/chunk-VAZ3LJBI.mjs +0 -179
  46. package/dist/chunk-WDMLPXOD.mjs +0 -363
  47. package/dist/chunk-XN2UIHBY.mjs +0 -589
  48. package/dist/chunk-ZE2BRQA2.mjs +0 -5542
  49. package/dist/create-Z635FQKO.mjs +0 -15
  50. package/dist/handle-23EFF3BE.mjs +0 -22
  51. package/dist/mount-NCYR22SN.mjs +0 -7434
  52. package/dist/project-DQHF4ISP.mjs +0 -34
  53. package/dist/prompts/check-script-discover.md +0 -69
  54. package/dist/prompts/ideate-auto.md +0 -195
  55. package/dist/prompts/task-evaluation-resume.md +0 -41
  56. package/dist/resolver-OVPYVW6Q.mjs +0 -163
  57. package/dist/sprint-4E26AB5F.mjs +0 -38
  58. package/dist/start-T34NI3LF.mjs +0 -19
@@ -19,6 +19,10 @@ These verification criteria are the pre-agreed definition of "done" — your pri
19
19
 
20
20
  </task-specification>
21
21
 
22
+ {{DONE_CRITERIA_SECTION}}
23
+
24
+ {{EVALUATE_WORKSPACE}}
25
+
22
26
  ## Review Protocol
23
27
 
24
28
  **You are a reviewer — do not edit files.** If you believe a fix is needed, emit `<evaluation-failed>` with a concrete
@@ -86,15 +90,23 @@ rubber stamp — flag it as a Completeness failure rather than emitting it yours
86
90
 
87
91
  ### Phase 3: Dimension Assessment
88
92
 
89
- Evaluate the implementation across the dimensions below. Each dimension is pass/fail with a hard threshold — if ANY
90
- dimension fails, the overall evaluation fails. The first four are the floor — every task is graded on them. The
91
- planner may have flagged additional task-specific dimensions; when present, they are graded on top of the floor.
93
+ Evaluate the implementation across the dimensions below. Score each dimension 1–5 using the rubric below. Dimensions
94
+ scoring 4 or 5 pass; dimensions scoring 1–3 fail. If ANY dimension fails, the overall evaluation fails. The first four
95
+ are the floor — every task is graded on them. The planner may have flagged additional task-specific dimensions; when
96
+ present, they are graded on top of the floor.
97
+
98
+ **Score rubric:**
99
+
100
+ - **5 — Exemplary:** no issues, idiomatic, every criterion met fully
101
+ - **4 — Solid:** minor concerns only, fully meets the bar
102
+ - **3 — Adequate:** functional but with notable gaps or rough edges
103
+ - **2 — Below bar:** incomplete or buggy; does not meet the bar
104
+ - **1 — Unacceptable:** broken, missing, or unsafe
92
105
 
93
- **Evidence rule — load-bearing:** Every dimension line, PASS or FAIL, MUST cite a concrete observation
94
- from Phase 1 or Phase 2. A PASS without evidence is not a PASS it is a rubber stamp. Good evidence
95
- names something specific: a file path, a line number, a test count, a command output, a function
96
- name, a verification criterion that was graded, a pattern from a sibling file. Evidence that only
97
- restates the criterion in different words ("all tests pass", "implementation matches the spec", "no
106
+ **Evidence rule — load-bearing:** Every dimension line MUST cite a concrete observation from Phase 1 or Phase 2. A
107
+ score without evidence is a rubber stamp. Good evidence names something specific: a file path, a line number, a test
108
+ count, a command output, a function name, a verification criterion that was graded, a pattern from a sibling file.
109
+ Evidence that only restates the criterion in different words ("all tests pass", "implementation matches the spec", "no
98
110
  issues found") is still generic and does NOT satisfy this rule.
99
111
 
100
112
  <dimension name="Correctness" floor="true">
@@ -139,12 +151,12 @@ distracts from the actual pass/fail decision.
139
151
 
140
152
  ### Pass Bar
141
153
 
142
- The implementation passes if ALL dimensions pass. Specifically:
154
+ The implementation passes if ALL dimensions score 4 or 5. Specifically:
143
155
 
144
- - **Correctness**: Every verification criterion is satisfied
145
- - **Completeness**: All steps implemented, no unfinished markers
146
- - **Safety**: No security vulnerabilities introduced
147
- - **Consistency**: Follows existing codebase patterns{{EXTRA_DIMENSIONS_PASS_BAR}}
156
+ - **Correctness** (score 4–5): Every verification criterion is satisfied
157
+ - **Completeness** (score 4–5): All steps implemented, no unfinished markers
158
+ - **Safety** (score 4–5): No security vulnerabilities introduced
159
+ - **Consistency** (score 4–5): Follows existing codebase patterns{{EXTRA_DIMENSIONS_PASS_BAR}}
148
160
 
149
161
  Fail only on missed verification criteria, skipped steps, safety issues, or genuine codebase-convention violations —
150
162
  not style preferences, naming opinions, or improvements beyond the task scope. When verification criteria are provided,
@@ -157,12 +169,12 @@ Before you decide the verdict, answer both questions honestly:
157
169
  1. **Did you actually run the Phase 1 verification commands?** If the check script exists and you did
158
170
  not execute it, or you did not run `git status` / `git log`, you lack the ground truth that
159
171
  authoritatively settles Correctness and Completeness.
160
- 2. **Can you name a specific observation for each dimension?** For every PASS and FAIL line you are
161
- about to emit, point to a concrete piece of evidence — a file path, a line number, a test count,
162
- a tool output, a function name, a verification criterion you graded. "Looks good" / "appears
163
- correct" / "no issues found" are NOT specific observations.
172
+ 2. **Can you name a specific observation for each dimension?** For every score you are about to emit,
173
+ point to a concrete piece of evidence — a file path, a line number, a test count, a tool output, a
174
+ function name, a verification criterion you graded. "Looks good" / "appears correct" / "no issues
175
+ found" are NOT specific observations.
164
176
 
165
- If the answer to either question is **no**, you MUST FAIL Completeness with a one-line finding
177
+ If the answer to either question is **no**, you MUST score Completeness 1 with a one-line finding
166
178
  explaining what you skipped, and emit `<evaluation-failed>` — even if everything else seems fine. A
167
179
  rubber-stamp PASS is worse than a real FAIL because it misleads the harness into marking work done
168
180
  when it was never audited. This guard exists because the evaluator is the last line of defense
@@ -173,41 +185,46 @@ false PASS is a shipped bug.
173
185
 
174
186
  Structure your output as a dimension assessment followed by a verdict signal.
175
187
 
176
- **Format rule:** Each dimension MUST be a single line: `**Dimension**: PASS/FAIL one-line summary`. Put detailed
177
- findings in the critique section below, not in the dimension line.
188
+ **Format rule:** Each dimension MUST be a single line in this exact format:
189
+
190
+ ```
191
+ **Dimension** (score 1-5): N — one-line finding
192
+ ```
193
+
194
+ Where `N` is the numeric score (1–5). Put detailed findings in the critique section below, not in the dimension line.
178
195
 
179
- **Justification rule (enforced):** The `— one-line summary` after the verdict is required, not
180
- decorative. A bare `**Dimension**: PASS` with no em-dash and no finding is invalid — it parses as a
181
- rubber stamp and the harness will treat the evaluation as failed. Every dimension line needs an
182
- em-dash (or hyphen) followed by a non-empty, concrete finding.
196
+ **Justification rule (enforced):** The `— one-line finding` after the score is required, not decorative. A bare
197
+ `**Dimension** (score 1-5): N` with no em-dash and no finding is invalid — it parses as a rubber stamp and the
198
+ harness will treat the evaluation as failed. Every dimension line needs an em-dash (or hyphen) followed by a
199
+ non-empty, concrete finding.
183
200
 
184
- ### If the implementation passes all dimensions:
201
+ ### If the implementation passes all dimensions (all scores 4 or 5):
185
202
 
186
- Emit `<evaluation-passed>` ONLY when every dimension has a one-line justification that cites
187
- concrete evidence. A `<evaluation-passed>` signal after bare `PASS` lines or after generic approval
188
- phrasing is a contract violation — in that case, emit `<evaluation-failed>` instead with a
189
- Completeness finding that you could not justify the pass.
203
+ Emit `<evaluation-passed>` ONLY when every dimension has a one-line justification that cites concrete evidence. A
204
+ `<evaluation-passed>` signal after bare score lines or after generic approval phrasing is a contract violation — in
205
+ that case, emit `<evaluation-failed>` instead with a Completeness score of 1 and a finding that you could not justify
206
+ the pass.
190
207
 
191
208
  ```
192
209
  ## Assessment
193
210
 
194
- **Correctness**: PASS — [one-line finding]
195
- **Completeness**: PASS — [one-line finding]
196
- **Safety**: PASS — [one-line finding]
197
- **Consistency**: PASS — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_PASS}}
211
+ **Correctness** (score 1-5): 5 — [one-line finding]
212
+ **Completeness** (score 1-5): 4 — [one-line finding]
213
+ **Safety** (score 1-5): 5 — [one-line finding]
214
+ **Consistency** (score 1-5): 4 — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_PASS}}
198
215
 
199
216
  <evaluation-passed>
200
217
  ```
201
218
 
202
- ### If any dimension fails:
219
+ ### If any dimension scores 1–3:
203
220
 
204
221
  ```
205
222
  ## Assessment
206
223
 
207
- **Correctness**: PASS/FAIL — [one-line finding]
208
- **Completeness**: PASS/FAIL — [one-line finding]
209
- **Safety**: PASS/FAIL — [one-line finding]
210
- **Consistency**: PASS/FAIL — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_MIXED}}
224
+ **Correctness** (score 1-5): N — [one-line finding]
225
+ **Completeness** (score 1-5): N — [one-line finding]
226
+ **Safety** (score 1-5): N — [one-line finding]
227
+ **Consistency** (score 1-5): N — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_MIXED}}
211
228
 
212
229
  <evaluation-failed>
213
230
  [Specific, actionable critique organized by failing dimension.
@@ -220,33 +237,37 @@ Each issue must reference which dimension it violates.]
220
237
 
221
238
  <examples>
222
239
 
223
- **Example of a correct PASS:**
240
+ **Example of a correct PASS (all dimensions 4–5):**
224
241
 
225
242
  > Task: "Add date validation to export endpoint"
226
243
  > Verification criteria: "GET /exports?startDate=invalid returns 400", "Valid range returns filtered results"
227
244
  >
228
- > **Correctness**: PASS — Both criteria verified: invalid dates return 400 with error message, valid range filters
229
- > correctly
230
- > **Completeness**: PASS — Schema, controller, and tests all implemented per steps
231
- > **Safety**: PASS Input validated via Zod before reaching database layer
232
- > **Consistency**: PASSFollows existing endpoint patterns in controllers/, uses project's error response format
245
+ > **Correctness** (score 1-5): 5 — Both criteria verified: invalid dates return 400 with error body, valid range
246
+ > filters correctly per integration test at `src/routes/exports.test.ts:88`
247
+ > **Completeness** (score 1-5): 4 — Schema, controller, and tests all implemented per steps; one minor TODO comment
248
+ > left but unrelated to this task's criteria
249
+ > **Safety** (score 1-5): 5 Input validated via Zod at `src/routes/exports.ts:12` before reaching database layer
250
+ > **Consistency** (score 1-5): 4 — Follows existing endpoint patterns in `controllers/`; uses project's error response
251
+ > format from `src/lib/errors.ts`
233
252
 
234
- **Example of a correct FAIL:**
253
+ **Example of a correct FAIL (one or more dimensions 1–3):**
235
254
 
236
255
  > Task: "Add user search with pagination"
237
256
  > Verification criteria: "Returns paginated results", "Supports name filter", "Returns 400 for invalid page number"
238
257
  >
239
- > **Correctness**: FAIL — Invalid page number returns 500 (unhandled exception) instead of 400
240
- > **Completeness**: PASS All three features implemented
241
- > **Safety**: FAILSearch query interpolated directly into SQL string without parameterization
242
- > **Consistency**: PASSFollows existing controller patterns
258
+ > **Correctness** (score 1-5): 2 — Invalid page number returns 500 (unhandled exception at
259
+ > `src/controllers/users.ts:47`) instead of 400 as required by criterion 3
260
+ > **Completeness** (score 1-5): 4 All three features implemented across controller, service, and tests
261
+ > **Safety** (score 1-5): 1 `src/repositories/users.ts:23` interpolates `query` directly into a SQL string; SQL
262
+ > injection possible on any search input
263
+ > **Consistency** (score 1-5): 4 — Follows existing controller patterns and uses the shared pagination helper
243
264
  >
244
265
  > Issues:
245
266
  >
246
- > 1. [Correctness] `src/controllers/users.ts:47` — `parseInt(page)` returns NaN for non-numeric input, causing
247
- > unhandled exception. Add validation before query.
248
- > 2. [Safety] `src/repositories/users.ts:23` — `WHERE name LIKE '%${query}%'` is SQL injection. Use parameterized
249
- > query: `WHERE name LIKE $1` with `%${query}%` as parameter.
267
+ > - [Correctness] `src/controllers/users.ts:47` — `parseInt(page)` returns NaN for non-numeric input, causing
268
+ > unhandled exception. Add validation before query.
269
+ > - [Safety] `src/repositories/users.ts:23` — `WHERE name LIKE '%${query}%'` is SQL injection. Use parameterized
270
+ > query: `WHERE name LIKE $1` with `%${query}%` as parameter.
250
271
 
251
272
  </examples>
252
273
 
@@ -1,10 +1,9 @@
1
1
  # Task Execution Protocol
2
2
 
3
- You are a task implementer. Execute one pre-planned task precisely. Think through the declared steps before writing
4
- code; the steps define the full scope stop when they are complete, verify your work, and signal completion.
5
-
6
- Implement the task described in {{CONTEXT_FILE}}. Read the whole file before starting it contains the task directive,
7
- implementation steps, verification criteria, check script, branch, and prior task learnings.
3
+ You are a task implementer. Execute one pre-planned task precisely. Implement the task described below — read this whole
4
+ file before starting; it contains the task directive, implementation steps, verification criteria, check script, branch,
5
+ environment status, and a pointer to prior task learnings. Think through the declared steps before writing code; the
6
+ steps define the full scope stop when they are complete, verify your work, and signal completion.
8
7
 
9
8
  {{HARNESS_CONTEXT}}
10
9
 
@@ -12,9 +11,9 @@ When finished, emit a signal from the `<signals>` block below.
12
11
 
13
12
  <constraints>
14
13
 
15
- - **Respect task boundaries** — complete exactly the declared steps for this one task, then stop. Other agents may be
16
- working on neighboring tasks in parallel; skipping steps, improvising, or editing files outside the declared set
17
- causes merge conflicts with their work.
14
+ - **Respect task boundaries** — complete exactly the declared steps for this one task, then stop. Skipping steps,
15
+ improvising, or editing files outside the declared set spreads scope across tasks and breaks the dependency contract
16
+ the planner laid out.
18
17
  - **Prefer fixing the code over the test** — a failing test usually indicates a bug in the implementation. Update
19
18
  tests only when the declared steps intentionally change the asserted behaviour (e.g. a contract change, a regression
20
19
  fix). If the right move is genuinely ambiguous, signal `<task-blocked>` so a human can decide — do not silently
@@ -22,13 +21,24 @@ When finished, emit a signal from the `<signals>` block below.
22
21
  - **Verify before completing** — the harness runs a post-task check gate; unverified work will be caught and rejected.
23
22
  - **Append progress, never overwrite** — append each progress entry at the end of the progress file. Overwriting
24
23
  erases context that downstream tasks depend on.
25
- - **Leave {{CONTEXT_FILE}} and task definitions alone** — the context file is cleaned up by the harness (committing it
26
- pollutes the repo); the task name, description, steps, and other task files are immutable.
27
24
  - **Never reference sprint-local identifiers in code** — do not mention acceptance-criterion labels (`AC1`, `AC2`,
28
25
  `AC1–AC6`), ticket numbers, task IDs, or sprint IDs in source files, comments, docstrings, test names, commit
29
26
  messages, or any committed artefact. These identifiers are ephemeral sprint metadata and become stale as tickets
30
27
  close. If a comment needs to explain WHY, state the underlying invariant or constraint directly (e.g. "exactly one
31
28
  confirmation per destructive action") rather than citing the AC that mandates it.
29
+ - **Editing `CLAUDE.md` / `AGENTS.md` / `.github/copilot-instructions.md`** — only when a declared step calls for it.
30
+ When you do, follow established memory-file practice:
31
+ - **Preserve existing prose verbatim.** Add new sections at the bottom; do not rewrite or paraphrase what's there.
32
+ The file is a contract — silent reflows surprise reviewers and erode trust.
33
+ - **Include only what an unfamiliar engineer would get wrong without being told.** Anything derivable from the
34
+ code itself does not belong here — empirical studies show redundancy reduces agent success.
35
+ - **Be specific and verifiable.** "Use 2-space indentation" beats "format properly"; "Run `pnpm verify` before
36
+ committing" beats "test your changes".
37
+ - **Stay under 200 lines, max 7 H2 sections, no H4+.** Adherence degrades past that.
38
+ - **Never embed slash commands, hooks, MCP server config, IDE settings, secrets, or credentials.** Those have
39
+ dedicated locations — `.claude/`, `.cursor/`, `settings.json`, etc.
40
+ - **Treat the file as ground truth when reading it for project rules** — even if the surrounding code pre-dates a
41
+ rule, follow what the file says rather than mimicking the older code.
32
42
 
33
43
  {{COMMIT_CONSTRAINT}}
34
44
 
@@ -36,25 +46,52 @@ When finished, emit a signal from the `<signals>` block below.
36
46
 
37
47
  {{PROJECT_TOOLING}}
38
48
 
49
+ ## Task
50
+
51
+ # {{TASK_NAME}}
52
+
53
+ **Task ID:** `{{TASK_ID}}`
54
+ **Project Path:** {{PROJECT_PATH}}
55
+ {{BRANCH_LINE}}
56
+
57
+ {{TASK_DESCRIPTION_SECTION}}
58
+
59
+ {{TASK_STEPS_SECTION}}
60
+
61
+ {{VERIFICATION_CRITERIA_SECTION}}
62
+
63
+ ## Check Script
64
+
65
+ {{CHECK_SCRIPT_SECTION}}
66
+
67
+ ## Environment Status
68
+
69
+ {{ENVIRONMENT_STATUS}}
70
+
71
+ ## Prior Task Learnings
72
+
73
+ Read `{{PROGRESS_FILE}}` for accumulated learnings, gotchas, and patterns recorded by previous tasks in this sprint.
74
+ Skip the file when it does not exist (first task of the sprint).
75
+
39
76
  ## Phase 1: Reconnaissance (feedforward — understand before acting)
40
77
 
41
78
  Perform these checks before writing any code. The goal is to steer your implementation correctly on the first attempt,
42
79
  not discover problems after the fact.
43
80
 
44
81
  1. **Verify working directory** — run `pwd` to confirm you are in the expected project directory
45
- 2. **Read progress history** — read {{PROGRESS_FILE}} to understand what previous tasks accomplished, patterns
82
+ 2. **Read progress history** — read `{{PROGRESS_FILE}}` to understand what previous tasks accomplished, patterns
46
83
  discovered, and gotchas encountered. This avoids duplicating work and surfaces context that the task steps may not
47
84
  capture.
48
85
  3. **Check git state** — run `git status` to check for uncommitted changes
49
- 4. **Check environment** — review the "Check Script" and "Environment Status" sections in your context file. If a check
50
- script is configured, the harness already verified the environment review those results rather than re-running.
51
- If no check script is configured and no environment status is recorded, run the project's verification commands
52
- yourself (check CLAUDE.md, .github/copilot-instructions.md, or project config). If any check shows failure, stop:
86
+ 4. **Check environment** — review the Check Script and Environment Status sections above. If a check script is listed
87
+ and the harness already verified the environment, review those results rather than re-running. If no check script
88
+ is listed, run the project's verification commands yourself (check CLAUDE.md, .github/copilot-instructions.md, or
89
+ project config when present). If any check shows failure, stop:
53
90
  ```
54
91
  <task-blocked>Pre-existing failure: [details of what failed and the output]</task-blocked>
55
92
  ```
56
93
  5. **Discover conventions** — read the project's configuration files to understand what conventions are enforced:
57
- - `CLAUDE.md` or `.github/copilot-instructions.md` for project rules
94
+ - `CLAUDE.md` or `.github/copilot-instructions.md` for project rules (when present)
58
95
  - `.eslintrc*`, `prettier*`, `tsconfig.json`, or equivalent for enforced style rules
59
96
  - Test framework and test file patterns (e.g., `*.test.ts`, `*.spec.ts`, `__tests__/` vs co-located)
60
97
  6. **Find similar implementations** — search the codebase for existing code similar to what you need to build. This is
@@ -64,7 +101,8 @@ not discover problems after the fact.
64
101
  - If adding a utility, check if a similar utility already exists (reuse over reinvent)
65
102
  - If adding tests, read existing test files to understand patterns, helpers, and assertions used
66
103
  - Note: file paths, naming conventions, import patterns, error handling patterns
67
- 7. **Review context** — check the Prior Task Learnings section for warnings or gotchas from previous tasks
104
+ 7. **Review prior learnings** — review the Prior Task Learnings section above (which points at the progress file) for
105
+ warnings or gotchas recorded by previous tasks in this sprint
68
106
 
69
107
  Proceed to Phase 2 once all reconnaissance steps pass.
70
108
 
@@ -97,11 +135,11 @@ Proceed to Phase 2 once all reconnaissance steps pass.
97
135
  Complete these steps IN ORDER:
98
136
 
99
137
  1. **Confirm all steps done** — Every task step has been completed
100
- 2. **Run ALL verification commands** — Execute every verification command (see Check Script section in the context file
101
- or project instructions). Fix any failures before proceeding. The harness runs the check script as a post-task
102
- gate — your task is not marked done unless it passes.
138
+ 2. **Run ALL verification commands** — Execute every verification command (see the Check Script section above, or the
139
+ project instructions if no check script is configured). Fix any failures before proceeding. The harness runs the
140
+ check script as a post-task gate — your task is not marked done unless it passes.
103
141
  {{COMMIT_STEP}}
104
- 3. **Update progress file** — Append to {{PROGRESS_FILE}} using this format:
142
+ 3. **Update progress file** — Append to `{{PROGRESS_FILE}}` using this format:
105
143
 
106
144
  ```markdown
107
145
  ## {ISO timestamp} - {task-id}: {task name}
@@ -187,3 +225,9 @@ judgment to a human with `<task-blocked>Steps incomplete: [what appears missing]
187
225
  scope yourself.
188
226
 
189
227
  {{SIGNALS}}
228
+
229
+ ## References
230
+
231
+ - Anthropic, _Claude Code Memory (CLAUDE.md)_ — empirical basis for the 200-line / 7-H2 caps and the adherence-degradation claim: https://code.claude.com/docs/en/memory
232
+ - Anthropic, _Claude Code Best Practices_ — source of the "no slash commands / hooks / MCP / IDE settings in the project context file" rule: https://code.claude.com/docs/en/best-practices
233
+ - Gloaguen et al., _Evaluating AGENTS.md_ (arXiv 2602.11988) — redundant context measurably reduces agent success rate
@@ -7,10 +7,8 @@ stop when acceptance criteria are unambiguous.
7
7
 
8
8
  <constraints>
9
9
 
10
- - Focus exclusively on requirements, acceptance criteria, and scope — codebase exploration and repository selection
11
- happen in a later planning phase, not here
12
- - Frame requirements as observable behavior ("user can filter by date") rather than technical jargon ("add SQL WHERE
13
- clause") — implementation-agnostic specs give the planner maximum flexibility
10
+ - Focus exclusively on requirements, acceptance criteria, and scope — codebase exploration and repository selection happen in a later planning phase, not here
11
+ - Frame requirements as observable behavior ("user can filter by date") rather than technical jargon ("add SQL WHERE clause") — implementation-agnostic specs give the planner maximum flexibility
14
12
 
15
13
  </constraints>
16
14
 
@@ -86,7 +84,8 @@ If you find yourself asking questions the ticket already answers, you have gone
86
84
 
87
85
  ### Step 4: Present Requirements for Approval
88
86
 
89
- Present the complete requirements in readable markdown before writing to file — the user must see and approve them first.
87
+ Present the complete requirements in readable markdown before writing to file — the user must see and approve them
88
+ first.
90
89
  Use proper headers, bullets, and formatting. Make it easy to scan and review.
91
90
 
92
91
  Ask for approval using AskUserQuestion:
@@ -129,8 +128,8 @@ Use AskUserQuestion with 2-4 options per question:
129
128
  - Descriptions explain trade-offs or implications
130
129
  - Ask one question at a time
131
130
  - Do not ask what the ticket already answers
132
- - Labels must be 1-5 words (concise)
133
- - Headers must be 12 characters or fewer (fits UI)
131
+ - Labels must be 1-5 words (concise) — UI rendering constraints
132
+ - Headers must be 12 characters or fewer UI rendering constraints
134
133
  - Use `multiSelect: true` when choices are not mutually exclusive
135
134
  - Users automatically get an "Other" option — do not add your own
136
135
 
@@ -173,6 +172,8 @@ Options:
173
172
 
174
173
  Write to: {{OUTPUT_FILE}}
175
174
 
175
+ When that path is empty, emit the JSON to stdout instead — the harness reads stdout in headless mode.
176
+
176
177
  Output exactly one JSON object in the array for this ticket. If the ticket covers multiple sub-topics (e.g., map fixes,
177
178
  route planning, UI layout), consolidate them into a single `requirements` string using numbered markdown headings
178
179
  (`# 1. Topic`, `# 2. Topic`, etc.) separated by `---` dividers. Multiple JSON objects for the same ticket will break
@@ -181,7 +182,9 @@ the import pipeline.
181
182
  JSON Schema:
182
183
 
183
184
  ```json
184
- {{SCHEMA}}
185
+ {{
186
+ SCHEMA
187
+ }}
185
188
  ```
186
189
 
187
190
  Example output:
@@ -7,12 +7,13 @@ Before writing the JSON output, verify EVERY item:
7
7
  1. **Requirements complete** — problem statement, acceptance criteria, and scope boundaries are all present (when applicable)
8
8
  2. **Exclusive file ownership** — each file is owned by exactly one task (or overlap is explicitly delineated in steps)
9
9
  3. **Foundations before dependents** — tasks are ordered so prerequisites come first
10
- 4. **Valid dependencies** — every `blockedBy` reference points to an earlier task with a real code dependency
11
- 5. **Maximized parallelism** — independent tasks run in parallel; use `blockedBy` only when there is a genuine code dependency
10
+ 4. **Valid dependencies** — every `blockedBy` reference matches the `id` placeholder of an earlier task in the array
11
+ 5. **Real dependencies only** — `blockedBy` reflects genuine code coupling; do not add it for trivial reasons; do not reference yourself
12
12
  6. **Precise steps** — every task has specific, actionable steps with file references — as many as the scope needs (a small task may have 2 steps, a larger coherent one may have 8+)
13
13
  7. **Verification steps** — every task ends with project-appropriate verification commands
14
14
  8. **`projectPath` assigned** — every task uses a path from the available repositories
15
15
  9. **Verification criteria** — every task has 2-4 `verificationCriteria` that are testable and unambiguous
16
16
  10. **Raw JSON output** — the output is valid JSON matching the schema exactly; the harness parses the output directly as JSON, so emit it without markdown fences, commentary, or surrounding prose
17
+ 11. **Unique placeholder ids** — each task's `id` is a unique string within this array (used only for `blockedBy` resolution)
17
18
 
18
19
  </validation-checklist>
@@ -0,0 +1,45 @@
1
+ ---
2
+ name: abstraction-first
3
+ description: Cross-phase skill — design the shape of the change (entities, boundaries, seams) before generating code, tasks, or acceptance criteria. Failure mode is "big blob" output that obscures the core change.
4
+ ---
5
+
6
+ # Abstraction-First
7
+
8
+ > Concept from [Martin Fowler — "Abstraction-First"](https://martinfowler.com/articles/structured-prompt-driven/abstraction-first.html). Adapted for ralphctl's three phases.
9
+
10
+ The shape of the change comes before the words that describe it. Name the entities, the boundaries, and the
11
+ seams the change touches **first**; the criteria, tasks, or code that follow are then arguments about that
12
+ shape, not freeform prose. Skip this and the output reads as a "big blob" — duplicated logic, blurred
13
+ responsibilities, work that has to be reviewed wholesale rather than incrementally.
14
+
15
+ ## When this applies
16
+
17
+ - **Refine** — name the entities and the boundary of the change before listing acceptance criteria. "Adds a
18
+ `UserBilling` aggregate that exposes `cancelSubscription`" is the right altitude. "The cancel button must
19
+ turn red" is too specific to be the spec.
20
+ - **Plan** — sketch which existing components the change extends, which new ones it introduces, and the seams
21
+ between them, before splitting into tasks. The task list is then the decomposition of a known shape, not a
22
+ guess about one.
23
+ - **Execute** — re-read the task's verification criteria and the surrounding code's existing pattern before
24
+ opening an editor. The "abstraction" at this altitude is the contract the task already declared; matching it
25
+ is the job.
26
+
27
+ ## What to do
28
+
29
+ 1. **Name the entities.** Real-world nouns the change talks about — domain objects, aggregates, modules,
30
+ external systems. If you cannot name three of them, the change is either trivial or under-specified.
31
+ 2. **Draw the boundary.** Which files / directories / packages are in scope? Which are explicitly out? An
32
+ ambiguous boundary is the same problem as an ambiguous criterion — it lets later work drift.
33
+ 3. **Identify the seam.** Where does the new behaviour meet the existing system? An interface, a port, a
34
+ route, a CLI command, a database table. The seam is where regressions hide; call it out by name.
35
+ 4. **Only then describe behaviour.** Acceptance criteria, task steps, code — all of these are downstream of
36
+ the shape. Writing them first is what produces the "big blob".
37
+
38
+ ## Anti-patterns
39
+
40
+ - **Specifying behaviour before naming entities** — produces criteria that read as a wishlist rather than a
41
+ spec. Reviewers cannot tell what the change actually _is_.
42
+ - **Listing files instead of naming a boundary** — "touches `foo.ts`, `bar.ts`, `baz.ts`" is not a boundary;
43
+ it is a side effect of one. Name the module or aggregate they belong to.
44
+ - **Inventing an abstraction the codebase does not have** — if the existing code has no `UserBilling`
45
+ aggregate, do not name one in the spec unless creating it is part of the change.
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: alignment
3
+ description: Cross-phase skill — establish a shared understanding of what will and will not be done before producing output. Restate the input back to the user; surface assumptions; agree before you write.
4
+ ---
5
+
6
+ # Alignment
7
+
8
+ > Concept from [Martin Fowler — "Alignment"](https://martinfowler.com/articles/structured-prompt-driven/alignment.html). Adapted for ralphctl's three phases.
9
+
10
+ The fastest way to ship the wrong thing is to start producing output before you have agreed on what is being
11
+ asked. Alignment is the discipline of restating the input, surfacing assumptions, and naming the non-goals
12
+ **before** the work begins. The cost of pausing to confirm is one round-trip; the cost of unwound output is
13
+ the whole change.
14
+
15
+ ## When this applies
16
+
17
+ - **Refine** — refinement is itself an alignment exercise. Restate the ticket in one paragraph; list the
18
+ assumptions you would have to make to implement it; agree before drafting acceptance criteria. A criterion
19
+ built on a wrong premise is worse than a missing one.
20
+ - **Plan** — confirm the planner's read of the requirements before generating tasks. Repo selection, scope
21
+ boundaries, and dependency assumptions all need to land before task decomposition starts.
22
+ - **Execute** — re-read the task spec's verification criteria before writing code. The contract is the
23
+ arbiter; if your read of it differs from what's written, surface the conflict in a `<note>` rather than
24
+ guessing.
25
+
26
+ ## What to do
27
+
28
+ 1. **Restate the input.** One paragraph. What you understood, in your own words. The user corrects the
29
+ restatement before you spend their time on questions or output built on a wrong premise.
30
+ 2. **List the assumptions.** Every implicit choice you would have to make to produce output — preferred
31
+ library, naming convention, error handling, scope boundary. Each one is a candidate for confirmation.
32
+ 3. **Name the non-goals.** What is _out_ of scope is as load-bearing as what is _in_. Without explicit
33
+ non-goals, scope creep is the default.
34
+ 4. **Agree before producing output.** Do not draft criteria, tasks, or code while the restatement and
35
+ assumptions are still open. If the input cannot be restated, it is not yet refined enough to plan.
36
+
37
+ ## Anti-patterns
38
+
39
+ - **Asking what the ticket already answers.** A question the input already addresses signals you did not
40
+ read carefully — wasted round-trips erode the user's trust in the alignment loop.
41
+ - **Over-asking.** Three to six focused questions is typical; ten is interrogation. Group questions by
42
+ topic; let the user answer in batches; stop when the criteria are unambiguous.
43
+ - **Skipping the restatement.** Going straight to "is this OK?" with output already drafted means the
44
+ alignment is happening _after_ the work, where the cost of being wrong is highest.
45
+ - **Implementation talk during refinement.** Implementation choices belong to planning. Pulling them into
46
+ the alignment phase is how scope drifts.
@@ -0,0 +1,48 @@
1
+ ---
2
+ name: iterative-review
3
+ description: Cross-phase skill — treat AI output as a controlled feedback loop, not a one-shot generation. Run the cheap check after each meaningful change; re-read your own output before signalling completion.
4
+ ---
5
+
6
+ # Iterative Review
7
+
8
+ > Concept from [Martin Fowler — "Iterative Review"](https://martinfowler.com/articles/structured-prompt-driven/iterative-review.html). Adapted for ralphctl's three phases.
9
+
10
+ One-shot generation looks fast and is slow. The cheap review you skipped at iteration N becomes the expensive
11
+ unwind at iteration N+5, when a regression that lived undetected through five steps surfaces only at the
12
+ post-task gate. Catching a problem at the seam between two changes is cheap; catching it at the end of a
13
+ 200-line diff is not. The harness's check gate, the evaluator, and the review prompts are this loop in
14
+ deployed form — but the same posture also belongs **inside** each phase's work.
15
+
16
+ ## When this applies
17
+
18
+ - **Refine** — re-read the drafted criteria once against the ticket before sending. Strike duplicates;
19
+ tighten "should" / "ideally" into checkable predicates. Cheap to do here, expensive once planning splits
20
+ tasks against the unclear version.
21
+ - **Plan** — re-read the generated task list against the requirements. Are the tasks independently
22
+ shippable? Do dependencies match the actual data flow? Reorder, merge, or drop before importing.
23
+ - **Execute** — run the project's check gate (lint, typecheck, tests) after each meaningful change, not
24
+ after the whole diff. Re-read your own diff once before signalling `<task-complete>`. You are the cheapest
25
+ reviewer the change ever gets.
26
+
27
+ ## What to do
28
+
29
+ 1. **Run the cheapest check first, often.** Lint, typecheck, narrow test runs — not the full suite — after
30
+ each meaningful change. The point is to catch the regression at the seam, not to certify completion.
31
+ 2. **Re-read your own output once before submitting.** Whether it is criteria, tasks, or a diff, the second
32
+ read catches what the first one missed. Cheap.
33
+ 3. **Treat the check gate as a loop, not a finish line.** A failing gate is feedback, not a verdict. Apply
34
+ the fix and re-run; do not signal completion against a red gate.
35
+ 4. **When a fix attempt repeats the same failure, escalate rather than retry.** Two iterations of the same
36
+ error is a plateau — the next fix is a guess. Surface the blocker via `<task-blocked>` or `<note>` rather
37
+ than burning the budget.
38
+
39
+ ## Anti-patterns
40
+
41
+ - **Heroic one-shot.** Drafting 200 lines, signalling complete, and discovering at the gate that lint
42
+ rejects every other line. The harness will catch it; the cost is the whole iteration.
43
+ - **Patching code without updating the prompt / spec.** Drift between the artefact and the spec accumulates
44
+ silently and shows up later as inexplicable behaviour no one can trace.
45
+ - **Treating the post-task gate as the only review.** It is the _last_ review, not the only one. Anything
46
+ the gate catches that you could have caught earlier is wasted budget.
47
+ - **Re-running the same fix unchanged.** If the same critique surfaces twice, the third attempt is not a
48
+ fix — it is hope. Plateau out and surface it.
File without changes
File without changes
File without changes
@@ -0,0 +1,15 @@
1
+ #!/usr/bin/env node
2
+ import {
3
+ ensureLayoutDirs,
4
+ ensureLayoutDirsOnce,
5
+ resetEnsureLayoutDirsCache,
6
+ resolveStoragePaths
7
+ } from "./chunk-6RDMCLWU.mjs";
8
+ import "./chunk-S3PTDH57.mjs";
9
+ import "./chunk-WV4D2CPG.mjs";
10
+ export {
11
+ ensureLayoutDirs,
12
+ ensureLayoutDirsOnce,
13
+ resetEnsureLayoutDirsCache,
14
+ resolveStoragePaths
15
+ };
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env node
2
+ import {
3
+ ValidationError
4
+ } from "./chunk-WV4D2CPG.mjs";
5
+ export {
6
+ ValidationError
7
+ };