ralphctl 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +29 -16
- package/dist/absolute-path-WUTZQ37D.mjs +8 -0
- package/dist/chunk-6RDMCLWU.mjs +108 -0
- package/dist/chunk-HIU74KTO.mjs +1046 -0
- package/dist/chunk-S3PTDH57.mjs +78 -0
- package/dist/chunk-WV4D2CPG.mjs +26 -0
- package/dist/cli.mjs +22413 -717
- package/dist/manifest.json +24 -0
- package/dist/prompt-adapter-JQICGVX7.mjs +7 -0
- package/dist/prompts/ideate.md +3 -1
- package/dist/prompts/plan-auto.md +23 -8
- package/dist/prompts/plan-common-examples.md +3 -3
- package/dist/prompts/plan-common.md +6 -5
- package/dist/prompts/plan-interactive.md +30 -7
- package/dist/prompts/repo-onboard.md +154 -64
- package/dist/prompts/signals-task.md +3 -0
- package/dist/prompts/sprint-feedback.md +3 -0
- package/dist/prompts/task-evaluation.md +74 -53
- package/dist/prompts/task-execution.md +65 -21
- package/dist/prompts/ticket-refine.md +11 -8
- package/dist/prompts/validation-checklist.md +3 -2
- package/dist/skills/default/abstraction-first/SKILL.md +45 -0
- package/dist/skills/default/alignment/SKILL.md +46 -0
- package/dist/skills/default/iterative-review/SKILL.md +48 -0
- package/dist/skills/exec/.gitkeep +0 -0
- package/dist/skills/plan/.gitkeep +0 -0
- package/dist/skills/refine/.gitkeep +0 -0
- package/dist/storage-paths-IPNZZM5D.mjs +15 -0
- package/dist/validation-error-QT6Q7FYU.mjs +7 -0
- package/package.json +9 -4
- package/dist/add-67UFUI54.mjs +0 -17
- package/dist/add-DVPVHENV.mjs +0 -18
- package/dist/bootstrap-FMHG6DRY.mjs +0 -11
- package/dist/chunk-62HYDA7L.mjs +0 -1128
- package/dist/chunk-747KW2RW.mjs +0 -24
- package/dist/chunk-BSB4EDGR.mjs +0 -260
- package/dist/chunk-BT5FKIZX.mjs +0 -787
- package/dist/chunk-CBMFRQ4Y.mjs +0 -441
- package/dist/chunk-CFUVE2BP.mjs +0 -16
- package/dist/chunk-D6QZNEYN.mjs +0 -5520
- package/dist/chunk-FNAAA32W.mjs +0 -103
- package/dist/chunk-GQ2WFKBN.mjs +0 -269
- package/dist/chunk-IWXBJD2D.mjs +0 -27
- package/dist/chunk-OGEXYSFS.mjs +0 -228
- package/dist/chunk-VAZ3LJBI.mjs +0 -179
- package/dist/chunk-WDMLPXOD.mjs +0 -363
- package/dist/chunk-XN2UIHBY.mjs +0 -589
- package/dist/chunk-ZE2BRQA2.mjs +0 -5542
- package/dist/create-Z635FQKO.mjs +0 -15
- package/dist/handle-23EFF3BE.mjs +0 -22
- package/dist/mount-NCYR22SN.mjs +0 -7434
- package/dist/project-DQHF4ISP.mjs +0 -34
- package/dist/prompts/check-script-discover.md +0 -69
- package/dist/prompts/ideate-auto.md +0 -195
- package/dist/prompts/task-evaluation-resume.md +0 -41
- package/dist/resolver-OVPYVW6Q.mjs +0 -163
- package/dist/sprint-4E26AB5F.mjs +0 -38
- package/dist/start-T34NI3LF.mjs +0 -19
|
@@ -19,6 +19,10 @@ These verification criteria are the pre-agreed definition of "done" — your pri
|
|
|
19
19
|
|
|
20
20
|
</task-specification>
|
|
21
21
|
|
|
22
|
+
{{DONE_CRITERIA_SECTION}}
|
|
23
|
+
|
|
24
|
+
{{EVALUATE_WORKSPACE}}
|
|
25
|
+
|
|
22
26
|
## Review Protocol
|
|
23
27
|
|
|
24
28
|
**You are a reviewer — do not edit files.** If you believe a fix is needed, emit `<evaluation-failed>` with a concrete
|
|
@@ -86,15 +90,23 @@ rubber stamp — flag it as a Completeness failure rather than emitting it yours
|
|
|
86
90
|
|
|
87
91
|
### Phase 3: Dimension Assessment
|
|
88
92
|
|
|
89
|
-
Evaluate the implementation across the dimensions below.
|
|
90
|
-
dimension fails, the overall evaluation fails. The first four
|
|
91
|
-
planner may have flagged additional task-specific dimensions; when
|
|
93
|
+
Evaluate the implementation across the dimensions below. Score each dimension 1–5 using the rubric below. Dimensions
|
|
94
|
+
scoring 4 or 5 pass; dimensions scoring 1–3 fail. If ANY dimension fails, the overall evaluation fails. The first four
|
|
95
|
+
are the floor — every task is graded on them. The planner may have flagged additional task-specific dimensions; when
|
|
96
|
+
present, they are graded on top of the floor.
|
|
97
|
+
|
|
98
|
+
**Score rubric:**
|
|
99
|
+
|
|
100
|
+
- **5 — Exemplary:** no issues, idiomatic, every criterion met fully
|
|
101
|
+
- **4 — Solid:** minor concerns only, fully meets the bar
|
|
102
|
+
- **3 — Adequate:** functional but with notable gaps or rough edges
|
|
103
|
+
- **2 — Below bar:** incomplete or buggy; does not meet the bar
|
|
104
|
+
- **1 — Unacceptable:** broken, missing, or unsafe
|
|
92
105
|
|
|
93
|
-
**Evidence rule — load-bearing:** Every dimension line
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
restates the criterion in different words ("all tests pass", "implementation matches the spec", "no
|
|
106
|
+
**Evidence rule — load-bearing:** Every dimension line MUST cite a concrete observation from Phase 1 or Phase 2. A
|
|
107
|
+
score without evidence is a rubber stamp. Good evidence names something specific: a file path, a line number, a test
|
|
108
|
+
count, a command output, a function name, a verification criterion that was graded, a pattern from a sibling file.
|
|
109
|
+
Evidence that only restates the criterion in different words ("all tests pass", "implementation matches the spec", "no
|
|
98
110
|
issues found") is still generic and does NOT satisfy this rule.
|
|
99
111
|
|
|
100
112
|
<dimension name="Correctness" floor="true">
|
|
@@ -139,12 +151,12 @@ distracts from the actual pass/fail decision.
|
|
|
139
151
|
|
|
140
152
|
### Pass Bar
|
|
141
153
|
|
|
142
|
-
The implementation passes if ALL dimensions
|
|
154
|
+
The implementation passes if ALL dimensions score 4 or 5. Specifically:
|
|
143
155
|
|
|
144
|
-
- **Correctness
|
|
145
|
-
- **Completeness
|
|
146
|
-
- **Safety
|
|
147
|
-
- **Consistency
|
|
156
|
+
- **Correctness** (score 4–5): Every verification criterion is satisfied
|
|
157
|
+
- **Completeness** (score 4–5): All steps implemented, no unfinished markers
|
|
158
|
+
- **Safety** (score 4–5): No security vulnerabilities introduced
|
|
159
|
+
- **Consistency** (score 4–5): Follows existing codebase patterns{{EXTRA_DIMENSIONS_PASS_BAR}}
|
|
148
160
|
|
|
149
161
|
Fail only on missed verification criteria, skipped steps, safety issues, or genuine codebase-convention violations —
|
|
150
162
|
not style preferences, naming opinions, or improvements beyond the task scope. When verification criteria are provided,
|
|
@@ -157,12 +169,12 @@ Before you decide the verdict, answer both questions honestly:
|
|
|
157
169
|
1. **Did you actually run the Phase 1 verification commands?** If the check script exists and you did
|
|
158
170
|
not execute it, or you did not run `git status` / `git log`, you lack the ground truth that
|
|
159
171
|
authoritatively settles Correctness and Completeness.
|
|
160
|
-
2. **Can you name a specific observation for each dimension?** For every
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
172
|
+
2. **Can you name a specific observation for each dimension?** For every score you are about to emit,
|
|
173
|
+
point to a concrete piece of evidence — a file path, a line number, a test count, a tool output, a
|
|
174
|
+
function name, a verification criterion you graded. "Looks good" / "appears correct" / "no issues
|
|
175
|
+
found" are NOT specific observations.
|
|
164
176
|
|
|
165
|
-
If the answer to either question is **no**, you MUST
|
|
177
|
+
If the answer to either question is **no**, you MUST score Completeness 1 with a one-line finding
|
|
166
178
|
explaining what you skipped, and emit `<evaluation-failed>` — even if everything else seems fine. A
|
|
167
179
|
rubber-stamp PASS is worse than a real FAIL because it misleads the harness into marking work done
|
|
168
180
|
when it was never audited. This guard exists because the evaluator is the last line of defense
|
|
@@ -173,41 +185,46 @@ false PASS is a shipped bug.
|
|
|
173
185
|
|
|
174
186
|
Structure your output as a dimension assessment followed by a verdict signal.
|
|
175
187
|
|
|
176
|
-
**Format rule:** Each dimension MUST be a single line
|
|
177
|
-
|
|
188
|
+
**Format rule:** Each dimension MUST be a single line in this exact format:
|
|
189
|
+
|
|
190
|
+
```
|
|
191
|
+
**Dimension** (score 1-5): N — one-line finding
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Where `N` is the numeric score (1–5). Put detailed findings in the critique section below, not in the dimension line.
|
|
178
195
|
|
|
179
|
-
**Justification rule (enforced):** The `— one-line
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
196
|
+
**Justification rule (enforced):** The `— one-line finding` after the score is required, not decorative. A bare
|
|
197
|
+
`**Dimension** (score 1-5): N` with no em-dash and no finding is invalid — it parses as a rubber stamp and the
|
|
198
|
+
harness will treat the evaluation as failed. Every dimension line needs an em-dash (or hyphen) followed by a
|
|
199
|
+
non-empty, concrete finding.
|
|
183
200
|
|
|
184
|
-
### If the implementation passes all dimensions:
|
|
201
|
+
### If the implementation passes all dimensions (all scores 4 or 5):
|
|
185
202
|
|
|
186
|
-
Emit `<evaluation-passed>` ONLY when every dimension has a one-line justification that cites
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
203
|
+
Emit `<evaluation-passed>` ONLY when every dimension has a one-line justification that cites concrete evidence. A
|
|
204
|
+
`<evaluation-passed>` signal after bare score lines or after generic approval phrasing is a contract violation — in
|
|
205
|
+
that case, emit `<evaluation-failed>` instead with a Completeness score of 1 and a finding that you could not justify
|
|
206
|
+
the pass.
|
|
190
207
|
|
|
191
208
|
```
|
|
192
209
|
## Assessment
|
|
193
210
|
|
|
194
|
-
**Correctness
|
|
195
|
-
**Completeness
|
|
196
|
-
**Safety
|
|
197
|
-
**Consistency
|
|
211
|
+
**Correctness** (score 1-5): 5 — [one-line finding]
|
|
212
|
+
**Completeness** (score 1-5): 4 — [one-line finding]
|
|
213
|
+
**Safety** (score 1-5): 5 — [one-line finding]
|
|
214
|
+
**Consistency** (score 1-5): 4 — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_PASS}}
|
|
198
215
|
|
|
199
216
|
<evaluation-passed>
|
|
200
217
|
```
|
|
201
218
|
|
|
202
|
-
### If any dimension
|
|
219
|
+
### If any dimension scores 1–3:
|
|
203
220
|
|
|
204
221
|
```
|
|
205
222
|
## Assessment
|
|
206
223
|
|
|
207
|
-
**Correctness
|
|
208
|
-
**Completeness
|
|
209
|
-
**Safety
|
|
210
|
-
**Consistency
|
|
224
|
+
**Correctness** (score 1-5): N — [one-line finding]
|
|
225
|
+
**Completeness** (score 1-5): N — [one-line finding]
|
|
226
|
+
**Safety** (score 1-5): N — [one-line finding]
|
|
227
|
+
**Consistency** (score 1-5): N — [one-line finding]{{EXTRA_DIMENSIONS_ASSESSMENT_MIXED}}
|
|
211
228
|
|
|
212
229
|
<evaluation-failed>
|
|
213
230
|
[Specific, actionable critique organized by failing dimension.
|
|
@@ -220,33 +237,37 @@ Each issue must reference which dimension it violates.]
|
|
|
220
237
|
|
|
221
238
|
<examples>
|
|
222
239
|
|
|
223
|
-
**Example of a correct PASS:**
|
|
240
|
+
**Example of a correct PASS (all dimensions 4–5):**
|
|
224
241
|
|
|
225
242
|
> Task: "Add date validation to export endpoint"
|
|
226
243
|
> Verification criteria: "GET /exports?startDate=invalid returns 400", "Valid range returns filtered results"
|
|
227
244
|
>
|
|
228
|
-
> **Correctness
|
|
229
|
-
> correctly
|
|
230
|
-
> **Completeness
|
|
231
|
-
>
|
|
232
|
-
> **
|
|
245
|
+
> **Correctness** (score 1-5): 5 — Both criteria verified: invalid dates return 400 with error body, valid range
|
|
246
|
+
> filters correctly per integration test at `src/routes/exports.test.ts:88`
|
|
247
|
+
> **Completeness** (score 1-5): 4 — Schema, controller, and tests all implemented per steps; one minor TODO comment
|
|
248
|
+
> left but unrelated to this task's criteria
|
|
249
|
+
> **Safety** (score 1-5): 5 — Input validated via Zod at `src/routes/exports.ts:12` before reaching database layer
|
|
250
|
+
> **Consistency** (score 1-5): 4 — Follows existing endpoint patterns in `controllers/`; uses project's error response
|
|
251
|
+
> format from `src/lib/errors.ts`
|
|
233
252
|
|
|
234
|
-
**Example of a correct FAIL:**
|
|
253
|
+
**Example of a correct FAIL (one or more dimensions 1–3):**
|
|
235
254
|
|
|
236
255
|
> Task: "Add user search with pagination"
|
|
237
256
|
> Verification criteria: "Returns paginated results", "Supports name filter", "Returns 400 for invalid page number"
|
|
238
257
|
>
|
|
239
|
-
> **Correctness
|
|
240
|
-
>
|
|
241
|
-
> **
|
|
242
|
-
> **
|
|
258
|
+
> **Correctness** (score 1-5): 2 — Invalid page number returns 500 (unhandled exception at
|
|
259
|
+
> `src/controllers/users.ts:47`) instead of 400 as required by criterion 3
|
|
260
|
+
> **Completeness** (score 1-5): 4 — All three features implemented across controller, service, and tests
|
|
261
|
+
> **Safety** (score 1-5): 1 — `src/repositories/users.ts:23` interpolates `query` directly into a SQL string; SQL
|
|
262
|
+
> injection possible on any search input
|
|
263
|
+
> **Consistency** (score 1-5): 4 — Follows existing controller patterns and uses the shared pagination helper
|
|
243
264
|
>
|
|
244
265
|
> Issues:
|
|
245
266
|
>
|
|
246
|
-
>
|
|
247
|
-
>
|
|
248
|
-
>
|
|
249
|
-
>
|
|
267
|
+
> - [Correctness] `src/controllers/users.ts:47` — `parseInt(page)` returns NaN for non-numeric input, causing
|
|
268
|
+
> unhandled exception. Add validation before query.
|
|
269
|
+
> - [Safety] `src/repositories/users.ts:23` — `WHERE name LIKE '%${query}%'` is SQL injection. Use parameterized
|
|
270
|
+
> query: `WHERE name LIKE $1` with `%${query}%` as parameter.
|
|
250
271
|
|
|
251
272
|
</examples>
|
|
252
273
|
|
|
@@ -1,10 +1,9 @@
|
|
|
1
1
|
# Task Execution Protocol
|
|
2
2
|
|
|
3
|
-
You are a task implementer. Execute one pre-planned task precisely.
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
implementation steps, verification criteria, check script, branch, and prior task learnings.
|
|
3
|
+
You are a task implementer. Execute one pre-planned task precisely. Implement the task described below — read this whole
|
|
4
|
+
file before starting; it contains the task directive, implementation steps, verification criteria, check script, branch,
|
|
5
|
+
environment status, and a pointer to prior task learnings. Think through the declared steps before writing code; the
|
|
6
|
+
steps define the full scope — stop when they are complete, verify your work, and signal completion.
|
|
8
7
|
|
|
9
8
|
{{HARNESS_CONTEXT}}
|
|
10
9
|
|
|
@@ -12,9 +11,9 @@ When finished, emit a signal from the `<signals>` block below.
|
|
|
12
11
|
|
|
13
12
|
<constraints>
|
|
14
13
|
|
|
15
|
-
- **Respect task boundaries** — complete exactly the declared steps for this one task, then stop.
|
|
16
|
-
|
|
17
|
-
|
|
14
|
+
- **Respect task boundaries** — complete exactly the declared steps for this one task, then stop. Skipping steps,
|
|
15
|
+
improvising, or editing files outside the declared set spreads scope across tasks and breaks the dependency contract
|
|
16
|
+
the planner laid out.
|
|
18
17
|
- **Prefer fixing the code over the test** — a failing test usually indicates a bug in the implementation. Update
|
|
19
18
|
tests only when the declared steps intentionally change the asserted behaviour (e.g. a contract change, a regression
|
|
20
19
|
fix). If the right move is genuinely ambiguous, signal `<task-blocked>` so a human can decide — do not silently
|
|
@@ -22,13 +21,24 @@ When finished, emit a signal from the `<signals>` block below.
|
|
|
22
21
|
- **Verify before completing** — the harness runs a post-task check gate; unverified work will be caught and rejected.
|
|
23
22
|
- **Append progress, never overwrite** — append each progress entry at the end of the progress file. Overwriting
|
|
24
23
|
erases context that downstream tasks depend on.
|
|
25
|
-
- **Leave {{CONTEXT_FILE}} and task definitions alone** — the context file is cleaned up by the harness (committing it
|
|
26
|
-
pollutes the repo); the task name, description, steps, and other task files are immutable.
|
|
27
24
|
- **Never reference sprint-local identifiers in code** — do not mention acceptance-criterion labels (`AC1`, `AC2`,
|
|
28
25
|
`AC1–AC6`), ticket numbers, task IDs, or sprint IDs in source files, comments, docstrings, test names, commit
|
|
29
26
|
messages, or any committed artefact. These identifiers are ephemeral sprint metadata and become stale as tickets
|
|
30
27
|
close. If a comment needs to explain WHY, state the underlying invariant or constraint directly (e.g. "exactly one
|
|
31
28
|
confirmation per destructive action") rather than citing the AC that mandates it.
|
|
29
|
+
- **Editing `CLAUDE.md` / `AGENTS.md` / `.github/copilot-instructions.md`** — only when a declared step calls for it.
|
|
30
|
+
When you do, follow established memory-file practice:
|
|
31
|
+
- **Preserve existing prose verbatim.** Add new sections at the bottom; do not rewrite or paraphrase what's there.
|
|
32
|
+
The file is a contract — silent reflows surprise reviewers and erode trust.
|
|
33
|
+
- **Include only what an unfamiliar engineer would get wrong without being told.** Anything derivable from the
|
|
34
|
+
code itself does not belong here — empirical studies show redundancy reduces agent success.
|
|
35
|
+
- **Be specific and verifiable.** "Use 2-space indentation" beats "format properly"; "Run `pnpm verify` before
|
|
36
|
+
committing" beats "test your changes".
|
|
37
|
+
- **Stay under 200 lines, max 7 H2 sections, no H4+.** Adherence degrades past that.
|
|
38
|
+
- **Never embed slash commands, hooks, MCP server config, IDE settings, secrets, or credentials.** Those have
|
|
39
|
+
dedicated locations — `.claude/`, `.cursor/`, `settings.json`, etc.
|
|
40
|
+
- **Treat the file as ground truth when reading it for project rules** — even if the surrounding code pre-dates a
|
|
41
|
+
rule, follow what the file says rather than mimicking the older code.
|
|
32
42
|
|
|
33
43
|
{{COMMIT_CONSTRAINT}}
|
|
34
44
|
|
|
@@ -36,25 +46,52 @@ When finished, emit a signal from the `<signals>` block below.
|
|
|
36
46
|
|
|
37
47
|
{{PROJECT_TOOLING}}
|
|
38
48
|
|
|
49
|
+
## Task
|
|
50
|
+
|
|
51
|
+
# {{TASK_NAME}}
|
|
52
|
+
|
|
53
|
+
**Task ID:** `{{TASK_ID}}`
|
|
54
|
+
**Project Path:** {{PROJECT_PATH}}
|
|
55
|
+
{{BRANCH_LINE}}
|
|
56
|
+
|
|
57
|
+
{{TASK_DESCRIPTION_SECTION}}
|
|
58
|
+
|
|
59
|
+
{{TASK_STEPS_SECTION}}
|
|
60
|
+
|
|
61
|
+
{{VERIFICATION_CRITERIA_SECTION}}
|
|
62
|
+
|
|
63
|
+
## Check Script
|
|
64
|
+
|
|
65
|
+
{{CHECK_SCRIPT_SECTION}}
|
|
66
|
+
|
|
67
|
+
## Environment Status
|
|
68
|
+
|
|
69
|
+
{{ENVIRONMENT_STATUS}}
|
|
70
|
+
|
|
71
|
+
## Prior Task Learnings
|
|
72
|
+
|
|
73
|
+
Read `{{PROGRESS_FILE}}` for accumulated learnings, gotchas, and patterns recorded by previous tasks in this sprint.
|
|
74
|
+
Skip the file when it does not exist (first task of the sprint).
|
|
75
|
+
|
|
39
76
|
## Phase 1: Reconnaissance (feedforward — understand before acting)
|
|
40
77
|
|
|
41
78
|
Perform these checks before writing any code. The goal is to steer your implementation correctly on the first attempt,
|
|
42
79
|
not discover problems after the fact.
|
|
43
80
|
|
|
44
81
|
1. **Verify working directory** — run `pwd` to confirm you are in the expected project directory
|
|
45
|
-
2. **Read progress history** — read {{PROGRESS_FILE}} to understand what previous tasks accomplished, patterns
|
|
82
|
+
2. **Read progress history** — read `{{PROGRESS_FILE}}` to understand what previous tasks accomplished, patterns
|
|
46
83
|
discovered, and gotchas encountered. This avoids duplicating work and surfaces context that the task steps may not
|
|
47
84
|
capture.
|
|
48
85
|
3. **Check git state** — run `git status` to check for uncommitted changes
|
|
49
|
-
4. **Check environment** — review the
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
86
|
+
4. **Check environment** — review the Check Script and Environment Status sections above. If a check script is listed
|
|
87
|
+
and the harness already verified the environment, review those results rather than re-running. If no check script
|
|
88
|
+
is listed, run the project's verification commands yourself (check CLAUDE.md, .github/copilot-instructions.md, or
|
|
89
|
+
project config when present). If any check shows failure, stop:
|
|
53
90
|
```
|
|
54
91
|
<task-blocked>Pre-existing failure: [details of what failed and the output]</task-blocked>
|
|
55
92
|
```
|
|
56
93
|
5. **Discover conventions** — read the project's configuration files to understand what conventions are enforced:
|
|
57
|
-
- `CLAUDE.md` or `.github/copilot-instructions.md` for project rules
|
|
94
|
+
- `CLAUDE.md` or `.github/copilot-instructions.md` for project rules (when present)
|
|
58
95
|
- `.eslintrc*`, `prettier*`, `tsconfig.json`, or equivalent for enforced style rules
|
|
59
96
|
- Test framework and test file patterns (e.g., `*.test.ts`, `*.spec.ts`, `__tests__/` vs co-located)
|
|
60
97
|
6. **Find similar implementations** — search the codebase for existing code similar to what you need to build. This is
|
|
@@ -64,7 +101,8 @@ not discover problems after the fact.
|
|
|
64
101
|
- If adding a utility, check if a similar utility already exists (reuse over reinvent)
|
|
65
102
|
- If adding tests, read existing test files to understand patterns, helpers, and assertions used
|
|
66
103
|
- Note: file paths, naming conventions, import patterns, error handling patterns
|
|
67
|
-
7. **Review
|
|
104
|
+
7. **Review prior learnings** — review the Prior Task Learnings section above (which points at the progress file) for
|
|
105
|
+
warnings or gotchas recorded by previous tasks in this sprint
|
|
68
106
|
|
|
69
107
|
Proceed to Phase 2 once all reconnaissance steps pass.
|
|
70
108
|
|
|
@@ -97,11 +135,11 @@ Proceed to Phase 2 once all reconnaissance steps pass.
|
|
|
97
135
|
Complete these steps IN ORDER:
|
|
98
136
|
|
|
99
137
|
1. **Confirm all steps done** — Every task step has been completed
|
|
100
|
-
2. **Run ALL verification commands** — Execute every verification command (see Check Script section
|
|
101
|
-
|
|
102
|
-
gate — your task is not marked done unless it passes.
|
|
138
|
+
2. **Run ALL verification commands** — Execute every verification command (see the Check Script section above, or the
|
|
139
|
+
project instructions if no check script is configured). Fix any failures before proceeding. The harness runs the
|
|
140
|
+
check script as a post-task gate — your task is not marked done unless it passes.
|
|
103
141
|
{{COMMIT_STEP}}
|
|
104
|
-
3. **Update progress file** — Append to {{PROGRESS_FILE}} using this format:
|
|
142
|
+
3. **Update progress file** — Append to `{{PROGRESS_FILE}}` using this format:
|
|
105
143
|
|
|
106
144
|
```markdown
|
|
107
145
|
## {ISO timestamp} - {task-id}: {task name}
|
|
@@ -187,3 +225,9 @@ judgment to a human with `<task-blocked>Steps incomplete: [what appears missing]
|
|
|
187
225
|
scope yourself.
|
|
188
226
|
|
|
189
227
|
{{SIGNALS}}
|
|
228
|
+
|
|
229
|
+
## References
|
|
230
|
+
|
|
231
|
+
- Anthropic, _Claude Code Memory (CLAUDE.md)_ — empirical basis for the 200-line / 7-H2 caps and the adherence-degradation claim: https://code.claude.com/docs/en/memory
|
|
232
|
+
- Anthropic, _Claude Code Best Practices_ — source of the "no slash commands / hooks / MCP / IDE settings in the project context file" rule: https://code.claude.com/docs/en/best-practices
|
|
233
|
+
- Gloaguen et al., _Evaluating AGENTS.md_ (arXiv 2602.11988) — redundant context measurably reduces agent success rate
|
|
@@ -7,10 +7,8 @@ stop when acceptance criteria are unambiguous.
|
|
|
7
7
|
|
|
8
8
|
<constraints>
|
|
9
9
|
|
|
10
|
-
- Focus exclusively on requirements, acceptance criteria, and scope — codebase exploration and repository selection
|
|
11
|
-
|
|
12
|
-
- Frame requirements as observable behavior ("user can filter by date") rather than technical jargon ("add SQL WHERE
|
|
13
|
-
clause") — implementation-agnostic specs give the planner maximum flexibility
|
|
10
|
+
- Focus exclusively on requirements, acceptance criteria, and scope — codebase exploration and repository selection happen in a later planning phase, not here
|
|
11
|
+
- Frame requirements as observable behavior ("user can filter by date") rather than technical jargon ("add SQL WHERE clause") — implementation-agnostic specs give the planner maximum flexibility
|
|
14
12
|
|
|
15
13
|
</constraints>
|
|
16
14
|
|
|
@@ -86,7 +84,8 @@ If you find yourself asking questions the ticket already answers, you have gone
|
|
|
86
84
|
|
|
87
85
|
### Step 4: Present Requirements for Approval
|
|
88
86
|
|
|
89
|
-
Present the complete requirements in readable markdown before writing to file — the user must see and approve them
|
|
87
|
+
Present the complete requirements in readable markdown before writing to file — the user must see and approve them
|
|
88
|
+
first.
|
|
90
89
|
Use proper headers, bullets, and formatting. Make it easy to scan and review.
|
|
91
90
|
|
|
92
91
|
Ask for approval using AskUserQuestion:
|
|
@@ -129,8 +128,8 @@ Use AskUserQuestion with 2-4 options per question:
|
|
|
129
128
|
- Descriptions explain trade-offs or implications
|
|
130
129
|
- Ask one question at a time
|
|
131
130
|
- Do not ask what the ticket already answers
|
|
132
|
-
- Labels must be 1-5 words (concise)
|
|
133
|
-
- Headers must be 12 characters or fewer
|
|
131
|
+
- Labels must be 1-5 words (concise) — UI rendering constraints
|
|
132
|
+
- Headers must be 12 characters or fewer — UI rendering constraints
|
|
134
133
|
- Use `multiSelect: true` when choices are not mutually exclusive
|
|
135
134
|
- Users automatically get an "Other" option — do not add your own
|
|
136
135
|
|
|
@@ -173,6 +172,8 @@ Options:
|
|
|
173
172
|
|
|
174
173
|
Write to: {{OUTPUT_FILE}}
|
|
175
174
|
|
|
175
|
+
When that path is empty, emit the JSON to stdout instead — the harness reads stdout in headless mode.
|
|
176
|
+
|
|
176
177
|
Output exactly one JSON object in the array for this ticket. If the ticket covers multiple sub-topics (e.g., map fixes,
|
|
177
178
|
route planning, UI layout), consolidate them into a single `requirements` string using numbered markdown headings
|
|
178
179
|
(`# 1. Topic`, `# 2. Topic`, etc.) separated by `---` dividers. Multiple JSON objects for the same ticket will break
|
|
@@ -181,7 +182,9 @@ the import pipeline.
|
|
|
181
182
|
JSON Schema:
|
|
182
183
|
|
|
183
184
|
```json
|
|
184
|
-
{{
|
|
185
|
+
{{
|
|
186
|
+
SCHEMA
|
|
187
|
+
}}
|
|
185
188
|
```
|
|
186
189
|
|
|
187
190
|
Example output:
|
|
@@ -7,12 +7,13 @@ Before writing the JSON output, verify EVERY item:
|
|
|
7
7
|
1. **Requirements complete** — problem statement, acceptance criteria, and scope boundaries are all present (when applicable)
|
|
8
8
|
2. **Exclusive file ownership** — each file is owned by exactly one task (or overlap is explicitly delineated in steps)
|
|
9
9
|
3. **Foundations before dependents** — tasks are ordered so prerequisites come first
|
|
10
|
-
4. **Valid dependencies** — every `blockedBy` reference
|
|
11
|
-
5. **
|
|
10
|
+
4. **Valid dependencies** — every `blockedBy` reference matches the `id` placeholder of an earlier task in the array
|
|
11
|
+
5. **Real dependencies only** — `blockedBy` reflects genuine code coupling; do not add it for trivial reasons; do not reference yourself
|
|
12
12
|
6. **Precise steps** — every task has specific, actionable steps with file references — as many as the scope needs (a small task may have 2 steps, a larger coherent one may have 8+)
|
|
13
13
|
7. **Verification steps** — every task ends with project-appropriate verification commands
|
|
14
14
|
8. **`projectPath` assigned** — every task uses a path from the available repositories
|
|
15
15
|
9. **Verification criteria** — every task has 2-4 `verificationCriteria` that are testable and unambiguous
|
|
16
16
|
10. **Raw JSON output** — the output is valid JSON matching the schema exactly; the harness parses the output directly as JSON, so emit it without markdown fences, commentary, or surrounding prose
|
|
17
|
+
11. **Unique placeholder ids** — each task's `id` is a unique string within this array (used only for `blockedBy` resolution)
|
|
17
18
|
|
|
18
19
|
</validation-checklist>
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: abstraction-first
|
|
3
|
+
description: Cross-phase skill — design the shape of the change (entities, boundaries, seams) before generating code, tasks, or acceptance criteria. Failure mode is "big blob" output that obscures the core change.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Abstraction-First
|
|
7
|
+
|
|
8
|
+
> Concept from [Martin Fowler — "Abstraction-First"](https://martinfowler.com/articles/structured-prompt-driven/abstraction-first.html). Adapted for ralphctl's three phases.
|
|
9
|
+
|
|
10
|
+
The shape of the change comes before the words that describe it. Name the entities, the boundaries, and the
|
|
11
|
+
seams the change touches **first**; the criteria, tasks, or code that follow are then arguments about that
|
|
12
|
+
shape, not freeform prose. Skip this and the output reads as a "big blob" — duplicated logic, blurred
|
|
13
|
+
responsibilities, work that has to be reviewed wholesale rather than incrementally.
|
|
14
|
+
|
|
15
|
+
## When this applies
|
|
16
|
+
|
|
17
|
+
- **Refine** — name the entities and the boundary of the change before listing acceptance criteria. "Adds a
|
|
18
|
+
`UserBilling` aggregate that exposes `cancelSubscription`" is the right altitude. "The cancel button must
|
|
19
|
+
turn red" is too specific to be the spec.
|
|
20
|
+
- **Plan** — sketch which existing components the change extends, which new ones it introduces, and the seams
|
|
21
|
+
between them, before splitting into tasks. The task list is then the decomposition of a known shape, not a
|
|
22
|
+
guess about one.
|
|
23
|
+
- **Execute** — re-read the task's verification criteria and the surrounding code's existing pattern before
|
|
24
|
+
opening an editor. The "abstraction" at this altitude is the contract the task already declared; matching it
|
|
25
|
+
is the job.
|
|
26
|
+
|
|
27
|
+
## What to do
|
|
28
|
+
|
|
29
|
+
1. **Name the entities.** Real-world nouns the change talks about — domain objects, aggregates, modules,
|
|
30
|
+
external systems. If you cannot name three of them, the change is either trivial or under-specified.
|
|
31
|
+
2. **Draw the boundary.** Which files / directories / packages are in scope? Which are explicitly out? An
|
|
32
|
+
ambiguous boundary is the same problem as an ambiguous criterion — it lets later work drift.
|
|
33
|
+
3. **Identify the seam.** Where does the new behaviour meet the existing system? An interface, a port, a
|
|
34
|
+
route, a CLI command, a database table. The seam is where regressions hide; call it out by name.
|
|
35
|
+
4. **Only then describe behaviour.** Acceptance criteria, task steps, code — all of these are downstream of
|
|
36
|
+
the shape. Writing them first is what produces the "big blob".
|
|
37
|
+
|
|
38
|
+
## Anti-patterns
|
|
39
|
+
|
|
40
|
+
- **Specifying behaviour before naming entities** — produces criteria that read as a wishlist rather than a
|
|
41
|
+
spec. Reviewers cannot tell what the change actually _is_.
|
|
42
|
+
- **Listing files instead of naming a boundary** — "touches `foo.ts`, `bar.ts`, `baz.ts`" is not a boundary;
|
|
43
|
+
it is a side effect of one. Name the module or aggregate they belong to.
|
|
44
|
+
- **Inventing an abstraction the codebase does not have** — if the existing code has no `UserBilling`
|
|
45
|
+
aggregate, do not name one in the spec unless creating it is part of the change.
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: alignment
|
|
3
|
+
description: Cross-phase skill — establish a shared understanding of what will and will not be done before producing output. Restate the input back to the user; surface assumptions; agree before you write.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Alignment
|
|
7
|
+
|
|
8
|
+
> Concept from [Martin Fowler — "Alignment"](https://martinfowler.com/articles/structured-prompt-driven/alignment.html). Adapted for ralphctl's three phases.
|
|
9
|
+
|
|
10
|
+
The fastest way to ship the wrong thing is to start producing output before you have agreed on what is being
|
|
11
|
+
asked. Alignment is the discipline of restating the input, surfacing assumptions, and naming the non-goals
|
|
12
|
+
**before** the work begins. The cost of pausing to confirm is one round-trip; the cost of unwound output is
|
|
13
|
+
the whole change.
|
|
14
|
+
|
|
15
|
+
## When this applies
|
|
16
|
+
|
|
17
|
+
- **Refine** — refinement is itself an alignment exercise. Restate the ticket in one paragraph; list the
|
|
18
|
+
assumptions you would have to make to implement it; agree before drafting acceptance criteria. A criterion
|
|
19
|
+
built on a wrong premise is worse than a missing one.
|
|
20
|
+
- **Plan** — confirm the planner's read of the requirements before generating tasks. Repo selection, scope
|
|
21
|
+
boundaries, and dependency assumptions all need to land before task decomposition starts.
|
|
22
|
+
- **Execute** — re-read the task spec's verification criteria before writing code. The contract is the
|
|
23
|
+
arbiter; if your read of it differs from what's written, surface the conflict in a `<note>` rather than
|
|
24
|
+
guessing.
|
|
25
|
+
|
|
26
|
+
## What to do
|
|
27
|
+
|
|
28
|
+
1. **Restate the input.** One paragraph. What you understood, in your own words. The user corrects the
|
|
29
|
+
restatement before you spend their time on questions or output built on a wrong premise.
|
|
30
|
+
2. **List the assumptions.** Every implicit choice you would have to make to produce output — preferred
|
|
31
|
+
library, naming convention, error handling, scope boundary. Each one is a candidate for confirmation.
|
|
32
|
+
3. **Name the non-goals.** What is _out_ of scope is as load-bearing as what is _in_. Without explicit
|
|
33
|
+
non-goals, scope creep is the default.
|
|
34
|
+
4. **Agree before producing output.** Do not draft criteria, tasks, or code while the restatement and
|
|
35
|
+
assumptions are still open. If the input cannot be restated, it is not yet refined enough to plan.
|
|
36
|
+
|
|
37
|
+
## Anti-patterns
|
|
38
|
+
|
|
39
|
+
- **Asking what the ticket already answers.** A question the input already addresses signals you did not
|
|
40
|
+
read carefully — wasted round-trips erode the user's trust in the alignment loop.
|
|
41
|
+
- **Over-asking.** Three to six focused questions is typical; ten is interrogation. Group questions by
|
|
42
|
+
topic; let the user answer in batches; stop when the criteria are unambiguous.
|
|
43
|
+
- **Skipping the restatement.** Going straight to "is this OK?" with output already drafted means the
|
|
44
|
+
alignment is happening _after_ the work, where the cost of being wrong is highest.
|
|
45
|
+
- **Implementation talk during refinement.** Implementation choices belong to planning. Pulling them into
|
|
46
|
+
the alignment phase is how scope drifts.
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: iterative-review
|
|
3
|
+
description: Cross-phase skill — treat AI output as a controlled feedback loop, not a one-shot generation. Run the cheap check after each meaningful change; re-read your own output before signalling completion.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Iterative Review
|
|
7
|
+
|
|
8
|
+
> Concept from [Martin Fowler — "Iterative Review"](https://martinfowler.com/articles/structured-prompt-driven/iterative-review.html). Adapted for ralphctl's three phases.
|
|
9
|
+
|
|
10
|
+
One-shot generation looks fast and is slow. The cheap review you skipped at iteration N becomes the expensive
|
|
11
|
+
unwind at iteration N+5, when a regression that lived undetected through five steps surfaces only at the
|
|
12
|
+
post-task gate. Catching a problem at the seam between two changes is cheap; catching it at the end of a
|
|
13
|
+
200-line diff is not. The harness's check gate, the evaluator, and the review prompts are this loop in
|
|
14
|
+
deployed form — but the same posture also belongs **inside** each phase's work.
|
|
15
|
+
|
|
16
|
+
## When this applies
|
|
17
|
+
|
|
18
|
+
- **Refine** — re-read the drafted criteria once against the ticket before sending. Strike duplicates;
|
|
19
|
+
tighten "should" / "ideally" into checkable predicates. Cheap to do here, expensive once planning splits
|
|
20
|
+
tasks against the unclear version.
|
|
21
|
+
- **Plan** — re-read the generated task list against the requirements. Are the tasks independently
|
|
22
|
+
shippable? Do dependencies match the actual data flow? Reorder, merge, or drop before importing.
|
|
23
|
+
- **Execute** — run the project's check gate (lint, typecheck, tests) after each meaningful change, not
|
|
24
|
+
after the whole diff. Re-read your own diff once before signalling `<task-complete>`. You are the cheapest
|
|
25
|
+
reviewer the change ever gets.
|
|
26
|
+
|
|
27
|
+
## What to do
|
|
28
|
+
|
|
29
|
+
1. **Run the cheapest check first, often.** Lint, typecheck, narrow test runs — not the full suite — after
|
|
30
|
+
each meaningful change. The point is to catch the regression at the seam, not to certify completion.
|
|
31
|
+
2. **Re-read your own output once before submitting.** Whether it is criteria, tasks, or a diff, the second
|
|
32
|
+
read catches what the first one missed. Cheap.
|
|
33
|
+
3. **Treat the check gate as a loop, not a finish line.** A failing gate is feedback, not a verdict. Apply
|
|
34
|
+
the fix and re-run; do not signal completion against a red gate.
|
|
35
|
+
4. **When a fix attempt repeats the same failure, escalate rather than retry.** Two iterations of the same
|
|
36
|
+
error is a plateau — the next fix is a guess. Surface the blocker via `<task-blocked>` or `<note>` rather
|
|
37
|
+
than burning the budget.
|
|
38
|
+
|
|
39
|
+
## Anti-patterns
|
|
40
|
+
|
|
41
|
+
- **Heroic one-shot.** Drafting 200 lines, signalling complete, and discovering at the gate that lint
|
|
42
|
+
rejects every other line. The harness will catch it; the cost is the whole iteration.
|
|
43
|
+
- **Patching code without updating the prompt / spec.** Drift between the artefact and the spec accumulates
|
|
44
|
+
silently and shows up later as inexplicable behaviour no one can trace.
|
|
45
|
+
- **Treating the post-task gate as the only review.** It is the _last_ review, not the only one. Anything
|
|
46
|
+
the gate catches that you could have caught earlier is wasted budget.
|
|
47
|
+
- **Re-running the same fix unchanged.** If the same critique surfaces twice, the third attempt is not a
|
|
48
|
+
fix — it is hope. Plateau out and surface it.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import {
|
|
3
|
+
ensureLayoutDirs,
|
|
4
|
+
ensureLayoutDirsOnce,
|
|
5
|
+
resetEnsureLayoutDirsCache,
|
|
6
|
+
resolveStoragePaths
|
|
7
|
+
} from "./chunk-6RDMCLWU.mjs";
|
|
8
|
+
import "./chunk-S3PTDH57.mjs";
|
|
9
|
+
import "./chunk-WV4D2CPG.mjs";
|
|
10
|
+
export {
|
|
11
|
+
ensureLayoutDirs,
|
|
12
|
+
ensureLayoutDirsOnce,
|
|
13
|
+
resetEnsureLayoutDirsCache,
|
|
14
|
+
resolveStoragePaths
|
|
15
|
+
};
|