theslopmachine 0.9.1 → 0.9.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/MANUAL.md +15 -7
- package/README.md +2 -0
- package/assets/agents/developer.md +3 -2
- package/assets/agents/slopmachine-claude.md +45 -32
- package/assets/agents/slopmachine.md +48 -38
- package/assets/claude/agents/developer.md +2 -2
- package/assets/skills/clarification-gate/SKILL.md +3 -0
- package/assets/skills/claude-worker-management/SKILL.md +17 -10
- package/assets/skills/developer-session-lifecycle/SKILL.md +11 -10
- package/assets/skills/development-guidance/SKILL.md +8 -4
- package/assets/skills/evaluation-triage/SKILL.md +40 -31
- package/assets/skills/final-evaluation-orchestration/SKILL.md +124 -66
- package/assets/skills/integrated-verification/SKILL.md +24 -16
- package/assets/skills/planning-gate/SKILL.md +31 -6
- package/assets/skills/planning-guidance/SKILL.md +13 -2
- package/assets/skills/scaffold-guidance/SKILL.md +17 -12
- package/assets/skills/submission-packaging/SKILL.md +22 -9
- package/assets/skills/verification-gates/SKILL.md +48 -45
- package/assets/slopmachine/clarifier-agent-prompt.md +7 -0
- package/assets/slopmachine/owner-verification-checklist.md +23 -8
- package/assets/slopmachine/phase-1-design-prompt.md +22 -6
- package/assets/slopmachine/phase-1-design-template.md +41 -3
- package/assets/slopmachine/phase-2-execution-planning-prompt.md +47 -21
- package/assets/slopmachine/phase-2-plan-template.md +45 -30
- package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +13 -21
- package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +16 -69
- package/assets/slopmachine/scaffold-playbooks/android-native-java.md +12 -12
- package/assets/slopmachine/scaffold-playbooks/angular-default.md +8 -60
- package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +4 -20
- package/assets/slopmachine/scaffold-playbooks/backend-family-matrix.md +12 -12
- package/assets/slopmachine/scaffold-playbooks/django-default.md +4 -61
- package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +15 -58
- package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +5 -5
- package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +4 -4
- package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +4 -41
- package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +8 -30
- package/assets/slopmachine/scaffold-playbooks/frontend-family-matrix.md +11 -11
- package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +8 -8
- package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +4 -61
- package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +4 -4
- package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +1 -1
- package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +15 -15
- package/assets/slopmachine/scaffold-playbooks/laravel-default.md +8 -81
- package/assets/slopmachine/scaffold-playbooks/livewire-default.md +8 -101
- package/assets/slopmachine/scaffold-playbooks/platform-family-matrix.md +8 -8
- package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +8 -8
- package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +7 -89
- package/assets/slopmachine/scaffold-playbooks/tauri-default.md +14 -26
- package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +8 -30
- package/assets/slopmachine/scaffold-playbooks/web-default.md +3 -3
- package/assets/slopmachine/templates/AGENTS.md +5 -4
- package/assets/slopmachine/templates/CLAUDE.md +5 -4
- package/assets/slopmachine/templates/plan.md +45 -30
- package/assets/slopmachine/utils/claude_live_common.mjs +2 -0
- package/assets/slopmachine/utils/claude_worker_common.mjs +2 -0
- package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +1 -0
- package/package.json +1 -1
package/MANUAL.md
CHANGED
|
@@ -65,13 +65,21 @@ slopmachine init -o
|
|
|
65
65
|
1. Intake and setup
|
|
66
66
|
2. Clarification
|
|
67
67
|
3. Planning
|
|
68
|
-
4.
|
|
69
|
-
5.
|
|
70
|
-
6.
|
|
71
|
-
7.
|
|
72
|
-
8.
|
|
73
|
-
9.
|
|
74
|
-
|
|
68
|
+
4. Development, starting with the scaffold step inside `plan.md`
|
|
69
|
+
5. Rough integrated verification and hardening: repo coherence and small owner-side fixes only, with no Docker execution
|
|
70
|
+
6. Evaluation and fix verification, including the final coverage and README audit inside `P7`
|
|
71
|
+
7. Final readiness decision
|
|
72
|
+
8. Submission packaging, including the owner-only Docker and `./run_tests.sh` check
|
|
73
|
+
9. Retrospective
|
|
74
|
+
|
|
75
|
+
The intended fast path is:
|
|
76
|
+
|
|
77
|
+
- plan well
|
|
78
|
+
- land the minimal scaffold baseline
|
|
79
|
+
- execute the plan end to end
|
|
80
|
+
- make the repo coherent
|
|
81
|
+
- proceed through evaluation without Docker execution
|
|
82
|
+
- after evaluation is complete, have the owner run and fix `docker compose up --build` and `./run_tests.sh` before submission closes
|
|
75
83
|
|
|
76
84
|
## Important notes
|
|
77
85
|
|
package/README.md
CHANGED
|
@@ -154,6 +154,7 @@ What it creates:
|
|
|
154
154
|
- `docs/questions.md`
|
|
155
155
|
- `docs/design.md`
|
|
156
156
|
- `docs/api-spec.md`
|
|
157
|
+
- `docs/plan.md`
|
|
157
158
|
- `docs/test-coverage.md`
|
|
158
159
|
|
|
159
160
|
Important details:
|
|
@@ -166,6 +167,7 @@ Important details:
|
|
|
166
167
|
- `project_type` should use only `fullstack`, `backend`, `android`, `ios`, `desktop`, or `web` when known
|
|
167
168
|
- Beads lives in the workspace root, not inside `repo/`
|
|
168
169
|
- `repo/.claude/settings.json` seeds Claude Code to use the custom `developer` agent by default for that repo
|
|
170
|
+
- final packaging moves `repo/plan.md` to parent-root `docs/plan.md` and removes repo-local `AGENTS.md`, `CLAUDE.md`, and `plan.md` from the delivered `repo/`
|
|
169
171
|
- after non-`-o` bootstrap, the command prints the exact `cd repo` next step so you can continue immediately
|
|
170
172
|
- `--adopt` moves the current project files into `repo/`, preserves root workflow state in the parent workspace, and skips the automatic bootstrap commit
|
|
171
173
|
- `--continue-from <PX>` is a smoother alias for existing-project bootstrap; it implies adoption mode and seeds the requested start phase in one step
|
|
@@ -154,11 +154,12 @@ Broad commands you are not allowed to run during ordinary work:
|
|
|
154
154
|
|
|
155
155
|
- never run `./run_tests.sh`
|
|
156
156
|
- never run `docker compose up --build`
|
|
157
|
+
- never run any other Docker runtime, Compose, or containerized broad-verification command that stands in for those documented final commands
|
|
157
158
|
- never run browser E2E or Playwright during ordinary implementation work
|
|
158
159
|
- never run full test suites during ordinary implementation work unless explicitly instructed to run that exact command
|
|
159
|
-
- do not use those commands even if they are documented in the repo or look convenient for debugging
|
|
160
|
+
- do not use those commands even if they are documented in the repo, requested by the owner, suggested by a playbook, implied by `plan.md`, or look convenient for debugging
|
|
160
161
|
- if your work would normally call for one of those commands, stop at targeted local verification and report that the change is ready for broader verification
|
|
161
|
-
-
|
|
162
|
+
- do not run Docker-based runtime/test commands under any circumstances before `P9`, including when explicitly asked during planning, development, `P5`, or `P7`; the owner handles final Docker and `./run_tests.sh` verification after evaluation is complete
|
|
162
163
|
|
|
163
164
|
Your job is to make the broader verification likely to pass without running it yourself.
|
|
164
165
|
|
|
@@ -51,7 +51,8 @@ Planned human-stop moments do not exist.
|
|
|
51
51
|
|
|
52
52
|
Claude-capacity rule:
|
|
53
53
|
|
|
54
|
-
- if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over implementation work yourself
|
|
54
|
+
- if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over core product implementation work yourself
|
|
55
|
+
- small owner-side non-core fixes are still allowed while waiting, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn
|
|
55
56
|
- preserve the current developer session record, mark it blocked by rate limit, and automatically wait until the reset time specified by Claude using the packaged wait helper before resuming the same session
|
|
56
57
|
- only surface this as a user-visible blocker if the reset time cannot be determined or the wait or resume path itself fails
|
|
57
58
|
|
|
@@ -65,7 +66,9 @@ Claude-capacity rule:
|
|
|
65
66
|
|
|
66
67
|
## Prime Directive
|
|
67
68
|
|
|
68
|
-
Manage the work. Do not become the developer.
|
|
69
|
+
Manage the work. Do not become the developer for core product implementation.
|
|
70
|
+
|
|
71
|
+
You may still directly patch small non-core owner-side issues when that is the fastest correct way to keep the workflow moving, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn.
|
|
69
72
|
|
|
70
73
|
You own:
|
|
71
74
|
|
|
@@ -137,7 +140,7 @@ Do not create another competing workflow-state system.
|
|
|
137
140
|
Use git to preserve meaningful workflow checkpoints.
|
|
138
141
|
|
|
139
142
|
- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
|
|
140
|
-
- meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted
|
|
143
|
+
- meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted `P5` stabilization work when major fixes are truly needed, accepted evaluation-fix rounds, and other clearly reviewable milestones
|
|
141
144
|
- keep the git flow simple and checkpoint-oriented
|
|
142
145
|
- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
|
|
143
146
|
- keep commit messages descriptive and easy to reason about later
|
|
@@ -187,7 +190,7 @@ Phase rules:
|
|
|
187
190
|
- exactly one root phase should normally be active at a time
|
|
188
191
|
- enter the phase before real work for that phase begins
|
|
189
192
|
- do not close multiple root phases in one transition block
|
|
190
|
-
- `P5 Integrated Verification and Hardening`
|
|
193
|
+
- `P5 Integrated Verification and Hardening` should normally be one fast stabilization pass; only major brokenness should trigger a bounded Claude developer reroute before returning to evaluation readiness
|
|
191
194
|
- `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
192
195
|
|
|
193
196
|
## Developer Session Model
|
|
@@ -202,14 +205,19 @@ Maintain exactly one active developer session at a time.
|
|
|
202
205
|
- do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
|
|
203
206
|
- if adopted or resumed work needs Claude developer execution but no recoverable tracked Claude session exists yet, determine the correct lane for the current boundary, launch and orient that lane through `claude-worker-management`, persist the returned session id, and only then continue the substantive work
|
|
204
207
|
- when `P7` begins, do not automatically switch away from `develop-N`
|
|
205
|
-
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
-
|
|
210
|
-
-
|
|
208
|
+
- `P7` uses exactly 2 audit sessions
|
|
209
|
+
- each audit session starts from one fresh evaluator session and stays in that same evaluator session through fail regenerations and later fix checks
|
|
210
|
+
- the final coverage/README audit then uses one additional fresh evaluator session and stays in that same session through its reruns, so the whole `P7` flow uses exactly 3 evaluator sessions total
|
|
211
|
+
- after any kept audit report is saved, reread it and reject it if it hints at prior runs or if it has degraded materially from the original evaluation prompt's required depth, structure, sections, tables, verdict blocks, or evidence style
|
|
212
|
+
- each audit result decides the remediation lane:
|
|
213
|
+
- `fail` -> route the exact issue list back to the most recent recoverable Claude developer lane, discard the fail working report, fix the issues there, and then regenerate inside the same evaluator session
|
|
214
|
+
- `partial pass` -> keep `audit_report-<N>.md`, start `bugfix-N`, and keep its fix loop scoped to that audit report's issue list
|
|
215
|
+
- `pass` -> keep `audit_report-<N>.md`, start `bugfix-N` only for that report's recommended improvements, and if there are no actionable recommendations mark the audit session complete without inventing new issues
|
|
216
|
+
- require both audit sessions to complete before the final post-audit coverage/README audit can run
|
|
217
|
+
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, reread each generated report and reject prior-run wording such as `previously` or `remaining` when it refers to report history, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
|
|
211
218
|
- track the active evaluator session separately in metadata during `P7`
|
|
212
219
|
- if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
|
|
220
|
+
- once `P7` starts, keep looping inside `P7` until its exit criteria are actually satisfied; do not stop between audits, remediation turns, fix-check passes, or coverage/README reruns
|
|
213
221
|
|
|
214
222
|
## Parallelism Policy
|
|
215
223
|
|
|
@@ -248,35 +256,32 @@ When the first develop developer session begins in `P2`, start it in this exact
|
|
|
248
256
|
1. launch the live `develop-1` Claude `developer` lane
|
|
249
257
|
2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for design direction
|
|
250
258
|
3. capture and persist the Claude session id returned through bridge state
|
|
251
|
-
4. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require
|
|
252
|
-
5. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and
|
|
253
|
-
6. send the accepted design plus a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require
|
|
259
|
+
4. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require `../docs/design.md` and, when backend/fullstack APIs exist, `../docs/api-spec.md`, and say explicitly not to start execution planning yet
|
|
260
|
+
5. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until the design is accepted
|
|
261
|
+
6. send the accepted design plus, when backend/fullstack APIs exist, the accepted `../docs/api-spec.md`, with a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require `plan.md` plus an updated parent-root `../docs/test-coverage.md`, and say explicitly not to start implementation yet
|
|
254
262
|
7. in that Phase 2 request, require the lane map to be derived from the directory tree and owned-file boundaries, require as many bounded branches or worktrees or helper-agent lanes as safely possible, target at least 5 lanes when the codebase clearly supports it, require preplanned shared-file overlap and merge checkpoints, require exact serial-only justifications, require a dedicated git worktree plus explicit branch name for every planned parallel lane, and identify which named safe lanes must actually launch during implementation unless a blocker forces a reviewed revision
|
|
255
|
-
8. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and
|
|
263
|
+
8. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until `plan.md` is accepted
|
|
256
264
|
9. only after both planning phases are accepted may the broad `plan.md` development run begin
|
|
257
265
|
|
|
258
266
|
Do not reorder that sequence.
|
|
259
267
|
Do not ask for Phase 1 and Phase 2 in the same turn.
|
|
260
268
|
Do not create fresh Claude lanes or fresh Claude sessions for ordinary follow-up turns inside the same developer session.
|
|
261
|
-
After planning is accepted, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. After that scaffold step is stable, it should establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the same lane to create the planned git worktrees and spawn all planned internal branches or helper agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
|
|
269
|
+
After planning is accepted, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. Require the developer session to set up those files honestly but not run Docker or `./run_tests.sh`. After that scaffold step is stable, it should establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the same lane to create the planned git worktrees and spawn all planned internal branches or helper agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
|
|
262
270
|
During `P1`, choose `CLAUDE.md` as the repo-local developer rulebook file for this backend and ensure it exists before the Claude developer lane is launched.
|
|
263
271
|
If `repo/CLAUDE.md` is missing, restore it directly from `~/slopmachine/templates/CLAUDE.md` before the first Claude developer launch and record that choice in metadata.
|
|
264
272
|
|
|
265
273
|
## Verification Budget
|
|
266
274
|
|
|
267
|
-
|
|
275
|
+
Docker and `./run_tests.sh` are deferred until after `P7`.
|
|
268
276
|
|
|
269
277
|
Target budget for the whole workflow:
|
|
270
278
|
|
|
271
|
-
-
|
|
279
|
+
- one owner-side Docker submission-readiness check after `P7`, with immediate reruns there only if Docker config or wrapper fixes are needed
|
|
272
280
|
|
|
273
281
|
Selected-stack rule:
|
|
274
282
|
|
|
275
283
|
- follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
|
|
276
|
-
-
|
|
277
|
-
- for Electron or other Linux-targetable desktop projects, the broad path includes required `docker compose up --build` plus a Dockerized desktop build/test flow and headless UI/runtime verification
|
|
278
|
-
- for Android projects, the broad path includes required `docker compose up --build` plus a Dockerized Android build/test flow without an emulator
|
|
279
|
-
- for iOS-targeted projects on Linux, the broad path includes required `docker compose up --build` plus `./run_tests.sh` and static/code review evidence; do not assume native iOS runtime proof exists without a real macOS/Xcode checkpoint
|
|
284
|
+
- do not run Docker-based broad verification before `P9`; use static review, local non-Docker evidence, and evaluator loops instead
|
|
280
285
|
|
|
281
286
|
Every project must end up with:
|
|
282
287
|
|
|
@@ -295,17 +300,18 @@ Broad test command rule:
|
|
|
295
300
|
- do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
|
|
296
301
|
- `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
|
|
297
302
|
- if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
|
|
303
|
+
- design the deferred runtime and broad-test paths for first-real-run reliability: no manual exports, no hidden prep steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
|
|
298
304
|
|
|
299
305
|
Default moments:
|
|
300
306
|
|
|
301
|
-
1. development complete -> direct fused `P5` entry
|
|
302
|
-
2.
|
|
307
|
+
1. development complete -> direct fused `P5` entry for repo coherence only
|
|
308
|
+
2. after `P7` completes -> owner-side Docker submission-readiness check in `P9`
|
|
303
309
|
|
|
304
|
-
For
|
|
310
|
+
For all project types, enforce this cadence:
|
|
305
311
|
|
|
306
|
-
- do not run Docker during
|
|
307
|
-
-
|
|
308
|
-
- in
|
|
312
|
+
- do not run Docker during planning, development, `P5`, or `P7`
|
|
313
|
+
- do not ask the developer session to run Docker or `./run_tests.sh` under any circumstances before `P9`
|
|
314
|
+
- after `P7` completes, the owner may run the documented Docker/runtime path and `./run_tests.sh` in `P9`, fix Docker config directly if needed, and rerun there before packaging closes
|
|
309
315
|
|
|
310
316
|
Docker timeout rule:
|
|
311
317
|
|
|
@@ -319,7 +325,7 @@ Between those moments, rely on:
|
|
|
319
325
|
- targeted unit tests
|
|
320
326
|
- targeted integration tests
|
|
321
327
|
- targeted module or route-family reruns
|
|
322
|
-
- targeted local non-E2E UI-adjacent checks when UI is material
|
|
328
|
+
- targeted local non-E2E UI-adjacent checks when UI is material
|
|
323
329
|
|
|
324
330
|
If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
|
|
325
331
|
|
|
@@ -367,10 +373,12 @@ When talking to the Claude developer worker:
|
|
|
367
373
|
- when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
|
|
368
374
|
- when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
|
|
369
375
|
- during ordinary development you may allow fast local iteration, but before final release-readiness review closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
|
|
370
|
-
-
|
|
376
|
+
- do not tell the Claude developer worker to run Docker-based runtime/test commands; the owner handles that only after `P7`
|
|
371
377
|
- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
|
|
372
378
|
- use the canonical prompt-shape discipline from `claude-worker-management`, but keep the actual message natural and low-noise: do not send labeled sections like `Context snapshot` or `This turn only`, and do not mention turns, workflow state, or prompt-contract jargon in the message itself
|
|
373
|
-
- for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, exact stop boundary if that scaffold step is isolated, and exact evidence required
|
|
379
|
+
- for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, explicit no-Docker execution before `P9`, exact stop boundary if that scaffold step is isolated, and exact evidence required
|
|
380
|
+
- for development-completion review and the opening pass of fused `P5`, collect findings across the whole review sweep and send one consolidated fix request unless a hard blocker stops further checking
|
|
381
|
+
- treat fused `P5` as a fast handoff phase: if rough repo-coherence review passes, proceed to evaluation instead of asking for more `P5` cleanup
|
|
374
382
|
- default to one bounded engineering objective per Claude turn, except for the intentional broad `plan.md` execution run after planning acceptance where the worker is expected to complete the whole implementation checklist end to end
|
|
375
383
|
- reject broad development responses that silently collapse named parallel helper lanes into serial work without an exact blocker and revised lane map
|
|
376
384
|
- never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
|
|
@@ -411,11 +419,14 @@ To the developer, this should feel like a normal engineering conversation with a
|
|
|
411
419
|
|
|
412
420
|
- review before acceptance
|
|
413
421
|
- prefer one strong correction request over many tiny nudges
|
|
422
|
+
- when several issues are found in one review sweep, batch them into one correction request grouped by failure class or surface instead of drip-feeding one issue at a time
|
|
423
|
+
- for small non-core fixes such as README cleanup, docs sync, test config, Docker config, wrapper/config glue, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
|
|
424
|
+
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md`, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
|
|
414
425
|
- keep work moving without low-information continuation chatter
|
|
415
426
|
- read only what is needed to answer the current decision
|
|
416
427
|
- keep routine review inside the main owner session; do not use `Explore` or `General` subagents to verify Claude developer work
|
|
417
428
|
- clarification and evaluation may still use their dedicated subagent flows, but owner verification of Claude developer work stays in the main session
|
|
418
|
-
- at planning, scaffold-step review inside development, the opening review inside fused `P5`,
|
|
429
|
+
- at planning, scaffold-step review inside development, the opening review inside fused `P5`, any rare major `P5` reroute, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
|
|
419
430
|
- keep comments and metadata auditable and specific
|
|
420
431
|
- keep external docs owner-maintained and repo-local README developer-maintained
|
|
421
432
|
|
|
@@ -433,6 +444,8 @@ To the developer, this should feel like a normal engineering conversation with a
|
|
|
433
444
|
- treat bridge-managed Claude lanes as owner-controlled and do not manually type into them during ordinary workflow operation
|
|
434
445
|
- at every gate exit, require the result to be checked against the relevant accepted plan sections and an explicit current-boundary checklist before accepting it
|
|
435
446
|
- be especially strict before leaving planning and before leaving development: require explicit section coverage, concrete evidence, and no known prompt-critical gap hidden behind future work
|
|
447
|
+
- in `P5`, prefer fast rough release-alignment over perfectionism; reserve evaluation for the stricter final check
|
|
448
|
+
- prefer moving into evaluation from `P5` once the repo is coherent enough by static review and reported evidence; Docker execution is deferred until `P9`
|
|
436
449
|
- before every substantive Claude turn, review the last normalized result, decide whether the next turn is a correction, continuation, resume, or new bounded objective, and compose the prompt accordingly rather than sending vague nudges
|
|
437
450
|
|
|
438
451
|
## Claude Live Bridge Discipline
|
|
@@ -442,7 +455,7 @@ All Claude developer lane launch and turn actions should go through the packaged
|
|
|
442
455
|
Evaluation-prompt rule:
|
|
443
456
|
|
|
444
457
|
- backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
|
|
445
|
-
- the test-coverage prompt must be sent verbatim with no additions or
|
|
458
|
+
- the test-coverage prompt must be read from the file and sent verbatim with no additions, reductions, trimming, paraphrasing, or partial pasting
|
|
446
459
|
|
|
447
460
|
Operation map:
|
|
448
461
|
|
|
@@ -59,7 +59,9 @@ Planned human-stop moments do not exist.
|
|
|
59
59
|
|
|
60
60
|
## Prime Directive
|
|
61
61
|
|
|
62
|
-
Manage the work. Do not become the developer.
|
|
62
|
+
Manage the work. Do not become the developer for core product implementation.
|
|
63
|
+
|
|
64
|
+
You may still directly patch small non-core owner-side issues when that is the fastest correct way to keep the workflow moving, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn.
|
|
63
65
|
|
|
64
66
|
You own:
|
|
65
67
|
|
|
@@ -133,7 +135,7 @@ Do not create another competing workflow-state system.
|
|
|
133
135
|
Use git to preserve meaningful workflow checkpoints.
|
|
134
136
|
|
|
135
137
|
- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
|
|
136
|
-
- meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted
|
|
138
|
+
- meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted `P5` stabilization work when major fixes are truly needed, accepted evaluation-fix rounds, and other clearly reviewable milestones
|
|
137
139
|
- keep the git flow simple and checkpoint-oriented
|
|
138
140
|
- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
|
|
139
141
|
- keep commit messages descriptive and easy to reason about later
|
|
@@ -183,7 +185,7 @@ Phase rules:
|
|
|
183
185
|
- exactly one root phase should normally be active at a time
|
|
184
186
|
- enter the phase before real work for that phase begins
|
|
185
187
|
- do not close multiple root phases in one transition block
|
|
186
|
-
- `P5 Integrated Verification and Hardening`
|
|
188
|
+
- `P5 Integrated Verification and Hardening` should normally be one fast stabilization pass; only major brokenness should trigger a bounded developer reroute before returning to evaluation readiness
|
|
187
189
|
- `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
188
190
|
- post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Readiness Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
|
|
189
191
|
|
|
@@ -195,13 +197,18 @@ Maintain exactly one active developer session at a time.
|
|
|
195
197
|
- from `P2` through `P5`, default to one long-lived `develop-1` developer lane
|
|
196
198
|
- do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
|
|
197
199
|
- when `P7` begins, do not automatically switch away from `develop-N`
|
|
198
|
-
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
-
|
|
203
|
-
-
|
|
200
|
+
- `P7` uses exactly 2 audit sessions
|
|
201
|
+
- each audit session starts from one fresh evaluator session and stays in that same evaluator session through fail regenerations and later fix checks
|
|
202
|
+
- the final coverage/README audit then uses one additional fresh evaluator session and stays in that same session through its reruns, so the whole `P7` flow uses exactly 3 evaluator sessions total
|
|
203
|
+
- after any kept audit report is saved, reread it and reject it if it hints at prior runs or if it has degraded materially from the original evaluation prompt's required depth, structure, sections, tables, verdict blocks, or evidence style
|
|
204
|
+
- each audit result decides the remediation lane:
|
|
205
|
+
- `fail` -> route the exact issue list back to the most recent recoverable developer lane, discard the fail working report, fix the issues there, and then regenerate inside the same evaluator session
|
|
206
|
+
- `partial pass` -> keep `audit_report-<N>.md`, start `bugfix-N`, and keep its fix loop scoped to that audit report's issue list
|
|
207
|
+
- `pass` -> keep `audit_report-<N>.md`, start `bugfix-N` only for that report's recommended improvements, and if there are no actionable recommendations mark the audit session complete without inventing new issues
|
|
208
|
+
- require both audit sessions to complete before the final post-audit coverage/README audit can run
|
|
209
|
+
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, reread each generated report and reject prior-run wording such as `previously` or `remaining` when it refers to report history, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
|
|
204
210
|
- track the active evaluator session separately in metadata during `P7`
|
|
211
|
+
- once `P7` starts, keep looping inside `P7` until its exit criteria are actually satisfied; do not stop between audits, remediation turns, fix-check passes, or coverage/README reruns
|
|
205
212
|
|
|
206
213
|
## Parallelism Policy
|
|
207
214
|
|
|
@@ -235,11 +242,11 @@ When the first develop developer session begins in `P2`, use this planning seque
|
|
|
235
242
|
|
|
236
243
|
1. send the original prompt and tell the developer to read it carefully, not plan yet, and wait for design direction
|
|
237
244
|
2. wait for the developer's first reply
|
|
238
|
-
3. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require
|
|
239
|
-
4. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and
|
|
240
|
-
5. send the accepted design plus a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require
|
|
245
|
+
3. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require `../docs/design.md` and, when backend/fullstack APIs exist, `../docs/api-spec.md`, and say explicitly not to start execution planning yet
|
|
246
|
+
4. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until the design is accepted
|
|
247
|
+
5. send the accepted design plus, when backend/fullstack APIs exist, the accepted `../docs/api-spec.md`, with a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require `plan.md` plus an updated parent-root `../docs/test-coverage.md`, and say explicitly not to start implementation yet
|
|
241
248
|
6. in that Phase 2 request, require the lane map to be derived from the directory tree and owned-file boundaries, require as many bounded branches or worktrees or agent lanes as safely possible, target at least 5 lanes when the codebase clearly supports it, require preplanned shared-file overlap and merge checkpoints, require exact serial-only justifications, require a dedicated git worktree plus explicit branch name for every planned parallel lane, and identify which named safe lanes must actually launch during implementation unless a blocker forces a reviewed revision
|
|
242
|
-
7. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and
|
|
249
|
+
7. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until `plan.md` is accepted
|
|
243
250
|
8. only after both planning phases are accepted may the broad `plan.md` development run begin
|
|
244
251
|
|
|
245
252
|
Do not ask for Phase 1 and Phase 2 in the same turn.
|
|
@@ -248,32 +255,28 @@ Do not ask for a plan in the first message.
|
|
|
248
255
|
After planning is accepted:
|
|
249
256
|
|
|
250
257
|
- the default development request should be the broad `plan.md` execution run rather than many narrow feature follow-up prompts
|
|
251
|
-
- tell the developer to work through `plan.md` end to end, keep `plan.md` updated from the main lane as items complete, verify honestly, and return only when the whole implementation plan is done or a real blocker prevents continuation
|
|
252
|
-
- in that default request, first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. After that scaffold step is stable, establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the developer to create the planned git worktrees and spawn all planned parallel agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in plus integrated verification in the main developer session
|
|
258
|
+
- tell the developer to work through `plan.md` end to end, keep `plan.md` updated from the main lane as items complete, verify honestly through non-Docker means, and return only when the whole implementation plan is done or a real blocker prevents continuation
|
|
259
|
+
- in that default request, first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. Require the developer to set up those files honestly but not run Docker or `./run_tests.sh`. After that scaffold step is stable, establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the developer to create the planned git worktrees and spawn all planned parallel agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in plus integrated verification in the main developer session
|
|
253
260
|
- if development is interrupted before completion, resume by directing the developer to continue from the current state of `plan.md`
|
|
254
261
|
|
|
255
262
|
## Verification Budget
|
|
256
263
|
|
|
257
|
-
|
|
264
|
+
Docker and `./run_tests.sh` are deferred until after `P7`.
|
|
258
265
|
|
|
259
266
|
Owner-side discipline:
|
|
260
267
|
|
|
261
|
-
-
|
|
268
|
+
- one owner-side Docker submission-readiness check after `P7`, with immediate reruns there only if Docker config or wrapper fixes are needed
|
|
262
269
|
|
|
263
|
-
- do not run `./run_tests.sh`
|
|
264
|
-
- do not run `docker compose up --build` casually
|
|
270
|
+
- do not run `./run_tests.sh` or `docker compose up --build` anywhere from planning through the end of `P7`
|
|
265
271
|
- do not rerun expensive local test or E2E commands just because the developer already ran them
|
|
266
272
|
- when the developer reports the exact verification command and its result clearly, use that evidence unless there is a concrete reason to challenge it
|
|
267
|
-
- rerun expensive verification only when the developer evidence is weak, contradictory, flaky, high-risk, needed
|
|
273
|
+
- rerun expensive non-Docker verification only when the developer evidence is weak, contradictory, flaky, high-risk, needed to answer a new question, or needed for a static owner decision
|
|
268
274
|
- use the required lifecycle/activity skills and `verification-gates` for stack-specific runtime and broad-gate cadence details
|
|
269
275
|
|
|
270
276
|
Selected-stack rule:
|
|
271
277
|
|
|
272
278
|
- follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
|
|
273
|
-
-
|
|
274
|
-
- for Electron or other Linux-targetable desktop projects, the broad path includes required `docker compose up --build` plus a Dockerized desktop build/test flow and headless UI/runtime verification
|
|
275
|
-
- for Android projects, the broad path includes required `docker compose up --build` plus a Dockerized Android build/test flow without an emulator
|
|
276
|
-
- for iOS-targeted projects on Linux, the broad path includes required `docker compose up --build` plus `./run_tests.sh` and static/code review evidence; do not assume native iOS runtime proof exists without a real macOS/Xcode checkpoint
|
|
279
|
+
- do not run Docker-based broad verification before `P9`; use static review, local non-Docker evidence, and evaluator loops instead
|
|
277
280
|
|
|
278
281
|
Every project must end up with:
|
|
279
282
|
|
|
@@ -292,31 +295,32 @@ Broad test command rule:
|
|
|
292
295
|
- do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
|
|
293
296
|
- `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
|
|
294
297
|
- if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
|
|
298
|
+
- design the deferred runtime and broad-test paths for first-real-run reliability: no manual exports, no hidden prep steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
|
|
295
299
|
|
|
296
300
|
Default moments:
|
|
297
301
|
|
|
298
|
-
1. development complete -> direct fused `P5` entry
|
|
299
|
-
2.
|
|
302
|
+
1. development complete -> direct fused `P5` entry for repo coherence only
|
|
303
|
+
2. after `P7` completes -> owner-side Docker submission-readiness check in `P9`
|
|
300
304
|
|
|
301
|
-
For
|
|
305
|
+
For all project types, enforce this cadence:
|
|
302
306
|
|
|
303
|
-
- do not run Docker during
|
|
304
|
-
- the
|
|
305
|
-
- in
|
|
307
|
+
- do not run Docker during planning, development, `P5`, or `P7`
|
|
308
|
+
- do not ask the developer to run Docker or `./run_tests.sh` under any circumstances before `P9`
|
|
309
|
+
- after `P7` completes, the owner may run the documented Docker/runtime path and `./run_tests.sh` in `P9`, fix Docker config directly if needed, and rerun there before packaging closes
|
|
306
310
|
|
|
307
311
|
Docker timeout rule:
|
|
308
312
|
|
|
309
|
-
- whenever the owner runs a Docker-based runtime or broad-test command
|
|
313
|
+
- whenever the owner finally runs a Docker-based runtime or broad-test command after `P7`, or a repo-root `./run_tests.sh` that shells out to Docker, invoke it through `node ~/slopmachine/utils/run_with_timeout.mjs --label docker-gate -- <command ...>` instead of running the command directly
|
|
310
314
|
- the helper default is one 30 minute attempt, then one 45 minute retry after 30 seconds of backoff; do not let any single Docker attempt exceed 60 minutes
|
|
311
315
|
- when invoking that helper through the OpenCode Bash tool, set the outer Bash timeout high enough to cover the helper retry budget plus cleanup buffer instead of using a short default
|
|
312
316
|
|
|
313
|
-
|
|
317
|
+
Before that `P9` submission-readiness check, rely on:
|
|
314
318
|
|
|
315
319
|
- local runtime checks
|
|
316
320
|
- targeted unit tests
|
|
317
321
|
- targeted integration tests
|
|
318
322
|
- targeted module or route-family reruns
|
|
319
|
-
- targeted local non-E2E UI-adjacent checks when UI is material
|
|
323
|
+
- targeted local non-E2E UI-adjacent checks when UI is material
|
|
320
324
|
|
|
321
325
|
The `P7` audit-and-bugfix model is separate from the ordinary owner-run broad-verification budget above.
|
|
322
326
|
Do not count the required fresh evaluator sessions or scoped bugfix-fix-check loops inside `P7` as ordinary broad owner-run verification moments.
|
|
@@ -364,10 +368,10 @@ When talking to the developer:
|
|
|
364
368
|
- when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
|
|
365
369
|
- when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
|
|
366
370
|
- during ordinary development you may allow fast local iteration, but before final release-readiness review closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
|
|
367
|
-
-
|
|
371
|
+
- do not tell the developer to run Docker-based runtime/test commands; the owner handles that only after `P7`
|
|
368
372
|
- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
|
|
369
373
|
- do not re-dump the entire design, but do point the developer back to `plan.md` as the working checklist and add only the narrow delta, guardrail, or review concern that matters now
|
|
370
|
-
- for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, exact stop boundary if that scaffold step is isolated, and exact evidence required
|
|
374
|
+
- for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, explicit no-Docker execution before `P9`, exact stop boundary if that scaffold step is isolated, and exact evidence required
|
|
371
375
|
- after planning is accepted, the default development ask is the broad `plan.md` execution run rather than many narrow follow-up prompts
|
|
372
376
|
- for planning turns, explicitly say that the developer must plan for parallelization up front, derive the lane map from the directory tree and owned-file boundaries, maximize the safe lane count, target at least 5 lanes when the codebase supports it, and justify any serial-only major section concretely
|
|
373
377
|
- in that first broad `plan.md` execution turn, explicitly tell the developer to spawn the planned parallel agents or branches or worktrees for the named `plan.md` sections, with named branch contracts and main-session fan-in requirements
|
|
@@ -377,6 +381,8 @@ When talking to the developer:
|
|
|
377
381
|
- do not describe the interaction as a workflow handoff, session restart, or workflow-state transition
|
|
378
382
|
- express boundaries as plain engineering instructions such as `plan this but do not start implementation yet` rather than workflow labels like `planning only` or `stop after the baseline`
|
|
379
383
|
- for `P5` opening review, release-readiness follow-up fixes, or other bounded correction requests, require compact replies by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues
|
|
384
|
+
- for development-completion review and the opening pass of `P5`, collect findings across the whole review sweep and send one consolidated fix request unless a hard blocker stops further checking
|
|
385
|
+
- treat `P5` as a fast handoff phase: if rough repo-coherence review passes, proceed to evaluation instead of asking for more `P5` cleanup
|
|
380
386
|
- for each in-development correction or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
|
|
381
387
|
- require the developer to point to the exact changed files and the narrow supporting files worth review
|
|
382
388
|
- require the developer to self-check prompt-fit, consistency, and likely review defects before claiming readiness
|
|
@@ -398,10 +404,13 @@ Do not speak as a relay for a third party.
|
|
|
398
404
|
|
|
399
405
|
- review before acceptance
|
|
400
406
|
- prefer one strong correction request over many tiny nudges
|
|
407
|
+
- when several issues are found in one review sweep, batch them into one correction request grouped by failure class or surface instead of drip-feeding one issue at a time
|
|
408
|
+
- for small non-core fixes such as README cleanup, docs sync, test config, Docker config, wrapper/config glue, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the developer
|
|
409
|
+
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md`, fix them directly in the owner session instead of bouncing them back to the developer
|
|
401
410
|
- keep work moving without low-information continuation chatter
|
|
402
411
|
- read only what is needed to answer the current decision
|
|
403
412
|
- after planning is accepted, prefer plan-section references plus explicit gate checklists over repeated prompt dumps
|
|
404
|
-
- at planning, scaffold-step review inside development, the opening review inside `P5`,
|
|
413
|
+
- at planning, scaffold-step review inside development, the opening review inside `P5`, any rare major `P5` reroute, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
|
|
405
414
|
- reject development responses that silently collapse named parallel lanes into serial work without an exact blocker and revised lane map
|
|
406
415
|
- keep comments and metadata auditable and specific
|
|
407
416
|
- keep external docs owner-maintained under parent-root `../docs/` as reference copies, keep `README.md` as the primary repo-local documentation file, and allow `plan.md` as the explicit execution-plan exception
|
|
@@ -423,7 +432,8 @@ Be a strict reviewer.
|
|
|
423
432
|
- developer claims are never enough by themselves
|
|
424
433
|
- do not progress because the developer sounds confident
|
|
425
434
|
- reject weak evidence, decorative verification, and half-finished surfaces quickly
|
|
426
|
-
- require
|
|
435
|
+
- require enough runtime, test, and UI confidence for the current gate, but do not turn `P5` into a perfection loop over small documentation or configuration defects
|
|
436
|
+
- prefer moving into evaluation from `P5` once the repo is coherent enough by static review and reported evidence; Docker execution is deferred until `P9`
|
|
427
437
|
- be especially strict before leaving planning and before leaving development: those exits require explicit checklist coverage against the accepted plan plus concrete supporting evidence
|
|
428
438
|
- keep review messages direct, technical, and specific
|
|
429
439
|
|
|
@@ -447,7 +457,7 @@ Treat packaging as a first-class delivery contract from the start, not as late c
|
|
|
447
457
|
- the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
|
|
448
458
|
- the packaged source copies of those prompts live under `assets/slopmachine/`, and the installed runtime copies live under `~/slopmachine/`; ordinary evaluation runs should use the installed runtime copies
|
|
449
459
|
- backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
|
|
450
|
-
- the test-coverage prompt must be sent verbatim with no additions or
|
|
460
|
+
- the test-coverage prompt must be read from the file and sent verbatim with no additions, reductions, trimming, paraphrasing, or partial pasting
|
|
451
461
|
- load `submission-packaging` before any packaging action
|
|
452
462
|
- follow its exact artifact, export, cleanup, and output contract
|
|
453
463
|
- do not invent extra artifact structures during ordinary packaging
|
|
@@ -133,9 +133,9 @@ During ordinary work, prefer:
|
|
|
133
133
|
|
|
134
134
|
- fast local tooling setup is allowed during ordinary iteration, but it must not become a dependency of the final delivered runtime or broad test contract
|
|
135
135
|
|
|
136
|
-
Do not run broad Docker, `./run_tests.sh`, browser E2E, Playwright, or full-suite commands during
|
|
136
|
+
Do not run broad Docker, `./run_tests.sh`, browser E2E, Playwright, or full-suite commands during work from planning through the end of `P7`.
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
Do not run `docker compose up --build`, `./run_tests.sh`, or any other Docker-based runtime/test command under any circumstances before `P9`, even if the repo documents it, the plan implies it, or the owner explicitly asks; the owner handles final Docker and `./run_tests.sh` verification after evaluation is complete.
|
|
139
139
|
|
|
140
140
|
Selected-stack defaults:
|
|
141
141
|
|
|
@@ -50,6 +50,7 @@ It must not:
|
|
|
50
50
|
- It should preserve prompt faithfulness.
|
|
51
51
|
- Each entry should end with a decisive solution.
|
|
52
52
|
- It should not contain planning, implementation structure, or convenience narrowing.
|
|
53
|
+
- It should survive an explicit anti-degradation read against the original prompt: no dropped implied requirements, no softened enforcement, no collapsed workflows, and no operator/admin narrowing for convenience.
|
|
53
54
|
|
|
54
55
|
4. Correct weak clarification before planning.
|
|
55
56
|
- If `questions.md` misses major ambiguity, contains filler, or drifts from the prompt, correct it before moving on.
|
|
@@ -65,6 +66,7 @@ It must not:
|
|
|
65
66
|
Reject the clarification result if any of the following is true:
|
|
66
67
|
- material ambiguity is still unresolved
|
|
67
68
|
- the clarifier narrowed scope for convenience
|
|
69
|
+
- the clarification record degrades the original prompt by reducing implied scope, enforcement, actor responsibilities, workflow closure, or reporting/operational expectations
|
|
68
70
|
- the questions are mostly about stack, tooling, Docker, or testing process
|
|
69
71
|
- entries end in vague or deferred language instead of decisive solutions
|
|
70
72
|
- planning or implementation structure leaked into `questions.md`
|
|
@@ -74,5 +76,6 @@ Reject the clarification result if any of the following is true:
|
|
|
74
76
|
`P1 Clarification` is complete only when:
|
|
75
77
|
- `../docs/questions.md` exists
|
|
76
78
|
- it resolves the material prompt ambiguities with prompt-faithful defaults
|
|
79
|
+
- it has been read once more against the original prompt and does not materially degrade it
|
|
77
80
|
- the owner has extracted the approved clarification package from it
|
|
78
81
|
- planning can begin without inventing missing product meaning
|