theslopmachine 0.9.1 → 0.9.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/MANUAL.md +17 -7
  2. package/README.md +4 -0
  3. package/assets/agents/developer.md +3 -2
  4. package/assets/agents/slopmachine-claude.md +59 -41
  5. package/assets/agents/slopmachine.md +61 -46
  6. package/assets/claude/agents/developer.md +3 -2
  7. package/assets/skills/clarification-gate/SKILL.md +30 -34
  8. package/assets/skills/claude-worker-management/SKILL.md +20 -10
  9. package/assets/skills/developer-session-lifecycle/SKILL.md +13 -12
  10. package/assets/skills/development-guidance/SKILL.md +12 -13
  11. package/assets/skills/evaluation-triage/SKILL.md +45 -44
  12. package/assets/skills/final-evaluation-orchestration/SKILL.md +160 -74
  13. package/assets/skills/integrated-verification/SKILL.md +26 -16
  14. package/assets/skills/planning-gate/SKILL.md +86 -6
  15. package/assets/skills/planning-guidance/SKILL.md +18 -7
  16. package/assets/skills/scaffold-guidance/SKILL.md +17 -12
  17. package/assets/skills/submission-packaging/SKILL.md +33 -14
  18. package/assets/skills/verification-gates/SKILL.md +50 -46
  19. package/assets/slopmachine/clarifier-agent-prompt.md +33 -83
  20. package/assets/slopmachine/owner-verification-checklist.md +58 -27
  21. package/assets/slopmachine/phase-1-design-prompt.md +56 -88
  22. package/assets/slopmachine/phase-1-design-template.md +66 -113
  23. package/assets/slopmachine/phase-2-execution-planning-prompt.md +50 -21
  24. package/assets/slopmachine/phase-2-plan-template.md +47 -30
  25. package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +13 -21
  26. package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +16 -69
  27. package/assets/slopmachine/scaffold-playbooks/android-native-java.md +12 -12
  28. package/assets/slopmachine/scaffold-playbooks/angular-default.md +8 -60
  29. package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +4 -20
  30. package/assets/slopmachine/scaffold-playbooks/backend-family-matrix.md +12 -12
  31. package/assets/slopmachine/scaffold-playbooks/django-default.md +4 -61
  32. package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +15 -58
  33. package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +5 -5
  34. package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +4 -4
  35. package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +4 -41
  36. package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +8 -30
  37. package/assets/slopmachine/scaffold-playbooks/frontend-family-matrix.md +11 -11
  38. package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +8 -8
  39. package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +4 -61
  40. package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +4 -4
  41. package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +1 -1
  42. package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +15 -15
  43. package/assets/slopmachine/scaffold-playbooks/laravel-default.md +8 -81
  44. package/assets/slopmachine/scaffold-playbooks/livewire-default.md +8 -101
  45. package/assets/slopmachine/scaffold-playbooks/platform-family-matrix.md +8 -8
  46. package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +8 -8
  47. package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +7 -89
  48. package/assets/slopmachine/scaffold-playbooks/tauri-default.md +14 -26
  49. package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +8 -30
  50. package/assets/slopmachine/scaffold-playbooks/web-default.md +3 -3
  51. package/assets/slopmachine/templates/AGENTS.md +6 -4
  52. package/assets/slopmachine/templates/CLAUDE.md +5 -4
  53. package/assets/slopmachine/templates/plan.md +47 -30
  54. package/assets/slopmachine/utils/claude_live_common.mjs +69 -1
  55. package/assets/slopmachine/utils/claude_live_launch.mjs +33 -7
  56. package/assets/slopmachine/utils/claude_live_stop.mjs +22 -4
  57. package/assets/slopmachine/utils/claude_worker_common.mjs +2 -0
  58. package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +1 -0
  59. package/assets/slopmachine/utils/prepare_evaluation_prompt.mjs +29 -7
  60. package/package.json +1 -1
  61. package/src/init.js +1 -0
package/MANUAL.md CHANGED
@@ -48,6 +48,7 @@ slopmachine init -o
48
48
  ## What `init` does
49
49
 
50
50
  - creates `.ai/` workflow files plus `.ai/artifacts`
51
+ - creates hidden `.ai/worktrees/` as the default location for parallel git worktrees
51
52
  - initializes git when needed
52
53
  - updates `.gitignore`
53
54
  - bootstraps beads_rust (`br`)
@@ -59,19 +60,28 @@ slopmachine init -o
59
60
  - seeds `.ai/startup-context.md` plus the parent-root planning docs under `docs/`
60
61
  - creates the initial git commit so the workspace starts with a clean tree
61
62
  - optionally opens `opencode` in `repo/`
63
+ - parallel worktrees should stay under hidden parent-root `.ai/worktrees/` so the visible workspace root stays clean
62
64
 
63
65
  ## Rough workflow
64
66
 
65
67
  1. Intake and setup
66
68
  2. Clarification
67
69
  3. Planning
68
- 4. Minimal scaffold
69
- 5. End-to-end development
70
- 6. Integrated verification and hardening
71
- 7. Evaluation and fix verification, including the final coverage and README audit inside `P7`
72
- 8. Final readiness decision
73
- 9. Submission packaging
74
- 10. Retrospective
70
+ 4. Development, starting with the scaffold step inside `plan.md`
71
+ 5. Rough integrated verification and hardening: repo coherence and small owner-side fixes only, with no Docker execution
72
+ 6. Evaluation and fix verification, including the final coverage and README audit inside `P7`
73
+ 7. Final readiness decision
74
+ 8. Submission packaging, including the owner-only Docker and `./run_tests.sh` check
75
+ 9. Retrospective
76
+
77
+ The intended fast path is:
78
+
79
+ - plan well
80
+ - land the minimal scaffold baseline
81
+ - execute the plan end to end
82
+ - make the repo coherent
83
+ - proceed through evaluation without Docker execution
84
+ - after evaluation is complete, have the owner run and fix `docker compose up --build` and `./run_tests.sh` before submission closes
75
85
 
76
86
  ## Important notes
77
87
 
package/README.md CHANGED
@@ -145,6 +145,7 @@ What it creates:
145
145
  - `metadata.json`
146
146
  - `.ai/metadata.json`
147
147
  - `.ai/startup-context.md`
148
+ - hidden `.ai/worktrees/` for parallel git worktrees when used
148
149
  - root `.beads/`
149
150
  - `repo/AGENTS.md`
150
151
  - `repo/CLAUDE.md`
@@ -154,6 +155,7 @@ What it creates:
154
155
  - `docs/questions.md`
155
156
  - `docs/design.md`
156
157
  - `docs/api-spec.md`
158
+ - `docs/plan.md`
157
159
  - `docs/test-coverage.md`
158
160
 
159
161
  Important details:
@@ -166,6 +168,8 @@ Important details:
166
168
  - `project_type` should use only `fullstack`, `backend`, `android`, `ios`, `desktop`, or `web` when known
167
169
  - Beads lives in the workspace root, not inside `repo/`
168
170
  - `repo/.claude/settings.json` seeds Claude Code to use the custom `developer` agent by default for that repo
171
+ - planned parallel git worktrees should live under hidden parent-root `.ai/worktrees/` by default so root-level `repo-lane-*` folders do not clutter the workspace
172
+ - final packaging moves `repo/plan.md` to parent-root `docs/plan.md` and removes repo-local `AGENTS.md`, `CLAUDE.md`, and `plan.md` from the delivered `repo/`
169
173
  - after non-`-o` bootstrap, the command prints the exact `cd repo` next step so you can continue immediately
170
174
  - `--adopt` moves the current project files into `repo/`, preserves root workflow state in the parent workspace, and skips the automatic bootstrap commit
171
175
  - `--continue-from <PX>` is a smoother alias for existing-project bootstrap; it implies adoption mode and seeds the requested start phase in one step
@@ -154,11 +154,12 @@ Broad commands you are not allowed to run during ordinary work:
154
154
 
155
155
  - never run `./run_tests.sh`
156
156
  - never run `docker compose up --build`
157
+ - never run any other Docker runtime, Compose, or containerized broad-verification command that stands in for those documented final commands
157
158
  - never run browser E2E or Playwright during ordinary implementation work
158
159
  - never run full test suites during ordinary implementation work unless explicitly instructed to run that exact command
159
- - do not use those commands even if they are documented in the repo or look convenient for debugging
160
+ - do not use those commands even if they are documented in the repo, requested by the owner, suggested by a playbook, implied by `plan.md`, or look convenient for debugging
160
161
  - if your work would normally call for one of those commands, stop at targeted local verification and report that the change is ready for broader verification
161
- - if explicitly instructed to run Docker-based runtime/test commands, or for a repo-root `./run_tests.sh` that shells out to Docker, run them through `node ~/slopmachine/utils/run_with_timeout.mjs --label docker-gate -- <command ...>` instead of invoking Docker directly; the helper default is one 30 minute attempt, then one 45 minute retry after 30 seconds of backoff, and no single Docker attempt may exceed 60 minutes
162
+ - do not run Docker-based runtime/test commands under any circumstances during planning, development, `P5`, or `P7`; the owner handles the first broad Docker and `./run_tests.sh` verification in `P5` and may rerun it in `P9` for final confirmation
162
163
 
163
164
  Your job is to make the broader verification likely to pass without running it yourself.
164
165
 
@@ -51,7 +51,8 @@ Planned human-stop moments do not exist.
51
51
 
52
52
  Claude-capacity rule:
53
53
 
54
- - if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over implementation work yourself
54
+ - if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over core product implementation work yourself
55
+ - small owner-side non-core fixes are still allowed while waiting, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn
55
56
  - preserve the current developer session record, mark it blocked by rate limit, and automatically wait until the reset time specified by Claude using the packaged wait helper before resuming the same session
56
57
  - only surface this as a user-visible blocker if the reset time cannot be determined or the wait or resume path itself fails
57
58
 
@@ -65,7 +66,9 @@ Claude-capacity rule:
65
66
 
66
67
  ## Prime Directive
67
68
 
68
- Manage the work. Do not become the developer.
69
+ Manage the work. Do not become the developer for core product implementation.
70
+
71
+ You may still directly patch small non-core owner-side issues when that is the fastest correct way to keep the workflow moving, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn.
69
72
 
70
73
  You own:
71
74
 
@@ -137,7 +140,7 @@ Do not create another competing workflow-state system.
137
140
  Use git to preserve meaningful workflow checkpoints.
138
141
 
139
142
  - after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
140
- - meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted integrated-verification-and-hardening correction rounds, accepted evaluation-fix rounds, and other clearly reviewable milestones
143
+ - meaningful work includes accepted scaffold-step completion inside development, accepted `P5` opening reviews, accepted `P5` stabilization work when major fixes are truly needed, accepted evaluation-fix rounds, and other clearly reviewable milestones
141
144
  - keep the git flow simple and checkpoint-oriented
142
145
  - commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
143
146
  - keep commit messages descriptive and easy to reason about later
@@ -187,7 +190,8 @@ Phase rules:
187
190
  - exactly one root phase should normally be active at a time
188
191
  - enter the phase before real work for that phase begins
189
192
  - do not close multiple root phases in one transition block
190
- - `P5 Integrated Verification and Hardening` may loop with the developer lane until release alignment is explicit
193
+ - `P5 Integrated Verification and Hardening` should normally be one fast stabilization pass that includes the first real owner-run `docker compose up --build` and `./run_tests.sh` gate; owner-fixable Docker/config/wrapper churn should be fixed there directly, and only major brokenness should trigger a bounded Claude developer reroute before returning to evaluation readiness
194
+ - `P8 Final Readiness Decision` should be one fast owner-run reconciliation sweep after `P7`: reread the delivered repo, `README.md`, parent-root `../docs/`, and carried `../.tmp/` audit artifacts together, fix small docs or README or repo-hygiene drift directly, and only reopen evaluation or packaging-adjacent follow-up when a material inconsistency remains
191
195
  - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
192
196
 
193
197
  ## Developer Session Model
@@ -202,14 +206,22 @@ Maintain exactly one active developer session at a time.
202
206
  - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
203
207
  - if adopted or resumed work needs Claude developer execution but no recoverable tracked Claude session exists yet, determine the correct lane for the current boundary, launch and orient that lane through `claude-worker-management`, persist the returned session id, and only then continue the substantive work
204
208
  - when `P7` begins, do not automatically switch away from `develop-N`
205
- - each fresh evaluation result decides the remediation lane:
206
- - `fail` -> route the issue list back to the latest `develop-N` Claude session and discard the working audit report file after triage
207
- - `partial pass` -> start the next `bugfix-N` Claude session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
208
- - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
209
- - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
210
- - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
209
+ - `P7` uses exactly 2 audit sessions
210
+ - each audit session starts from one fresh evaluator session and stays in that same evaluator session through fail regenerations and later fix checks
211
+ - the final coverage/README audit then uses one additional fresh evaluator session and stays in that same session through its reruns, so the whole `P7` flow uses exactly 3 evaluator sessions total
212
+ - after any kept audit report is saved, reread it and reject it if it hints at prior runs or if it has degraded materially from the original evaluation prompt's required depth, structure, sections, tables, verdict blocks, or evidence style
213
+ - each audit result decides the remediation lane:
214
+ - audit session `1` keeps all of its remediation in `bugfix-1`, including fail regenerations and later kept-report fixes
215
+ - audit session `2` keeps all of its remediation in `bugfix-2`, including fail regenerations and later kept-report fixes
216
+ - `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, send the full issue list from that failed attempt to that audit session's exact `bugfix-N` Claude lane, require that whole list to be fixed, and then rerun the full prepared evaluation prompt inside the same evaluator session
217
+ - `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane, and treat that kept report's full issue list as the authoritative fix-check scope for the rest of that audit session
218
+ - `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane for every actionable reported issue and recommendation in that report, and if there are no actionable items mark the audit session complete without inventing new issues
219
+ - `audit_report-<N>-fix_check.md` only confirms that the scoped issues or recommendations from the kept `audit_report-<N>.md` are fixed; if it is not clean, send only the unresolved subset back for remediation, then repeat the same-session fix-check loop against the full kept-report scope, and once that scoped set is confirmed fixed move on to the next audit session or next `P7` subphase
220
+ - require both audit sessions to complete before the final post-audit coverage/README audit can run
221
+ - after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, reread each generated report and reject prior-run wording such as `previously` or `remaining` when it refers to report history, and judge the result by the owner's reading of the report as a whole: if it does not read as an overall pass, move the displaced report into `../.ai/archive/`, route the fixes to `bugfix-2` when that lane exists or else to the current recoverable Claude developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
211
222
  - track the active evaluator session separately in metadata during `P7`
212
223
  - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
224
+ - once `P7` starts, keep looping inside `P7` until its exit criteria are actually satisfied; do not stop between audits, remediation turns, fix-check passes, or coverage/README reruns
213
225
 
214
226
  ## Parallelism Policy
215
227
 
@@ -237,46 +249,42 @@ If adopted or repaired work reaches development, integrated verification and har
237
249
  During `P1 Clarification`, use this clarification handshake:
238
250
 
239
251
  1. launch one short-lived `General` clarification worker
240
- 2. use the packaged `~/slopmachine/clarifier-agent-prompt.md` as the worker prompt, injecting the original prompt and supporting stack/context notes
241
- 3. require the worker to output only `../docs/questions.md`
242
- 4. review `../docs/questions.md`; if it misses material ambiguity, contains filler, or drifts from the prompt, correct clarification before continuing
243
- 5. parse `../docs/questions.md` into the approved clarification package for planning: the accepted clarification list plus any short additional locked deltas that are not already captured there
244
- 6. only after that package is strong enough should `P2` begin and the live `develop-1` lane be launched
252
+ 2. use the packaged `~/slopmachine/clarifier-agent-prompt.md` verbatim as the worker prompt, injecting only the original prompt and supporting stack/context notes, and require it to output only `../docs/questions.md`
253
+ 3. use `clarification-gate` to review `../docs/questions.md` and turn it into the approved clarification package
254
+ 4. only when that package is complete and unambiguous enough to serve as the clarified prompt for planning should `P2` begin and the live `develop-1` lane be launched
245
255
 
246
256
  When the first develop developer session begins in `P2`, start it in this exact order through the live bridge:
247
257
 
248
258
  1. launch the live `develop-1` Claude `developer` lane
249
259
  2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for design direction
250
260
  3. capture and persist the Claude session id returned through bridge state
251
- 4. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require only `../docs/design.md` and say explicitly not to start execution planning yet
252
- 5. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and correct until the design is accepted
253
- 6. send the accepted design plus a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require only `plan.md` and say explicitly not to start implementation yet
254
- 7. in that Phase 2 request, require the lane map to be derived from the directory tree and owned-file boundaries, require as many bounded branches or worktrees or helper-agent lanes as safely possible, target at least 5 lanes when the codebase clearly supports it, require preplanned shared-file overlap and merge checkpoints, require exact serial-only justifications, require a dedicated git worktree plus explicit branch name for every planned parallel lane, and identify which named safe lanes must actually launch during implementation unless a blocker forces a reviewed revision
255
- 8. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject and correct until `plan.md` is accepted
256
- 9. only after both planning phases are accepted may the broad `plan.md` development run begin
261
+ 4. send the inline approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; require `../docs/design.md` and, when backend/fullstack APIs exist, `../docs/api-spec.md`, and say explicitly not to start execution planning yet
262
+ 5. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until the design is accepted
263
+ 6. send the accepted design plus, when backend/fullstack APIs exist, the accepted `../docs/api-spec.md`, with a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require `plan.md` plus an updated parent-root `../docs/test-coverage.md`, and say explicitly not to start implementation yet
264
+ 7. in that Phase 2 request, explicitly require a directory-tree-derived lane map, explicit shared-file control, exact serial-only justifications, a dedicated git worktree plus explicit branch name for every planned parallel lane, and at least 5 lanes when the codebase clearly supports that level of safe fan-out; also identify which named safe lanes must actually launch during implementation unless a blocker forces a reviewed revision
265
+ 8. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; before leaving `P2`, do one final combined no-drift reread of the accepted design plus accepted plan against the original prompt and accepted clarifications, confirm `../docs/api-spec.md` when applicable and `../docs/test-coverage.md` are fulfilled from the accepted plan, and reject any remaining critical security weakness or planning drift
266
+ 9. only after that final planning reread passes may the broad `plan.md` development run begin
257
267
 
258
268
  Do not reorder that sequence.
259
269
  Do not ask for Phase 1 and Phase 2 in the same turn.
260
270
  Do not create fresh Claude lanes or fresh Claude sessions for ordinary follow-up turns inside the same developer session.
261
- After planning is accepted, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. After that scaffold step is stable, it should establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the same lane to create the planned git worktrees and spawn all planned internal branches or helper agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
271
+ After planning is accepted, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should tell the same lane to land the scaffold step from section 3 of `plan.md` first without running Docker or `./run_tests.sh`, then stabilize the shared-file and pre-fan-out security contract in the main lane, then create the planned git worktrees and launch the named internal branches or helper agents, keep implementation plus matching tests together inside each lane, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
262
272
  During `P1`, choose `CLAUDE.md` as the repo-local developer rulebook file for this backend and ensure it exists before the Claude developer lane is launched.
263
273
  If `repo/CLAUDE.md` is missing, restore it directly from `~/slopmachine/templates/CLAUDE.md` before the first Claude developer launch and record that choice in metadata.
264
274
 
265
275
  ## Verification Budget
266
276
 
267
- Broad project-standard gate commands are expensive and must stay rare.
277
+ Docker and `./run_tests.sh` are deferred until the owner-run gate in `P5`.
268
278
 
269
279
  Target budget for the whole workflow:
270
280
 
271
- - at most 2 broad owner-run verification moments using the selected stack's full verification path
281
+ - one owner-side Docker/runtime and broad-test gate in `P5`, with immediate reruns there for owner-fixable Docker/config/wrapper/test-harness issues
282
+ - one final confirmation rerun in `P9` when late fixes or packaging changes could still affect the runtime/test contract
272
283
 
273
284
  Selected-stack rule:
274
285
 
275
286
  - follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
276
- - for web projects, the broad path includes required `docker compose up --build` plus the full test command and browser E2E when applicable
277
- - for Electron or other Linux-targetable desktop projects, the broad path includes required `docker compose up --build` plus a Dockerized desktop build/test flow and headless UI/runtime verification
278
- - for Android projects, the broad path includes required `docker compose up --build` plus a Dockerized Android build/test flow without an emulator
279
- - for iOS-targeted projects on Linux, the broad path includes required `docker compose up --build` plus `./run_tests.sh` and static/code review evidence; do not assume native iOS runtime proof exists without a real macOS/Xcode checkpoint
287
+ - do not run Docker-based broad verification before `P5`; use static review and local non-Docker evidence before that point, then keep `P7` non-Docker and reserve `P9` for final confirmation reruns only
280
288
 
281
289
  Every project must end up with:
282
290
 
@@ -295,17 +303,19 @@ Broad test command rule:
295
303
  - do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
296
304
  - `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
297
305
  - if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
306
+ - design the deferred runtime and broad-test paths for first-real-run reliability: no manual exports, no hidden prep steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
298
307
 
299
308
  Default moments:
300
309
 
301
- 1. development complete -> direct fused `P5` entry, where the first broad owner-run verification and `plan.md` integrity review happen
302
- 2. final qualified state before packaging
310
+ 1. development complete -> direct fused `P5` entry with the owner-run Docker/runtime and broad-test gate
311
+ 2. after `P7` completes -> `P9` final confirmation rerun when the latest changes could affect the runtime/test contract
303
312
 
304
- For web projects, enforce this cadence:
313
+ For all project types, enforce this cadence:
305
314
 
306
- - do not run Docker during the opening scaffold step or ordinary development work unless a real blocker forces earlier escalation
307
- - the first Docker-based run is in the opening pass of fused `P5` unless a real blocker forces earlier escalation
308
- - in between broad checks, development should rely on local fast verification only
315
+ - do not run Docker during planning, development, or `P7`
316
+ - do not ask the developer session to run Docker or `./run_tests.sh` under any circumstances; the owner handles the first broad gate in `P5` and may rerun it in `P9` for final confirmation
317
+ - after `P3` completes, the owner should run the documented Docker/runtime path and `./run_tests.sh` in `P5`, fix owner-side Docker/config/wrapper/test-harness issues directly if needed, and rerun there before moving to evaluation
318
+ - after `P7` completes, rerun those commands in `P9` only when final confirmation is still needed because late fixes or packaging changes touched the contract
309
319
 
310
320
  Docker timeout rule:
311
321
 
@@ -319,7 +329,7 @@ Between those moments, rely on:
319
329
  - targeted unit tests
320
330
  - targeted integration tests
321
331
  - targeted module or route-family reruns
322
- - targeted local non-E2E UI-adjacent checks when UI is material; keep browser E2E and Playwright for the owner-run broad gate moments unless a concrete blocker justifies earlier escalation
332
+ - targeted local non-E2E UI-adjacent checks when UI is material
323
333
 
324
334
  If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
325
335
 
@@ -367,10 +377,12 @@ When talking to the Claude developer worker:
367
377
  - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
368
378
  - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
369
379
  - during ordinary development you may allow fast local iteration, but before final release-readiness review closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
370
- - when a bounded follow-up or gate requires Docker-based runtime/test commands, tell the Claude developer worker to run them through `node ~/slopmachine/utils/run_with_timeout.mjs --label docker-gate -- <command ...>` rather than invoking Docker directly
380
+ - do not tell the Claude developer worker to run Docker-based runtime/test commands; the owner handles the first broad Docker gate in `P5`
371
381
  - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
372
382
  - use the canonical prompt-shape discipline from `claude-worker-management`, but keep the actual message natural and low-noise: do not send labeled sections like `Context snapshot` or `This turn only`, and do not mention turns, workflow state, or prompt-contract jargon in the message itself
373
- - for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, exact stop boundary if that scaffold step is isolated, and exact evidence required
383
+ - for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, explicit no-Docker execution before `P5`, exact stop boundary if that scaffold step is isolated, and exact evidence required
384
+ - for development-completion review and the opening pass of fused `P5`, collect findings across the whole review sweep and send one consolidated fix request unless a hard blocker stops further checking
385
+ - treat fused `P5` as a fast handoff phase: if rough repo-coherence review passes, proceed to evaluation instead of asking for more `P5` cleanup
374
386
  - default to one bounded engineering objective per Claude turn, except for the intentional broad `plan.md` execution run after planning acceptance where the worker is expected to complete the whole implementation checklist end to end
375
387
  - reject broad development responses that silently collapse named parallel helper lanes into serial work without an exact blocker and revised lane map
376
388
  - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
@@ -411,11 +423,15 @@ To the developer, this should feel like a normal engineering conversation with a
411
423
 
412
424
  - review before acceptance
413
425
  - prefer one strong correction request over many tiny nudges
426
+ - when several issues are found in one review sweep, batch them into one correction request grouped by failure class or surface instead of drip-feeding one issue at a time
427
+ - for small non-core fixes such as README cleanup, docs sync, test config, Docker config, wrapper/config glue, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
428
+ - for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md`, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
429
+ - during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`, and carried audit artifacts before packaging starts; prefer direct owner fixes for small drift instead of turning that sweep into another Claude developer loop
414
430
  - keep work moving without low-information continuation chatter
415
431
  - read only what is needed to answer the current decision
416
432
  - keep routine review inside the main owner session; do not use `Explore` or `General` subagents to verify Claude developer work
417
433
  - clarification and evaluation may still use their dedicated subagent flows, but owner verification of Claude developer work stays in the main session
418
- - at planning, scaffold-step review inside development, the opening review inside fused `P5`, later integrated-verification-and-hardening correction rounds, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
434
+ - at planning, scaffold-step review inside development, the opening review inside fused `P5`, any rare major `P5` reroute, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
419
435
  - keep comments and metadata auditable and specific
420
436
  - keep external docs owner-maintained and repo-local README developer-maintained
421
437
 
@@ -433,6 +449,8 @@ To the developer, this should feel like a normal engineering conversation with a
433
449
  - treat bridge-managed Claude lanes as owner-controlled and do not manually type into them during ordinary workflow operation
434
450
  - at every gate exit, require the result to be checked against the relevant accepted plan sections and an explicit current-boundary checklist before accepting it
435
451
  - be especially strict before leaving planning and before leaving development: require explicit section coverage, concrete evidence, and no known prompt-critical gap hidden behind future work
452
+ - in `P5`, prefer fast rough release-alignment over perfectionism; reserve evaluation for the stricter final check
453
+ - prefer moving into evaluation from `P5` once the repo is coherent enough by the owner-run Docker/runtime gate, prompt review, and security review; `P9` is final confirmation rather than first execution
436
454
  - before every substantive Claude turn, review the last normalized result, decide whether the next turn is a correction, continuation, resume, or new bounded objective, and compose the prompt accordingly rather than sending vague nudges
437
455
 
438
456
  ## Claude Live Bridge Discipline
@@ -441,8 +459,8 @@ All Claude developer lane launch and turn actions should go through the packaged
441
459
 
442
460
  Evaluation-prompt rule:
443
461
 
444
- - backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
445
- - the test-coverage prompt must be sent verbatim with no additions or reductions
462
+ - backend and frontend evaluation prompts must be prepared through `node ~/slopmachine/utils/prepare_evaluation_prompt.mjs --workspace-root .. --prompt-file <chosen-prompt-file>`, which reads parent-root `../metadata.json`, injects the real project prompt where needed, and writes the exact sendable body to a deterministic file under `../.ai/`
463
+ - the owner must send the exact contents of that prepared file to the evaluator; on same-session reruns, prepare the file again with `--mode rerun` and then send the exact saved rerun file contents
446
464
 
447
465
  Operation map:
448
466
 
@@ -457,7 +475,7 @@ Operation map:
457
475
  - package the Claude project session folder for final delivery as one root zip bundle:
458
476
  - `node ~/slopmachine/utils/package_claude_session.mjs`
459
477
  - this resolves the tracked relevant Claude session artifacts from the tracked `session_id` values plus the project `cwd` under `~/.claude/projects/`, packages the normalized tracked transcript JSONL files together with the raw matching session directories once, and avoids sweeping unrelated random Claude sessions into the archive
460
- - after Claude session packaging is fully complete, stop each tracked live Claude lane with `node ~/slopmachine/utils/claude_live_stop.mjs --runtime-dir <dir>` and verify the tmux session is gone before closing `P9`
478
+ - after Claude session packaging is fully complete, attempt to stop each tracked live Claude lane with `node ~/slopmachine/utils/claude_live_stop.mjs --runtime-dir <dir>`, but only when the bridge can prove the tmux session belongs to the current task runtime; if that check fails or the stop fails, leave the tmux session alone rather than risking another tmux instance
461
479
 
462
480
  Timeout rule:
463
481