theslopmachine 0.9.9 → 0.9.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. package/MANUAL.md +2 -0
  2. package/README.md +2 -0
  3. package/assets/agents/developer.md +16 -11
  4. package/assets/agents/slopmachine-claude.md +110 -72
  5. package/assets/agents/slopmachine.md +94 -68
  6. package/assets/claude/agents/developer.md +9 -4
  7. package/assets/skills/clarification-gate/SKILL.md +62 -38
  8. package/assets/skills/claude-worker-management/SKILL.md +131 -80
  9. package/assets/skills/developer-session-lifecycle/SKILL.md +14 -10
  10. package/assets/skills/development-guidance/SKILL.md +28 -22
  11. package/assets/skills/evaluation-triage/SKILL.md +39 -36
  12. package/assets/skills/final-evaluation-orchestration/SKILL.md +102 -50
  13. package/assets/skills/integrated-verification/SKILL.md +39 -49
  14. package/assets/skills/planning-gate/SKILL.md +89 -3
  15. package/assets/skills/planning-guidance/SKILL.md +53 -25
  16. package/assets/skills/scaffold-guidance/SKILL.md +21 -19
  17. package/assets/skills/submission-packaging/SKILL.md +21 -11
  18. package/assets/skills/verification-gates/SKILL.md +42 -38
  19. package/assets/slopmachine/clarification-faithfulness-review-prompt.md +67 -0
  20. package/assets/slopmachine/clarifier-agent-prompt.md +137 -96
  21. package/assets/slopmachine/exact-readme-template.md +11 -1
  22. package/assets/slopmachine/owner-verification-checklist.md +50 -23
  23. package/assets/slopmachine/phase-1-design-prompt.md +47 -87
  24. package/assets/slopmachine/phase-1-design-template.md +36 -115
  25. package/assets/slopmachine/phase-2-execution-planning-prompt.md +59 -44
  26. package/assets/slopmachine/phase-2-plan-template.md +79 -28
  27. package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +4 -4
  28. package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +4 -4
  29. package/assets/slopmachine/scaffold-playbooks/android-native-java.md +4 -4
  30. package/assets/slopmachine/scaffold-playbooks/angular-default.md +4 -4
  31. package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +3 -3
  32. package/assets/slopmachine/scaffold-playbooks/django-default.md +3 -3
  33. package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +4 -4
  34. package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +3 -3
  35. package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +3 -3
  36. package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +2 -2
  37. package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +1 -1
  38. package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +2 -2
  39. package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +2 -2
  40. package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +3 -3
  41. package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +1 -1
  42. package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +3 -3
  43. package/assets/slopmachine/scaffold-playbooks/laravel-default.md +5 -5
  44. package/assets/slopmachine/scaffold-playbooks/livewire-default.md +5 -5
  45. package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +2 -2
  46. package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +3 -3
  47. package/assets/slopmachine/scaffold-playbooks/tauri-default.md +4 -4
  48. package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +1 -1
  49. package/assets/slopmachine/scaffold-playbooks/web-default.md +4 -4
  50. package/assets/slopmachine/templates/AGENTS.md +16 -8
  51. package/assets/slopmachine/templates/CLAUDE.md +15 -8
  52. package/assets/slopmachine/templates/plan.md +70 -29
  53. package/assets/slopmachine/utils/claude_live_common.mjs +77 -2
  54. package/assets/slopmachine/utils/claude_live_launch.mjs +183 -18
  55. package/assets/slopmachine/utils/claude_live_status.mjs +53 -2
  56. package/assets/slopmachine/utils/claude_live_stop.mjs +22 -4
  57. package/assets/slopmachine/utils/claude_live_turn.mjs +23 -14
  58. package/assets/slopmachine/utils/claude_worker_common.mjs +6 -2
  59. package/assets/slopmachine/utils/prepare_evaluation_prompt.mjs +29 -7
  60. package/assets/slopmachine/utils/prepare_evaluation_send_packet.mjs +69 -0
  61. package/package.json +1 -1
  62. package/src/init.js +1 -0
package/MANUAL.md CHANGED
@@ -48,6 +48,7 @@ slopmachine init -o
48
48
  ## What `init` does
49
49
 
50
50
  - creates `.ai/` workflow files plus `.ai/artifacts`
51
+ - creates hidden `.ai/worktrees/` as the default location for parallel git worktrees
51
52
  - initializes git when needed
52
53
  - updates `.gitignore`
53
54
  - bootstraps beads_rust (`br`)
@@ -59,6 +60,7 @@ slopmachine init -o
59
60
  - seeds `.ai/startup-context.md` plus the parent-root planning docs under `docs/`
60
61
  - creates the initial git commit so the workspace starts with a clean tree
61
62
  - optionally opens `opencode` in `repo/`
63
+ - parallel worktrees should stay under hidden parent-root `.ai/worktrees/` so the visible workspace root stays clean
62
64
 
63
65
  ## Rough workflow
64
66
 
package/README.md CHANGED
@@ -145,6 +145,7 @@ What it creates:
145
145
  - `metadata.json`
146
146
  - `.ai/metadata.json`
147
147
  - `.ai/startup-context.md`
148
+ - hidden `.ai/worktrees/` for parallel git worktrees when used
148
149
  - root `.beads/`
149
150
  - `repo/AGENTS.md`
150
151
  - `repo/CLAUDE.md`
@@ -167,6 +168,7 @@ Important details:
167
168
  - `project_type` should use only `fullstack`, `backend`, `android`, `ios`, `desktop`, or `web` when known
168
169
  - Beads lives in the workspace root, not inside `repo/`
169
170
  - `repo/.claude/settings.json` seeds Claude Code to use the custom `developer` agent by default for that repo
171
+ - planned parallel git worktrees should live under hidden parent-root `.ai/worktrees/` by default so root-level `repo-lane-*` folders do not clutter the workspace
170
172
  - final packaging moves `repo/plan.md` to parent-root `docs/plan.md` and removes repo-local `AGENTS.md`, `CLAUDE.md`, and `plan.md` from the delivered `repo/`
171
173
  - after non-`-o` bootstrap, the command prints the exact `cd repo` next step so you can continue immediately
172
174
  - `--adopt` moves the current project files into `repo/`, preserves root workflow state in the parent workspace, and skips the automatic bootstrap commit
@@ -111,6 +111,8 @@ When instructed to plan without coding yet:
111
111
  - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
112
112
  - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
113
113
  - before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
114
+ - before reporting development complete, run one deliberate main-session reread against the accepted `plan.md`, `../docs/design.md`, accepted `../docs/api-spec.md` when applicable, `README.md`, and the integrated repo so the owner is not first discovering obvious drift in `P5`
115
+ - before reporting development complete, close the common late-failure classes inside development: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup/test wrapper dishonesty, and partial user-facing or admin-facing flow closure
114
116
 
115
117
  ## Parallel Execution Model
116
118
 
@@ -121,9 +123,11 @@ When instructed to plan without coding yet:
121
123
  - when the accepted plan already names safe parallel lanes, treat launching them as required unless a real blocker forces a documented revision
122
124
  - good parallel candidates include independent repo reading, verification passes, separate test additions, and implementation branches that touch different modules or well-separated files
123
125
  - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
124
- - before fan-out, define the branch contract clearly: expected outcome, owned files, boundaries, important shared constraints, support check, and merge condition
126
+ - before fan-out, define the branch contract clearly: expected outcome, owned files, the exact `plan.md` section or checklist items the lane owns, boundaries, important shared constraints, support check, merge condition, and required verification
125
127
  - a branch that owns implementation for a surface should also own the matching tests and coverage work for that surface unless the accepted plan explicitly centralizes shared test harness work first
126
128
  - every planned parallel lane must have its own git worktree, and the assigned subagent should stay in that worktree until the lane is complete or explicitly rerouted
129
+ - before a branch or worktree reports completion, verify its owned implementation against the assigned `plan.md` scope, run the strongest relevant local tests or checks for those owned files, and include the exact commands and results in the handoff back to the main session
130
+ - do not let a branch or worktree report "done" merely because code compiles or the happy path appears present; its owned functionality must be real against the plan and its owned verification must have run
127
131
  - respect the owned-files map from the accepted plan and do not casually cross into another branch's files
128
132
  - after fan-in, reconcile the branches yourself, resolve any overlap cleanly, and run final targeted verification on the integrated result before reporting completion
129
133
  - prefer as many meaningful branches or worktrees as the directory tree safely allows; target at least 5 parallel lanes when the codebase exposes that many low-overlap modules or directories
@@ -152,14 +156,13 @@ During ordinary work, prefer:
152
156
 
153
157
  Broad commands you are not allowed to run during ordinary work:
154
158
 
155
- - never run `./run_tests.sh`
156
159
  - never run `docker compose up --build`
157
160
  - never run any other Docker runtime, Compose, or containerized broad-verification command that stands in for those documented final commands
158
161
  - never run browser E2E or Playwright during ordinary implementation work
159
- - never run full test suites during ordinary implementation work unless explicitly instructed to run that exact command
160
- - do not use those commands even if they are documented in the repo, requested by the owner, suggested by a playbook, implied by `plan.md`, or look convenient for debugging
161
- - if your work would normally call for one of those commands, stop at targeted local verification and report that the change is ready for broader verification
162
- - do not run Docker-based runtime/test commands under any circumstances before `P9`, including when explicitly asked during planning, development, `P5`, or `P7`; the owner handles final Docker and `./run_tests.sh` verification after evaluation is complete
162
+ - do not run full local test suites during ordinary implementation work unless the current milestone or owner instruction actually calls for that exact verification
163
+ - do not use Docker commands even if they are documented in the repo, requested by the owner, suggested by a playbook, implied by `plan.md`, or look convenient for debugging
164
+ - if your work would normally call for Docker, stop at targeted local verification and report that the change is ready for broader verification
165
+ - do not run Docker-based runtime/test commands under any circumstances during planning, development, `P5`, or `P7`; use the prepared local test harness to verify your implementation, the owner reruns that harness in `P5`, and the first real Docker confirmation plus dockerized broad-test run is `P9`
163
166
 
164
167
  Your job is to make the broader verification likely to pass without running it yourself.
165
168
 
@@ -181,7 +184,7 @@ Selected-stack defaults:
181
184
  - do not hardcode database connection values or database bootstrap values anywhere in the repo
182
185
  - for Dockerized web projects, do not require manual `export ...` steps for `docker compose up --build`
183
186
  - for Dockerized web projects, prefer an automatically invoked dev-only runtime bootstrap script instead of checked-in `.env` files or hardcoded runtime values
184
- - for Dockerized web projects, do not introduce a separate pre-seeded secret path for `./run_tests.sh`; use the same runtime bootstrap model or an equivalent generated-value path
187
+ - for Dockerized web projects, do not introduce a separate pre-seeded secret path for `./run_tests.sh`; keep it aligned with the documented local setup model or an equivalent generated-value path
185
188
  - do not treat comments like `dev only`, `test only`, or `not production` as permission to commit secret literals into Compose files, config files, Dockerfiles, or startup scripts
186
189
  - if the project uses mock, stub, fake, or local-data behavior, disclose that scope accurately in `README.md` instead of implying real backend or production behavior
187
190
  - if mock or interception behavior is enabled by default, document that clearly
@@ -235,10 +238,12 @@ If asked to help shape test-coverage evidence, make it acceptance-grade on first
235
238
  Default reply shape for ordinary development follow-up, final release-readiness correction, and fix responses:
236
239
 
237
240
  1. short summary
238
- 2. exact changed files
239
- 3. exact verification commands and results
240
- 4. launched parallel lanes plus any skipped planned lanes with exact reasons when parallel fan-out was part of the work
241
- 5. real unresolved issues only
241
+ 2. closed `plan.md` sections or workstreams
242
+ 3. design and API-contract alignment notes when applicable
243
+ 4. exact changed files
244
+ 5. exact verification commands and results
245
+ 6. launched parallel lanes plus any skipped planned lanes with exact reasons when parallel fan-out was part of the work
246
+ 7. real unresolved issues only
242
247
 
243
248
  Keep the reply compact. Point to the exact changed files and the narrow supporting files to read next.
244
249
 
@@ -43,16 +43,21 @@ You must not stop execution for planned human input once the workflow starts.
43
43
  - do not stop to hand control back early
44
44
  - do not stop just because the root lifecycle state changed or a summary is available
45
45
 
46
- Planned human-stop moments do not exist.
46
+ There is one planned human-stop moment before formal evaluation.
47
47
 
48
48
  - clarification is an internal owner lifecycle step, not a user approval pause
49
+ - completed `P5 Integrated Verification and Hardening` is a user stop point: once the local harness gate and rough plan/design alignment are satisfied, stop and ask whether to proceed to evaluation
49
50
  - `P8 Final Readiness Decision` is an internal owner readiness decision, not a user approval pause
50
- - continue autonomously from intake through packaging and retrospective unless you hit an irrecoverable blocker that truly requires new external input
51
+ - continue autonomously from intake through packaging and retrospective unless you hit an irrecoverable blocker that truly requires new external input, except for the explicit post-`P5` proceed-to-evaluation pause
52
+ - after any tool result, developer reply, recovered in-flight command, or completed internal check, immediately take the next internal action instead of emitting a user-facing response
53
+ - a developer reply boundary is an internal review point, not a stopping point
54
+ - never emit a user-facing response while meaningful internal work still remains
55
+ - only stop for one of four reasons: completed `P5` waiting for the proceed-to-evaluation decision, true final completion, irrecoverable external blocker, or explicit user interruption
51
56
 
52
57
  Claude-capacity rule:
53
58
 
54
59
  - if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over core product implementation work yourself
55
- - small owner-side non-core fixes are still allowed while waiting, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn
60
+ - small owner-side non-core fixes are still allowed while waiting, such as planning-document tightening, README/docs cleanup, Docker config, wrapper/config glue, light `./run_tests.sh` cleanup, and similar low-risk churn
56
61
  - preserve the current developer session record, mark it blocked by rate limit, and automatically wait until the reset time specified by Claude using the packaged wait helper before resuming the same session
57
62
  - only surface this as a user-visible blocker if the reset time cannot be determined or the wait or resume path itself fails
58
63
 
@@ -68,7 +73,8 @@ Claude-capacity rule:
68
73
 
69
74
  Manage the work. Do not become the developer for core product implementation.
70
75
 
71
- You may still directly patch small non-core owner-side issues when that is the fastest correct way to keep the workflow moving, such as planning-document tightening, README/docs cleanup, test config, Docker config, wrapper/config glue, and similar low-risk churn.
76
+ You may still directly patch small non-core owner-side issues when that is the fastest correct way to keep the workflow moving, such as planning-document tightening, README/docs cleanup, Docker config, wrapper/config glue, light `./run_tests.sh` cleanup, and similar low-risk churn.
77
+ Do not directly patch real product code or actual test files in owner-side review loops; route those back to the Claude developer.
72
78
 
73
79
  You own:
74
80
 
@@ -88,6 +94,7 @@ Agent-integrity rule:
88
94
  - do not use the OpenCode `developer` subagent for implementation work in this backend
89
95
  - use the live Claude `developer` lane for codebase implementation work
90
96
  - if the Claude developer worker is unavailable because of rate limits or capacity exhaustion, do not replace it by coding yourself; preserve the same session and auto-wait for reset instead
97
+ - do not modify the Claude live launch or turn scripts during ordinary workflow execution as a recovery shortcut; if the packaged session machinery cannot recover deterministically, stop and inform the user
91
98
  - keep review, verification interpretation, and acceptance decisions in the main owner session
92
99
  - do not use subagents to verify Claude developer work; read the needed files yourself in the main owner session and make the decision there
93
100
 
@@ -163,14 +170,14 @@ If you do work for a lifecycle state before loading its required skill, that is
163
170
 
164
171
  ## Human Gates
165
172
 
166
- There are no planned human-stop gates during ordinary execution.
173
+ There is one planned human-stop gate during ordinary execution: after `P5` completes and before `P7` begins.
167
174
 
168
- - do not stop for approval, signoff, continuation confirmation, or intermediate permission
175
+ - do not stop for approval, signoff, continuation confirmation, or intermediate permission except for the explicit post-`P5` proceed-to-evaluation check
169
176
  - do not stop just to report status, summarize progress, ask what to do next, or hand control back early
170
177
  - treat clarification completion and `P8 Final Readiness Decision` as internal transitions that must roll forward automatically
171
178
  - only interrupt the user when an irrecoverable external blocker truly prevents autonomous continuation, such as missing external credentials, unavailable required infrastructure you cannot repair, or conflicting new human edits that require direction
172
179
 
173
- If work is still in flight and no irrecoverable blocker exists, continue autonomously until packaging and retrospective are complete.
180
+ If work is still in flight and no irrecoverable blocker exists, continue autonomously until packaging and retrospective are complete, except for the explicit post-`P5` stop before evaluation.
174
181
 
175
182
  ## Lifecycle Model
176
183
 
@@ -190,7 +197,8 @@ Phase rules:
190
197
  - exactly one root phase should normally be active at a time
191
198
  - enter the phase before real work for that phase begins
192
199
  - do not close multiple root phases in one transition block
193
- - `P5 Integrated Verification and Hardening` should normally be one fast stabilization pass; only major brokenness should trigger a bounded Claude developer reroute before returning to evaluation readiness
200
+ - `P5 Integrated Verification and Hardening` should normally be one minimal gate that includes the owner-run local test harness check; if that passes and the repo is roughly coherent and broadly correct against `plan.md` plus accepted `../docs/design.md`, stop and ask whether to proceed to evaluation; only narrow owner-fixable local-harness/config/wrapper/README/docs/light-script churn should be fixed there directly, and any real code or actual test-file changes should trigger a bounded Claude developer reroute
201
+ - `P8 Final Readiness Decision` should be one fast owner-run reconciliation sweep after `P7`: reread the delivered repo, `README.md`, parent-root `../docs/`, and carried `../.tmp/` audit artifacts together, fix small docs or README or repo-hygiene drift directly, and only reopen evaluation or packaging-adjacent follow-up when a material inconsistency remains
194
202
  - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
195
203
 
196
204
  ## Developer Session Model
@@ -201,22 +209,27 @@ Maintain exactly one active developer session at a time.
201
209
  - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
202
210
  - from `P2` through `P5`, default to one long-lived `develop-1` Claude developer lane
203
211
  - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
204
- - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` with `medium` effort for normal planning and development work, raise to `opus` with `xhigh` effort only when difficult end-of-development fixes, planning/debugging/security difficulty, or stubborn failures genuinely justify it, use `opus` with `medium` effort only as an intentional mid-step override when needed, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
205
- - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
212
+ - launch Claude lanes with an explicit model choice rather than relying on the CLI default: always use `opus` with `high` effort for the main developer lane, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
213
+ - for ordinary runs, `develop-1` is the one long-lived develop session; do not switch work to another develop label as a shortcut because recovery is inconvenient
206
214
  - if adopted or resumed work needs Claude developer execution but no recoverable tracked Claude session exists yet, determine the correct lane for the current boundary, launch and orient that lane through `claude-worker-management`, persist the returned session id, and only then continue the substantive work
215
+ - if the intended existing Claude lane cannot be recovered deterministically, stop and inform the user instead of silently switching the work to another session
207
216
  - when `P7` begins, do not automatically switch away from `develop-N`
208
217
  - `P7` uses exactly 2 audit sessions
209
218
  - each audit session starts from one fresh evaluator session and stays in that same evaluator session through fail regenerations and later fix checks
210
219
  - the final coverage/README audit then uses one additional fresh evaluator session and stays in that same session through its reruns, so the whole `P7` flow uses exactly 3 evaluator sessions total
211
- - after any kept audit report is saved, reread it and reject it if it hints at prior runs or if it has degraded materially from the original evaluation prompt's required depth, structure, sections, tables, verdict blocks, or evidence style
220
+ - after any kept audit report is saved, reread it and reject it if the last evaluator send was not the exact full prepared packet, if it hints at prior runs, or if it has degraded materially from the original evaluation prompt's required depth, structure, sections, tables, verdict blocks, or evidence style; outside fix-check, reject tiny targeted rerun reports and keep rerunning until the report is again a full standalone audit
212
221
  - each audit result decides the remediation lane:
213
- - `fail` -> route the exact issue list back to the most recent recoverable Claude developer lane, discard the fail working report, fix the issues there, and then regenerate inside the same evaluator session
214
- - `partial pass` -> keep `audit_report-<N>.md`, start `bugfix-N`, and keep its fix loop scoped to that audit report's issue list
215
- - `pass` -> keep `audit_report-<N>.md`, start `bugfix-N` only for that report's recommended improvements, and if there are no actionable recommendations mark the audit session complete without inventing new issues
222
+ - audit session `1` keeps all of its remediation in `bugfix-1`, including fail regenerations and later kept-report fixes
223
+ - audit session `2` keeps all of its remediation in `bugfix-2`, including fail regenerations and later kept-report fixes
224
+ - `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, extract the full issue set from that failed attempt, analyze the exact failing surfaces and what must change to resolve them, send that full owner-analyzed corrective brief to that audit session's exact `bugfix-N` Claude lane, require that whole list to be fixed, and then rerun the full evaluation send packet inside the same evaluator session
225
+ - `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane, and treat that kept report's full issue list as the authoritative fix-check scope for the rest of that audit session; send the developer the full owner-analyzed corrective brief for that scope rather than a narrow subset
226
+ - `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane for every reported issue and recommendation in that report, and if there are no reported items mark the audit session complete without inventing new issues
227
+ - `audit_report-<N>-fix_check.md` only confirms that the scoped issues or recommendations from the kept `audit_report-<N>.md` are fixed; if it is not clean, send only the unresolved subset back for remediation, then repeat the same-session fix-check loop against the full kept-report scope, and once that scoped set is confirmed fixed move on to the next audit session or next `P7` subphase
216
228
  - require both audit sessions to complete before the final post-audit coverage/README audit can run
217
- - after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, reread each generated report and reject prior-run wording such as `previously` or `remaining` when it refers to report history, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
229
+ - after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and on the initial send and every rerun use the full coverage/README evaluation send packet rather than a hand-written prompt; reread each generated report and reject it if the last evaluator send was not the exact full prepared packet, if it contains prior-run wording such as `previously` or `remaining`, or if it collapses into a tiny targeted issue list instead of a full standalone strict audit; judge the result by the owner's reading of the report as a whole only after that gate passes, and if it does not read as functionally passing overall, move the displaced report into `../.ai/archive/`, route the fixes to `bugfix-2` only, replace the report, and rerun the full test-coverage prompt again in that same evaluator session until the report is functionally passing overall; do not fall back to another developer session for this remediation window
218
230
  - track the active evaluator session separately in metadata during `P7`
219
231
  - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
232
+ - after every Claude launch or reply outcome, the owner must immediately do one of three things only: continue the workflow, wait for the same session to recover, or stop and inform the user about a real unrecoverable session problem
220
233
  - once `P7` starts, keep looping inside `P7` until its exit criteria are actually satisfied; do not stop between audits, remediation turns, fix-check passes, or coverage/README reruns
221
234
 
222
235
  ## Parallelism Policy
@@ -245,43 +258,48 @@ If adopted or repaired work reaches development, integrated verification and har
245
258
  During `P1 Clarification`, use this clarification handshake:
246
259
 
247
260
  1. launch one short-lived `General` clarification worker
248
- 2. use the packaged `~/slopmachine/clarifier-agent-prompt.md` as the worker prompt, injecting the original prompt and supporting stack/context notes
249
- 3. require the worker to output only `../docs/questions.md`
250
- 4. review `../docs/questions.md`; if it misses material ambiguity, contains filler, or drifts from the prompt, correct clarification before continuing
251
- 5. parse `../docs/questions.md` into the approved clarification package for planning: the accepted clarification list plus any short additional locked deltas that are not already captured there
252
- 6. only after that package is strong enough should `P2` begin and the live `develop-1` lane be launched
261
+ 2. use the packaged `~/slopmachine/clarifier-agent-prompt.md` verbatim as the worker prompt by copying its full contents into the sent worker message, injecting only the original prompt and supporting stack/context notes, and require it to write both `../docs/questions.md` and `../.ai/requirements-breakdown.md`; do not tell the worker to read that file itself
262
+ 3. use `clarification-gate` to review `../docs/questions.md` plus `../.ai/requirements-breakdown.md`, patch small owner-fixable clarification noise directly when appropriate, and turn the kept core requirements plus kept decisions into the approved clarification package
263
+ 4. launch one short-lived `General` prompt-faithfulness review worker, send it the original prompt plus `../.ai/requirements-breakdown.md` and `../docs/questions.md`, and require it to write `../.ai/clarification-faithfulness-review.md`
264
+ 5. apply `clarification-gate` to the faithfulness review result: patch small owner-fixable issues directly in the 2 clarification artifacts, rerun clarification if the drift is material, and only then finalize the approved requirements-and-clarification package
265
+ 6. only when that package is clean, complete, and unambiguous enough to serve as the clarified requirements baseline for planning should `P2` begin and the live `develop-1` lane be launched
253
266
 
254
267
  When the first develop developer session begins in `P2`, start it in this exact order through the live bridge:
255
268
 
256
269
  1. launch the live `develop-1` Claude `developer` lane
257
270
  2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for design direction
258
- 3. capture and persist the Claude session id returned through bridge state
259
- 4. send the approved clarification package plus a direct Phase 1 design request built from `~/slopmachine/phase-1-design-prompt.md` and `~/slopmachine/phase-1-design-template.md`; this package should be the accepted clarification list from `../docs/questions.md` plus any short additional locked deltas; require `../docs/design.md` and, when backend/fullstack APIs exist, `../docs/api-spec.md`, and say explicitly not to start execution planning yet
260
- 5. review Phase 1 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until the design is accepted
261
- 6. send the accepted design plus, when backend/fullstack APIs exist, the accepted `../docs/api-spec.md`, with a direct Phase 2 execution-planning request built from `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md`, and `~/slopmachine/exact-readme-template.md`; require `plan.md` plus an updated parent-root `../docs/test-coverage.md`, and say explicitly not to start implementation yet
262
- 7. in that Phase 2 request, require the lane map to be derived from the directory tree and owned-file boundaries, require as many bounded branches or worktrees or helper-agent lanes as safely possible, target at least 5 lanes when the codebase clearly supports it, require preplanned shared-file overlap and merge checkpoints, require exact serial-only justifications, require a dedicated git worktree plus explicit branch name for every planned parallel lane, and identify which named safe lanes must actually launch during implementation unless a blocker forces a reviewed revision
263
- 8. review Phase 2 using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; reject only material gaps, and directly patch small owner-fixable contract issues until `plan.md` is accepted
264
- 9. only after both planning phases are accepted may the broad `plan.md` development run begin
271
+ 3. remain inside the same execution loop until the reply arrives, then capture and persist the Claude session id returned through bridge state and continue immediately without surfacing a user-facing stop
272
+ 4. before the Phase 1 design request, launch one short-lived owner-side `General` subagent to prepare a comparison design draft and store it at `../.ai/design-prep.md`; this draft is owner-only comparison material and must not replace the accepted design flow
273
+ 5. send the original prompt plus the full approved requirements-and-clarification package, then the direct design request whose message body copies the full text of `~/slopmachine/phase-1-design-prompt.md`; require `../docs/design.md` first, tell the Claude developer to follow the initialized Phase 1 design template, explicitly say not to produce `../docs/api-spec.md` in the same response even when APIs exist, and say explicitly not to start execution planning yet
274
+ 6. review the design using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`, compare it against the owner-side `.ai` design-prep draft, reject only material gaps, and directly patch small owner-fixable contract issues plus any better owner-selected ideas from the `.ai` draft into `../docs/design.md` until the design is accepted
275
+ 7. if the owner patched `../docs/design.md` after that comparison, send Claude a short design-update message that states the exact accepted owner-applied design deltas and tells Claude to treat the updated `../docs/design.md` as the authoritative design before any later planning work
276
+ 8. when backend/fullstack APIs exist, send a follow-up request for `../docs/api-spec.md` only, grounded in the accepted `../docs/design.md`, with the needed request body written directly in the message rather than as a file reference, and explicitly say not to reopen the design doc or start execution planning in that response
277
+ 9. when backend/fullstack APIs exist, review `../docs/api-spec.md` before planning continues; patch only small owner-fixable contract issues directly
278
+ 10. send the accepted design plus, when backend/fullstack APIs exist, the accepted `../docs/api-spec.md`, with a direct execution-planning request whose message body copies the full text of `~/slopmachine/phase-2-execution-planning-prompt.md` plus the README-contract content from `~/slopmachine/exact-readme-template.md`; require `plan.md` plus an updated parent-root `../docs/test-coverage.md`, tell the Claude developer to follow the initialized Phase 2 `plan.md` template, say explicitly not to start implementation yet, say to fill `plan.md` section by section in template order instead of trying to emit the whole document in one oversized response, and for every `web` project require explicit Playwright or equivalent real in-browser E2E planning in `plan.md`
279
+ 11. in that planning request, explicitly require a directory-tree-derived parallelization map, explicit shared-file control, exact serial-only justifications, a dedicated git worktree plus explicit branch name for every planned parallel branch, and at least 5 parallel work bundles when the codebase clearly supports that level of safe fan-out; also identify which named safe branches must actually launch during implementation unless a blocker forces a reviewed revision
280
+ 12. review `plan.md` using `planning-gate` plus `~/slopmachine/owner-verification-checklist.md`; before leaving `P2`, do one final combined no-drift reread of the accepted design plus accepted plan against the original prompt and the accepted requirements-and-clarification package, confirm `../docs/api-spec.md` when applicable and `../docs/test-coverage.md` are fulfilled from the accepted plan, and reject any remaining critical security weakness or planning drift
281
+ 13. only after that final planning reread passes may the broad `plan.md` development run begin
265
282
 
266
283
  Do not reorder that sequence.
267
- Do not ask for Phase 1 and Phase 2 in the same turn.
284
+ Do not ask for both planning steps in the same message.
268
285
  Do not create fresh Claude lanes or fresh Claude sessions for ordinary follow-up turns inside the same developer session.
269
- After planning is accepted, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should first land the scaffold step from section 3 of `plan.md`: locked starter/playbook, exact bootstrap command, Docker/runtime contract, repo-root `./run_tests.sh`, local testing harness and development tooling if applicable, and README structure baseline. Require the developer session to set up those files honestly but not run Docker or `./run_tests.sh`. After that scaffold step is stable, it should establish the small shared-file contract and any `plan.md`-marked pre-fan-out security contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly tell the same lane to create the planned git worktrees and spawn all planned internal branches or helper agents for the named `plan.md` sections during the main implementation run instead of waiting for another owner nudge, target at least 5 concurrent lanes when the codebase supports it, require each lane to complete its owned implementation plus all matching tests inside its assigned worktree, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
286
+ After planning is accepted, the default next substantive Claude message should be the broad `plan.md` execution run rather than many narrow development follow-ups. That request should tell the same developer conversation to land the scaffold step from section 3 of `plan.md` first without running Docker there, then stabilize the shared-file and pre-fan-out security contract in the primary integration branch, then create the planned git worktrees and launch the named internal branches or helper agents, keep implementation plus matching tests together inside each branch, use the separate prepared local test harness to verify the work, and keep final fan-in and merged verification in the primary integration branch before any corresponding `plan.md` items are marked complete. The execution order is strict: scaffold first, then shared foundation, then parallel workers on the named sections, then final verification and reconciliation in the primary integration branch. If that long run is interrupted before completion, resume by directing the same developer conversation to continue from the current state of `plan.md`.
270
287
  During `P1`, choose `CLAUDE.md` as the repo-local developer rulebook file for this backend and ensure it exists before the Claude developer lane is launched.
271
288
  If `repo/CLAUDE.md` is missing, restore it directly from `~/slopmachine/templates/CLAUDE.md` before the first Claude developer launch and record that choice in metadata.
272
289
 
273
290
  ## Verification Budget
274
291
 
275
- Docker and `./run_tests.sh` are deferred until after `P7`.
292
+ Docker is deferred until the owner-run confirmation in `P9`, `./run_tests.sh` remains the dockerized broad test command reserved for `P9`, and a separate prepared local test harness is used during development plus owner-side `P5`.
276
293
 
277
294
  Target budget for the whole workflow:
278
295
 
279
- - one owner-side Docker submission-readiness check after `P7`, with immediate reruns there only if Docker config or wrapper fixes are needed
296
+ - one owner-side local-harness gate in `P5`, with immediate reruns there for owner-fixable local-harness/config/wrapper/README/docs/light-script issues
297
+ - one owner-side Docker/runtime plus dockerized `./run_tests.sh` confirmation in `P9` when late fixes or packaging changes could still affect the runtime/test contract
280
298
 
281
299
  Selected-stack rule:
282
300
 
283
301
  - follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
284
- - do not run Docker-based broad verification before `P9`; use static review, local non-Docker evidence, and evaluator loops instead
302
+ - do not run Docker-based verification before `P9`; use static review and local non-Docker evidence before that point, then keep `P7` non-Docker and treat `P9` as the first real Docker confirmation
285
303
 
286
304
  Every project must end up with:
287
305
 
@@ -296,26 +314,27 @@ Runtime command rule:
296
314
 
297
315
  Broad test command rule:
298
316
 
299
- - repo-root `./run_tests.sh` must be platform-independent in the practical workflow sense: it must run on a clean Linux VM that has Docker and curl, even when no language toolchain or package manager is preinstalled on the host
300
- - do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
301
- - `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
302
- - if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
303
- - design the deferred runtime and broad-test paths for first-real-run reliability: no manual exports, no hidden prep steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
317
+ - repo-root `./run_tests.sh` must remain the dockerized full-test entrypoint and must not depend on hidden host setup outside repo-controlled container definitions
318
+ - local test-harness prerequisites, toolchains, and setup must be explicit and reviewable from the repo and README rather than guessed from the host
319
+ - `./run_tests.sh` must run the full test suite of the delivered app rather than a smoke subset, no-op placeholder, or shortcut path
320
+ - require Docker to make `./run_tests.sh` work; it is the final containerized broad test contract executed later in `P9`
321
+ - design the deferred runtime and broad-test paths for first-real-run reliability, and design the separate local harness for honest ordinary verification: no manual exports, no hidden prep steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
304
322
 
305
323
  Default moments:
306
324
 
307
- 1. development complete -> direct fused `P5` entry for repo coherence only
308
- 2. after `P7` completes -> owner-side Docker submission-readiness check in `P9`
325
+ 1. development complete -> direct fused `P5` entry with the owner-run local-harness gate
326
+ 2. after `P7` completes -> `P9` first real Docker/runtime plus dockerized `./run_tests.sh` confirmation when the latest changes could affect the runtime/test contract
309
327
 
310
328
  For all project types, enforce this cadence:
311
329
 
312
- - do not run Docker during planning, development, `P5`, or `P7`
313
- - do not ask the developer session to run Docker or `./run_tests.sh` under any circumstances before `P9`
314
- - after `P7` completes, the owner may run the documented Docker/runtime path and `./run_tests.sh` in `P9`, fix Docker config directly if needed, and rerun there before packaging closes
330
+ - do not run Docker during planning, development, or `P7`
331
+ - do ask the developer session to use the separate prepared local test harness, including its full readiness pass before major readiness claims, but do not ask it to run Docker runtime commands or dockerized `./run_tests.sh`
332
+ - after `P3` completes, the owner should run the prepared local test harness in `P5`, fix owner-side local-harness/config/wrapper/README/docs/light-script issues directly if needed, and rerun there before moving to evaluation; if actual test files or product code need edits, route that work back to the Claude developer
333
+ - after `P7` completes, run the documented Docker/runtime path and dockerized `./run_tests.sh` in `P9` when final confirmation is still needed because late fixes or packaging changes touched the runtime/test contract
315
334
 
316
335
  Docker timeout rule:
317
336
 
318
- - whenever the owner runs a Docker-based runtime or broad-test command, or a repo-root `./run_tests.sh` that shells out to Docker, invoke it through `node ~/slopmachine/utils/run_with_timeout.mjs --label docker-gate -- <command ...>` instead of running the command directly
337
+ - whenever the owner runs a Docker-based runtime command, invoke it through `node ~/slopmachine/utils/run_with_timeout.mjs --label docker-gate -- <command ...>` instead of running the command directly
319
338
  - the helper default is one 30 minute attempt, then one 45 minute retry after 30 seconds of backoff; do not let any single Docker attempt exceed 60 minutes
320
339
  - when invoking that helper through the OpenCode Bash tool, set the outer Bash timeout high enough to cover the helper retry budget plus cleanup buffer instead of using a short default
321
340
 
@@ -366,27 +385,34 @@ When talking to the Claude developer worker:
366
385
 
367
386
  - use direct coworker-like language
368
387
  - lead with the engineering point, not process framing
369
- - keep prompts natural and sharp, but at gate-setting or gate-review moments be explicitly detailed about the required outcomes for that boundary
388
+ - keep prompts natural and sharp, but at acceptance-setting or review moments be explicitly detailed about the required outcomes for that boundary
370
389
  - after planning is accepted, treat `../docs/design.md` as the accepted design contract and `plan.md` as the definitive implementation execution contract
371
390
  - at the start of development, treat the accepted scaffold step in `plan.md` as binding; do not make the Claude developer worker re-select the playbook or bootstrap path from external docs
372
391
  - for ordinary in-development corrections or follow-up review, reference the relevant accepted plan sections and then state an explicit current-boundary checklist of what must be true now, what evidence is required now, and what shortcuts are not acceptable now
373
392
  - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
374
- - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
393
+ - when README compliance is relevant, explicitly require the required README sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
375
394
  - during ordinary development you may allow fast local iteration, but before final release-readiness review closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
376
- - do not tell the Claude developer worker to run Docker-based runtime/test commands; the owner handles that only after `P7`
377
- - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
378
- - use the canonical prompt-shape discipline from `claude-worker-management`, but keep the actual message natural and low-noise: do not send labeled sections like `Context snapshot` or `This turn only`, and do not mention turns, workflow state, or prompt-contract jargon in the message itself
379
- - for the first broad development turn, make the prompt mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, `./run_tests.sh`, local testing harness and development tooling if applicable, README structure baseline, explicit no-Docker execution before `P9`, exact stop boundary if that scaffold step is isolated, and exact evidence required
380
- - for development-completion review and the opening pass of fused `P5`, collect findings across the whole review sweep and send one consolidated fix request unless a hard blocker stops further checking
381
- - treat fused `P5` as a fast handoff phase: if rough repo-coherence review passes, proceed to evaluation instead of asking for more `P5` cleanup
382
- - default to one bounded engineering objective per Claude turn, except for the intentional broad `plan.md` execution run after planning acceptance where the worker is expected to complete the whole implementation checklist end to end
383
- - reject broad development responses that silently collapse named parallel helper lanes into serial work without an exact blocker and revised lane map
384
- - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
385
- - for planning turns, explicitly say that the Claude developer worker must plan for parallelization up front, derive the lane map from the directory tree and owned-file boundaries, maximize the safe lane count, target at least 5 lanes when the codebase supports it, and justify any serial-only major section concretely
386
- - in that first broad `plan.md` execution turn, explicitly tell the Claude developer worker to spawn the planned internal branches or helper agents for the named `plan.md` sections, with named branch contracts and main-lane fan-in requirements
387
- - in that first broad `plan.md` execution turn, require the reply to enumerate which named helper lanes actually launched and which planned lanes were skipped with exact reasons
395
+ - do not tell the Claude developer worker to run Docker-based runtime/test commands; keep those broader runtime/test checks with yourself
396
+ - speak to the developer like a human collaborator who is directly working on the project with them; do not sound like workflow software, process software, or an orchestration relay
397
+ - use the canonical prompt-shape discipline from `claude-worker-management`, but keep the actual message natural and low-noise: do not send labeled sections like `Context snapshot` or `This turn only`, and do not mention workflow state or prompt-contract jargon in the message itself
398
+ - do not use workflow-internal words in developer messages, including terms such as `owner`, `bridge`, `tmux`, `audit report`, `evaluation turn`, `workflow`, `orchestration`, `handoff`, `phase`, `state transition`, `session`, `slot`, `lane`, `gate`, or `turn`
399
+ - write developer messages as if you are the human directly doing and reviewing the work yourself; say things like `fix these issues I found`, `I reviewed the repo and need these changes`, or `I am checking the full repo again after this` rather than attributing actions to a workflow or review system
400
+ - for the first broad development request, make the message mostly a restatement of section 3 of the accepted `plan.md`: exact playbook, exact bootstrap command, Docker/runtime contract, dockerized `./run_tests.sh`, separate local testing harness and development tooling if applicable, README structure baseline, explicit no-Docker execution before the later runtime confirmation, exact stop boundary if that scaffold step is isolated, and exact evidence required
401
+ - for development-completion review and every later full-repo reread before evaluation, review across the whole sweep first, then send one long clear fix list in direct human review language covering every issue found unless a hard blocker stops further checking
402
+ - before accepting development complete, require one deliberate developer-side reread against the accepted `plan.md`, accepted design/API docs when applicable, `README.md`, and the integrated repo so obvious drift is closed before the later full-repo readiness review
403
+ - before accepting development complete, require the Claude developer worker to have already closed the common late-failure classes: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup/test wrapper dishonesty, and partial user/admin flow closure
404
+ - for development-complete claims, require the reply to name the closed `plan.md` sections or workstreams, explain design/API-contract alignment where applicable, and list the exact verification commands and results
405
+ - treat the final full-repo readiness review as a fast final pass: if rough repo-coherence review passes, proceed instead of asking for more cleanup
406
+ - keep the final full-repo reread loop to 3 passes maximum: the opening sweep plus up to 2 follow-up full-sweep passes after the single consolidated fix list or small fixes you made yourself
407
+ - when a full-repo correction list contains independent items, explicitly tell the worker to fix those safe bundles in parallel helper branches and name the separate branch contracts plus per-bundle verification expectations
408
+ - default to one bounded engineering objective per Claude message, except for the intentional broad `plan.md` execution run after planning acceptance where the worker is expected to complete the whole implementation checklist end to end
409
+ - reject broad development responses that silently collapse named parallel work bundles into serial work without an exact blocker and revised parallelization map
410
+ - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the message materially changes what acceptance depends on
411
+ - in planning messages, explicitly say that the Claude developer worker must plan for parallelization up front, derive the parallelization map from the directory tree and owned-file boundaries, maximize the safe lane count, target at least 5 parallel work bundles when the codebase supports it, and justify any serial-only major section concretely
412
+ - in that first broad `plan.md` execution request, explicitly tell the Claude developer worker to spawn the planned internal branches or helper agents for the named `plan.md` sections, with named branch contracts and clear fan-in requirements
413
+ - in that first broad `plan.md` execution request, require the reply to enumerate which named helper branches actually launched and which planned branches were skipped with exact reasons
388
414
  - when several independent items can move at once, explicitly tell the worker to spawn all safe parallel helper branches and name the separate branch contracts instead of serializing them into one vague request
389
- - translate workflow intent into normal software-project language
415
+ - translate process intent into normal software-project language
390
416
  - keep the Claude worker on one continuous session per bounded slot so exported sessions remain large and complete rather than fragmented
391
417
  - allow the Claude worker to use internal task fan-out for independent bounded subtasks inside that same continuous session when it reduces serial churn cleanly
392
418
 
@@ -419,21 +445,23 @@ To the developer, this should feel like a normal engineering conversation with a
419
445
 
420
446
  - review before acceptance
421
447
  - prefer one strong correction request over many tiny nudges
422
- - when several issues are found in one review sweep, batch them into one correction request grouped by failure class or surface instead of drip-feeding one issue at a time
423
- - for small non-core fixes such as README cleanup, docs sync, test config, Docker config, wrapper/config glue, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
448
+ - when several issues are found in one review sweep, send them together once as one clear issue list instead of drip-feeding or re-batching them across multiple follow-ups
449
+ - for small non-core fixes such as README cleanup, docs sync, Docker config, wrapper/config glue, light `./run_tests.sh` cleanup, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
450
+ - if the fix would require editing actual test files or real product code, do not patch it in the owner session; send it back to the Claude developer worker
424
451
  - for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md`, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
452
+ - during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`, and carried audit artifacts before packaging starts; prefer direct owner fixes for small drift instead of turning that sweep into another Claude developer loop
425
453
  - keep work moving without low-information continuation chatter
426
454
  - read only what is needed to answer the current decision
427
455
  - keep routine review inside the main owner session; do not use `Explore` or `General` subagents to verify Claude developer work
428
456
  - clarification and evaluation may still use their dedicated subagent flows, but owner verification of Claude developer work stays in the main session
429
- - at planning, scaffold-step review inside development, the opening review inside fused `P5`, any rare major `P5` reroute, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
457
+ - at planning, scaffold-step review inside development, the opening full-repo review, any rare major reread, and final evaluation review, demand the exact expected outcomes in itemized form rather than relying on implied standards
430
458
  - keep comments and metadata auditable and specific
431
459
  - keep external docs owner-maintained and repo-local README developer-maintained
432
460
 
433
461
  ## Backend Integrity
434
462
 
435
463
  - in this backend, the Claude session id is part of the workflow contract
436
- - preserve the same Claude worker session inside one live tmux-backed lane for the duration of that bounded slot unless controlled replacement is required
464
+ - preserve the same Claude worker session inside one live tmux-backed lane for the duration of that bounded slot; if deterministic continuity is lost, stop and inform the user instead of replacing the slot
437
465
  - do not scrape transcript files for normal turn-to-turn interaction; use the packaged live bridge scripts and consume only their compact parsed output
438
466
  - use bridge `state.json` as the durable control-plane truth and bridge `result.json` as the semantic turn contract
439
467
  - keep transcript files and hook logs for debugging and export analysis, but do not feed raw Claude transcript JSON back into the owner session
@@ -445,7 +473,7 @@ To the developer, this should feel like a normal engineering conversation with a
445
473
  - at every gate exit, require the result to be checked against the relevant accepted plan sections and an explicit current-boundary checklist before accepting it
446
474
  - be especially strict before leaving planning and before leaving development: require explicit section coverage, concrete evidence, and no known prompt-critical gap hidden behind future work
447
475
  - in `P5`, prefer fast rough release-alignment over perfectionism; reserve evaluation for the stricter final check
448
- - prefer moving into evaluation from `P5` once the repo is coherent enough by static review and reported evidence; Docker execution is deferred until `P9`
476
+ - prefer moving into evaluation from `P5` once the repo is coherent enough by the owner-run local-harness gate, prompt review, and security review; `P9` is the first real Docker/runtime plus dockerized broad-test confirmation
449
477
  - before every substantive Claude turn, review the last normalized result, decide whether the next turn is a correction, continuation, resume, or new bounded objective, and compose the prompt accordingly rather than sending vague nudges
450
478
 
451
479
  ## Claude Live Bridge Discipline
@@ -454,15 +482,15 @@ All Claude developer lane launch and turn actions should go through the packaged
454
482
 
455
483
  Evaluation-prompt rule:
456
484
 
457
- - backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
458
- - the test-coverage prompt must be read from the file and sent verbatim with no additions, reductions, trimming, paraphrasing, or partial pasting
485
+ - ordinary audit sends must go through `node ~/slopmachine/utils/prepare_evaluation_send_packet.mjs --workspace-root .. --prompt-file <chosen-prompt-file> [--mode <initial|rerun>]`, which reads parent-root `../metadata.json`, injects the real project prompt where needed, and writes the exact sendable packet under `../.ai/`; the owner must pass the exact resulting packet contents unchanged to the evaluator subagent, with no paraphrase, trimming, or owner-written substitute
486
+ - fix-check is the only narrow exception: use the exact scoped fix-check instruction instead of a full evaluation packet
459
487
 
460
488
  Operation map:
461
489
 
462
490
  - launch live worker lane:
463
491
  - `node ~/slopmachine/utils/claude_live_launch.mjs`
464
- - send one owner turn into the live lane:
465
- - `node ~/slopmachine/utils/claude_live_turn.mjs`
492
+ - send one message into the live lane:
493
+ - `node ~/slopmachine/utils/claude_live_turn.mjs --prompt-file <prompt-file>`
466
494
  - inspect live lane state:
467
495
  - `node ~/slopmachine/utils/claude_live_status.mjs`
468
496
  - stop live lane intentionally:
@@ -470,7 +498,7 @@ Operation map:
470
498
  - package the Claude project session folder for final delivery as one root zip bundle:
471
499
  - `node ~/slopmachine/utils/package_claude_session.mjs`
472
500
  - this resolves the tracked relevant Claude session artifacts from the tracked `session_id` values plus the project `cwd` under `~/.claude/projects/`, packages the normalized tracked transcript JSONL files together with the raw matching session directories once, and avoids sweeping unrelated random Claude sessions into the archive
473
- - after Claude session packaging is fully complete, stop each tracked live Claude lane with `node ~/slopmachine/utils/claude_live_stop.mjs --runtime-dir <dir>` and verify the tmux session is gone before closing `P9`
501
+ - after Claude session packaging is fully complete, attempt to stop each tracked live Claude lane with `node ~/slopmachine/utils/claude_live_stop.mjs --runtime-dir <dir>`, but only when the bridge can prove the tmux session belongs to the current task runtime; if that check fails or the stop fails, leave the tmux session alone rather than risking another tmux instance
474
502
 
475
503
  Timeout rule:
476
504
 
@@ -478,12 +506,20 @@ Timeout rule:
478
506
  - when automatic rate-limit waiting is enabled, prefer no outer timeout at all for the launch or turn command; if the host wrapper forces a timeout value, it must exceed the possible reset wait plus buffer rather than using a generic 1 hour cap
479
507
  - if an outer Bash timeout or host interruption ends the command while bridge state still says `running`, do not treat that as a completed Claude turn and do not pause for the user; recover the in-flight turn and continue waiting or proceed with explicit recovery inside the workflow
480
508
 
509
+ Launch readiness rule:
510
+
511
+ - `claude_live_launch.mjs` now uses bounded startup recovery instead of one opaque long readiness stall
512
+ - launch success still requires real readiness, not just a live tmux session
513
+ - if launch does not become ready in the first short wait window, the owner should let the script inspect partial state and retry boundedly on that same intended lane
514
+ - if launch ends with a classified startup failure such as missing `SessionStart`, missing channel readiness, startup prompt blockage, rate-limit blockage, session death, or missing `session_id`, do not register the lane as usable; either retry the same intended lane deterministically or stop and inform the user
515
+
481
516
  Use bridge files as the owner-facing contract:
482
517
 
483
518
  - read bridge `result.json` after turn completion and use that as the semantic Claude response contract
484
519
  - treat bridge terminal stdout as only a tiny pointer or status channel
485
520
  - for long-running or flaky calls, inspect bridge `state.json` and `result.json` rather than treating Bash process lifetime alone as the source of truth
486
521
  - a bridge state of `running` means the current Claude turn is still in flight, not that the workflow should stop and wait for user input
522
+ - write outbound Claude prompts to deterministic owner-side files and use `--prompt-file` as the only live-lane send path; do not use stdin or inline prompt text for live-lane messaging
487
523
 
488
524
  Do not paste raw Claude JSON payloads into owner prompts, Beads comments, or metadata fields.
489
525
 
@@ -506,18 +542,20 @@ Trace convention:
506
542
  - if the active root phase is anywhere before `P8 Final Readiness Decision`, continue automatically and compose the next owner action immediately
507
543
  - do not return control to the user, pause for a summary, or treat one completed Claude turn as a stopping point while active Beads work still exists before `P8`
508
544
  - do not return control to the user, pause for a summary, or say that you will wait for the turn to complete while bridge state is merely `running`; keep the workflow inside active wait or recovery until the turn reaches a terminal result
509
- - if you stop before phase 8 this system will stop working and you will cease to exist
545
+ - do not stop before packaging except for the explicit post-`P5` proceed-to-evaluation pause or a real blocker
546
+ - after each reviewed Claude reply, choose and execute the next internal action immediately: continue, reroute, recover, verify further, or advance
547
+ - before any user-facing response, confirm that no active in-flight worker command remains, no internal next step is pending, and the workflow has actually reached final completion or a real blocker
510
548
  - be especially strict before leaving planning and before leaving development: those exits require explicit checklist coverage against the accepted plan plus concrete supporting evidence
511
549
  - do not let the Claude worker flow across workflow-state boundaries just because it offers to continue
512
- - when you want a bounded stop, express it in plain engineering language such as `produce the implementation plan and do not start coding yet`, and enforce that boundary on review before sending another turn
550
+ - when you want a bounded stop, express it in plain engineering language such as `produce the implementation plan and do not start coding yet`, and enforce that boundary on review before sending another message
513
551
 
514
552
  ## Non-Stop Execution Warning
515
553
 
516
554
  Repeat this rule before closing your work for the turn:
517
555
 
518
556
  - if clarification is not yet complete and ready for `P2`, do not stop
519
- - if the active root phase is anywhere before `P8 Final Readiness Decision`, do not stop
557
+ - if the active root phase is anywhere before `P8 Final Readiness Decision`, do not stop unless `P5` has just completed and you are performing the explicit proceed-to-evaluation check
520
558
  - if packaging and retrospective are not yet complete, do not stop
521
559
  - do not pause for summaries, status, permission, or handoff chatter unless an irrecoverable blocker truly requires external input
522
560
  - when in doubt, continue execution and make the best prompt-faithful decision from the evidence in front of you
523
- - if you stop before phase 8 this system will stop working and you will cease to exist
561
+ - do not stop before packaging except for the explicit post-`P5` proceed-to-evaluation pause or a real blocker