theslopmachine 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. package/README.md +1 -1
  2. package/RELEASE.md +2 -2
  3. package/assets/agents/developer.md +13 -13
  4. package/assets/agents/slopmachine-claude.md +7 -5
  5. package/assets/agents/slopmachine.md +6 -5
  6. package/assets/claude/agents/developer.md +6 -6
  7. package/assets/skills/clarification-gate/SKILL.md +9 -18
  8. package/assets/skills/claude-worker-management/SKILL.md +34 -22
  9. package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
  10. package/assets/skills/development-guidance/SKILL.md +3 -0
  11. package/assets/skills/evaluation-triage/SKILL.md +6 -4
  12. package/assets/skills/final-evaluation-orchestration/SKILL.md +16 -13
  13. package/assets/skills/hardening-gate/SKILL.md +3 -0
  14. package/assets/skills/integrated-verification/SKILL.md +2 -0
  15. package/assets/skills/planning-guidance/SKILL.md +1 -0
  16. package/assets/skills/submission-packaging/SKILL.md +6 -4
  17. package/assets/skills/verification-gates/SKILL.md +7 -2
  18. package/assets/slopmachine/test-coverage-prompt.md +561 -0
  19. package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
  20. package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
  21. package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
  22. package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
  23. package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
  24. package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
  25. package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
  26. package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
  27. package/package.json +1 -1
  28. package/src/constants.js +2 -2
  29. package/src/init.js +7 -1
  30. package/src/install.js +94 -21
package/README.md CHANGED
@@ -40,7 +40,7 @@ From this package directory:
40
40
  npm install
41
41
  npm run check
42
42
  npm pack
43
- npm install -g ./theslopmachine-0.6.2.tgz
43
+ npm install -g ./theslopmachine-0.7.2.tgz
44
44
  ```
45
45
 
46
46
  For local development instead:
package/RELEASE.md CHANGED
@@ -14,7 +14,7 @@ node ./bin/slopmachine.js --help
14
14
  SLOPMACHINE_HOME="$(pwd)/.tmp-home" SLOPMACHINE_NONINTERACTIVE=1 SLOPMACHINE_PLUGIN_BOOTSTRAP=0 node ./bin/slopmachine.js setup
15
15
  ```
16
16
 
17
- That setup path should install `opencode-ai@latest` when OpenCode is missing and refresh it to `@latest` when it already exists.
17
+ That setup path should install `opencode-ai` when OpenCode is missing and only refresh it when the detected version is below the minimum supported version.
18
18
 
19
19
  Users can later refresh to the newest published package with:
20
20
 
@@ -105,7 +105,7 @@ And specifically verify that the tarball includes the current workflow assets:
105
105
  - `assets/slopmachine/utils/claude_live_turn.mjs`
106
106
  - `assets/slopmachine/utils/claude_live_status.mjs`
107
107
  - `assets/slopmachine/utils/claude_live_stop.mjs`
108
- - `test-coverage-prompt.md`
108
+ - `assets/slopmachine/test-coverage-prompt.md`
109
109
 
110
110
  ## Publish
111
111
 
@@ -57,23 +57,23 @@ Do not introduce convenience-based simplifications, `v1` reductions, future-phas
57
57
 
58
58
  - the original prompt explicitly allows it
59
59
  - the approved clarification explicitly allows it
60
- - the owner explicitly instructs it in the current session
60
+ - the project lead explicitly instructs it in the current session
61
61
 
62
62
  If a simplification would make implementation easier but is not explicitly authorized, keep the full prompt scope and plan the real complexity instead.
63
63
 
64
64
  When accepted planning artifacts already exist, treat them as the primary execution contract.
65
65
 
66
66
  - read the relevant accepted plan section before implementing the next slice
67
- - do not wait for the owner to restate what is already in the plan
68
- - treat owner follow-up prompts mainly as narrow deltas, guardrails, or correction signals
67
+ - do not wait for the project lead to restate what is already in the plan
68
+ - treat project-lead follow-up prompts mainly as narrow deltas, guardrails, or correction signals
69
69
 
70
- When the owner asks for planning without coding yet:
70
+ When the project lead asks for planning without coding yet:
71
71
 
72
72
  - produce an exhaustive, section-addressable implementation plan rather than a high-level summary
73
73
  - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
74
74
  - make unresolved items rare, narrow, and explicit
75
- - if the owner asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
76
- - when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
75
+ - if the project lead asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
76
+ - when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
77
77
 
78
78
  ## Execution Model
79
79
 
@@ -91,7 +91,7 @@ When the owner asks for planning without coding yet:
91
91
  - keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
92
92
  - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
93
93
  - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
94
- - if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the owner will catch inconsistencies later
94
+ - if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the project lead will catch inconsistencies later
95
95
  - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
96
96
  - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
97
97
  - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -170,15 +170,15 @@ Before reporting work as ready, run this preflight yourself:
170
170
  - consistency: do code, docs, route contracts, security notes, and runtime/test commands agree?
171
171
  - flow completeness: are the user-facing and operator-facing flows touched by this work actually covered end to end?
172
172
  - security and permissions: are auth, RBAC, object-level checks, sensitive actions, and audit implications handled where relevant?
173
- - verification: did you run the strongest targeted checks that are appropriate without using owner-only broad gates?
174
- - reviewability: can the owner review this work by reading the changed files and a small number of directly related files?
175
- - test-coverage specificity: if the owner asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
173
+ - verification: did you run the strongest targeted checks that are appropriate without using lead-only broad gates?
174
+ - reviewability: can the project lead review this work by reading the changed files and a small number of directly related files?
175
+ - test-coverage specificity: if the project lead asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
176
176
 
177
177
  If any answer is no, fix it before replying or call out the blocker explicitly.
178
178
 
179
179
  When you make an assumption, keep it prompt-preserving by default. If an assumption would reduce scope, mark it as unresolved instead of silently locking it in.
180
180
 
181
- If the owner asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
181
+ If the project lead asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
182
182
 
183
183
  - one explicit row or subsection per requirement/risk cluster
184
184
  - planned test file or test layer named concretely
@@ -207,9 +207,9 @@ Default reply shape for ordinary slice completion, hardening, and fix responses:
207
207
  3. exact verification commands and results
208
208
  4. real unresolved issues only
209
209
 
210
- Keep the reply compact. Point to the exact changed files and the narrow supporting files the owner should read next.
210
+ Keep the reply compact. Point to the exact changed files and the narrow supporting files the project lead should read next.
211
211
 
212
- Use the larger reply shape only when the owner explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
212
+ Use the larger reply shape only when the project lead explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
213
213
 
214
214
  1. `Changed files` — exact files changed
215
215
  2. `What changed` — the concrete behavior/contract updates in those files
@@ -207,14 +207,15 @@ Maintain exactly one active developer session at a time.
207
207
  - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
208
208
  - from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
209
209
  - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
210
+ - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` for normal work, escalate to `opus` only when the planning/debugging/security difficulty genuinely justifies it, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
210
211
  - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
211
212
  - when `P7` begins, do not automatically switch away from `develop-N`
212
213
  - each fresh evaluation result decides the remediation lane:
213
- - `fail` -> route the issue list back to the latest `develop-N` Claude session
214
- - `partial pass` -> start the next `bugfix-N` Claude session tied to that audit report and keep its fix loop scoped to that audit's issue list
215
- - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
214
+ - `fail` -> route the issue list back to the latest `develop-N` Claude session and discard the working audit report file after triage
215
+ - `partial pass` -> start the next `bugfix-N` Claude session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
216
+ - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
216
217
  - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
217
- - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
218
+ - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
218
219
  - track the active evaluator session separately in metadata during `P7`
219
220
  - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
220
221
 
@@ -236,7 +237,7 @@ When the first develop developer session begins in `P2`, start it in this exact
236
237
  2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
237
238
  3. capture and persist the Claude session id returned through bridge state
238
239
  4. form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
239
- 5. send a compact second owner message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
240
+ 5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
240
241
  6. continue with planning from there in that same Claude session
241
242
 
242
243
  Do not reorder that sequence.
@@ -346,6 +347,7 @@ When talking to the Claude developer worker:
346
347
  - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
347
348
  - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
348
349
  - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
350
+ - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
349
351
  - use the canonical prompt-shape discipline from `claude-worker-management`: every substantive turn should make the current boundary, expected outcomes, required evidence, disallowed shortcuts, and stop boundary unmistakable
350
352
  - default to one bounded engineering objective per Claude turn; split cross-boundary work into separate turns instead of hoping Claude infers the boundary correctly
351
353
  - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
@@ -199,11 +199,11 @@ Maintain exactly one active developer session at a time.
199
199
  - do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
200
200
  - when `P7` begins, do not automatically switch away from `develop-N`
201
201
  - each fresh evaluation result decides the remediation lane:
202
- - `fail` -> route the issue list back to the latest `develop-N` session
203
- - `partial pass` -> start the next `bugfix-N` session tied to that audit report and keep its fix loop scoped to that audit's issue list
204
- - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
202
+ - `fail` -> route the issue list back to the latest `develop-N` session and discard the working audit report file after triage
203
+ - `partial pass` -> start the next `bugfix-N` session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
204
+ - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
205
205
  - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
206
- - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
206
+ - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
207
207
  - track the active evaluator session separately in metadata during `P7`
208
208
 
209
209
  ## Parallelism Policy
@@ -222,7 +222,7 @@ When the first develop developer session begins in `P2`, use this planning hands
222
222
  1. send the original prompt and tell the developer to read it carefully, not plan yet, and wait for clarifications and planning direction
223
223
  2. wait for the developer's first reply
224
224
  3. before the second message, form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
225
- 4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second owner message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
225
+ 4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second planning-direction message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
226
226
  5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
227
227
  6. continue with planning from there
228
228
 
@@ -338,6 +338,7 @@ When talking to the developer:
338
338
  - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
339
339
  - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
340
340
  - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
341
+ - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
341
342
  - do not re-dump the entire plan, but do enumerate the exact subset of plan-backed outcomes that must now be delivered
342
343
  - when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
343
344
  - when 2 or 3 independent items can move at once, explicitly authorize parallel execution and name the separate branch contracts instead of serializing them into one vague request
@@ -50,14 +50,14 @@ Do not narrow scope for convenience.
50
50
  - if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
51
51
  - update `README.md` when behavior or run/test instructions change
52
52
  - do not touch workflow or rulebook files such as `CLAUDE.md` unless explicitly asked
53
- - when the owner says to plan without coding yet, produce planning artifacts and stop
53
+ - when the project lead says to plan without coding yet, produce planning artifacts and stop
54
54
  - when planning, produce an exhaustive, section-addressable implementation plan rather than a high-level summary
55
55
  - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
56
56
  - make unresolved items rare, narrow, and explicit
57
- - when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
58
- - planning-only deliverables inside the repo should be limited to `README.md` unless the owner explicitly asks for another in-repo artifact
59
- - when the owner says to finish the scaffold and not start feature implementation yet, stop before starting development work
60
- - do not continue into extra follow-on work that the owner did not ask for
57
+ - when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
58
+ - planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
59
+ - when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
60
+ - do not continue into extra follow-on work that the project lead did not ask for
61
61
  - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
62
62
  - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
63
63
  - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -121,7 +121,7 @@ Selected-stack defaults:
121
121
  - be direct and technically clear
122
122
  - report what changed, what was verified, and what still looks weak
123
123
  - if a problem needs a real fix, fix it instead of explaining around it
124
- - when the owner asks for a bounded deliverable, end with a concise summary of what was completed and what remains
124
+ - when the project lead asks for a bounded deliverable, end with a concise summary of what was completed and what remains
125
125
  - when you write or update files, end with:
126
126
  - `FILES_CHANGED:` followed by the exact repo-local file paths changed
127
127
  - `NEXT_STEP:` followed by the next concrete engineering step or remaining blocker when useful
@@ -133,12 +133,12 @@ Its primary target is requirements ambiguity from the original prompt.
133
133
 
134
134
  Prefer questions about missing or unclear product behavior, actor expectations, workflow requirements, business rules, scope boundaries, output expectations, and other prompt-level ambiguities.
135
135
 
136
- Each entry should answer this structure:
136
+ Each entry should use this exact structure:
137
137
 
138
- 1. what was unclear from the original prompt
139
- 2. how you interpreted it
140
- 3. what decision or solution you chose for it
141
- 4. why that choice is prompt-faithful and reasonable
138
+ 1. a numbered clarification heading
139
+ 2. `Question:`
140
+ 3. `My Understanding:`
141
+ 4. `Solution:`
142
142
 
143
143
  Keep the file narrow and explicit.
144
144
 
@@ -156,19 +156,10 @@ Do not use `questions.md` for:
156
156
  Preferred entry shape:
157
157
 
158
158
  ```md
159
- ## Item N: <short ambiguity title>
160
-
161
- ### What was unclear
162
- <the exact ambiguity or missing detail>
163
-
164
- ### Interpretation
165
- <how it was interpreted>
166
-
167
- ### Decision
168
- <the chosen resolution or safe default>
169
-
170
- ### Why this is reasonable
171
- <brief justification tied to prompt faithfulness>
159
+ ### 1. <short clarification title>
160
+ - Question: <the exact ambiguity or missing detail>
161
+ - My Understanding: <how it was interpreted and why this needed to be locked>
162
+ - Solution: <the chosen resolution or safe default>
172
163
  ```
173
164
 
174
165
  If nothing material was unclear, still create `questions.md` and keep it minimal rather than inventing content.
@@ -20,7 +20,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
20
20
  - do not use the OpenCode `developer` subagent for implementation work in the `slopmachine-claude` path
21
21
  - do not read Claude transcript files as the normal communication channel
22
22
  - communicate with the Claude worker through the packaged live bridge scripts in `~/slopmachine/utils/`
23
- - use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each owner message into that lane
23
+ - use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each message into that lane
24
24
  - set the Claude live runtime settings default `agent` to `developer` so the lane stays on the intended system prompt even if the session is resumed or inspected through Claude-native controls
25
25
  - treat bridge `state.json` as the durable control-plane truth for lane status, routing, and Claude session identity
26
26
  - treat bridge `result.json` as the semantic source of truth after each completed turn
@@ -32,9 +32,9 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
32
32
  - launch the live lane with `--dangerously-skip-permissions` so the worker does not stall on routine file-edit permission prompts inside the bounded repo
33
33
  - when Claude uses internal task fan-out and the environment allows explicit agent selection, prefer the installed `developer` agent for implementation-capable branches so the same engineering standard applies across those branches
34
34
  - there is no repo-controlled guarantee that every Claude helper subagent globally reuses the `developer` prompt, so keep critical implementation in the main developer lane or in explicitly developer-scoped helper branches rather than relying on unspecified built-in helper behavior
35
- - make every owner-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
36
- - do not send vague owner prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
37
- - each substantive owner message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
35
+ - make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
36
+ - do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
37
+ - each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
38
38
  - default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
39
39
 
40
40
  ## Lane launch rule
@@ -53,6 +53,15 @@ Preferred launch pattern:
53
53
  node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --runtime-dir <dir>
54
54
  ```
55
55
 
56
+ ## Model selection rule
57
+
58
+ - choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
59
+ - default to `--model sonnet` for ordinary planning, scaffold, development, and routine bugfix work
60
+ - escalate to `--model opus` only for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
61
+ - keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
62
+ - when the task difficulty warrants it, also pass an explicit `--effort <level>` at launch time rather than hoping the default thinking level is ideal
63
+ - keep the chosen `model`, `effort`, and `subagent_model` recorded in bridge state so later recovery and review can see what launched the lane
64
+
56
65
  The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
57
66
 
58
67
  When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
@@ -70,17 +79,18 @@ The default pattern is to let the live lane start normally and then persist the
70
79
  For all later turns in the same bounded developer slot:
71
80
 
72
81
  ```bash
73
- node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --prompt-file <file> --timeout-ms <turn-timeout>
82
+ printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
74
83
  ```
75
84
 
76
- - inject exactly one owner message at a time into the idle live lane
85
+ - inject exactly one message at a time into the idle live lane
86
+ - pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
77
87
  - wait for `Stop` or `StopFailure` before sending the next message
78
88
  - do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
79
89
  - if turn execution fails, stop and recover explicitly instead of silently creating a new worker
80
90
 
81
91
  ## Turn-preflight checklist
82
92
 
83
- Before sending any owner message into the live lane:
93
+ Before sending any message into the live lane:
84
94
 
85
95
  1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
86
96
  2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
@@ -89,12 +99,12 @@ Before sending any owner message into the live lane:
89
99
  5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
90
100
 
91
101
  If the stop boundary is fuzzy, the turn is too broad.
92
- If the owner prompt would span multiple major boundaries, split it.
102
+ If the message would span multiple major boundaries, split it.
93
103
  Do not send the next turn until the prior turn has been reviewed and either accepted, corrected, or explicitly rerouted.
94
104
 
95
- ## Canonical owner-message contract
105
+ ## Canonical lead-message contract
96
106
 
97
- For substantive live-lane turns, write the owner message in natural engineering language but make sure it includes all of these ingredients:
107
+ For substantive live-lane turns, write the message in natural engineering language but make sure it includes all of these ingredients:
98
108
 
99
109
  - `Context snapshot`: the current accepted state and only the fresh deltas that matter now
100
110
  - `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
@@ -112,16 +122,18 @@ When the turn intentionally uses internal parallel fan-out, also include:
112
122
  - `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
113
123
 
114
124
  Keep the wording natural. Do not turn every prompt into a rigid template dump.
125
+ The actual message should read like it came from a human project manager or technical lead who is invested in the project, not from workflow software.
126
+ Do not use obvious automation phrasing such as `owner`, `workflow`, `phase`, `session slot`, `contract anchor`, or `reply contract` in the message sent to Claude unless the user explicitly wants that style.
115
127
  But do make the contract mechanically obvious enough that Claude cannot plausibly misunderstand what acceptance depends on.
116
128
 
117
129
  ## Canonical prompt shapes
118
130
 
119
131
  ### Planning-start shape
120
132
 
121
- For the second owner message in the first `develop` lane and for other explicit planning-entry turns:
133
+ For the second planning-direction message in the first `develop` lane and for other explicit planning-entry turns:
122
134
 
123
135
  - inline the approved clarification content and requirements-ambiguity resolutions directly in the message
124
- - include the owner's initial planning view so Claude refines a direction instead of inventing one from zero
136
+ - include the initial planning view so Claude refines a direction instead of inventing one from zero
125
137
  - restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
126
138
  - say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
127
139
  - require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
@@ -154,7 +166,7 @@ For ordinary implementation turns:
154
166
  - name the exact slice, user/admin actor path, modules, or surfaces to complete now
155
167
  - itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
156
168
  - require targeted local verification tied back to those expected outcomes
157
- - explicitly prohibit owner-only broad verification commands and unrelated follow-on work
169
+ - explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
158
170
  - when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
159
171
  - say to stop after this slice and report the exact changed files plus exact verification results
160
172
 
@@ -189,7 +201,7 @@ For evaluator-driven remediation inside a `bugfix-N` session opened by a `partia
189
201
 
190
202
  Do not do these:
191
203
 
192
- - send `continue`, `next`, or `keep going` as a substantive owner prompt
204
+ - send `continue`, `next`, or `keep going` as a substantive prompt
193
205
  - ask for planning and implementation in the same turn unless that mixed boundary is intentional and explicitly stated
194
206
  - ask for multiple gate exits in one turn
195
207
  - let Claude decide its own stopping point implicitly
@@ -252,17 +264,17 @@ When the first `develop` slot begins in planning:
252
264
  1. launch the live `develop` lane if it is not already running
253
265
  2. send the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction through the bridge
254
266
  3. store the Claude session id from bridge `state.json`
255
- 4. form an initial owner planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
256
- 5. send a compact second owner message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial owner planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
267
+ 4. form an initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
268
+ 5. send a compact second message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
257
269
  6. continue the planning conversation in that same Claude session
258
270
 
259
271
  Do not merge those two first messages.
260
272
  Do not ask for a plan in the first message.
261
273
 
262
- Preferred second owner message shape:
274
+ Preferred second planning-direction message shape:
263
275
 
264
- - inline the approved clarification content and the requirements-ambiguity resolutions directly in the owner message
265
- - include the owner's initial planning view so planning is refined collaboratively rather than invented from zero
276
+ - inline the approved clarification content and the requirements-ambiguity resolutions directly in the message
277
+ - include the initial planning view so planning is refined collaboratively rather than invented from zero
266
278
  - add any short delta notes that are not already captured in that inlined summary
267
279
  - express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
268
280
  - require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
@@ -270,7 +282,7 @@ Preferred second owner message shape:
270
282
  - say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
271
283
 
272
284
  Do not tell the developer worker to read files outside `repo/`.
273
- If owner-side artifacts outside `repo/` matter, restate their content directly in the owner message instead of passing file paths.
285
+ If project-lead artifacts outside `repo/` matter, restate their content directly in the message instead of passing file paths.
274
286
  Do not mention session names, slot labels, or workflow phase labels to the developer worker.
275
287
 
276
288
  ### `bugfix-N` orientation handshake
@@ -278,7 +290,7 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
278
290
  When a fresh `partial pass` evaluation result opens the next remediation lane:
279
291
 
280
292
  1. launch a fresh live Claude developer lane for the next `bugfix-N` label
281
- 2. use the first owner message only to orient that session to the repo and the current delivered state
293
+ 2. use the first message only to orient that session to the repo and the current delivered state
282
294
  3. make clear in plain engineering language that follow-up work will be focused remediation against evaluator findings
283
295
  4. wait for the first response and store the Claude session id from bridge `state.json`
284
296
  5. only after that orientation exchange, continue the same `bugfix-N` live lane with the first evaluator-driven issue list
@@ -390,7 +402,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
390
402
  - if the bridge reports `blocked` because of `claude_usage_limit`, treat that as an automatic wait-and-resume path rather than a handoff-stop condition unless the wait or resume path itself fails
391
403
  - if the saved live lane cannot continue, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
392
404
  - if a replacement session is required, record the handoff clearly in metadata and tracker comments
393
- - keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal owner prompts unless debugging is explicitly needed
405
+ - keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal developer-facing prompts unless debugging is explicitly needed
394
406
 
395
407
  ## Rate-limit handling
396
408
 
@@ -140,7 +140,7 @@ Each `evaluation_runs[]` record should include enough to recover deterministic `
140
140
  - `audit_number`
141
141
  - `session_id`
142
142
  - `verdict`
143
- - `audit_report_path`
143
+ - `audit_report_path` when the report was kept, otherwise `null`
144
144
  - `route_target`
145
145
  - `routed_developer_session_id`
146
146
  - `routed_developer_label`
@@ -172,6 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
172
172
  - keep exactly one active developer session at a time
173
173
  - record every developer session in `developer_sessions`
174
174
  - from `P2` through `P6`, default to one long-lived `develop-1` lane
175
+ - default the launch model for that long-lived lane to `sonnet`; choose `opus` only when the current lane's work is genuinely high-difficulty enough to justify a more expensive launch
175
176
  - if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
176
177
  - keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
177
178
  - keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically
@@ -65,6 +65,7 @@ Use this skill during `P4 Development` before prompting the developer.
65
65
  - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
66
66
  - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
67
67
  - before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
68
+ - before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
68
69
  - verify the module against its planned behavior before trying to move on
69
70
  - do not move on while the module is still obviously weak or half-finished
70
71
  - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
@@ -80,8 +81,10 @@ Use this skill during `P4 Development` before prompting the developer.
80
81
  - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
81
82
  - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
82
83
  - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
84
+ - local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
83
85
  - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
84
86
  - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
87
+ - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
85
88
  - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
86
89
  - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
87
90
  - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
@@ -26,7 +26,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
26
26
  - treat the audit as a remediation trigger that routes back to develop
27
27
  - extract and hand off all issues to the latest `develop-N` developer session
28
28
  - fix them
29
- - keep the audit report at its normalized `../.tmp/audit_report-<N>.md` path
29
+ - do not keep the fail audit report in `../.tmp/` after triage; discard it once the issue bundle is extracted and recorded in metadata
30
30
  - do not open `bugfix-N` for this audit
31
31
  - run a fresh new evaluator session for the next audit
32
32
 
@@ -39,6 +39,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
39
39
  ### `pass`
40
40
 
41
41
  - record the audit as a discarded clean audit and do not hand off an issue list
42
+ - discard the pass audit report file instead of keeping it in `../.tmp/`
42
43
  - do not treat it as `P7` completion
43
44
  - immediately rerun a fresh evaluation until a `partial pass` opens the next scoped bugfix session
44
45
 
@@ -69,8 +70,9 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
69
70
 
70
71
  ## Exit standard
71
72
 
72
- - after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session until the report is clean
73
+ - after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session
73
74
  - keep the coverage/README report path fixed at `../.tmp/test_coverage_and_readme_audit_report.md` and replace the prior copy on each rerun instead of numbering it
74
- - do not move to `P8` until 2 bugfix sessions have been completed and the coverage/README audit report is clean
75
- - keep every fresh audit report under `../.tmp/audit_report-<N>.md`
75
+ - allow at most 3 remediation attempts for the coverage/README audit; after the third attempt, keep the latest report as the final carried-forward evidence
76
+ - do not move to `P8` until 2 bugfix sessions have been completed and the final coverage/README report exists from that last `P7` subphase
77
+ - keep only partial-pass audit reports under `../.tmp/audit_report-<N>.md`
76
78
  - for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`
@@ -40,10 +40,9 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
40
40
 
41
41
  - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
42
42
  - do not use the older cycle-directory report-root model
43
- - number every fresh evaluation audit sequentially across the whole run:
44
- - `../.tmp/audit_report-1.md`
45
- - `../.tmp/audit_report-2.md`
46
- - and so on
43
+ - number every fresh evaluation audit sequentially across the whole run for routing and metadata purposes
44
+ - persist `../.tmp/audit_report-<N>.md` only for `partial pass` audits that actually open bugfix sessions
45
+ - if a fresh audit is `fail` or `pass`, extract what you need from the generated working report, record the verdict and routing in metadata, and then discard the report file instead of leaving it in `../.tmp/`
47
46
  - for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
48
47
  - `../.tmp/audit_report-<N>-fix_check-1.md`
49
48
  - `../.tmp/audit_report-<N>-fix_check-2.md`
@@ -82,8 +81,10 @@ For each fresh audit:
82
81
  - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
83
82
  - send that fully composed text block directly to one fresh `General` evaluator session
84
83
  - require that session to produce a detailed file-backed audit report plus an issue summary
85
- - assign the next audit number and normalize the report path to `../.tmp/audit_report-<N>.md`
86
- - record the evaluator session id, prompt kind, audit number, verdict, report path, and routing decision in metadata
84
+ - assign the next audit number
85
+ - if and only if the verdict is `partial pass`, keep the normalized report path as `../.tmp/audit_report-<N>.md`
86
+ - if the verdict is `fail` or `pass`, discard the generated report file after extracting the issue summary or verdict you need
87
+ - record the evaluator session id, prompt kind, audit number, verdict, kept-or-discarded report status, and routing decision in metadata
87
88
 
88
89
  ## Fresh-audit branching rule
89
90
 
@@ -91,11 +92,11 @@ After each fresh audit report is produced, branch by verdict:
91
92
 
92
93
  ### `fail`
93
94
 
94
- - record the audit as a `fail` under its `audit_report-<N>.md` path
95
+ - record the audit as a `fail` in metadata, but do not leave an `audit_report-<N>.md` file in `../.tmp/`
95
96
  - extract all reported issues and send them to the latest `develop-N` session
96
97
  - do not open `bugfix-N` for a `fail` audit
97
98
  - fix the issues in that develop session
98
- - after remediation, start a brand new evaluator session and run the next fresh audit as `audit_report-<N+1>.md`
99
+ - after remediation, start a brand new evaluator session and run the next fresh audit
99
100
 
100
101
  ### `partial pass`
101
102
 
@@ -106,7 +107,7 @@ After each fresh audit report is produced, branch by verdict:
106
107
 
107
108
  ### `pass`
108
109
 
109
- - record the audit as a discarded clean audit under its `audit_report-<N>.md` path
110
+ - record the audit as a discarded clean audit in metadata and do not leave an `audit_report-<N>.md` file in `../.tmp/`
110
111
  - do not open `bugfix-N`
111
112
  - do not count it toward `P7` completion
112
113
  - immediately start another fresh evaluator session and continue `P7` until a `partial pass` opens the next bugfix session
@@ -128,7 +129,7 @@ Inside a `partial pass` audit's bugfix loop:
128
129
 
129
130
  ## Post-bugfix coverage and README audit
130
131
 
131
- - after 2 bugfix sessions have been completed, do not leave `P7` yet
132
+ - after 2 bugfix sessions have been completed, do not leave `P7` yet; this audit is the last subphase inside `P7`
132
133
  - read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
133
134
  - launch a fresh `General` evaluator session for this audit
134
135
  - prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
@@ -137,8 +138,10 @@ Inside a `partial pass` audit's bugfix loop:
137
138
  - if the report finds any issue, treat that as blocking `P7` completion
138
139
  - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
139
140
  - require fixes plus concrete verification evidence from that developer session
141
+ - after the fixes land, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands before the next static coverage/README rerun and treat failures as unresolved issues
140
142
  - after the fixes land, run a fresh new coverage/README audit again and replace the old report
141
- - keep looping until `../.tmp/test_coverage_and_readme_audit_report.md` is clean and the report confirms the minimum 90 percent coverage threshold is satisfied
143
+ - allow at most 3 remediation attempts for this final coverage/README audit
144
+ - if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward
142
145
 
143
146
  ## Scope rule
144
147
 
@@ -149,10 +152,10 @@ Inside a `partial pass` audit's bugfix loop:
149
152
 
150
153
  ## Exit target
151
154
 
152
- - `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit report is clean
155
+ - `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit has run as the last subphase of `P7`
153
156
  - the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
154
157
  - fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
155
- - after the second bugfix session completes, run the coverage/README audit; move to `P8 Final Human Decision` only after that audit passes cleanly
158
+ - after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Human Decision` with a clean report, otherwise move to `P8 Final Human Decision` with the latest final report after the third attempt
156
159
 
157
160
  ## Boundaries
158
161
 
@@ -51,6 +51,7 @@ Hardening should treat these as the main review buckets before final evaluation
51
51
  - audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
52
52
  - audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
53
53
  - audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
54
+ - for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
54
55
  - audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
55
56
  - verify that missing failure handling is not being hidden behind fake-success behavior
56
57
  - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
@@ -58,6 +59,7 @@ Hardening should treat these as the main review buckets before final evaluation
58
59
  - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
59
60
  - enforce env-file discipline during hardening
60
61
  - run documentation verification against the real codebase and runtime behavior, not just document existence
62
+ - if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
61
63
  - audit README compliance against the strict post-bugfix README review shape:
62
64
  - project type near the top
63
65
  - startup instructions
@@ -67,6 +69,7 @@ Hardening should treat these as the main review buckets before final evaluation
67
69
  - architecture and workflow clarity
68
70
  - for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
69
71
  - verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
72
+ - before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
70
73
  - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
71
74
  - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
72
75
  - make sure the system is genuinely reviewable and reproducible