npm - theslopmachine - Versions diffs - 0.7.0 → 0.7.2 - Mend

theslopmachine 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +1 -1
package/RELEASE.md +2 -2
package/assets/agents/developer.md +13 -13
package/assets/agents/slopmachine-claude.md +7 -5
package/assets/agents/slopmachine.md +6 -5
package/assets/claude/agents/developer.md +6 -6
package/assets/skills/clarification-gate/SKILL.md +9 -18
package/assets/skills/claude-worker-management/SKILL.md +34 -22
package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
package/assets/skills/development-guidance/SKILL.md +3 -0
package/assets/skills/evaluation-triage/SKILL.md +6 -4
package/assets/skills/final-evaluation-orchestration/SKILL.md +16 -13
package/assets/skills/hardening-gate/SKILL.md +3 -0
package/assets/skills/integrated-verification/SKILL.md +2 -0
package/assets/skills/planning-guidance/SKILL.md +1 -0
package/assets/skills/submission-packaging/SKILL.md +6 -4
package/assets/skills/verification-gates/SKILL.md +7 -2
package/assets/slopmachine/test-coverage-prompt.md +561 -0
package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
package/package.json +1 -1
package/src/constants.js +2 -2
package/src/init.js +7 -1
package/src/install.js +94 -21

package/README.md CHANGED Viewed

@@ -40,7 +40,7 @@ From this package directory:
 npm install
 npm run check
 npm pack
-npm install -g ./theslopmachine-0.6.2.tgz
+npm install -g ./theslopmachine-0.7.2.tgz
 ```
 For local development instead:

package/RELEASE.md CHANGED Viewed

@@ -14,7 +14,7 @@ node ./bin/slopmachine.js --help
 SLOPMACHINE_HOME="$(pwd)/.tmp-home" SLOPMACHINE_NONINTERACTIVE=1 SLOPMACHINE_PLUGIN_BOOTSTRAP=0 node ./bin/slopmachine.js setup
 ```
-That setup path should install `opencode-ai@latest` when OpenCode is missing and refresh it to `@latest` when it already exists.
+That setup path should install `opencode-ai` when OpenCode is missing and only refresh it when the detected version is below the minimum supported version.
 Users can later refresh to the newest published package with:
@@ -105,7 +105,7 @@ And specifically verify that the tarball includes the current workflow assets:
 - `assets/slopmachine/utils/claude_live_turn.mjs`
 - `assets/slopmachine/utils/claude_live_status.mjs`
 - `assets/slopmachine/utils/claude_live_stop.mjs`
-- `test-coverage-prompt.md`
+- `assets/slopmachine/test-coverage-prompt.md`
 ## Publish

package/assets/agents/developer.md CHANGED Viewed

@@ -57,23 +57,23 @@ Do not introduce convenience-based simplifications, `v1` reductions, future-phas
 - the original prompt explicitly allows it
 - the approved clarification explicitly allows it
-- the owner explicitly instructs it in the current session
+- the project lead explicitly instructs it in the current session
 If a simplification would make implementation easier but is not explicitly authorized, keep the full prompt scope and plan the real complexity instead.
 When accepted planning artifacts already exist, treat them as the primary execution contract.
 - read the relevant accepted plan section before implementing the next slice
-- do not wait for the owner to restate what is already in the plan
-- treat owner follow-up prompts mainly as narrow deltas, guardrails, or correction signals
+- do not wait for the project lead to restate what is already in the plan
+- treat project-lead follow-up prompts mainly as narrow deltas, guardrails, or correction signals
-When the owner asks for planning without coding yet:
+When the project lead asks for planning without coding yet:
 - produce an exhaustive, section-addressable implementation plan rather than a high-level summary
 - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
 - make unresolved items rare, narrow, and explicit
-- if the owner asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
-- when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
+- if the project lead asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
+- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
 ## Execution Model
@@ -91,7 +91,7 @@ When the owner asks for planning without coding yet:
 - keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
 - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
 - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
-- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the owner will catch inconsistencies later
+- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the project lead will catch inconsistencies later
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -170,15 +170,15 @@ Before reporting work as ready, run this preflight yourself:
 - consistency: do code, docs, route contracts, security notes, and runtime/test commands agree?
 - flow completeness: are the user-facing and operator-facing flows touched by this work actually covered end to end?
 - security and permissions: are auth, RBAC, object-level checks, sensitive actions, and audit implications handled where relevant?
-- verification: did you run the strongest targeted checks that are appropriate without using owner-only broad gates?
-- reviewability: can the owner review this work by reading the changed files and a small number of directly related files?
-- test-coverage specificity: if the owner asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
+- verification: did you run the strongest targeted checks that are appropriate without using lead-only broad gates?
+- reviewability: can the project lead review this work by reading the changed files and a small number of directly related files?
+- test-coverage specificity: if the project lead asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
 If any answer is no, fix it before replying or call out the blocker explicitly.
 When you make an assumption, keep it prompt-preserving by default. If an assumption would reduce scope, mark it as unresolved instead of silently locking it in.
-If the owner asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
+If the project lead asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
 - one explicit row or subsection per requirement/risk cluster
 - planned test file or test layer named concretely
@@ -207,9 +207,9 @@ Default reply shape for ordinary slice completion, hardening, and fix responses:
 3. exact verification commands and results
 4. real unresolved issues only
-Keep the reply compact. Point to the exact changed files and the narrow supporting files the owner should read next.
+Keep the reply compact. Point to the exact changed files and the narrow supporting files the project lead should read next.
-Use the larger reply shape only when the owner explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
+Use the larger reply shape only when the project lead explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
 1. `Changed files` — exact files changed
 2. `What changed` — the concrete behavior/contract updates in those files

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -207,14 +207,15 @@ Maintain exactly one active developer session at a time.
 - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
 - from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
 - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
+- launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` for normal work, escalate to `opus` only when the planning/debugging/security difficulty genuinely justifies it, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
 - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
 - when `P7` begins, do not automatically switch away from `develop-N`
 - each fresh evaluation result decides the remediation lane:
-  - `fail` -> route the issue list back to the latest `develop-N` Claude session
-  - `partial pass` -> start the next `bugfix-N` Claude session tied to that audit report and keep its fix loop scoped to that audit's issue list
-  - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
+  - `fail` -> route the issue list back to the latest `develop-N` Claude session and discard the working audit report file after triage
+  - `partial pass` -> start the next `bugfix-N` Claude session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
+  - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
 - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
-- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
+- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
 - track the active evaluator session separately in metadata during `P7`
 - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
@@ -236,7 +237,7 @@ When the first develop developer session begins in `P2`, start it in this exact
 2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
 3. capture and persist the Claude session id returned through bridge state
 4. form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-5. send a compact second owner message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
+5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
 6. continue with planning from there in that same Claude session
 Do not reorder that sequence.
@@ -346,6 +347,7 @@ When talking to the Claude developer worker:
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
 - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
 - use the canonical prompt-shape discipline from `claude-worker-management`: every substantive turn should make the current boundary, expected outcomes, required evidence, disallowed shortcuts, and stop boundary unmistakable
 - default to one bounded engineering objective per Claude turn; split cross-boundary work into separate turns instead of hoping Claude infers the boundary correctly
 - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -199,11 +199,11 @@ Maintain exactly one active developer session at a time.
 - do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
 - when `P7` begins, do not automatically switch away from `develop-N`
 - each fresh evaluation result decides the remediation lane:
-  - `fail` -> route the issue list back to the latest `develop-N` session
-  - `partial pass` -> start the next `bugfix-N` session tied to that audit report and keep its fix loop scoped to that audit's issue list
-  - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
+  - `fail` -> route the issue list back to the latest `develop-N` session and discard the working audit report file after triage
+  - `partial pass` -> start the next `bugfix-N` session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
+  - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
 - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
-- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
+- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
 - track the active evaluator session separately in metadata during `P7`
 ## Parallelism Policy
@@ -222,7 +222,7 @@ When the first develop developer session begins in `P2`, use this planning hands
 1. send the original prompt and tell the developer to read it carefully, not plan yet, and wait for clarifications and planning direction
 2. wait for the developer's first reply
 3. before the second message, form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second owner message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
+4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second planning-direction message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
 5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
 6. continue with planning from there
@@ -338,6 +338,7 @@ When talking to the developer:
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
 - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
 - do not re-dump the entire plan, but do enumerate the exact subset of plan-backed outcomes that must now be delivered
 - when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
 - when 2 or 3 independent items can move at once, explicitly authorize parallel execution and name the separate branch contracts instead of serializing them into one vague request

package/assets/claude/agents/developer.md CHANGED Viewed

@@ -50,14 +50,14 @@ Do not narrow scope for convenience.
 - if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
 - update `README.md` when behavior or run/test instructions change
 - do not touch workflow or rulebook files such as `CLAUDE.md` unless explicitly asked
-- when the owner says to plan without coding yet, produce planning artifacts and stop
+- when the project lead says to plan without coding yet, produce planning artifacts and stop
 - when planning, produce an exhaustive, section-addressable implementation plan rather than a high-level summary
 - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
 - make unresolved items rare, narrow, and explicit
-- when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
-- planning-only deliverables inside the repo should be limited to `README.md` unless the owner explicitly asks for another in-repo artifact
-- when the owner says to finish the scaffold and not start feature implementation yet, stop before starting development work
-- do not continue into extra follow-on work that the owner did not ask for
+- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
+- planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
+- when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
+- do not continue into extra follow-on work that the project lead did not ask for
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -121,7 +121,7 @@ Selected-stack defaults:
 - be direct and technically clear
 - report what changed, what was verified, and what still looks weak
 - if a problem needs a real fix, fix it instead of explaining around it
-- when the owner asks for a bounded deliverable, end with a concise summary of what was completed and what remains
+- when the project lead asks for a bounded deliverable, end with a concise summary of what was completed and what remains
 - when you write or update files, end with:
   - `FILES_CHANGED:` followed by the exact repo-local file paths changed
   - `NEXT_STEP:` followed by the next concrete engineering step or remaining blocker when useful

package/assets/skills/clarification-gate/SKILL.md CHANGED Viewed

@@ -133,12 +133,12 @@ Its primary target is requirements ambiguity from the original prompt.
 Prefer questions about missing or unclear product behavior, actor expectations, workflow requirements, business rules, scope boundaries, output expectations, and other prompt-level ambiguities.
-Each entry should answer this structure:
+Each entry should use this exact structure:
-1. what was unclear from the original prompt
-2. how you interpreted it
-3. what decision or solution you chose for it
-4. why that choice is prompt-faithful and reasonable
+1. a numbered clarification heading
+2. `Question:`
+3. `My Understanding:`
+4. `Solution:`
 Keep the file narrow and explicit.
@@ -156,19 +156,10 @@ Do not use `questions.md` for:
 Preferred entry shape:
 ```md
-## Item N: <short ambiguity title>
-### What was unclear
-<the exact ambiguity or missing detail>
-### Interpretation
-<how it was interpreted>
-### Decision
-<the chosen resolution or safe default>
-### Why this is reasonable
-<brief justification tied to prompt faithfulness>
+### 1. <short clarification title>
+- Question: <the exact ambiguity or missing detail>
+- My Understanding: <how it was interpreted and why this needed to be locked>
+- Solution: <the chosen resolution or safe default>
 ```
 If nothing material was unclear, still create `questions.md` and keep it minimal rather than inventing content.

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -20,7 +20,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
 - do not use the OpenCode `developer` subagent for implementation work in the `slopmachine-claude` path
 - do not read Claude transcript files as the normal communication channel
 - communicate with the Claude worker through the packaged live bridge scripts in `~/slopmachine/utils/`
-- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each owner message into that lane
+- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each message into that lane
 - set the Claude live runtime settings default `agent` to `developer` so the lane stays on the intended system prompt even if the session is resumed or inspected through Claude-native controls
 - treat bridge `state.json` as the durable control-plane truth for lane status, routing, and Claude session identity
 - treat bridge `result.json` as the semantic source of truth after each completed turn
@@ -32,9 +32,9 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
 - launch the live lane with `--dangerously-skip-permissions` so the worker does not stall on routine file-edit permission prompts inside the bounded repo
 - when Claude uses internal task fan-out and the environment allows explicit agent selection, prefer the installed `developer` agent for implementation-capable branches so the same engineering standard applies across those branches
 - there is no repo-controlled guarantee that every Claude helper subagent globally reuses the `developer` prompt, so keep critical implementation in the main developer lane or in explicitly developer-scoped helper branches rather than relying on unspecified built-in helper behavior
-- make every owner-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
-- do not send vague owner prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
-- each substantive owner message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
+- make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
+- do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
+- each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
 - default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
 ## Lane launch rule
@@ -53,6 +53,15 @@ Preferred launch pattern:
 node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --runtime-dir <dir>
 ```
+## Model selection rule
+- choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
+- default to `--model sonnet` for ordinary planning, scaffold, development, and routine bugfix work
+- escalate to `--model opus` only for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
+- keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
+- when the task difficulty warrants it, also pass an explicit `--effort <level>` at launch time rather than hoping the default thinking level is ideal
+- keep the chosen `model`, `effort`, and `subagent_model` recorded in bridge state so later recovery and review can see what launched the lane
 The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
 When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
@@ -70,17 +79,18 @@ The default pattern is to let the live lane start normally and then persist the
 For all later turns in the same bounded developer slot:
 ```bash
-node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --prompt-file <file> --timeout-ms <turn-timeout>
+printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
 ```
-- inject exactly one owner message at a time into the idle live lane
+- inject exactly one message at a time into the idle live lane
+- pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
 - wait for `Stop` or `StopFailure` before sending the next message
 - do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
 - if turn execution fails, stop and recover explicitly instead of silently creating a new worker
 ## Turn-preflight checklist
-Before sending any owner message into the live lane:
+Before sending any message into the live lane:
 1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
 2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
@@ -89,12 +99,12 @@ Before sending any owner message into the live lane:
 5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
 If the stop boundary is fuzzy, the turn is too broad.
-If the owner prompt would span multiple major boundaries, split it.
+If the message would span multiple major boundaries, split it.
 Do not send the next turn until the prior turn has been reviewed and either accepted, corrected, or explicitly rerouted.
-## Canonical owner-message contract
+## Canonical lead-message contract
-For substantive live-lane turns, write the owner message in natural engineering language but make sure it includes all of these ingredients:
+For substantive live-lane turns, write the message in natural engineering language but make sure it includes all of these ingredients:
 - `Context snapshot`: the current accepted state and only the fresh deltas that matter now
 - `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
@@ -112,16 +122,18 @@ When the turn intentionally uses internal parallel fan-out, also include:
 - `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
 Keep the wording natural. Do not turn every prompt into a rigid template dump.
+The actual message should read like it came from a human project manager or technical lead who is invested in the project, not from workflow software.
+Do not use obvious automation phrasing such as `owner`, `workflow`, `phase`, `session slot`, `contract anchor`, or `reply contract` in the message sent to Claude unless the user explicitly wants that style.
 But do make the contract mechanically obvious enough that Claude cannot plausibly misunderstand what acceptance depends on.
 ## Canonical prompt shapes
 ### Planning-start shape
-For the second owner message in the first `develop` lane and for other explicit planning-entry turns:
+For the second planning-direction message in the first `develop` lane and for other explicit planning-entry turns:
 - inline the approved clarification content and requirements-ambiguity resolutions directly in the message
-- include the owner's initial planning view so Claude refines a direction instead of inventing one from zero
+- include the initial planning view so Claude refines a direction instead of inventing one from zero
 - restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
 - say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
 - require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
@@ -154,7 +166,7 @@ For ordinary implementation turns:
 - name the exact slice, user/admin actor path, modules, or surfaces to complete now
 - itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
 - require targeted local verification tied back to those expected outcomes
-- explicitly prohibit owner-only broad verification commands and unrelated follow-on work
+- explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
 - when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
 - say to stop after this slice and report the exact changed files plus exact verification results
@@ -189,7 +201,7 @@ For evaluator-driven remediation inside a `bugfix-N` session opened by a `partia
 Do not do these:
-- send `continue`, `next`, or `keep going` as a substantive owner prompt
+- send `continue`, `next`, or `keep going` as a substantive prompt
 - ask for planning and implementation in the same turn unless that mixed boundary is intentional and explicitly stated
 - ask for multiple gate exits in one turn
 - let Claude decide its own stopping point implicitly
@@ -252,17 +264,17 @@ When the first `develop` slot begins in planning:
 1. launch the live `develop` lane if it is not already running
 2. send the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction through the bridge
 3. store the Claude session id from bridge `state.json`
-4. form an initial owner planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-5. send a compact second owner message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial owner planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
+4. form an initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
+5. send a compact second message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
 6. continue the planning conversation in that same Claude session
 Do not merge those two first messages.
 Do not ask for a plan in the first message.
-Preferred second owner message shape:
+Preferred second planning-direction message shape:
-- inline the approved clarification content and the requirements-ambiguity resolutions directly in the owner message
-- include the owner's initial planning view so planning is refined collaboratively rather than invented from zero
+- inline the approved clarification content and the requirements-ambiguity resolutions directly in the message
+- include the initial planning view so planning is refined collaboratively rather than invented from zero
 - add any short delta notes that are not already captured in that inlined summary
 - express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
 - require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
@@ -270,7 +282,7 @@ Preferred second owner message shape:
 - say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
 Do not tell the developer worker to read files outside `repo/`.
-If owner-side artifacts outside `repo/` matter, restate their content directly in the owner message instead of passing file paths.
+If project-lead artifacts outside `repo/` matter, restate their content directly in the message instead of passing file paths.
 Do not mention session names, slot labels, or workflow phase labels to the developer worker.
 ### `bugfix-N` orientation handshake
@@ -278,7 +290,7 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
 When a fresh `partial pass` evaluation result opens the next remediation lane:
 1. launch a fresh live Claude developer lane for the next `bugfix-N` label
-2. use the first owner message only to orient that session to the repo and the current delivered state
+2. use the first message only to orient that session to the repo and the current delivered state
 3. make clear in plain engineering language that follow-up work will be focused remediation against evaluator findings
 4. wait for the first response and store the Claude session id from bridge `state.json`
 5. only after that orientation exchange, continue the same `bugfix-N` live lane with the first evaluator-driven issue list
@@ -390,7 +402,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
 - if the bridge reports `blocked` because of `claude_usage_limit`, treat that as an automatic wait-and-resume path rather than a handoff-stop condition unless the wait or resume path itself fails
 - if the saved live lane cannot continue, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
 - if a replacement session is required, record the handoff clearly in metadata and tracker comments
-- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal owner prompts unless debugging is explicitly needed
+- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal developer-facing prompts unless debugging is explicitly needed
 ## Rate-limit handling

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -140,7 +140,7 @@ Each `evaluation_runs[]` record should include enough to recover deterministic `
 - `audit_number`
 - `session_id`
 - `verdict`
-- `audit_report_path`
+- `audit_report_path` when the report was kept, otherwise `null`
 - `route_target`
 - `routed_developer_session_id`
 - `routed_developer_label`
@@ -172,6 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
 - keep exactly one active developer session at a time
 - record every developer session in `developer_sessions`
 - from `P2` through `P6`, default to one long-lived `develop-1` lane
+- default the launch model for that long-lived lane to `sonnet`; choose `opus` only when the current lane's work is genuinely high-difficulty enough to justify a more expensive launch
 - if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
 - keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
 - keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -65,6 +65,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
 - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
 - before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
+- before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
 - verify the module against its planned behavior before trying to move on
 - do not move on while the module is still obviously weak or half-finished
 - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
@@ -80,8 +81,10 @@ Use this skill during `P4 Development` before prompting the developer.
 - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
 - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
 - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
+- local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
 - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
 - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
+- for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
 - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
 - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
 - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -26,7 +26,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 - treat the audit as a remediation trigger that routes back to develop
 - extract and hand off all issues to the latest `develop-N` developer session
 - fix them
-- keep the audit report at its normalized `../.tmp/audit_report-<N>.md` path
+- do not keep the fail audit report in `../.tmp/` after triage; discard it once the issue bundle is extracted and recorded in metadata
 - do not open `bugfix-N` for this audit
 - run a fresh new evaluator session for the next audit
@@ -39,6 +39,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 ### `pass`
 - record the audit as a discarded clean audit and do not hand off an issue list
+- discard the pass audit report file instead of keeping it in `../.tmp/`
 - do not treat it as `P7` completion
 - immediately rerun a fresh evaluation until a `partial pass` opens the next scoped bugfix session
@@ -69,8 +70,9 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 ## Exit standard
-- after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session until the report is clean
+- after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session
 - keep the coverage/README report path fixed at `../.tmp/test_coverage_and_readme_audit_report.md` and replace the prior copy on each rerun instead of numbering it
-- do not move to `P8` until 2 bugfix sessions have been completed and the coverage/README audit report is clean
-- keep every fresh audit report under `../.tmp/audit_report-<N>.md`
+- allow at most 3 remediation attempts for the coverage/README audit; after the third attempt, keep the latest report as the final carried-forward evidence
+- do not move to `P8` until 2 bugfix sessions have been completed and the final coverage/README report exists from that last `P7` subphase
+- keep only partial-pass audit reports under `../.tmp/audit_report-<N>.md`
 - for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -40,10 +40,9 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
 - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
 - do not use the older cycle-directory report-root model
-- number every fresh evaluation audit sequentially across the whole run:
-  - `../.tmp/audit_report-1.md`
-  - `../.tmp/audit_report-2.md`
-  - and so on
+- number every fresh evaluation audit sequentially across the whole run for routing and metadata purposes
+- persist `../.tmp/audit_report-<N>.md` only for `partial pass` audits that actually open bugfix sessions
+- if a fresh audit is `fail` or `pass`, extract what you need from the generated working report, record the verdict and routing in metadata, and then discard the report file instead of leaving it in `../.tmp/`
 - for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
   - `../.tmp/audit_report-<N>-fix_check-1.md`
   - `../.tmp/audit_report-<N>-fix_check-2.md`
@@ -82,8 +81,10 @@ For each fresh audit:
 - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
 - send that fully composed text block directly to one fresh `General` evaluator session
 - require that session to produce a detailed file-backed audit report plus an issue summary
-- assign the next audit number and normalize the report path to `../.tmp/audit_report-<N>.md`
-- record the evaluator session id, prompt kind, audit number, verdict, report path, and routing decision in metadata
+- assign the next audit number
+- if and only if the verdict is `partial pass`, keep the normalized report path as `../.tmp/audit_report-<N>.md`
+- if the verdict is `fail` or `pass`, discard the generated report file after extracting the issue summary or verdict you need
+- record the evaluator session id, prompt kind, audit number, verdict, kept-or-discarded report status, and routing decision in metadata
 ## Fresh-audit branching rule
@@ -91,11 +92,11 @@ After each fresh audit report is produced, branch by verdict:
 ### `fail`
-- record the audit as a `fail` under its `audit_report-<N>.md` path
+- record the audit as a `fail` in metadata, but do not leave an `audit_report-<N>.md` file in `../.tmp/`
 - extract all reported issues and send them to the latest `develop-N` session
 - do not open `bugfix-N` for a `fail` audit
 - fix the issues in that develop session
-- after remediation, start a brand new evaluator session and run the next fresh audit as `audit_report-<N+1>.md`
+- after remediation, start a brand new evaluator session and run the next fresh audit
 ### `partial pass`
@@ -106,7 +107,7 @@ After each fresh audit report is produced, branch by verdict:
 ### `pass`
-- record the audit as a discarded clean audit under its `audit_report-<N>.md` path
+- record the audit as a discarded clean audit in metadata and do not leave an `audit_report-<N>.md` file in `../.tmp/`
 - do not open `bugfix-N`
 - do not count it toward `P7` completion
 - immediately start another fresh evaluator session and continue `P7` until a `partial pass` opens the next bugfix session
@@ -128,7 +129,7 @@ Inside a `partial pass` audit's bugfix loop:
 ## Post-bugfix coverage and README audit
-- after 2 bugfix sessions have been completed, do not leave `P7` yet
+- after 2 bugfix sessions have been completed, do not leave `P7` yet; this audit is the last subphase inside `P7`
 - read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
 - launch a fresh `General` evaluator session for this audit
 - prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
@@ -137,8 +138,10 @@ Inside a `partial pass` audit's bugfix loop:
 - if the report finds any issue, treat that as blocking `P7` completion
 - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
 - require fixes plus concrete verification evidence from that developer session
+- after the fixes land, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands before the next static coverage/README rerun and treat failures as unresolved issues
 - after the fixes land, run a fresh new coverage/README audit again and replace the old report
-- keep looping until `../.tmp/test_coverage_and_readme_audit_report.md` is clean and the report confirms the minimum 90 percent coverage threshold is satisfied
+- allow at most 3 remediation attempts for this final coverage/README audit
+- if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward
 ## Scope rule
@@ -149,10 +152,10 @@ Inside a `partial pass` audit's bugfix loop:
 ## Exit target
-- `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit report is clean
+- `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit has run as the last subphase of `P7`
 - the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
 - fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
-- after the second bugfix session completes, run the coverage/README audit; move to `P8 Final Human Decision` only after that audit passes cleanly
+- after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Human Decision` with a clean report, otherwise move to `P8 Final Human Decision` with the latest final report after the third attempt
 ## Boundaries

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -51,6 +51,7 @@ Hardening should treat these as the main review buckets before final evaluation
 - audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
 - audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
 - audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
+- for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
 - audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
 - verify that missing failure handling is not being hidden behind fake-success behavior
 - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
@@ -58,6 +59,7 @@ Hardening should treat these as the main review buckets before final evaluation
 - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
 - enforce env-file discipline during hardening
 - run documentation verification against the real codebase and runtime behavior, not just document existence
+- if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
 - audit README compliance against the strict post-bugfix README review shape:
   - project type near the top
   - startup instructions
@@ -67,6 +69,7 @@ Hardening should treat these as the main review buckets before final evaluation
   - architecture and workflow clarity
 - for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
 - verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
+- before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
 - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
 - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
 - make sure the system is genuinely reviewable and reproducible