npm - theslopmachine - Versions diffs - 0.7.1 → 0.7.2 - Mend

theslopmachine 0.7.1 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +1 -1
package/assets/agents/developer.md +13 -13
package/assets/agents/slopmachine-claude.md +2 -1
package/assets/agents/slopmachine.md +2 -1
package/assets/claude/agents/developer.md +6 -6
package/assets/skills/clarification-gate/SKILL.md +9 -18
package/assets/skills/claude-worker-management/SKILL.md +23 -21
package/assets/skills/development-guidance/SKILL.md +3 -0
package/assets/skills/final-evaluation-orchestration/SKILL.md +1 -0
package/assets/skills/hardening-gate/SKILL.md +3 -0
package/assets/skills/integrated-verification/SKILL.md +2 -0
package/assets/skills/planning-guidance/SKILL.md +1 -0
package/assets/skills/submission-packaging/SKILL.md +2 -0
package/assets/skills/verification-gates/SKILL.md +5 -0
package/package.json +1 -1
package/src/init.js +7 -1

package/README.md CHANGED Viewed

@@ -40,7 +40,7 @@ From this package directory:
 npm install
 npm run check
 npm pack
-npm install -g ./theslopmachine-0.6.2.tgz
+npm install -g ./theslopmachine-0.7.2.tgz
 ```
 For local development instead:

package/assets/agents/developer.md CHANGED Viewed

@@ -57,23 +57,23 @@ Do not introduce convenience-based simplifications, `v1` reductions, future-phas
 - the original prompt explicitly allows it
 - the approved clarification explicitly allows it
-- the owner explicitly instructs it in the current session
+- the project lead explicitly instructs it in the current session
 If a simplification would make implementation easier but is not explicitly authorized, keep the full prompt scope and plan the real complexity instead.
 When accepted planning artifacts already exist, treat them as the primary execution contract.
 - read the relevant accepted plan section before implementing the next slice
-- do not wait for the owner to restate what is already in the plan
-- treat owner follow-up prompts mainly as narrow deltas, guardrails, or correction signals
+- do not wait for the project lead to restate what is already in the plan
+- treat project-lead follow-up prompts mainly as narrow deltas, guardrails, or correction signals
-When the owner asks for planning without coding yet:
+When the project lead asks for planning without coding yet:
 - produce an exhaustive, section-addressable implementation plan rather than a high-level summary
 - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
 - make unresolved items rare, narrow, and explicit
-- if the owner asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
-- when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
+- if the project lead asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
+- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
 ## Execution Model
@@ -91,7 +91,7 @@ When the owner asks for planning without coding yet:
 - keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
 - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
 - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
-- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the owner will catch inconsistencies later
+- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the project lead will catch inconsistencies later
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -170,15 +170,15 @@ Before reporting work as ready, run this preflight yourself:
 - consistency: do code, docs, route contracts, security notes, and runtime/test commands agree?
 - flow completeness: are the user-facing and operator-facing flows touched by this work actually covered end to end?
 - security and permissions: are auth, RBAC, object-level checks, sensitive actions, and audit implications handled where relevant?
-- verification: did you run the strongest targeted checks that are appropriate without using owner-only broad gates?
-- reviewability: can the owner review this work by reading the changed files and a small number of directly related files?
-- test-coverage specificity: if the owner asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
+- verification: did you run the strongest targeted checks that are appropriate without using lead-only broad gates?
+- reviewability: can the project lead review this work by reading the changed files and a small number of directly related files?
+- test-coverage specificity: if the project lead asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
 If any answer is no, fix it before replying or call out the blocker explicitly.
 When you make an assumption, keep it prompt-preserving by default. If an assumption would reduce scope, mark it as unresolved instead of silently locking it in.
-If the owner asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
+If the project lead asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
 - one explicit row or subsection per requirement/risk cluster
 - planned test file or test layer named concretely
@@ -207,9 +207,9 @@ Default reply shape for ordinary slice completion, hardening, and fix responses:
 3. exact verification commands and results
 4. real unresolved issues only
-Keep the reply compact. Point to the exact changed files and the narrow supporting files the owner should read next.
+Keep the reply compact. Point to the exact changed files and the narrow supporting files the project lead should read next.
-Use the larger reply shape only when the owner explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
+Use the larger reply shape only when the project lead explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
 1. `Changed files` — exact files changed
 2. `What changed` — the concrete behavior/contract updates in those files

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -237,7 +237,7 @@ When the first develop developer session begins in `P2`, start it in this exact
 2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
 3. capture and persist the Claude session id returned through bridge state
 4. form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-5. send a compact second owner message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
+5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
 6. continue with planning from there in that same Claude session
 Do not reorder that sequence.
@@ -347,6 +347,7 @@ When talking to the Claude developer worker:
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
 - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
 - use the canonical prompt-shape discipline from `claude-worker-management`: every substantive turn should make the current boundary, expected outcomes, required evidence, disallowed shortcuts, and stop boundary unmistakable
 - default to one bounded engineering objective per Claude turn; split cross-boundary work into separate turns instead of hoping Claude infers the boundary correctly
 - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -222,7 +222,7 @@ When the first develop developer session begins in `P2`, use this planning hands
 1. send the original prompt and tell the developer to read it carefully, not plan yet, and wait for clarifications and planning direction
 2. wait for the developer's first reply
 3. before the second message, form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second owner message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
+4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second planning-direction message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
 5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
 6. continue with planning from there
@@ -338,6 +338,7 @@ When talking to the developer:
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
 - during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
 - do not re-dump the entire plan, but do enumerate the exact subset of plan-backed outcomes that must now be delivered
 - when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
 - when 2 or 3 independent items can move at once, explicitly authorize parallel execution and name the separate branch contracts instead of serializing them into one vague request

package/assets/claude/agents/developer.md CHANGED Viewed

@@ -50,14 +50,14 @@ Do not narrow scope for convenience.
 - if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
 - update `README.md` when behavior or run/test instructions change
 - do not touch workflow or rulebook files such as `CLAUDE.md` unless explicitly asked
-- when the owner says to plan without coding yet, produce planning artifacts and stop
+- when the project lead says to plan without coding yet, produce planning artifacts and stop
 - when planning, produce an exhaustive, section-addressable implementation plan rather than a high-level summary
 - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
 - make unresolved items rare, narrow, and explicit
-- when the owner asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
-- planning-only deliverables inside the repo should be limited to `README.md` unless the owner explicitly asks for another in-repo artifact
-- when the owner says to finish the scaffold and not start feature implementation yet, stop before starting development work
-- do not continue into extra follow-on work that the owner did not ask for
+- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
+- planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
+- when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
+- do not continue into extra follow-on work that the project lead did not ask for
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
@@ -121,7 +121,7 @@ Selected-stack defaults:
 - be direct and technically clear
 - report what changed, what was verified, and what still looks weak
 - if a problem needs a real fix, fix it instead of explaining around it
-- when the owner asks for a bounded deliverable, end with a concise summary of what was completed and what remains
+- when the project lead asks for a bounded deliverable, end with a concise summary of what was completed and what remains
 - when you write or update files, end with:
   - `FILES_CHANGED:` followed by the exact repo-local file paths changed
   - `NEXT_STEP:` followed by the next concrete engineering step or remaining blocker when useful

package/assets/skills/clarification-gate/SKILL.md CHANGED Viewed

@@ -133,12 +133,12 @@ Its primary target is requirements ambiguity from the original prompt.
 Prefer questions about missing or unclear product behavior, actor expectations, workflow requirements, business rules, scope boundaries, output expectations, and other prompt-level ambiguities.
-Each entry should answer this structure:
+Each entry should use this exact structure:
-1. what was unclear from the original prompt
-2. how you interpreted it
-3. what decision or solution you chose for it
-4. why that choice is prompt-faithful and reasonable
+1. a numbered clarification heading
+2. `Question:`
+3. `My Understanding:`
+4. `Solution:`
 Keep the file narrow and explicit.
@@ -156,19 +156,10 @@ Do not use `questions.md` for:
 Preferred entry shape:
 ```md
-## Item N: <short ambiguity title>
-### What was unclear
-<the exact ambiguity or missing detail>
-### Interpretation
-<how it was interpreted>
-### Decision
-<the chosen resolution or safe default>
-### Why this is reasonable
-<brief justification tied to prompt faithfulness>
+### 1. <short clarification title>
+- Question: <the exact ambiguity or missing detail>
+- My Understanding: <how it was interpreted and why this needed to be locked>
+- Solution: <the chosen resolution or safe default>
 ```
 If nothing material was unclear, still create `questions.md` and keep it minimal rather than inventing content.

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -20,7 +20,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
 - do not use the OpenCode `developer` subagent for implementation work in the `slopmachine-claude` path
 - do not read Claude transcript files as the normal communication channel
 - communicate with the Claude worker through the packaged live bridge scripts in `~/slopmachine/utils/`
-- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each owner message into that lane
+- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each message into that lane
 - set the Claude live runtime settings default `agent` to `developer` so the lane stays on the intended system prompt even if the session is resumed or inspected through Claude-native controls
 - treat bridge `state.json` as the durable control-plane truth for lane status, routing, and Claude session identity
 - treat bridge `result.json` as the semantic source of truth after each completed turn
@@ -32,9 +32,9 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
 - launch the live lane with `--dangerously-skip-permissions` so the worker does not stall on routine file-edit permission prompts inside the bounded repo
 - when Claude uses internal task fan-out and the environment allows explicit agent selection, prefer the installed `developer` agent for implementation-capable branches so the same engineering standard applies across those branches
 - there is no repo-controlled guarantee that every Claude helper subagent globally reuses the `developer` prompt, so keep critical implementation in the main developer lane or in explicitly developer-scoped helper branches rather than relying on unspecified built-in helper behavior
-- make every owner-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
-- do not send vague owner prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
-- each substantive owner message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
+- make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
+- do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
+- each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
 - default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
 ## Lane launch rule
@@ -82,7 +82,7 @@ For all later turns in the same bounded developer slot:
 printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
 ```
-- inject exactly one owner message at a time into the idle live lane
+- inject exactly one message at a time into the idle live lane
 - pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
 - wait for `Stop` or `StopFailure` before sending the next message
 - do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
@@ -90,7 +90,7 @@ printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-
 ## Turn-preflight checklist
-Before sending any owner message into the live lane:
+Before sending any message into the live lane:
 1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
 2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
@@ -99,12 +99,12 @@ Before sending any owner message into the live lane:
 5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
 If the stop boundary is fuzzy, the turn is too broad.
-If the owner prompt would span multiple major boundaries, split it.
+If the message would span multiple major boundaries, split it.
 Do not send the next turn until the prior turn has been reviewed and either accepted, corrected, or explicitly rerouted.
-## Canonical owner-message contract
+## Canonical lead-message contract
-For substantive live-lane turns, write the owner message in natural engineering language but make sure it includes all of these ingredients:
+For substantive live-lane turns, write the message in natural engineering language but make sure it includes all of these ingredients:
 - `Context snapshot`: the current accepted state and only the fresh deltas that matter now
 - `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
@@ -122,16 +122,18 @@ When the turn intentionally uses internal parallel fan-out, also include:
 - `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
 Keep the wording natural. Do not turn every prompt into a rigid template dump.
+The actual message should read like it came from a human project manager or technical lead who is invested in the project, not from workflow software.
+Do not use obvious automation phrasing such as `owner`, `workflow`, `phase`, `session slot`, `contract anchor`, or `reply contract` in the message sent to Claude unless the user explicitly wants that style.
 But do make the contract mechanically obvious enough that Claude cannot plausibly misunderstand what acceptance depends on.
 ## Canonical prompt shapes
 ### Planning-start shape
-For the second owner message in the first `develop` lane and for other explicit planning-entry turns:
+For the second planning-direction message in the first `develop` lane and for other explicit planning-entry turns:
 - inline the approved clarification content and requirements-ambiguity resolutions directly in the message
-- include the owner's initial planning view so Claude refines a direction instead of inventing one from zero
+- include the initial planning view so Claude refines a direction instead of inventing one from zero
 - restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
 - say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
 - require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
@@ -164,7 +166,7 @@ For ordinary implementation turns:
 - name the exact slice, user/admin actor path, modules, or surfaces to complete now
 - itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
 - require targeted local verification tied back to those expected outcomes
-- explicitly prohibit owner-only broad verification commands and unrelated follow-on work
+- explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
 - when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
 - say to stop after this slice and report the exact changed files plus exact verification results
@@ -199,7 +201,7 @@ For evaluator-driven remediation inside a `bugfix-N` session opened by a `partia
 Do not do these:
-- send `continue`, `next`, or `keep going` as a substantive owner prompt
+- send `continue`, `next`, or `keep going` as a substantive prompt
 - ask for planning and implementation in the same turn unless that mixed boundary is intentional and explicitly stated
 - ask for multiple gate exits in one turn
 - let Claude decide its own stopping point implicitly
@@ -262,17 +264,17 @@ When the first `develop` slot begins in planning:
 1. launch the live `develop` lane if it is not already running
 2. send the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction through the bridge
 3. store the Claude session id from bridge `state.json`
-4. form an initial owner planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-5. send a compact second owner message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial owner planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
+4. form an initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
+5. send a compact second message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
 6. continue the planning conversation in that same Claude session
 Do not merge those two first messages.
 Do not ask for a plan in the first message.
-Preferred second owner message shape:
+Preferred second planning-direction message shape:
-- inline the approved clarification content and the requirements-ambiguity resolutions directly in the owner message
-- include the owner's initial planning view so planning is refined collaboratively rather than invented from zero
+- inline the approved clarification content and the requirements-ambiguity resolutions directly in the message
+- include the initial planning view so planning is refined collaboratively rather than invented from zero
 - add any short delta notes that are not already captured in that inlined summary
 - express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
 - require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
@@ -280,7 +282,7 @@ Preferred second owner message shape:
 - say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
 Do not tell the developer worker to read files outside `repo/`.
-If owner-side artifacts outside `repo/` matter, restate their content directly in the owner message instead of passing file paths.
+If project-lead artifacts outside `repo/` matter, restate their content directly in the message instead of passing file paths.
 Do not mention session names, slot labels, or workflow phase labels to the developer worker.
 ### `bugfix-N` orientation handshake
@@ -288,7 +290,7 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
 When a fresh `partial pass` evaluation result opens the next remediation lane:
 1. launch a fresh live Claude developer lane for the next `bugfix-N` label
-2. use the first owner message only to orient that session to the repo and the current delivered state
+2. use the first message only to orient that session to the repo and the current delivered state
 3. make clear in plain engineering language that follow-up work will be focused remediation against evaluator findings
 4. wait for the first response and store the Claude session id from bridge `state.json`
 5. only after that orientation exchange, continue the same `bugfix-N` live lane with the first evaluator-driven issue list
@@ -400,7 +402,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
 - if the bridge reports `blocked` because of `claude_usage_limit`, treat that as an automatic wait-and-resume path rather than a handoff-stop condition unless the wait or resume path itself fails
 - if the saved live lane cannot continue, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
 - if a replacement session is required, record the handoff clearly in metadata and tracker comments
-- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal owner prompts unless debugging is explicitly needed
+- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal developer-facing prompts unless debugging is explicitly needed
 ## Rate-limit handling

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -65,6 +65,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
 - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
 - before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
+- before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
 - verify the module against its planned behavior before trying to move on
 - do not move on while the module is still obviously weak or half-finished
 - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
@@ -80,8 +81,10 @@ Use this skill during `P4 Development` before prompting the developer.
 - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
 - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
 - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
+- local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
 - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
 - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
+- for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
 - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
 - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
 - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -138,6 +138,7 @@ Inside a `partial pass` audit's bugfix loop:
 - if the report finds any issue, treat that as blocking `P7` completion
 - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
 - require fixes plus concrete verification evidence from that developer session
+- after the fixes land, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands before the next static coverage/README rerun and treat failures as unresolved issues
 - after the fixes land, run a fresh new coverage/README audit again and replace the old report
 - allow at most 3 remediation attempts for this final coverage/README audit
 - if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -51,6 +51,7 @@ Hardening should treat these as the main review buckets before final evaluation
 - audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
 - audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
 - audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
+- for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
 - audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
 - verify that missing failure handling is not being hidden behind fake-success behavior
 - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
@@ -58,6 +59,7 @@ Hardening should treat these as the main review buckets before final evaluation
 - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
 - enforce env-file discipline during hardening
 - run documentation verification against the real codebase and runtime behavior, not just document existence
+- if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
 - audit README compliance against the strict post-bugfix README review shape:
   - project type near the top
   - startup instructions
@@ -67,6 +69,7 @@ Hardening should treat these as the main review buckets before final evaluation
   - architecture and workflow clarity
 - for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
 - verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
+- before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
 - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
 - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
 - make sure the system is genuinely reviewable and reproducible

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -33,6 +33,7 @@ Once a failure class is known:
 - for applicable UI-bearing work, this owner-run phase may use the selected stack's platform-appropriate UI/E2E tool for the affected flows, capture screenshots or equivalent artifacts, and verify the UI behavior and quality directly
 - verify requirement closure, not just feature existence
 - verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
+- verify the delivered runtime and broad-test behavior against `README.md`; if the README says a command is how the project should be run or verified, treat that command as part of the real external contract
 - verify end-to-end flow behavior where the change affects real workflows
 - verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
 - for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
@@ -51,6 +52,7 @@ Once a failure class is known:
 - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
 - when integrated verification repeatedly finds the same avoidable failure class, treat that as evidence that earlier slice execution or slice-close acceptance must become more system-aware in future runs
 - before closing the phase, verify the delivered startup path is genuinely runnable, the documented tests really execute, frontend behavior is usable when applicable, UI quality is acceptable, core running logic is complete, and Docker startup works when Docker is the runtime contract
+- before closing the phase, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands here as part of the final integrated proof for the phase
 - tighten parent-root `../docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
 - when security-bearing behavior changes, tighten parent-root `../docs/design.md` and `../docs/api-spec.md` as needed so enforcement points and mapped tests stay accurate
 - when frontend-bearing behavior changes, tighten `README.md` plus parent-root `../docs/design.md` as needed so key pages, interactions, and required UI states stay accurate

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -210,6 +210,7 @@ Selected-stack defaults:
 - for backend or fullstack projects, explicitly plan coverage for 401, 403, 404, conflicts or duplicate submission when relevant, object-level authorization, tenant or user isolation, sensitive-log exposure, and pagination/filter/sort when those behaviors exist
 - for frontend-bearing projects, explicitly plan a layered frontend test story when UI state or routing is material: unit, component, page or route integration, and E2E where applicable
 - for non-trivial frontend projects, explicitly plan a frontend test layer beyond runtime-only confidence: component, page, route, or state-focused tests when UI state complexity is meaningful
+- for `fullstack` and `web` projects, explicitly plan real frontend unit tests and make it possible for later audit output to state `Frontend unit tests: PRESENT` with direct file-level evidence rather than inference
 - for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable, but treat Playwright as a real verified dependency rather than a decorative default
 - for mobile work, plan Jest plus React Native Testing Library as the local default test layer and add a platform-appropriate mobile UI/E2E tool when real device-flow proof is needed
 - for desktop work, plan a local desktop test runner plus Playwright Electron support or another platform-appropriate desktop UI/E2E tool when real window-flow proof is needed

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -64,6 +64,7 @@ No screenshots are required as packaging artifacts.
 - ensure `README.md` matches the delivered codebase, functionality, runtime steps, test steps, main repo contents, and important new-developer information, and stays friendly to a junior developer
 - ensure `README.md` also describes the delivered architecture at an implementation-review level rather than only listing commands
 - ensure `README.md` remains the primary in-repo documentation surface
+- treat `README.md` as the final public output format for runtime and broad test expectations: the packaged repo must comply exactly with the commands and constraints it documents
 - verify no repo-local file depends on parent-root docs or sibling workflow artifacts for startup, build/preview, configuration, static review, or basic project understanding
 - if the project uses mock, stub, fake, interception, or local-data behavior, ensure `README.md` discloses that scope accurately and does not imply undisclosed real integration
 - if mock or interception behavior is enabled by default, ensure `README.md` says so clearly
@@ -141,6 +142,7 @@ After those steps:
 - do one final package review before declaring packaging complete
 - confirm the package is coherent as a delivered project, not just a working repo snapshot
 - confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
+- if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, make sure the final package review uses those exact commands rather than a substitute path
 - confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
 - if packaging reveals a real defect or missing artifact, fix it before closing the phase
 - do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied

package/assets/skills/verification-gates/SKILL.md CHANGED Viewed

@@ -26,6 +26,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
 - do not require the README to carry a full API catalog
 - require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
+- treat the README as the final public contract for runtime and broad-test behavior: if it documents a runtime command or a broad test command, the delivered output must satisfy that exact contract
 - do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
 - require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
 - if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
@@ -188,11 +189,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
 - when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
 - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
+- integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; when `README.md` documents `docker compose up --build` and/or `./run_tests.sh`, those exact commands are expected here as part of the final external-contract proof
 - module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
 - before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
 - before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
 - integrated verification completion requires explicit full-system evidence before the phase can close
 - integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
+- before leaving development, hardening, or packaging, if `README.md` documents a containerized final runtime or broad test command, require those exact commands to be run at the appropriate final gate and verify that the README still matches the real output
 - web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
 - mobile and desktop integrated verification must include the selected stack's platform-appropriate UI/E2E coverage for every major user flow when UI-bearing flows are material
 - for Electron or other Linux-targetable desktop projects, integrated verification should use the Dockerized desktop build/test path plus headless UI/runtime verification artifacts
@@ -207,9 +210,11 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
 - before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
 - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
+- before `P7`, for `fullstack` and `web` projects, require an explicit frontend unit-test verdict backed by direct file-level evidence; if frontend unit tests are missing or insufficient, treat that as a critical gap
 - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
 - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
 - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
+- before leaving `P7`, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered external contract, run those exact commands on the final state and require them to pass before moving to `P8`
 - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
 - before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "theslopmachine",
-  "version": "0.7.1",
+  "version": "0.7.2",
   "description": "SlopMachine installer and project bootstrap CLI",
   "license": "MIT",
   "type": "module",

package/src/init.js CHANGED Viewed

@@ -288,7 +288,13 @@ async function createInitialPhaseArtifacts(targetPath, options) {
     `## Bootstrap Status\n\n` +
     `- Workspace initialized by slopmachine.\n` +
     `${options.adoptExisting ? '- Existing project adoption mode is active.\n' : ''}` +
-    `${options.requestedStartPhase ? `- Requested start phase: ${options.requestedStartPhase}.\n` : ''}`
+    `${options.requestedStartPhase ? `- Requested start phase: ${options.requestedStartPhase}.\n` : ''}` +
+    `\n## Entry Template\n\n` +
+    `Copy this exact structure for each clarification item:\n\n` +
+    `### 1. Clarification Defaults for Planning\n` +
+    `- Question: Can the drafted clarification defaults be used for planning?\n` +
+    `- My Understanding: The prompt was large enough that planning needed explicit confirmation that the clarification package was acceptable. We needed to lock this in rather than carrying uncertainty forward into the planning phase.\n` +
+    `- Solution: Yes. Proceed with the drafted defaults, allowing planning to start from the approved clarification brief instead of an uncertain baseline.\n`
   const prePlanningBriefContent = `# Pre-Planning Brief\n\n` +
     `Capture the planning-critical project shape here before real planning begins.\n\n` +