npm - theslopmachine - Versions diffs - 0.7.0 → 0.7.1 - Mend

theslopmachine 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/RELEASE.md +2 -2
package/assets/agents/slopmachine-claude.md +5 -4
package/assets/agents/slopmachine.md +4 -4
package/assets/skills/claude-worker-management/SKILL.md +11 -1
package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
package/assets/skills/evaluation-triage/SKILL.md +6 -4
package/assets/skills/final-evaluation-orchestration/SKILL.md +15 -13
package/assets/skills/submission-packaging/SKILL.md +4 -4
package/assets/skills/verification-gates/SKILL.md +2 -2
package/assets/slopmachine/test-coverage-prompt.md +561 -0
package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
package/package.json +1 -1
package/src/constants.js +2 -2
package/src/install.js +94 -21

package/RELEASE.md CHANGED Viewed

@@ -14,7 +14,7 @@ node ./bin/slopmachine.js --help
 SLOPMACHINE_HOME="$(pwd)/.tmp-home" SLOPMACHINE_NONINTERACTIVE=1 SLOPMACHINE_PLUGIN_BOOTSTRAP=0 node ./bin/slopmachine.js setup
 ```
-That setup path should install `opencode-ai@latest` when OpenCode is missing and refresh it to `@latest` when it already exists.
+That setup path should install `opencode-ai` when OpenCode is missing and only refresh it when the detected version is below the minimum supported version.
 Users can later refresh to the newest published package with:
@@ -105,7 +105,7 @@ And specifically verify that the tarball includes the current workflow assets:
 - `assets/slopmachine/utils/claude_live_turn.mjs`
 - `assets/slopmachine/utils/claude_live_status.mjs`
 - `assets/slopmachine/utils/claude_live_stop.mjs`
-- `test-coverage-prompt.md`
+- `assets/slopmachine/test-coverage-prompt.md`
 ## Publish

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -207,14 +207,15 @@ Maintain exactly one active developer session at a time.
 - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
 - from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
 - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
+- launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` for normal work, escalate to `opus` only when the planning/debugging/security difficulty genuinely justifies it, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
 - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
 - when `P7` begins, do not automatically switch away from `develop-N`
 - each fresh evaluation result decides the remediation lane:
-  - `fail` -> route the issue list back to the latest `develop-N` Claude session
-  - `partial pass` -> start the next `bugfix-N` Claude session tied to that audit report and keep its fix loop scoped to that audit's issue list
-  - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
+  - `fail` -> route the issue list back to the latest `develop-N` Claude session and discard the working audit report file after triage
+  - `partial pass` -> start the next `bugfix-N` Claude session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
+  - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
 - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
-- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
+- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
 - track the active evaluator session separately in metadata during `P7`
 - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -199,11 +199,11 @@ Maintain exactly one active developer session at a time.
 - do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
 - when `P7` begins, do not automatically switch away from `develop-N`
 - each fresh evaluation result decides the remediation lane:
-  - `fail` -> route the issue list back to the latest `develop-N` session
-  - `partial pass` -> start the next `bugfix-N` session tied to that audit report and keep its fix loop scoped to that audit's issue list
-  - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
+  - `fail` -> route the issue list back to the latest `develop-N` session and discard the working audit report file after triage
+  - `partial pass` -> start the next `bugfix-N` session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
+  - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
 - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
-- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
+- after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
 - track the active evaluator session separately in metadata during `P7`
 ## Parallelism Policy

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -53,6 +53,15 @@ Preferred launch pattern:
 node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --runtime-dir <dir>
 ```
+## Model selection rule
+- choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
+- default to `--model sonnet` for ordinary planning, scaffold, development, and routine bugfix work
+- escalate to `--model opus` only for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
+- keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
+- when the task difficulty warrants it, also pass an explicit `--effort <level>` at launch time rather than hoping the default thinking level is ideal
+- keep the chosen `model`, `effort`, and `subagent_model` recorded in bridge state so later recovery and review can see what launched the lane
 The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
 When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
@@ -70,10 +79,11 @@ The default pattern is to let the live lane start normally and then persist the
 For all later turns in the same bounded developer slot:
 ```bash
-node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --prompt-file <file> --timeout-ms <turn-timeout>
+printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
 ```
 - inject exactly one owner message at a time into the idle live lane
+- pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
 - wait for `Stop` or `StopFailure` before sending the next message
 - do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
 - if turn execution fails, stop and recover explicitly instead of silently creating a new worker

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -140,7 +140,7 @@ Each `evaluation_runs[]` record should include enough to recover deterministic `
 - `audit_number`
 - `session_id`
 - `verdict`
-- `audit_report_path`
+- `audit_report_path` when the report was kept, otherwise `null`
 - `route_target`
 - `routed_developer_session_id`
 - `routed_developer_label`
@@ -172,6 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
 - keep exactly one active developer session at a time
 - record every developer session in `developer_sessions`
 - from `P2` through `P6`, default to one long-lived `develop-1` lane
+- default the launch model for that long-lived lane to `sonnet`; choose `opus` only when the current lane's work is genuinely high-difficulty enough to justify a more expensive launch
 - if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
 - keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
 - keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -26,7 +26,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 - treat the audit as a remediation trigger that routes back to develop
 - extract and hand off all issues to the latest `develop-N` developer session
 - fix them
-- keep the audit report at its normalized `../.tmp/audit_report-<N>.md` path
+- do not keep the fail audit report in `../.tmp/` after triage; discard it once the issue bundle is extracted and recorded in metadata
 - do not open `bugfix-N` for this audit
 - run a fresh new evaluator session for the next audit
@@ -39,6 +39,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 ### `pass`
 - record the audit as a discarded clean audit and do not hand off an issue list
+- discard the pass audit report file instead of keeping it in `../.tmp/`
 - do not treat it as `P7` completion
 - immediately rerun a fresh evaluation until a `partial pass` opens the next scoped bugfix session
@@ -69,8 +70,9 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 ## Exit standard
-- after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session until the report is clean
+- after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session
 - keep the coverage/README report path fixed at `../.tmp/test_coverage_and_readme_audit_report.md` and replace the prior copy on each rerun instead of numbering it
-- do not move to `P8` until 2 bugfix sessions have been completed and the coverage/README audit report is clean
-- keep every fresh audit report under `../.tmp/audit_report-<N>.md`
+- allow at most 3 remediation attempts for the coverage/README audit; after the third attempt, keep the latest report as the final carried-forward evidence
+- do not move to `P8` until 2 bugfix sessions have been completed and the final coverage/README report exists from that last `P7` subphase
+- keep only partial-pass audit reports under `../.tmp/audit_report-<N>.md`
 - for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -40,10 +40,9 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
 - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
 - do not use the older cycle-directory report-root model
-- number every fresh evaluation audit sequentially across the whole run:
-  - `../.tmp/audit_report-1.md`
-  - `../.tmp/audit_report-2.md`
-  - and so on
+- number every fresh evaluation audit sequentially across the whole run for routing and metadata purposes
+- persist `../.tmp/audit_report-<N>.md` only for `partial pass` audits that actually open bugfix sessions
+- if a fresh audit is `fail` or `pass`, extract what you need from the generated working report, record the verdict and routing in metadata, and then discard the report file instead of leaving it in `../.tmp/`
 - for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
   - `../.tmp/audit_report-<N>-fix_check-1.md`
   - `../.tmp/audit_report-<N>-fix_check-2.md`
@@ -82,8 +81,10 @@ For each fresh audit:
 - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
 - send that fully composed text block directly to one fresh `General` evaluator session
 - require that session to produce a detailed file-backed audit report plus an issue summary
-- assign the next audit number and normalize the report path to `../.tmp/audit_report-<N>.md`
-- record the evaluator session id, prompt kind, audit number, verdict, report path, and routing decision in metadata
+- assign the next audit number
+- if and only if the verdict is `partial pass`, keep the normalized report path as `../.tmp/audit_report-<N>.md`
+- if the verdict is `fail` or `pass`, discard the generated report file after extracting the issue summary or verdict you need
+- record the evaluator session id, prompt kind, audit number, verdict, kept-or-discarded report status, and routing decision in metadata
 ## Fresh-audit branching rule
@@ -91,11 +92,11 @@ After each fresh audit report is produced, branch by verdict:
 ### `fail`
-- record the audit as a `fail` under its `audit_report-<N>.md` path
+- record the audit as a `fail` in metadata, but do not leave an `audit_report-<N>.md` file in `../.tmp/`
 - extract all reported issues and send them to the latest `develop-N` session
 - do not open `bugfix-N` for a `fail` audit
 - fix the issues in that develop session
-- after remediation, start a brand new evaluator session and run the next fresh audit as `audit_report-<N+1>.md`
+- after remediation, start a brand new evaluator session and run the next fresh audit
 ### `partial pass`
@@ -106,7 +107,7 @@ After each fresh audit report is produced, branch by verdict:
 ### `pass`
-- record the audit as a discarded clean audit under its `audit_report-<N>.md` path
+- record the audit as a discarded clean audit in metadata and do not leave an `audit_report-<N>.md` file in `../.tmp/`
 - do not open `bugfix-N`
 - do not count it toward `P7` completion
 - immediately start another fresh evaluator session and continue `P7` until a `partial pass` opens the next bugfix session
@@ -128,7 +129,7 @@ Inside a `partial pass` audit's bugfix loop:
 ## Post-bugfix coverage and README audit
-- after 2 bugfix sessions have been completed, do not leave `P7` yet
+- after 2 bugfix sessions have been completed, do not leave `P7` yet; this audit is the last subphase inside `P7`
 - read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
 - launch a fresh `General` evaluator session for this audit
 - prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
@@ -138,7 +139,8 @@ Inside a `partial pass` audit's bugfix loop:
 - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
 - require fixes plus concrete verification evidence from that developer session
 - after the fixes land, run a fresh new coverage/README audit again and replace the old report
-- keep looping until `../.tmp/test_coverage_and_readme_audit_report.md` is clean and the report confirms the minimum 90 percent coverage threshold is satisfied
+- allow at most 3 remediation attempts for this final coverage/README audit
+- if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward
 ## Scope rule
@@ -149,10 +151,10 @@ Inside a `partial pass` audit's bugfix loop:
 ## Exit target
-- `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit report is clean
+- `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit has run as the last subphase of `P7`
 - the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
 - fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
-- after the second bugfix session completes, run the coverage/README audit; move to `P8 Final Human Decision` only after that audit passes cleanly
+- after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Human Decision` with a clean report, otherwise move to `P8 Final Human Decision` with the latest final report after the third attempt
 ## Boundaries

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -36,12 +36,12 @@ The final delivery layout in the parent project root must be:
   - no `sessions/` directory is required when all tracked developer sessions are Claude-backed
 - `metadata.json`
 - `.tmp/`
-  - `audit_report-<N>.md`
+  - `audit_report-<N>.md` only for bugfix-triggering `partial pass` audits
   - `audit_report-<N>-fix_check-<M>.md` when present
   - `test_coverage_and_readme_audit_report.md`
 - `repo/`
-In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included, though extra fresh audits or extra fix checks may legitimately increase that count.
+In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included: 2 kept partial-pass audit reports, at least 2 corresponding fix-check reports, and the final coverage/README audit report. Extra fix checks may legitimately increase that count.
 Inside the delivered `repo/`, the repository must remain self-sufficient:
@@ -90,7 +90,7 @@ For session export:
 Where `<backend>` comes from the tracked developer session record in metadata.
 Use `opencode` when no explicit backend field exists or when the backend is not Claude-backed.
-For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd` and packages that folder once.
+For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd`, normalizes the copied JSONL session files by flattening channel-originated user turns, and packages that folder once.
 After those steps:
@@ -125,7 +125,7 @@ After those steps:
 - when the project has database dependencies, confirm database setup is injected through initialization scripts rather than packaged local database dependency artifacts
 - confirm the cleanup helper has been run and that no known recursive cleanup targets remain in the delivered repo tree
 - confirm no environment-dependent dependency directories, editor-state folders, runtime caches, or workflow utility scripts are packaged into the delivered product
-- confirm parent-root `../.tmp/` exists and contains the required `audit_report-<N>.md` files
+- confirm parent-root `../.tmp/` exists and contains the required kept `audit_report-<N>.md` files for partial-pass audits only
 - confirm every bugfix-triggering audit number has its matching `audit_report-<N>-fix_check-<M>.md` files when fix checks were required
 - confirm parent-root `../.tmp/test_coverage_and_readme_audit_report.md` exists and is the final replaced copy rather than a numbered variant
 - confirm parent-root `../docs/test-coverage.md` explains the tested flows, mapped tests, and coverage boundaries

package/assets/skills/verification-gates/SKILL.md CHANGED Viewed

@@ -209,9 +209,9 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
 - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
 - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
-- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; every fresh evaluation produces `audit_report-<N>.md`, `fail` audits route back to the latest `develop-N` session, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, clean `pass` audits before the required bugfix sessions are discarded and rerun, and `P7` cannot finish until 2 bugfix sessions have been completed plus a clean `test_coverage_and_readme_audit_report.md`
+- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
 - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
-- before leaving `P7`, require a clean parent-root `../.tmp/test_coverage_and_readme_audit_report.md`; if it finds any issue, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit until clean
+- before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
 ## Acceptance rule