theslopmachine 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/RELEASE.md CHANGED
@@ -14,7 +14,7 @@ node ./bin/slopmachine.js --help
14
14
  SLOPMACHINE_HOME="$(pwd)/.tmp-home" SLOPMACHINE_NONINTERACTIVE=1 SLOPMACHINE_PLUGIN_BOOTSTRAP=0 node ./bin/slopmachine.js setup
15
15
  ```
16
16
 
17
- That setup path should install `opencode-ai@latest` when OpenCode is missing and refresh it to `@latest` when it already exists.
17
+ That setup path should install `opencode-ai` when OpenCode is missing and only refresh it when the detected version is below the minimum supported version.
18
18
 
19
19
  Users can later refresh to the newest published package with:
20
20
 
@@ -105,7 +105,7 @@ And specifically verify that the tarball includes the current workflow assets:
105
105
  - `assets/slopmachine/utils/claude_live_turn.mjs`
106
106
  - `assets/slopmachine/utils/claude_live_status.mjs`
107
107
  - `assets/slopmachine/utils/claude_live_stop.mjs`
108
- - `test-coverage-prompt.md`
108
+ - `assets/slopmachine/test-coverage-prompt.md`
109
109
 
110
110
  ## Publish
111
111
 
@@ -207,14 +207,15 @@ Maintain exactly one active developer session at a time.
207
207
  - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
208
208
  - from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
209
209
  - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
210
+ - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` for normal work, escalate to `opus` only when the planning/debugging/security difficulty genuinely justifies it, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
210
211
  - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
211
212
  - when `P7` begins, do not automatically switch away from `develop-N`
212
213
  - each fresh evaluation result decides the remediation lane:
213
- - `fail` -> route the issue list back to the latest `develop-N` Claude session
214
- - `partial pass` -> start the next `bugfix-N` Claude session tied to that audit report and keep its fix loop scoped to that audit's issue list
215
- - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
214
+ - `fail` -> route the issue list back to the latest `develop-N` Claude session and discard the working audit report file after triage
215
+ - `partial pass` -> start the next `bugfix-N` Claude session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
216
+ - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
216
217
  - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
217
- - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
218
+ - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
218
219
  - track the active evaluator session separately in metadata during `P7`
219
220
  - if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
220
221
 
@@ -199,11 +199,11 @@ Maintain exactly one active developer session at a time.
199
199
  - do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
200
200
  - when `P7` begins, do not automatically switch away from `develop-N`
201
201
  - each fresh evaluation result decides the remediation lane:
202
- - `fail` -> route the issue list back to the latest `develop-N` session
203
- - `partial pass` -> start the next `bugfix-N` session tied to that audit report and keep its fix loop scoped to that audit's issue list
204
- - `pass` -> discard it as a non-counting clean audit and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
202
+ - `fail` -> route the issue list back to the latest `develop-N` session and discard the working audit report file after triage
203
+ - `partial pass` -> start the next `bugfix-N` session tied to that kept audit report and keep its fix loop scoped to that audit's issue list
204
+ - `pass` -> discard it as a non-counting clean audit, discard the working audit report file, and immediately rerun a fresh evaluation until a `partial pass` opens the next bugfix session
205
205
  - require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
206
- - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun until clean before leaving `P7`
206
+ - after the second bugfix session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in a fresh `General` audit session, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and if it finds any issue route the fixes back to the currently active recoverable developer session, replace the report, and rerun up to 3 times before carrying the latest report forward
207
207
  - track the active evaluator session separately in metadata during `P7`
208
208
 
209
209
  ## Parallelism Policy
@@ -53,6 +53,15 @@ Preferred launch pattern:
53
53
  node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --runtime-dir <dir>
54
54
  ```
55
55
 
56
+ ## Model selection rule
57
+
58
+ - choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
59
+ - default to `--model sonnet` for ordinary planning, scaffold, development, and routine bugfix work
60
+ - escalate to `--model opus` only for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
61
+ - keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
62
+ - when the task difficulty warrants it, also pass an explicit `--effort <level>` at launch time rather than hoping the default thinking level is ideal
63
+ - keep the chosen `model`, `effort`, and `subagent_model` recorded in bridge state so later recovery and review can see what launched the lane
64
+
56
65
  The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
57
66
 
58
67
  When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
@@ -70,10 +79,11 @@ The default pattern is to let the live lane start normally and then persist the
70
79
  For all later turns in the same bounded developer slot:
71
80
 
72
81
  ```bash
73
- node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --prompt-file <file> --timeout-ms <turn-timeout>
82
+ printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
74
83
  ```
75
84
 
76
85
  - inject exactly one owner message at a time into the idle live lane
86
+ - pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
77
87
  - wait for `Stop` or `StopFailure` before sending the next message
78
88
  - do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
79
89
  - if turn execution fails, stop and recover explicitly instead of silently creating a new worker
@@ -140,7 +140,7 @@ Each `evaluation_runs[]` record should include enough to recover deterministic `
140
140
  - `audit_number`
141
141
  - `session_id`
142
142
  - `verdict`
143
- - `audit_report_path`
143
+ - `audit_report_path` when the report was kept, otherwise `null`
144
144
  - `route_target`
145
145
  - `routed_developer_session_id`
146
146
  - `routed_developer_label`
@@ -172,6 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
172
172
  - keep exactly one active developer session at a time
173
173
  - record every developer session in `developer_sessions`
174
174
  - from `P2` through `P6`, default to one long-lived `develop-1` lane
175
+ - default the launch model for that long-lived lane to `sonnet`; choose `opus` only when the current lane's work is genuinely high-difficulty enough to justify a more expensive launch
175
176
  - if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
176
177
  - keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
177
178
  - keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically
@@ -26,7 +26,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
26
26
  - treat the audit as a remediation trigger that routes back to develop
27
27
  - extract and hand off all issues to the latest `develop-N` developer session
28
28
  - fix them
29
- - keep the audit report at its normalized `../.tmp/audit_report-<N>.md` path
29
+ - do not keep the fail audit report in `../.tmp/` after triage; discard it once the issue bundle is extracted and recorded in metadata
30
30
  - do not open `bugfix-N` for this audit
31
31
  - run a fresh new evaluator session for the next audit
32
32
 
@@ -39,6 +39,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
39
39
  ### `pass`
40
40
 
41
41
  - record the audit as a discarded clean audit and do not hand off an issue list
42
+ - discard the pass audit report file instead of keeping it in `../.tmp/`
42
43
  - do not treat it as `P7` completion
43
44
  - immediately rerun a fresh evaluation until a `partial pass` opens the next scoped bugfix session
44
45
 
@@ -69,8 +70,9 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
69
70
 
70
71
  ## Exit standard
71
72
 
72
- - after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session until the report is clean
73
+ - after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session
73
74
  - keep the coverage/README report path fixed at `../.tmp/test_coverage_and_readme_audit_report.md` and replace the prior copy on each rerun instead of numbering it
74
- - do not move to `P8` until 2 bugfix sessions have been completed and the coverage/README audit report is clean
75
- - keep every fresh audit report under `../.tmp/audit_report-<N>.md`
75
+ - allow at most 3 remediation attempts for the coverage/README audit; after the third attempt, keep the latest report as the final carried-forward evidence
76
+ - do not move to `P8` until 2 bugfix sessions have been completed and the final coverage/README report exists from that last `P7` subphase
77
+ - keep only partial-pass audit reports under `../.tmp/audit_report-<N>.md`
76
78
  - for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`
@@ -40,10 +40,9 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
40
40
 
41
41
  - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
42
42
  - do not use the older cycle-directory report-root model
43
- - number every fresh evaluation audit sequentially across the whole run:
44
- - `../.tmp/audit_report-1.md`
45
- - `../.tmp/audit_report-2.md`
46
- - and so on
43
+ - number every fresh evaluation audit sequentially across the whole run for routing and metadata purposes
44
+ - persist `../.tmp/audit_report-<N>.md` only for `partial pass` audits that actually open bugfix sessions
45
+ - if a fresh audit is `fail` or `pass`, extract what you need from the generated working report, record the verdict and routing in metadata, and then discard the report file instead of leaving it in `../.tmp/`
47
46
  - for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
48
47
  - `../.tmp/audit_report-<N>-fix_check-1.md`
49
48
  - `../.tmp/audit_report-<N>-fix_check-2.md`
@@ -82,8 +81,10 @@ For each fresh audit:
82
81
  - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
83
82
  - send that fully composed text block directly to one fresh `General` evaluator session
84
83
  - require that session to produce a detailed file-backed audit report plus an issue summary
85
- - assign the next audit number and normalize the report path to `../.tmp/audit_report-<N>.md`
86
- - record the evaluator session id, prompt kind, audit number, verdict, report path, and routing decision in metadata
84
+ - assign the next audit number
85
+ - if and only if the verdict is `partial pass`, keep the normalized report path as `../.tmp/audit_report-<N>.md`
86
+ - if the verdict is `fail` or `pass`, discard the generated report file after extracting the issue summary or verdict you need
87
+ - record the evaluator session id, prompt kind, audit number, verdict, kept-or-discarded report status, and routing decision in metadata
87
88
 
88
89
  ## Fresh-audit branching rule
89
90
 
@@ -91,11 +92,11 @@ After each fresh audit report is produced, branch by verdict:
91
92
 
92
93
  ### `fail`
93
94
 
94
- - record the audit as a `fail` under its `audit_report-<N>.md` path
95
+ - record the audit as a `fail` in metadata, but do not leave an `audit_report-<N>.md` file in `../.tmp/`
95
96
  - extract all reported issues and send them to the latest `develop-N` session
96
97
  - do not open `bugfix-N` for a `fail` audit
97
98
  - fix the issues in that develop session
98
- - after remediation, start a brand new evaluator session and run the next fresh audit as `audit_report-<N+1>.md`
99
+ - after remediation, start a brand new evaluator session and run the next fresh audit
99
100
 
100
101
  ### `partial pass`
101
102
 
@@ -106,7 +107,7 @@ After each fresh audit report is produced, branch by verdict:
106
107
 
107
108
  ### `pass`
108
109
 
109
- - record the audit as a discarded clean audit under its `audit_report-<N>.md` path
110
+ - record the audit as a discarded clean audit in metadata and do not leave an `audit_report-<N>.md` file in `../.tmp/`
110
111
  - do not open `bugfix-N`
111
112
  - do not count it toward `P7` completion
112
113
  - immediately start another fresh evaluator session and continue `P7` until a `partial pass` opens the next bugfix session
@@ -128,7 +129,7 @@ Inside a `partial pass` audit's bugfix loop:
128
129
 
129
130
  ## Post-bugfix coverage and README audit
130
131
 
131
- - after 2 bugfix sessions have been completed, do not leave `P7` yet
132
+ - after 2 bugfix sessions have been completed, do not leave `P7` yet; this audit is the last subphase inside `P7`
132
133
  - read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
133
134
  - launch a fresh `General` evaluator session for this audit
134
135
  - prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
@@ -138,7 +139,8 @@ Inside a `partial pass` audit's bugfix loop:
138
139
  - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
139
140
  - require fixes plus concrete verification evidence from that developer session
140
141
  - after the fixes land, run a fresh new coverage/README audit again and replace the old report
141
- - keep looping until `../.tmp/test_coverage_and_readme_audit_report.md` is clean and the report confirms the minimum 90 percent coverage threshold is satisfied
142
+ - allow at most 3 remediation attempts for this final coverage/README audit
143
+ - if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward
142
144
 
143
145
  ## Scope rule
144
146
 
@@ -149,10 +151,10 @@ Inside a `partial pass` audit's bugfix loop:
149
151
 
150
152
  ## Exit target
151
153
 
152
- - `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit report is clean
154
+ - `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit has run as the last subphase of `P7`
153
155
  - the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
154
156
  - fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
155
- - after the second bugfix session completes, run the coverage/README audit; move to `P8 Final Human Decision` only after that audit passes cleanly
157
+ - after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Human Decision` with a clean report, otherwise move to `P8 Final Human Decision` with the latest final report after the third attempt
156
158
 
157
159
  ## Boundaries
158
160
 
@@ -36,12 +36,12 @@ The final delivery layout in the parent project root must be:
36
36
  - no `sessions/` directory is required when all tracked developer sessions are Claude-backed
37
37
  - `metadata.json`
38
38
  - `.tmp/`
39
- - `audit_report-<N>.md`
39
+ - `audit_report-<N>.md` only for bugfix-triggering `partial pass` audits
40
40
  - `audit_report-<N>-fix_check-<M>.md` when present
41
41
  - `test_coverage_and_readme_audit_report.md`
42
42
  - `repo/`
43
43
 
44
- In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included, though extra fresh audits or extra fix checks may legitimately increase that count.
44
+ In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included: 2 kept partial-pass audit reports, at least 2 corresponding fix-check reports, and the final coverage/README audit report. Extra fix checks may legitimately increase that count.
45
45
 
46
46
  Inside the delivered `repo/`, the repository must remain self-sufficient:
47
47
 
@@ -90,7 +90,7 @@ For session export:
90
90
 
91
91
  Where `<backend>` comes from the tracked developer session record in metadata.
92
92
  Use `opencode` when no explicit backend field exists or when the backend is not Claude-backed.
93
- For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd` and packages that folder once.
93
+ For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd`, normalizes the copied JSONL session files by flattening channel-originated user turns, and packages that folder once.
94
94
 
95
95
  After those steps:
96
96
 
@@ -125,7 +125,7 @@ After those steps:
125
125
  - when the project has database dependencies, confirm database setup is injected through initialization scripts rather than packaged local database dependency artifacts
126
126
  - confirm the cleanup helper has been run and that no known recursive cleanup targets remain in the delivered repo tree
127
127
  - confirm no environment-dependent dependency directories, editor-state folders, runtime caches, or workflow utility scripts are packaged into the delivered product
128
- - confirm parent-root `../.tmp/` exists and contains the required `audit_report-<N>.md` files
128
+ - confirm parent-root `../.tmp/` exists and contains the required kept `audit_report-<N>.md` files for partial-pass audits only
129
129
  - confirm every bugfix-triggering audit number has its matching `audit_report-<N>-fix_check-<M>.md` files when fix checks were required
130
130
  - confirm parent-root `../.tmp/test_coverage_and_readme_audit_report.md` exists and is the final replaced copy rather than a numbered variant
131
131
  - confirm parent-root `../docs/test-coverage.md` explains the tested flows, mapped tests, and coverage boundaries
@@ -209,9 +209,9 @@ Use evidence such as internal metadata files, structured Beads comments, verific
209
209
  - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
210
210
  - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
211
211
  - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
212
- - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; every fresh evaluation produces `audit_report-<N>.md`, `fail` audits route back to the latest `develop-N` session, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, clean `pass` audits before the required bugfix sessions are discarded and rerun, and `P7` cannot finish until 2 bugfix sessions have been completed plus a clean `test_coverage_and_readme_audit_report.md`
212
+ - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
213
213
  - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
214
- - before leaving `P7`, require a clean parent-root `../.tmp/test_coverage_and_readme_audit_report.md`; if it finds any issue, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit until clean
214
+ - before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
215
215
 
216
216
  ## Acceptance rule
217
217