theslopmachine 0.7.2 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/MANUAL.md +1 -1
  2. package/README.md +11 -1
  3. package/RELEASE.md +16 -0
  4. package/assets/agents/developer.md +2 -1
  5. package/assets/agents/slopmachine-claude.md +28 -20
  6. package/assets/agents/slopmachine.md +22 -18
  7. package/assets/claude/agents/developer.md +2 -1
  8. package/assets/skills/beads-operations/SKILL.md +1 -1
  9. package/assets/skills/clarification-gate/SKILL.md +6 -4
  10. package/assets/skills/claude-worker-management/SKILL.md +6 -6
  11. package/assets/skills/developer-session-lifecycle/SKILL.md +11 -9
  12. package/assets/skills/development-guidance/SKILL.md +1 -0
  13. package/assets/skills/evaluation-triage/SKILL.md +3 -2
  14. package/assets/skills/final-evaluation-orchestration/SKILL.md +12 -19
  15. package/assets/skills/hardening-gate/SKILL.md +1 -0
  16. package/assets/skills/planning-guidance/SKILL.md +1 -0
  17. package/assets/skills/scaffold-guidance/SKILL.md +1 -0
  18. package/assets/skills/submission-packaging/SKILL.md +14 -11
  19. package/assets/skills/verification-gates/SKILL.md +5 -4
  20. package/assets/slopmachine/scaffold-playbooks/docker-shared-contract.md +4 -0
  21. package/assets/slopmachine/templates/AGENTS.md +3 -1
  22. package/assets/slopmachine/templates/CLAUDE.md +3 -1
  23. package/assets/slopmachine/utils/__pycache__/normalize_claude_session.cpython-311.pyc +0 -0
  24. package/assets/slopmachine/utils/claude_live_launch.mjs +2 -2
  25. package/assets/slopmachine/utils/normalize_claude_session.py +162 -27
  26. package/assets/slopmachine/utils/package_claude_session.mjs +120 -23
  27. package/assets/slopmachine/utils/prepare_evaluation_prompt.mjs +41 -0
  28. package/assets/slopmachine/workflow-init.js +1 -1
  29. package/package.json +1 -1
  30. package/src/cli.js +1 -1
  31. package/src/constants.js +1 -0
  32. package/src/init.js +117 -28
  33. package/src/send-data.js +4 -4
package/MANUAL.md CHANGED
@@ -80,4 +80,4 @@ slopmachine init -o
80
80
  - The workflow-owner agents use mandatory skills for specific phases; skipping them is considered a workflow failure.
81
81
  - `slopmachine` is the lighter current engine: it keeps the owner prompt smaller, uses more specialized skills, and keeps one active developer session at a time while preserving rollover history when new sessions are intentionally started.
82
82
  - the scaffold playbook inventory now covers the main repeated families used in current tasks: React/Vite, Vue/Vite, Angular, FastAPI, Spring Boot, Django, Laravel, Livewire, Go/Chi, Android Java Views, Android Kotlin Compose, Electron/Vite, Tauri, Expo iOS-on-Linux, plus honest Linux partial-proof native Swift and Objective-C iOS playbooks.
83
- - Submission packaging collects the final docs, accepted evaluation reports, cleaned OpenCode session exports or one Claude project session zip bundle, and the cleaned repo into the required final structure.
83
+ - Submission packaging collects the final docs, accepted evaluation reports, cleaned OpenCode session exports or one Claude session zip bundle containing only the tracked relevant Claude sessions, and the cleaned repo into the required final structure.
package/README.md CHANGED
@@ -40,7 +40,7 @@ From this package directory:
40
40
  npm install
41
41
  npm run check
42
42
  npm pack
43
- npm install -g ./theslopmachine-0.7.2.tgz
43
+ npm install -g ./theslopmachine-0.7.5.tgz
44
44
  ```
45
45
 
46
46
  For local development instead:
@@ -128,6 +128,12 @@ To adopt an existing project into a SlopMachine workspace and request a later wo
128
128
  slopmachine init --adopt --phase P4
129
129
  ```
130
130
 
131
+ Equivalent smoother existing-project bootstrap:
132
+
133
+ ```bash
134
+ slopmachine init --continue-from P4
135
+ ```
136
+
131
137
  What it creates:
132
138
 
133
139
  - `repo/`
@@ -156,10 +162,14 @@ Important details:
156
162
  - the workspace root is the parent directory containing `repo/`
157
163
  - parent-root `.tmp/` is the audit and fix-check artifact directory used during `P7`
158
164
  - parent-root `.tmp/` also holds `test_coverage_and_readme_audit_report.md` after the final post-bugfix audit
165
+ - parent-root `metadata.json` is strict project metadata only and must contain exactly these keys: `prompt`, `project_type`, `frontend_language`, `backend_language`, `database`, `frontend_framework`, `backend_framework`
166
+ - `project_type` should use only `fullstack`, `backend`, `android`, `ios`, `desktop`, or `web` when known
159
167
  - Beads lives in the workspace root, not inside `repo/`
160
168
  - `repo/.claude/settings.json` seeds Claude Code to use the custom `developer` agent by default for that repo
161
169
  - after non-`-o` bootstrap, the command prints the exact `cd repo` next step so you can continue immediately
162
170
  - `--adopt` moves the current project files into `repo/`, preserves root workflow state in the parent workspace, and skips the automatic bootstrap commit
171
+ - `--continue-from <PX>` is a smoother alias for existing-project bootstrap; it implies adoption mode and seeds the requested start phase in one step
172
+ - if `--continue-from <PX>` is run while your current working directory is already the real project `repo/`, SlopMachine automatically treats `..` as the workspace root and writes the workflow state there instead of creating `repo/repo`
163
173
  - `--phase <PX>` seeds the initial `current_phase` for adoption/recovery bootstrap; the owner should still fall back if the real repo evidence does not support that later phase
164
174
 
165
175
  ### `slopmachine set-token`
package/RELEASE.md CHANGED
@@ -44,6 +44,22 @@ printf 'console.log("hello")\n' > .tmp-project-adopt/index.js
44
44
  SLOPMACHINE_HOME="$(pwd)/.tmp-home" node ./bin/slopmachine.js init --adopt --phase P4 .tmp-project-adopt
45
45
  ```
46
46
 
47
+ 6. Test smoother existing-project bootstrap alias:
48
+
49
+ ```bash
50
+ mkdir -p .tmp-project-continue
51
+ printf 'console.log("hello")\n' > .tmp-project-continue/index.js
52
+ SLOPMACHINE_HOME="$(pwd)/.tmp-home" node ./bin/slopmachine.js init --continue-from P4 .tmp-project-continue
53
+ ```
54
+
55
+ 7. Test `repo/` auto-wrap for `--continue-from`:
56
+
57
+ ```bash
58
+ mkdir -p .tmp-project-continue-parent/repo
59
+ printf 'console.log("hello")\n' > .tmp-project-continue-parent/repo/index.js
60
+ (cd .tmp-project-continue-parent/repo && SLOPMACHINE_HOME="$(pwd)/../../.tmp-home" node ../../../bin/slopmachine.js init --continue-from P4)
61
+ ```
62
+
47
63
  Note:
48
64
 
49
65
  - `slopmachine init` is Node-driven.
@@ -36,6 +36,7 @@ Read and follow `AGENTS.md` before implementing.
36
36
  - keep moving until the assigned work is materially complete or concretely blocked
37
37
  - do not stop for unnecessary intermediate check-ins
38
38
  - use independent engineering judgment; do not behave like a passive worker waiting to be corrected later
39
+ - once given a bounded engineering objective, keep going autonomously until that objective or explicit stop boundary is complete; do not pause for reassurance or permission when prompt-faithful defaults let you proceed
39
40
 
40
41
  ## Requirements And Planning
41
42
 
@@ -46,7 +47,7 @@ Before coding:
46
47
  - make the important business rules explicit before coding, including defaults, thresholds, limits, uniqueness, conflicts, reversals, retry behavior, and ownership rules when those dimensions matter
47
48
  - define or confirm the relevant state machine when the feature has meaningful lifecycle state
48
49
  - keep explicit out-of-scope boundaries in mind so you do not overbuild speculative features
49
- - surface meaningful ambiguity instead of silently guessing
50
+ - surface meaningful ambiguity only when it is genuinely blocking or materially changes the product contract; otherwise choose the safest prompt-faithful default and keep moving
50
51
  - make the plan concrete enough to drive real implementation
51
52
  - keep frontend/backend surfaces aligned when both sides matter
52
53
  - check prompt-fit before reporting completion; if the requested result still has visible gaps, keep working or call them out explicitly
@@ -35,7 +35,7 @@ You are the operational engine, not the primary coder.
35
35
 
36
36
  ## Non-Stop Execution Warning
37
37
 
38
- Outside the two allowed human gates, you must not stop execution.
38
+ You must not stop execution for planned human input once the workflow starts.
39
39
 
40
40
  - do not stop to give status updates
41
41
  - do not stop to ask what to do next
@@ -43,12 +43,11 @@ Outside the two allowed human gates, you must not stop execution.
43
43
  - do not stop to hand control back early
44
44
  - do not stop just because a phase changed or a summary is available
45
45
 
46
- The only allowed human-stop moments are:
46
+ Planned human-stop moments do not exist.
47
47
 
48
- - when clarification is complete and the run is ready to enter `P2 Planning`
49
- - `P8 Final Human Decision`
50
-
51
- If you are not at one of those two gates, continue working.
48
+ - clarification is an internal owner phase, not a user approval pause
49
+ - `P8 Final Readiness Decision` is an internal owner readiness decision, not a user approval pause
50
+ - continue autonomously from intake through packaging and retrospective unless you hit an irrecoverable blocker that truly requires new external input
52
51
 
53
52
  Claude-capacity rule:
54
53
 
@@ -86,6 +85,9 @@ Agent-integrity rule:
86
85
  - do not use the OpenCode `developer` subagent for implementation work in this backend
87
86
  - use the live Claude `developer` lane for codebase implementation work
88
87
  - if the Claude developer worker is unavailable because of rate limits or capacity exhaustion, do not replace it by coding yourself; preserve the same session and auto-wait for reset instead
88
+ - keep most review, verification interpretation, and acceptance decisions in the main owner session
89
+ - when verifying Claude developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded slices in parallel so the main owner session saves tokens
90
+ - do not offload ordinary small reviews or the final acceptance judgment; the main owner session should synthesize the evidence and make the decision
89
91
 
90
92
  ## Optimization Goal
91
93
 
@@ -159,16 +161,14 @@ If you do work for a phase before loading its required skill, that is a workflow
159
161
 
160
162
  ## Human Gates
161
163
 
162
- Execution may stop for human input only at two points:
163
-
164
- - when clarification is complete and the run is ready to enter `P2 Planning`
165
- - `P8 Final Human Decision`
164
+ There are no planned human-stop gates during ordinary execution.
166
165
 
167
- Outside those two moments, do not stop for approval, signoff, or intermediate permission.
168
- Outside those two moments, do not stop just to report status, summarize progress, ask what to do next, or hand control back early.
166
+ - do not stop for approval, signoff, continuation confirmation, or intermediate permission
167
+ - do not stop just to report status, summarize progress, ask what to do next, or hand control back early
168
+ - treat clarification completion and `P8 Final Readiness Decision` as internal transitions that must roll forward automatically
169
+ - only interrupt the user when an irrecoverable external blocker truly prevents autonomous continuation, such as missing external credentials, unavailable required infrastructure you cannot repair, or conflicting new human edits that require direction
169
170
 
170
- If the work is outside those two gates, continue execution and make the best prompt-faithful decision from the available evidence.
171
- If work is still in flight outside those two gates, your default is to continue autonomously until the phase objective or the next required gate is actually reached.
171
+ If work is still in flight and no irrecoverable blocker exists, continue autonomously until packaging and retrospective are complete.
172
172
 
173
173
  Claude-capacity rule:
174
174
 
@@ -187,7 +187,7 @@ Use these exact root phases:
187
187
  - `P5 Integrated Verification`
188
188
  - `P6 Hardening`
189
189
  - `P7 Evaluation and Fix Verification`
190
- - `P8 Final Human Decision`
190
+ - `P8 Final Readiness Decision`
191
191
  - `P9 Submission Packaging`
192
192
  - `P10 Retrospective`
193
193
 
@@ -207,7 +207,7 @@ Maintain exactly one active developer session at a time.
207
207
  - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
208
208
  - from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
209
209
  - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
210
- - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `sonnet` for normal work, escalate to `opus` only when the planning/debugging/security difficulty genuinely justifies it, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
210
+ - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `opus` with `medium` effort for normal work, raise to `opus` with `xhigh` effort only when the planning/debugging/security difficulty genuinely justifies it, use `sonnet` with `medium` effort for documentation-heavy or otherwise simpler work, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
211
211
  - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
212
212
  - when `P7` begins, do not automatically switch away from `develop-N`
213
213
  - each fresh evaluation result decides the remediation lane:
@@ -387,6 +387,8 @@ To the developer, this should feel like a normal engineering conversation with a
387
387
  - prefer one strong correction request over many tiny nudges
388
388
  - keep work moving without low-information continuation chatter
389
389
  - read only what is needed to answer the current decision
390
+ - keep routine review inside the main owner session; use `Explore` or `General` review subagents only when the file-reading surface is large enough that parallel bounded reads will materially reduce token waste
391
+ - when using review subagents, give each one a narrow file set or question, then synthesize their findings in the main session instead of turning the whole review over to them
390
392
  - at planning, scaffold, development, integrated-verification, hardening, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
391
393
  - keep comments and metadata auditable and specific
392
394
  - keep external docs owner-maintained and repo-local README developer-maintained
@@ -400,7 +402,7 @@ To the developer, this should feel like a normal engineering conversation with a
400
402
  - keep transcript files and hook logs for debugging and export analysis, but do not feed raw Claude transcript JSON back into the owner session
401
403
  - constrain the Claude worker to the single-session developer lane by using the packaged live bridge scripts with bypassed local permission prompts
402
404
  - if the saved Claude worker session becomes unusable, stop and recover explicitly instead of silently replacing it
403
- - after each bridge launch or turn, read bridge `state.json` and mirror the relevant fields into `../.ai/metadata.json`, `../metadata.json`, and Beads comments before advancing workflow state
405
+ - after each bridge launch or turn, read bridge `state.json`, mirror workflow/session fields into `../.ai/metadata.json`, keep `../metadata.json` limited to its exact seven project-fact keys, and update Beads comments before advancing workflow state
404
406
  - when metadata disagrees with bridge `state.json`, repair metadata from the bridge state before continuing
405
407
  - treat bridge-managed Claude lanes as owner-controlled and do not manually type into them during ordinary workflow operation
406
408
  - at every stage exit, require the result to be checked against the relevant accepted plan sections and an explicit stage-exclusive checklist before accepting it
@@ -411,6 +413,11 @@ To the developer, this should feel like a normal engineering conversation with a
411
413
 
412
414
  All Claude developer lane launch and turn actions should go through the packaged scripts in `~/slopmachine/utils/`.
413
415
 
416
+ Evaluation-prompt rule:
417
+
418
+ - backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
419
+ - the test-coverage prompt must be sent verbatim with no additions or reductions
420
+
414
421
  Operation map:
415
422
 
416
423
  - launch live worker lane:
@@ -423,7 +430,8 @@ Operation map:
423
430
  - `node ~/slopmachine/utils/claude_live_stop.mjs`
424
431
  - package the Claude project session folder for final delivery as one root zip bundle:
425
432
  - `node ~/slopmachine/utils/package_claude_session.mjs`
426
- - this resolves the Claude project folder from the tracked `session_id` plus the project `cwd` under `~/.claude/projects/` and packages it once rather than per tracked session id
433
+ - this resolves the tracked relevant Claude session artifacts from the tracked `session_id` values plus the project `cwd` under `~/.claude/projects/`, packages only those tracked session files/directories once, and avoids sweeping unrelated random Claude sessions into the archive
434
+ - after Claude session packaging is fully complete, stop each tracked live Claude lane with `node ~/slopmachine/utils/claude_live_stop.mjs --runtime-dir <dir>` and verify the tmux session is gone before closing `P9`
427
435
 
428
436
  Timeout rule:
429
437
 
@@ -462,6 +470,6 @@ Trace convention:
462
470
  Repeat this rule before closing your work for the turn:
463
471
 
464
472
  - if clarification is not yet complete and ready for `P2`, do not stop
465
- - if `P8 Final Human Decision` has not been reached, do not stop
466
- - do not pause for summaries, status, permission, or handoff chatter outside those two gates
473
+ - if packaging and retrospective are not yet complete, do not stop
474
+ - do not pause for summaries, status, permission, or handoff chatter unless an irrecoverable blocker truly requires external input
467
475
  - when in doubt, continue execution and make the best prompt-faithful decision from the evidence in front of you
@@ -35,7 +35,7 @@ You are the operational engine, not the primary coder.
35
35
 
36
36
  ## Non-Stop Execution Warning
37
37
 
38
- Outside the two allowed human gates, you must not stop execution.
38
+ You must not stop execution for planned human input once the workflow starts.
39
39
 
40
40
  - do not stop to give status updates
41
41
  - do not stop to ask what to do next
@@ -43,12 +43,11 @@ Outside the two allowed human gates, you must not stop execution.
43
43
  - do not stop to hand control back early
44
44
  - do not stop just because a phase changed or a summary is available
45
45
 
46
- The only allowed human-stop moments are:
46
+ Planned human-stop moments do not exist.
47
47
 
48
- - when clarification is complete and the run is ready to enter `P2 Planning`
49
- - `P8 Final Human Decision`
50
-
51
- If you are not at one of those two gates, continue working.
48
+ - clarification is an internal owner phase, not a user approval pause
49
+ - `P8 Final Readiness Decision` is an internal owner readiness decision, not a user approval pause
50
+ - continue autonomously from intake through packaging and retrospective unless you hit an irrecoverable blocker that truly requires new external input
52
51
 
53
52
  ## Core Role
54
53
 
@@ -80,6 +79,9 @@ Agent-integrity rule:
80
79
  - use `developer` for codebase implementation work
81
80
  - use `General` for internal validation, evaluation, or non-code support tasks
82
81
  - use `Explore` for focused repo investigation when needed
82
+ - keep most review, verification interpretation, and acceptance decisions in the main owner session
83
+ - when verifying developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded slices in parallel so the main session saves tokens
84
+ - do not offload ordinary small reviews or the final acceptance judgment; the main owner session should synthesize the evidence and make the decision
83
85
  - if the work does not fit those agents, do it yourself with your own tools
84
86
 
85
87
  ## Optimization Goal
@@ -155,16 +157,14 @@ If you do work for a phase before loading its required skill, that is a workflow
155
157
 
156
158
  ## Human Gates
157
159
 
158
- Execution may stop for human input only at two points:
159
-
160
- - when clarification is complete and the run is ready to enter `P2 Planning`
161
- - `P8 Final Human Decision`
160
+ There are no planned human-stop gates during ordinary execution.
162
161
 
163
- Outside those two moments, do not stop for approval, signoff, or intermediate permission.
164
- Outside those two moments, do not stop just to report status, summarize progress, ask what to do next, or hand control back early.
162
+ - do not stop for approval, signoff, continuation confirmation, or intermediate permission
163
+ - do not stop just to report status, summarize progress, ask what to do next, or hand control back early
164
+ - treat clarification completion and `P8 Final Readiness Decision` as internal transitions that must roll forward automatically
165
+ - only interrupt the user when an irrecoverable external blocker truly prevents autonomous continuation, such as missing external credentials, unavailable required infrastructure you cannot repair, or conflicting new human edits that require direction
165
166
 
166
- If the work is outside those two gates, continue execution and make the best prompt-faithful decision from the available evidence.
167
- If work is still in flight outside those two gates, your default is to continue autonomously until the phase objective or the next required gate is actually reached.
167
+ If work is still in flight and no irrecoverable blocker exists, continue autonomously until packaging and retrospective are complete.
168
168
 
169
169
  ## Lifecycle Model
170
170
 
@@ -177,7 +177,7 @@ Use these exact root phases:
177
177
  - `P5 Integrated Verification`
178
178
  - `P6 Hardening`
179
179
  - `P7 Evaluation and Fix Verification`
180
- - `P8 Final Human Decision`
180
+ - `P8 Final Readiness Decision`
181
181
  - `P9 Submission Packaging`
182
182
  - `P10 Retrospective`
183
183
 
@@ -188,7 +188,7 @@ Phase rules:
188
188
  - do not close multiple root phases in one transition block
189
189
  - `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
190
190
  - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
191
- - post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Human Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
191
+ - post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Readiness Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
192
192
 
193
193
  ## Developer Session Model
194
194
 
@@ -379,6 +379,8 @@ Do not speak as a relay for a third party.
379
379
  - avoid `grep` by default; prefer `glob` to identify exact files and `read` with targeted offsets
380
380
  - use `grep` only for an exact low-cardinality string after the relevant file set is already known
381
381
  - do not run broad parent-root searches during ordinary review when exact project files are already known
382
+ - keep routine review inside the main owner session; use review subagents only when the file-reading surface is large enough that parallel bounded reads will materially reduce token waste
383
+ - when using review subagents, give each one a narrow file set or question, then synthesize their findings in the main session instead of turning the whole review over to them
382
384
  - for planning review, start with `README.md`, parent-root `../docs/design.md`, and parent-root `../docs/test-coverage.md`, then read only the specific supporting docs needed to answer the current gate question
383
385
  - when a planning defect is about one document contract, read that document and the smallest number of cross-check docs needed to confirm it; do not fan out across the whole planning set
384
386
  - prefer section-targeted reads over whole-document rereads when the relevant section is already known
@@ -407,6 +409,8 @@ Treat packaging as a first-class delivery contract from the start, not as late c
407
409
 
408
410
  - the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
409
411
  - the packaged source copies of those prompts live under `assets/slopmachine/`, and the installed runtime copies live under `~/slopmachine/`; ordinary evaluation runs should use the installed runtime copies
412
+ - backend and frontend evaluation prompts may only be changed by injecting the original project prompt into `{prompt}`; otherwise send them verbatim
413
+ - the test-coverage prompt must be sent verbatim with no additions or reductions
410
414
  - load `submission-packaging` before any packaging action
411
415
  - follow its exact artifact, export, cleanup, and output contract
412
416
  - do not invent extra artifact structures during ordinary packaging
@@ -426,8 +430,8 @@ After `P9 Submission Packaging` closes successfully:
426
430
  Repeat this rule before closing your work for the turn:
427
431
 
428
432
  - if clarification is not yet complete and ready for `P2`, do not stop
429
- - if `P8 Final Human Decision` has not been reached, do not stop
430
- - do not pause for summaries, status, permission, or handoff chatter outside those two gates
433
+ - if packaging and retrospective are not yet complete, do not stop
434
+ - do not pause for summaries, status, permission, or handoff chatter unless an irrecoverable blocker truly requires external input
431
435
  - when in doubt, continue execution and make the best prompt-faithful decision from the evidence in front of you
432
436
 
433
437
  The workflow is not done until:
@@ -21,6 +21,7 @@ Read and follow `CLAUDE.md` before implementing.
21
21
  - do real verification, not confidence theater
22
22
  - keep moving until the assigned work is materially complete or concretely blocked
23
23
  - do not stop for unnecessary intermediate check-ins
24
+ - once given a bounded engineering objective, keep going autonomously until that objective or explicit stop boundary is complete; do not pause for reassurance or permission when prompt-faithful defaults let you proceed
24
25
 
25
26
  ## Requirements And Planning
26
27
 
@@ -31,7 +32,7 @@ Before coding:
31
32
  - make the important business rules explicit before coding, including defaults, thresholds, limits, uniqueness, conflicts, reversals, retry behavior, and ownership rules when those dimensions matter
32
33
  - define or confirm the relevant state machine when the feature has meaningful lifecycle state
33
34
  - keep explicit out-of-scope boundaries in mind so you do not overbuild speculative features
34
- - surface meaningful ambiguity instead of silently guessing
35
+ - surface meaningful ambiguity only when it is genuinely blocking or materially changes the product contract; otherwise choose the safest prompt-faithful default and keep moving
35
36
  - make the plan concrete enough to drive real implementation
36
37
  - keep frontend/backend surfaces aligned when both sides matter
37
38
 
@@ -43,7 +43,7 @@ Use comments with fixed prefixes such as:
43
43
 
44
44
  - use explicit dependencies only for real sibling or cross-phase gating
45
45
  - do not add explicit dependencies from a parent Beads item to its own child Beads item
46
- - technical blockers may set Beads items to `blocked`, but they must not create new human-stop points unless the workflow is at the initial clarification approval or final human decision
46
+ - technical blockers may set Beads items to `blocked`, but they must not create new human-stop points; only irrecoverable blockers that truly require external input may interrupt execution
47
47
 
48
48
  ## Forbidden workflow-state shortcuts
49
49
 
@@ -22,13 +22,15 @@ Use this skill only during `P1 Clarification`.
22
22
  - keep clarification work inside `P1`
23
23
  - treat this as internal clarification workflow guidance, not developer-visible text
24
24
  - do not start planning or developer launch while clarification is still active
25
- - stop for human approval only after the clarification artifact is ready and validated
25
+ - do not stop for human approval after the clarification artifact is ready and validated; lock prompt-faithful defaults and continue autonomously into planning
26
+ - if a prompt-faithful safe default exists, choose it and keep moving instead of asking the user to unblock you
26
27
 
27
28
  ## Clarification standard
28
29
 
29
30
  - preserve the full original prompt text in parent-root `../metadata.json` under `prompt`
30
31
  - if the user appended stack/context lines after the prompt block, keep those out of `prompt` and treat them as separate startup context
31
32
  - fill known project metadata fields in `../metadata.json` from the prompt and any defensible existing-repo evidence while clarification is in progress
33
+ - keep `../metadata.json` on this exact project-only schema and do not add any other keys: `prompt`, `project_type`, `frontend_language`, `backend_language`, `database`, `frontend_framework`, `backend_framework`
32
34
  - repair or normalize meaning-bearing metadata fields when this is a resume or adopted-project flow
33
35
  - decompose the prompt thoroughly into explicit requirements, implied requirements, user flows, constraints, boundaries, risks, quality expectations, and verification expectations
34
36
  - build an owner-only intake package in `../.ai/pre-planning-brief.md` that captures at least:
@@ -55,7 +57,7 @@ Use this skill only during `P1 Clarification`.
55
57
  - keep clarification aligned with the original prompt
56
58
  - do not let clarification reduce, weaken, narrow, or silently reinterpret the prompt
57
59
  - use clarification to sharpen the build and improve output quality only when that improvement stays fully consistent with the prompt intent
58
- - do not start tracked development until the human approval step is complete
60
+ - once clarification is strong enough and validated, start tracked development without waiting for human approval
59
61
 
60
62
  Before planning begins, do a deliberate internal gap sweep across at least these categories and capture the important unresolved items in the owner-only intake package when they matter:
61
63
 
@@ -193,7 +195,7 @@ Even when ambiguity is low, still perform a serious clarification sweep first; d
193
195
  - keep the validation loop bounded and intentional; prefer one strong pass plus a small number of revision cycles over repeated loose churn
194
196
  - once prompt-faithfulness is satisfied and the remaining notes are minor or cosmetic, stop iterating and proceed
195
197
  - only treat the clarification prompt as approved for developer use after this validation loop passes and your own review agrees
196
- - requesting human approval before this validation loop passes is illegal
198
+ - treating an unvalidated clarification artifact as ready is illegal; finish the validation loop before continuing
197
199
 
198
200
  ## Exit conditions
199
201
 
@@ -204,4 +206,4 @@ Even when ambiguity is low, still perform a serious clarification sweep first; d
204
206
  - material ambiguities are resolved or safely locked and documented
205
207
  - `../docs/questions.md` exists and reflects the accepted clarification record
206
208
  - prompt drift has been checked and rejected
207
- - human approval exists
209
+ - the clarification record is internally accepted and ready to roll directly into planning without a user-stop pause
@@ -56,10 +56,11 @@ node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --run
56
56
  ## Model selection rule
57
57
 
58
58
  - choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
59
- - default to `--model sonnet` for ordinary planning, scaffold, development, and routine bugfix work
60
- - escalate to `--model opus` only for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
59
+ - default to `--model opus --effort medium` for ordinary planning, scaffold, development, and routine bugfix work
60
+ - escalate to `--model opus --effort xhigh` for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
61
+ - use `--model sonnet --effort medium` for documentation-heavy, lightweight, or otherwise materially simpler work where the lower-cost lane is sufficient
61
62
  - keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
62
- - when the task difficulty warrants it, also pass an explicit `--effort <level>` at launch time rather than hoping the default thinking level is ideal
63
+ - pass an explicit `--effort <level>` at launch time instead of relying on the CLI default; `medium` is the normal baseline and `xhigh` is the difficult-task override
63
64
  - keep the chosen `model`, `effort`, and `subagent_model` recorded in bridge state so later recovery and review can see what launched the lane
64
65
 
65
66
  The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
@@ -192,7 +193,7 @@ When resuming a long-lived lane:
192
193
 
193
194
  For evaluator-driven remediation inside a `bugfix-N` session opened by a `partial pass` audit:
194
195
 
195
- - lead with the concrete evaluator finding or owner-reviewed issue statement
196
+ - lead with the concrete evaluator finding or reviewed issue statement
196
197
  - state the expected fix and the affected non-regression surfaces
197
198
  - require proof for the issue path plus the nearby happy path and security/ownership boundary when relevant
198
199
  - say to stop after the named issue set rather than reopening unrelated refactors
@@ -340,7 +341,7 @@ Recommended additional fields when useful:
340
341
 
341
342
  Bridge lane state is the authoritative transport state for Claude-backed developer work.
342
343
 
343
- After each meaningful bridge action, immediately read bridge `state.json` and mirror the important fields into `../.ai/metadata.json`, `../metadata.json`, and Beads comments before advancing workflow state.
344
+ After each meaningful bridge action, immediately read bridge `state.json`, mirror workflow/session fields into `../.ai/metadata.json`, keep `../metadata.json` limited to its exact project-fact schema, and update Beads comments before advancing workflow state.
344
345
 
345
346
  ### After lane launch
346
347
 
@@ -362,7 +363,6 @@ After each meaningful bridge action, immediately read bridge `state.json` and mi
362
363
  - `transcript_path`
363
364
  - `opened_from_audit_number` when the session was opened from a `partial pass` audit
364
365
  - `orientation_completed: false`
365
- - mirror `session_id` into `../metadata.json` as `session_id`
366
366
  - record the session in Beads using `SESSION:`
367
367
 
368
368
  ### After each successful turn
@@ -10,7 +10,7 @@ Use this skill at `P1` startup, whenever workflow/session state is uncertain, an
10
10
  ## Purpose
11
11
 
12
12
  - verify that workflow state is consistent before real owner work continues
13
- - keep `../.ai/metadata.json`, `../metadata.json`, and Beads session comments aligned
13
+ - keep `../.ai/metadata.json`, the exact project-only `../metadata.json`, and Beads session comments aligned without leaking workflow state into project metadata
14
14
  - manage the universal developer-session policy across the whole run
15
15
  - recover deterministic continuity for existing sessions instead of guessing
16
16
 
@@ -20,7 +20,7 @@ Use this skill at `P1` startup, whenever workflow/session state is uncertain, an
20
20
  - deterministic workspace creation belongs in `slopmachine init`, not in this skill
21
21
  - clarification execution belongs to `clarification-gate`, not to this skill
22
22
  - the first planning handshake belongs to `P2` plus the worker-management skill, not to this skill
23
- - do not create extra human stops, status hand-backs, or permission pauses beyond the two allowed gates
23
+ - do not create human stops, status hand-backs, or permission pauses during ordinary execution; only irrecoverable blockers may interrupt the run
24
24
 
25
25
  ## State inspection sequence
26
26
 
@@ -146,25 +146,25 @@ Each `evaluation_runs[]` record should include enough to recover deterministic `
146
146
  - `routed_developer_label`
147
147
  - `started_bugfix_session_id`
148
148
  - `started_bugfix_label`
149
- - `fix_check_paths`
149
+ - `fix_check_path`
150
150
  - `status`
151
151
  - `completed_at`
152
152
 
153
153
  ## Project metadata fields
154
154
 
155
- Keep `../metadata.json` focused on project facts and exported project metadata, including when relevant:
155
+ Keep `../metadata.json` focused on project facts and exported project metadata with this exact schema only:
156
156
 
157
157
  - `prompt`
158
158
  - `project_type`
159
159
  - `frontend_language`
160
160
  - `backend_language`
161
161
  - `database`
162
- - `session_id`
163
162
  - `frontend_framework`
164
163
  - `backend_framework`
165
164
 
165
+ - use only `fullstack`, `backend`, `android`, `ios`, `desktop`, or `web` for `project_type` when known
166
166
  - fill known values early and keep them current
167
- - prefer explicit values; use `null` only when a field is genuinely unknown or not applicable
167
+ - prefer explicit values; use empty strings instead of `null` or extra workflow fields
168
168
  - do not use `../metadata.json` as owner workflow scratch state
169
169
 
170
170
  ## Session-lane model
@@ -172,7 +172,9 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
172
172
  - keep exactly one active developer session at a time
173
173
  - record every developer session in `developer_sessions`
174
174
  - from `P2` through `P6`, default to one long-lived `develop-1` lane
175
- - default the launch model for that long-lived lane to `sonnet`; choose `opus` only when the current lane's work is genuinely high-difficulty enough to justify a more expensive launch
175
+ - default the launch model for that long-lived lane to `opus` with `medium` effort
176
+ - raise that lane to `opus` with `xhigh` effort only when the work is genuinely difficult enough to justify it
177
+ - when launching a documentation-heavy or otherwise materially simpler lane, prefer `sonnet` with `medium` effort
176
178
  - if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
177
179
  - keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
178
180
  - keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically
@@ -222,7 +224,7 @@ On recovery, inspect at least:
222
224
 
223
225
  - store the active developer session id in Beads comments using `SESSION:`
224
226
  - mirror the active developer session id in `../.ai/metadata.json`
225
- - mirror the active developer session id in `../metadata.json` as `session_id`
227
+ - do not write session ids or other workflow-only state into `../metadata.json`
226
228
  - for Claude-backed sessions, include backend, runtime directory, tmux session name, and transcript path in the recorded session state so recovery and export remain deterministic
227
229
  - if these records disagree, repair them before continuing
228
230
  - do not silently create a replacement developer session if the intended existing one can still be recovered
@@ -253,7 +255,7 @@ For live Claude lanes specifically:
253
255
  - parent-root `../docs/` is the owner-maintained external documentation directory
254
256
  - parent-root `../sessions/` is the cleaned raw session-export directory for non-Claude developer sessions
255
257
  - Claude-backed developer sessions are packaged once as parent-root `claude-sessions.zip` instead of per-session `../sessions/` entries
256
- - parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check-<M>.md`, and `test_coverage_and_readme_audit_report.md`
258
+ - parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check.md`, and `test_coverage_and_readme_audit_report.md`
257
259
  - parent-root `../.ai/claude-live/` is the live Claude bridge runtime directory root
258
260
  - `../docs/questions.md` is the mandatory clarification record artifact
259
261
  - do not treat repo-local `docs/` as the active external documentation location
@@ -82,6 +82,7 @@ Use this skill during `P4 Development` before prompting the developer.
82
82
  - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
83
83
  - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
84
84
  - local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
85
+ - do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the slice is considered complete
85
86
  - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
86
87
  - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
87
88
  - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
@@ -59,8 +59,9 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
59
59
  - that same evaluator session should receive only the exact audit-scoped issue list or the current unresolved subset
60
60
  - that same evaluator session should only confirm whether those exact earlier items are fixed; it should not perform a broader new review
61
61
  - the follow-up report should describe what is resolved, what remains open, and any important verification caveats
62
- - store follow-up reports as `../.tmp/audit_report-<N>-fix_check-<M>.md`
62
+ - store the follow-up report as `../.tmp/audit_report-<N>-fix_check.md`
63
63
  - do not rewrite the report text after generation except for file moves and filename normalization
64
+ - when the bugfix loop needs another pass, replace the existing `audit_report-<N>-fix_check.md` so it always reflects the latest whole-audit status rather than one narrow loop pass
64
65
 
65
66
  ## Scope discipline
66
67
 
@@ -75,4 +76,4 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
75
76
  - allow at most 3 remediation attempts for the coverage/README audit; after the third attempt, keep the latest report as the final carried-forward evidence
76
77
  - do not move to `P8` until 2 bugfix sessions have been completed and the final coverage/README report exists from that last `P7` subphase
77
78
  - keep only partial-pass audit reports under `../.tmp/audit_report-<N>.md`
78
- - for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`
79
+ - for each bugfix session, keep its starting partial-pass audit report and its single replace-in-place `audit_report-<N>-fix_check.md` report together by shared audit number in `../.tmp/`
@@ -43,10 +43,8 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
43
43
  - number every fresh evaluation audit sequentially across the whole run for routing and metadata purposes
44
44
  - persist `../.tmp/audit_report-<N>.md` only for `partial pass` audits that actually open bugfix sessions
45
45
  - if a fresh audit is `fail` or `pass`, extract what you need from the generated working report, record the verdict and routing in metadata, and then discard the report file instead of leaving it in `../.tmp/`
46
- - for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
47
- - `../.tmp/audit_report-<N>-fix_check-1.md`
48
- - `../.tmp/audit_report-<N>-fix_check-2.md`
49
- - and so on
46
+ - for a `partial pass` audit that opens a bugfix session, keep one replace-in-place fix-check report under that audit number:
47
+ - `../.tmp/audit_report-<N>-fix_check.md`
50
48
 
51
49
  ## Evaluator-session model
52
50
 
@@ -72,18 +70,14 @@ The installed runtime copies under `~/slopmachine/` are the ordinary evaluation
72
70
 
73
71
  For each fresh audit:
74
72
 
75
- - compose the chosen evaluation prompt yourself; do not tell the evaluator to read prompt files on its own
76
- - use the original project prompt from metadata
73
+ - do not prefix, suffix, summarize, or otherwise rewrite the chosen evaluation prompt yourself
77
74
  - read the chosen evaluation prompt file contents yourself before launching evaluation
78
- - compose one large final prompt block
79
- - prefix the request with a clear instruction that the reviewer must work in the current project directory and evaluate the delivered project
80
- - make sure `README.md` and the code inside the current project directory are sufficient for evaluation; do not assume the evaluator will rely on parent-root docs or sibling workflow artifacts
81
- - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
82
- - send that fully composed text block directly to one fresh `General` evaluator session
83
- - require that session to produce a detailed file-backed audit report plus an issue summary
75
+ - inject only the full original project prompt into the `{prompt}` placeholder and leave the rest of the template body unchanged
76
+ - send that resulting evaluation prompt text verbatim to one fresh `General` evaluator session with zero additions or reductions beyond the `{prompt}` substitution
77
+ - let the prompt itself define the evaluator output contract; do not append extra response requirements outside the prompt body
84
78
  - assign the next audit number
85
79
  - if and only if the verdict is `partial pass`, keep the normalized report path as `../.tmp/audit_report-<N>.md`
86
- - if the verdict is `fail` or `pass`, discard the generated report file after extracting the issue summary or verdict you need
80
+ - if the verdict is `fail` or `pass`, discard the generated report file after extracting the verdict or issue list you need from the evaluator result and/or report contents
87
81
  - record the evaluator session id, prompt kind, audit number, verdict, kept-or-discarded report status, and routing decision in metadata
88
82
 
89
83
  ## Fresh-audit branching rule
@@ -123,17 +117,16 @@ Inside a `partial pass` audit's bugfix loop:
123
117
  - after the developer claims the fixes are done, run a rough targeted owner-side verification pass on the affected behavior before asking for evaluator confirmation
124
118
  - then return to the same evaluator session and send only the exact issue list or current unresolved subset for scoped fix confirmation
125
119
  - require a file-backed fix-check report for that scoped verification pass
126
- - store each fix-check report as `../.tmp/audit_report-<N>-fix_check-<M>.md`
127
- - if unresolved issues remain, take only that unresolved subset back to the same bugfix session and repeat the same-session fix-check loop
120
+ - store the scoped fix-check report as `../.tmp/audit_report-<N>-fix_check.md`
121
+ - if unresolved issues remain, take only that unresolved subset back to the same bugfix session and repeat the same-session fix-check loop, replacing the same fix-check report each time so it always covers the whole audit issue list in its latest state
128
122
  - once all issues from `audit_report-<N>.md` are resolved, mark that bugfix session completed in metadata
129
123
 
130
124
  ## Post-bugfix coverage and README audit
131
125
 
132
126
  - after 2 bugfix sessions have been completed, do not leave `P7` yet; this audit is the last subphase inside `P7`
133
- - read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
134
127
  - launch a fresh `General` evaluator session for this audit
135
128
  - prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
136
- - compose the request yourself and make clear that the reviewer is working in the current project directory and must write the report to `../.tmp/test_coverage_and_readme_audit_report.md`
129
+ - send `~/slopmachine/test-coverage-prompt.md` verbatim with zero additions or reductions; do not prepend cwd notes, workflow notes, or custom audit instructions because that prompt already defines its own report path and audit workspace assumptions
137
130
  - before each rerun, remove or replace the previous `../.tmp/test_coverage_and_readme_audit_report.md`; do not keep numbered variants for this report
138
131
  - if the report finds any issue, treat that as blocking `P7` completion
139
132
  - route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
@@ -155,10 +148,10 @@ Inside a `partial pass` audit's bugfix loop:
155
148
  - `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit has run as the last subphase of `P7`
156
149
  - the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
157
150
  - fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
158
- - after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Human Decision` with a clean report, otherwise move to `P8 Final Human Decision` with the latest final report after the third attempt
151
+ - after the second bugfix session completes, run the coverage/README audit; if it becomes clean within 3 remediation attempts, move to `P8 Final Readiness Decision` with a clean report, otherwise move to `P8 Final Readiness Decision` with the latest final report after the third attempt
159
152
 
160
153
  ## Boundaries
161
154
 
162
- - this phase is owner-side evaluation orchestration, not the final human decision gate
155
+ - this phase is owner-side evaluation orchestration, not a user approval gate
163
156
  - keep audit numbering deterministic and monotonic across the whole run
164
157
  - do not reopen the old counted-cycle report-root model