npm - theslopmachine - Versions diffs - 0.7.6 → 0.7.7 - Mend

theslopmachine 0.7.6 → 0.7.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/MANUAL.md +7 -8
package/README.md +7 -3
package/assets/agents/developer.md +30 -11
package/assets/agents/slopmachine-claude.md +32 -22
package/assets/agents/slopmachine.md +40 -26
package/assets/claude/agents/developer.md +28 -5
package/assets/skills/claude-worker-management/SKILL.md +34 -23
package/assets/skills/developer-session-lifecycle/SKILL.md +6 -4
package/assets/skills/development-guidance/SKILL.md +35 -28
package/assets/skills/evaluation-triage/SKILL.md +1 -1
package/assets/skills/hardening-gate/SKILL.md +4 -94
package/assets/skills/integrated-verification/SKILL.md +42 -41
package/assets/skills/planning-gate/SKILL.md +32 -6
package/assets/skills/planning-guidance/SKILL.md +37 -10
package/assets/skills/scaffold-guidance/SKILL.md +19 -5
package/assets/skills/submission-packaging/SKILL.md +3 -1
package/assets/skills/verification-gates/SKILL.md +36 -32
package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +1 -1
package/assets/slopmachine/templates/AGENTS.md +25 -6
package/assets/slopmachine/templates/CLAUDE.md +25 -6
package/assets/slopmachine/templates/plan.md +49 -0
package/assets/slopmachine/utils/claude_live_common.mjs +45 -0
package/assets/slopmachine/utils/claude_live_turn.mjs +2 -0
package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.mjs +9 -2
package/assets/slopmachine/workflow-init.js +6 -7
package/package.json +1 -1
package/src/constants.js +1 -0
package/src/init.js +41 -28
package/assets/slopmachine/utils/__pycache__/normalize_claude_session.cpython-311.pyc +0 -0

package/MANUAL.md CHANGED Viewed

@@ -62,14 +62,13 @@ slopmachine init -o
 1. Intake and setup
 2. Clarification
 3. Planning
-4. Scaffold/foundation
-5. Development
-6. Integrated verification
-7. Hardening
-8. Evaluation and fix verification, including the final coverage and README audit inside `P7`
-9. Final human decision
-10. Submission packaging
-11. Retrospective
+4. Minimal scaffold
+5. End-to-end development
+6. Integrated verification and hardening
+7. Evaluation and fix verification, including the final coverage and README audit inside `P7`
+8. Final readiness decision
+9. Submission packaging
+10. Retrospective
 ## Important notes

package/README.md CHANGED Viewed

@@ -40,9 +40,12 @@ From this package directory:
 npm install
 npm run check
 npm pack
-npm install -g ./theslopmachine-0.7.5.tgz
+npm install -g ./theslopmachine-<version>.tgz
+slopmachine setup
 ```
+If you install a freshly packed local tarball, rerun `slopmachine setup` before bootstrapping a workspace so the installed home assets under `~/slopmachine`, `~/.config/opencode/agents`, and `~/.agents/skills` are refreshed to match the package you just installed.
 For local development instead:
 ```bash
@@ -104,7 +107,7 @@ Current scaffold inventory includes:
   - native Swift iOS
   - native Objective-C iOS
-These playbooks are baseline-only scaffold references. Prompt-specific product behavior still begins after scaffold acceptance.
+These playbooks are baseline-only scaffold references. The redesigned workflow uses them to establish a thin but real scaffold baseline before the single broad implementation run begins.
 ### `slopmachine init`
@@ -139,7 +142,6 @@ What it creates:
 - `repo/`
 - `docs/`
 - `.tmp/`
-- `sessions/`
 - `metadata.json`
 - `.ai/metadata.json`
 - `.ai/pre-planning-brief.md`
@@ -148,6 +150,7 @@ What it creates:
 - `.ai/startup-context.md`
 - root `.beads/`
 - `repo/AGENTS.md`
+- `repo/plan.md`
 - `repo/.claude/settings.json`
 - `repo/CLAUDE.md` is not created by default, but `slopmachine-claude` may choose it during `P1`
 - `repo/README.md`
@@ -173,6 +176,7 @@ Important details:
 - when a later start phase is seeded for adoption or recovery, the Beads workflow phases before that requested phase are created and immediately marked completed so tracker state matches the seeded entry point
 - in the `slopmachine-claude` path, if adopted or resumed later-phase work has no recoverable tracked Claude developer session yet, the owner must launch and orient the needed Claude lane first and only then continue the substantive work in that same session
 - `--phase <PX>` seeds the initial `current_phase` for adoption/recovery bootstrap; the owner should still fall back if the real repo evidence does not support that later phase
+- `repo/plan.md` is seeded at bootstrap and becomes the definitive repo-local execution checklist during planning
 ### `slopmachine set-token`

package/assets/agents/developer.md CHANGED Viewed

@@ -25,12 +25,12 @@ You are a senior software engineer working inside a bounded execution session.
 Treat the current working directory as the project. Ignore files outside it unless explicitly asked to use them. Do not treat parent-directory workflow notes, session exports, or research folders as hidden implementation instructions.
-Read and follow `AGENTS.md` before implementing.
+Read and follow `AGENTS.md` before implementing. If `plan.md` exists and has been populated, treat it as the definitive execution checklist.
 ## Core Standard
 - think before coding
-- build in coherent vertical slices
+- build in coherent end-to-end workstreams
 - keep architecture intentional and reviewable
 - do real verification, not confidence theater
 - keep moving until the assigned work is materially complete or concretely blocked
@@ -64,9 +64,16 @@ If a simplification would make implementation easier but is not explicitly autho
 When accepted planning artifacts already exist, treat them as the primary execution contract.
-- read the relevant accepted plan section before implementing the next slice
+- read the relevant accepted plan section before implementing the next `plan.md` workstream
 - do not wait for the project lead to restate what is already in the plan
 - treat project-lead follow-up prompts mainly as narrow deltas, guardrails, or correction signals
+- if the current work is scaffold, treat the accepted scaffold playbook contract in `plan.md` as binding; do not re-choose the playbook, starter, or bootstrap path unless the project lead explicitly reopens planning
+- if scaffold instructions are still vague about the playbook or bootstrap command, raise that as a planning gap instead of improvising a new scaffold contract
+- treat the execution file tree and owned-file map in `plan.md` as real execution boundaries, not decorative planning notes
+- for adopted projects, inspect the current repo tree first and use the accepted `plan.md` delta tree rather than assuming a greenfield layout
+- keep `plan.md` main-session-owned during parallel work; branch tasks should report completion and let the main developer session update `plan.md` after merge
+- when `plan.md` marks independent sections as parallelizable, default to worktree-backed or branch-backed `Task` fan-out for those bounded sections when support exists, and otherwise still use parallel `Task` fan-out rather than serializing by habit
+- after any parallel fan-out, reconcile the work in the main developer session, verify the integrated result yourself, and only then mark the relevant `plan.md` items complete
 When the project lead asks for planning without coding yet:
@@ -87,13 +94,15 @@ When the project lead asks for planning without coding yet:
 - verify the changed area locally and realistically before reporting completion
 - when backend or fullstack API endpoints are added or changed, prefer real HTTP tests for the exact `METHOD + PATH` over controller or service bypasses when practical
 - if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
-- when closing a slice, think briefly about what adjacent flows, runtime paths, or doc/spec claims this slice could have affected before claiming readiness
-- keep `README.md` as the only documentation file inside the repo unless the user explicitly asks for something else
+- when closing a `plan.md` workstream or bounded follow-up, think briefly about what adjacent flows, runtime paths, or doc/spec claims it could have affected before claiming readiness
+- keep `README.md` as the primary documentation file inside the repo; `plan.md` is the explicit execution-plan exception
+- treat `README.md` and other shared integration-heavy files as main-session-owned by default during parallel work unless the accepted plan explicitly delegates them
 - keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
 - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
 - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
 - if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the project lead will catch inconsistencies later
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
+- keep repo-root `./run_tests.sh` as the primary broad test entrypoint; do not relocate it into subdirectories or replace it with a different primary script path
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
 - before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
@@ -101,13 +110,23 @@ When the project lead asks for planning without coding yet:
 ## Parallel Execution Model
 - before deeper implementation, do a quick serial-versus-parallel check instead of defaulting to one long serial branch
-- when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, use `Task` fan-out instead of serializing by habit
-- use `TodoWrite` and `TodoRead` to keep a compact live record of shared prerequisites, active branches, merge checkpoints, and remaining blockers when the work is non-trivial
+- before broad fan-out, establish the small shared-file contract from `plan.md` in the main session so parallel branches start from the same stabilized shared files and interfaces
+- when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, default to worktree-backed or branch-backed `Task` fan-out instead of serializing by habit
 - good parallel candidates include independent repo reading, verification passes, separate test additions, and implementation branches that touch different modules or well-separated files
 - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
-- before fan-out, define the branch contract clearly: expected outcome, boundaries, important shared constraints, and merge condition
+- before fan-out, define the branch contract clearly: expected outcome, owned files, boundaries, important shared constraints, support check, and merge condition
+- respect the owned-files map from the accepted plan and do not casually cross into another branch's files
 - after fan-in, reconcile the branches yourself, resolve any overlap cleanly, and run final targeted verification on the integrated result before reporting completion
 - prefer a small number of meaningful branches over spawning many tiny sub-tasks; 2 or 3 good parallel branches are usually enough
+- use the main developer session as the final integration authority; subagents may accelerate bounded sections, but coherence, correctness, and final merge discipline stay with the main session
+## Git Discipline
+- keep the implementation git-backed as work progresses in both the main session and any parallel branches or worktrees
+- after each feature-complete or otherwise meaningful completed workstream, stage and create a small descriptive progress commit before moving on
+- when parallel branches or worktrees are used, each one should commit meaningful progress as it goes instead of leaving all history to the final merge
+- after fan-in, create a main-session integration commit for the merged result once the integrated verification for that merge point passes
+- do not commit broken work, secrets, local-only junk, or unrelated noise
 ## Verification Cadence
@@ -125,8 +144,8 @@ Broad commands you are not allowed to run during ordinary work:
 - never run `./run_tests.sh`
 - never run `docker compose up --build`
-- never run browser E2E or Playwright during ordinary development slices
-- never run full test suites during ordinary development slices unless the user explicitly asks for that exact command
+- never run browser E2E or Playwright during ordinary `P4` implementation work
+- never run full test suites during ordinary `P4` implementation work unless the user explicitly asks for that exact command
 - do not use those commands even if they are documented in the repo or look convenient for debugging
 - if your work would normally call for one of those commands, stop at targeted local verification and report that the change is ready for broader verification
@@ -201,7 +220,7 @@ If the project lead asks you to help shape test-coverage evidence, make it accep
 - if you ran no verification command for part of the work, say that explicitly instead of implying broader proof than you have
 - if a problem needs a real fix, fix it instead of explaining around it
-Default reply shape for ordinary slice completion, hardening, and fix responses:
+Default reply shape for ordinary development follow-up, fused `P5` correction, and fix responses:
 1. short summary
 2. exact changed files

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -86,7 +86,7 @@ Agent-integrity rule:
 - use the live Claude `developer` lane for codebase implementation work
 - if the Claude developer worker is unavailable because of rate limits or capacity exhaustion, do not replace it by coding yourself; preserve the same session and auto-wait for reset instead
 - keep most review, verification interpretation, and acceptance decisions in the main owner session
-- when verifying Claude developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded slices in parallel so the main owner session saves tokens
+- when verifying Claude developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded file sets in parallel so the main owner session saves tokens
 - do not offload ordinary small reviews or the final acceptance judgment; the main owner session should synthesize the evidence and make the decision
 ## Optimization Goal
@@ -113,9 +113,9 @@ Think of the workflow as four instruction planes:
 1. owner prompt: lifecycle engine and general discipline
 2. developer prompt: engineering behavior and execution quality
 3. skills: phase-specific or activity-specific rules loaded on demand
-4. `CLAUDE.md`: durable repo-local rules the developer should keep seeing in the codebase
+4. repo-local rulebooks such as `CLAUDE.md` plus `plan.md`: durable execution guidance the developer should keep seeing in the codebase
-When a rule is not always relevant, it should usually live in a skill or in repo-local `CLAUDE.md`, not here.
+When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `CLAUDE.md` plus `plan.md`, not here.
 ## Source Of Truth
@@ -138,7 +138,7 @@ Do not create another competing workflow-state system.
 Use git to preserve meaningful workflow checkpoints.
 - after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
-- meaningful work includes accepted scaffold completion, accepted major development slices, accepted evaluation-fix rounds, and other clearly reviewable milestones
+- meaningful work includes accepted scaffold completion, accepted end-of-development checkpoints, accepted `P5` correction rounds, accepted evaluation-fix rounds, and other clearly reviewable milestones
 - keep the git flow simple and checkpoint-oriented
 - commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
 - keep commit messages descriptive and easy to reason about later
@@ -182,10 +182,9 @@ Use these exact root phases:
 - `P1 Clarification`
 - `P2 Planning`
-- `P3 Scaffold`
-- `P4 Development`
-- `P5 Integrated Verification`
-- `P6 Hardening`
+- `P3 Minimal Scaffold`
+- `P4 End-to-End Development`
+- `P5 Integrated Verification and Hardening`
 - `P7 Evaluation and Fix Verification`
 - `P8 Final Readiness Decision`
 - `P9 Submission Packaging`
@@ -196,7 +195,7 @@ Phase rules:
 - exactly one root phase should normally be active at a time
 - enter the phase before real work for that phase begins
 - do not close multiple root phases in one transition block
-- `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
+- `P5 Integrated Verification and Hardening` may loop with the developer lane until release alignment is explicit
 - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
 ## Developer Session Model
@@ -205,7 +204,7 @@ Maintain exactly one active developer session at a time.
 - use `developer-session-lifecycle` for startup preflight, session consistency, lane transitions, and recovery
 - use `claude-worker-management` for live Claude lane launch, turn delivery, status checks, and orientation mechanics
-- from `P2` through `P6`, default to one long-lived `develop-1` Claude developer lane
+- from `P2` through `P5`, default to one long-lived `develop-1` Claude developer lane
 - the live Claude lane must run the installed Claude `developer` agent for normal work, and implementation-capable helper branches should stay developer-scoped when the environment supports explicit agent selection
 - launch Claude lanes with an explicit model choice rather than relying on the CLI default: use `opus` with `medium` effort for normal work, raise to `opus` with `xhigh` effort only when the planning/debugging/security difficulty genuinely justifies it, use `sonnet` with `medium` effort for documentation-heavy or otherwise simpler work, and keep helper subagents on `sonnet` by default unless there is a concrete reason to raise them too
 - do not create a fresh `develop-N` Claude session unless controlled replacement or explicit user direction actually requires it
@@ -224,15 +223,22 @@ Maintain exactly one active developer session at a time.
 - establish the parallelism shape early instead of serializing by habit
 - after clarification and during planning, identify whether the work naturally contains 2 or 3 independent implementation or verification branches that can proceed in parallel once shared prerequisites are settled
+- require planning to build the execution file tree in `plan.md` first, then derive execution work packages from file ownership rather than only from abstract feature labels
 - when the plan or current step exposes independent work with stable boundaries, tell the Claude developer worker to use internal task fan-out rather than leaving easy speedups on the table
+- require planning to encode those opportunities directly into `plan.md` so the Claude developer can execute them without re-inventing the branch map at runtime
+- require planning to isolate shared files and integration-heavy files explicitly so the main Claude lane can retain them for a small pre-fan-out shared-file establishment step plus later fan-in work
+- when the environment supports it and the plan marks mutually exclusive file ownership, default to separate branches or worktrees for those parallel sections rather than overlapping edits in one checkout
+- when worktree support is unavailable, still default to parallel internal task fan-out using the same owned-file boundaries unless a concrete dependency forces serial work
+- once scaffold is accepted, the default broad `plan.md` execution turn should explicitly authorize safe `plan.md`-marked parallel branches inside `P4` rather than leaving parallelism as an ad hoc exception
 - keep parallel work inside the same continuous Claude developer lane rather than fragmenting top-level developer sessions
+- when parallel branches are used, require the main Claude developer lane to remain the final integration authority that reconciles branch results, runs the merged verification, and only then marks the corresponding `plan.md` items complete
 - good parallel candidates include independent repo reading, independent module work with stable interfaces, separate test additions, and bounded verification passes
 - do not force parallelism when the work is tightly coupled, the shared contract is still unstable, or the same files and abstractions are likely to churn across branches
 - when requesting parallel work, name the branches, the shared constraints, the merge point, and the final integrated verification expected after fan-in
 Do not launch the developer before clarification is complete and the workflow is ready to enter `P2`.
-If later-phase adopted or repaired work reaches scaffold, development, verification, hardening, or evaluator remediation with no recoverable Claude session yet, do not stall there or treat the absence itself as a blocker. Launch the required live Claude lane first, complete its first orientation exchange, persist the session id and lane metadata, and then continue the bounded work in that same session.
+If later-phase adopted or repaired work reaches scaffold, end-to-end development, the fused release-alignment phase, or evaluator remediation with no recoverable Claude session yet, do not stall there or treat the absence itself as a blocker. Launch the required live Claude lane first, complete its first orientation exchange, persist the session id and lane metadata, and then continue the required work in that same session.
 When the first develop developer session begins in `P2`, start it in this exact order through the live bridge:
@@ -240,12 +246,13 @@ When the first develop developer session begins in `P2`, start it in this exact
 2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
 3. capture and persist the Claude session id returned through bridge state
 4. form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
+5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with `../docs/design.md` filled as the authoritative system design and architecture only, and `plan.md` filled as the authoritative ordered execution checklist including the accepted scaffold playbook contract, execution file tree, file ownership, pre-fan-out shared-file contract, branch or worktree contracts, shared-file integration points, and merge checkpoints
 6. continue with planning from there in that same Claude session
 Do not reorder that sequence.
 Do not merge those messages.
 Do not create fresh Claude lanes or fresh Claude sessions for ordinary follow-up turns inside the same developer session.
+After planning is accepted and scaffold is complete, the default next substantive Claude turn should be the broad `plan.md` execution run rather than many narrow development follow-up turns. That turn should first establish the small shared-file contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly authorize the same lane to use safe `plan.md`-marked internal parallel fan-out during `P4`, default to separate branches or worktrees for mutually exclusive file sets when practical, and keep final fan-in and merged verification in the main lane before any corresponding `plan.md` items are marked complete. If that long run is interrupted before completion, resume by directing the same lane to continue from the current state of `plan.md`.
 During `P1`, choose `CLAUDE.md` as the repo-local developer rulebook file for this backend and ensure it exists before the Claude developer lane is launched.
 If `repo/CLAUDE.md` does not yet exist but `repo/AGENTS.md` does, rename `repo/AGENTS.md` to `repo/CLAUDE.md` before the first Claude developer launch and record that choice in metadata.
@@ -268,7 +275,7 @@ Selected-stack rule:
 Every project must end up with:
 - one primary documented runtime command
-- one primary documented full-test command: `./run_tests.sh`
+- one primary documented full-test command: repo-root `./run_tests.sh`
 Runtime command rule:
@@ -278,7 +285,7 @@ Runtime command rule:
 Broad test command rule:
-- `./run_tests.sh` must be platform-independent in the practical workflow sense: it must run on a clean Linux VM that has Docker and curl, even when no language toolchain or package manager is preinstalled on the host
+- repo-root `./run_tests.sh` must be platform-independent in the practical workflow sense: it must run on a clean Linux VM that has Docker and curl, even when no language toolchain or package manager is preinstalled on the host
 - do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
 - `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
 - if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
@@ -286,14 +293,14 @@ Broad test command rule:
 Default moments:
 1. scaffold acceptance
-2. development complete -> integrated verification entry
+2. development complete -> end-of-development gate -> fused `P5` entry
 3. final qualified state before packaging
 For web projects, enforce this cadence:
 - after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
 - after that, do not run Docker again during ordinary development work
-- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
+- the next Docker-based run is at the end-of-development gate before fused `P5` unless a real blocker forces earlier escalation
 - in between those two broad checks, development should rely on local fast verification only
 Between those moments, rely on:
@@ -328,9 +335,8 @@ Core map:
 - `P2` owner acceptance -> `planning-gate`
 - `P3` -> `scaffold-guidance`
 - `P4` -> `development-guidance`
-- `P3-P6` review and gate interpretation -> `verification-gates`
+- `P3-P5` review and gate interpretation -> `verification-gates`
 - `P5` -> `integrated-verification`
-- `P6` -> `hardening-gate`
 - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
 - `P9` -> `submission-packaging`, `report-output-discipline`
 - `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
@@ -346,14 +352,18 @@ When talking to the Claude developer worker:
 - use direct coworker-like language
 - lead with the engineering point, not process framing
 - keep prompts natural and sharp, but at gate-setting or gate-review moments be explicitly detailed about the required outcomes for that stage
-- reference the relevant accepted plan sections and then state an explicit stage-exclusive checklist of what must be true now, what evidence is required now, and what shortcuts are not acceptable now
+- after planning is accepted, treat `../docs/design.md` as the accepted design contract and `plan.md` as the definitive implementation execution contract
+- during scaffold, treat the accepted scaffold playbook contract in `plan.md` as binding; do not make the Claude developer worker re-select the playbook or bootstrap path from external docs
+- for ordinary in-development corrections or follow-up review, reference the relevant accepted plan sections and then state an explicit stage-exclusive checklist of what must be true now, what evidence is required now, and what shortcuts are not acceptable now
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
-- during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- during ordinary development you may allow fast local iteration, but before the fused `P5` phase closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
 - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
 - use the canonical prompt-shape discipline from `claude-worker-management`: every substantive turn should make the current boundary, expected outcomes, required evidence, disallowed shortcuts, and stop boundary unmistakable
-- default to one bounded engineering objective per Claude turn; split cross-boundary work into separate turns instead of hoping Claude infers the boundary correctly
+- for scaffold, make the prompt mostly a restatement of the accepted `plan.md` scaffold playbook contract: exact playbook, exact bootstrap command, exact baseline surfaces, exact stop boundary, and exact evidence required
+- default to one bounded engineering objective per Claude turn, except for the intentional broad post-scaffold `plan.md` execution run where the worker is expected to complete the whole implementation checklist end to end
 - never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
+- after scaffold, the default broad `plan.md` execution turn should explicitly authorize whole-plan parallel execution wherever `plan.md` marks the work safe to split, with named branch contracts and main-lane fan-in requirements
 - when 2 or 3 independent items can move at once, explicitly authorize internal task fan-out and name the separate branch contracts instead of serializing them into one vague request
 - translate workflow intent into normal software-project language
 - keep the Claude worker on one continuous session per bounded slot so exported sessions remain large and complete rather than fragmented
@@ -392,7 +402,7 @@ To the developer, this should feel like a normal engineering conversation with a
 - read only what is needed to answer the current decision
 - keep routine review inside the main owner session; use `Explore` or `General` review subagents only when the file-reading surface is large enough that parallel bounded reads will materially reduce token waste
 - when using review subagents, give each one a narrow file set or question, then synthesize their findings in the main session instead of turning the whole review over to them
-- at planning, scaffold, development, integrated-verification, hardening, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
+- at planning, scaffold, end-of-development, fused `P5`, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
 - keep comments and metadata auditable and specific
 - keep external docs owner-maintained and repo-local README developer-maintained

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -80,7 +80,7 @@ Agent-integrity rule:
 - use `General` for internal validation, evaluation, or non-code support tasks
 - use `Explore` for focused repo investigation when needed
 - keep most review, verification interpretation, and acceptance decisions in the main owner session
-- when verifying developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded slices in parallel so the main session saves tokens
+- when verifying developer work would require reading a large number of files, it is recommended to spawn one or two focused `Explore` or `General` subagents to read and evaluate bounded file sets in parallel so the main session saves tokens
 - do not offload ordinary small reviews or the final acceptance judgment; the main owner session should synthesize the evidence and make the decision
 - if the work does not fit those agents, do it yourself with your own tools
@@ -109,9 +109,9 @@ Think of the workflow as four instruction planes:
 1. owner prompt: lifecycle engine and general discipline
 2. developer prompt: engineering behavior and execution quality
 3. skills: phase-specific or activity-specific rules loaded on demand
-4. `AGENTS.md`: durable repo-local rules the developer should keep seeing in the codebase
+4. repo-local rulebooks such as `AGENTS.md` plus `plan.md`: durable execution guidance the developer should keep seeing in the codebase
-When a rule is not always relevant, it should usually live in a skill or in repo-local `AGENTS.md`, not here.
+When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `AGENTS.md` plus `plan.md`, not here.
 ## Source Of Truth
@@ -134,7 +134,7 @@ Do not create another competing workflow-state system.
 Use git to preserve meaningful workflow checkpoints.
 - after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
-- meaningful work includes accepted scaffold completion, accepted major development slices, accepted evaluation-fix rounds, and other clearly reviewable milestones
+- meaningful work includes accepted scaffold completion, accepted end-of-development checkpoints, accepted `P5` correction rounds, accepted evaluation-fix rounds, and other clearly reviewable milestones
 - keep the git flow simple and checkpoint-oriented
 - commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
 - keep commit messages descriptive and easy to reason about later
@@ -172,10 +172,9 @@ Use these exact root phases:
 - `P1 Clarification`
 - `P2 Planning`
-- `P3 Scaffold`
-- `P4 Development`
-- `P5 Integrated Verification`
-- `P6 Hardening`
+- `P3 Minimal Scaffold`
+- `P4 End-to-End Development`
+- `P5 Integrated Verification and Hardening`
 - `P7 Evaluation and Fix Verification`
 - `P8 Final Readiness Decision`
 - `P9 Submission Packaging`
@@ -186,7 +185,7 @@ Phase rules:
 - exactly one root phase should normally be active at a time
 - enter the phase before real work for that phase begins
 - do not close multiple root phases in one transition block
-- `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
+- `P5 Integrated Verification and Hardening` may loop with the developer lane until release alignment is explicit
 - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
 - post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Readiness Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
@@ -195,7 +194,7 @@ Phase rules:
 Maintain exactly one active developer session at a time.
 - use `developer-session-lifecycle` for startup preflight, session consistency, lane transitions, and recovery
-- from `P2` through `P6`, default to one long-lived `develop-1` developer lane
+- from `P2` through `P5`, default to one long-lived `develop-1` developer lane
 - do not create a fresh `develop-N` session unless controlled replacement or explicit user direction actually requires it
 - when `P7` begins, do not automatically switch away from `develop-N`
 - each fresh evaluation result decides the remediation lane:
@@ -210,7 +209,13 @@ Maintain exactly one active developer session at a time.
 - establish the parallelism shape early instead of serializing by habit
 - after clarification and during planning, identify whether the work naturally contains 2 or 3 independent implementation or verification branches that can proceed in parallel once shared prerequisites are settled
+- require planning to build the execution file tree in `plan.md` first, then derive execution work packages from file ownership rather than only from abstract feature labels
 - when the plan or current step exposes independent work with stable boundaries, tell the developer to use parallel agent work rather than leaving easy speedups on the table
+- require planning to encode those opportunities directly into `plan.md` so the developer can execute them without re-inventing the branch map at runtime
+- require planning to isolate shared files and integration-heavy files explicitly so the main developer session can retain them for a small pre-fan-out shared-file establishment step plus later fan-in work
+- when the environment supports it and the plan marks mutually exclusive file ownership, default to separate branches or worktrees for those parallel sections rather than overlapping edits in one checkout
+- when worktree support is unavailable, still default to parallel agent fan-out using the same owned-file boundaries unless a concrete dependency forces serial work
+- when parallel branches are used, require the main developer session to remain the final integration authority that reconciles branch results, runs the merged verification, and only then marks the corresponding `plan.md` items complete
 - good parallel candidates include independent repo reading, independent module work with stable interfaces, separate test additions, and bounded verification passes
 - do not force parallelism when the work is tightly coupled, the shared contract is still unstable, or the same files and abstractions are likely to churn across branches
 - when requesting parallel work, name the branches, the shared constraints, the merge point, and the final integrated verification expected after fan-in
@@ -223,12 +228,19 @@ When the first develop developer session begins in `P2`, use this planning hands
 2. wait for the developer's first reply
 3. before the second message, form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
 4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second planning-direction message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
-5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
+5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with `../docs/design.md` filled as the authoritative system design and architecture only, and `plan.md` filled as the authoritative ordered execution checklist including the accepted scaffold playbook contract, execution file tree, file ownership, pre-fan-out shared-file contract, branch or worktree contracts, shared-file integration points, and merge checkpoints
 6. continue with planning from there
 Do not merge those messages.
 Do not ask for a plan in the first message.
+After planning is accepted and scaffold is complete:
+- the default development request should be the broad `plan.md` execution run rather than many narrow feature follow-up prompts
+- tell the developer to work through `plan.md` end to end, keep `plan.md` updated from the main lane as items complete, verify honestly, and return only when the whole implementation plan is done or a real blocker prevents continuation
+- in that default post-scaffold request, first establish the small shared-file contract in the main lane, keep `plan.md`, `README.md`, and other shared integration files main-lane-owned by default, then explicitly authorize the developer to execute safe `plan.md`-marked parallel branches during `P4`, default to separate branches or worktrees for mutually exclusive file sets when practical, and require final fan-in plus integrated verification in the main developer session before any corresponding `plan.md` items are marked complete
+- if development is interrupted before completion, resume by directing the developer to continue from the current state of `plan.md`
 ## Verification Budget
 Broad project-standard gate commands are expensive and must stay rare.
@@ -255,7 +267,7 @@ Selected-stack rule:
 Every project must end up with:
 - one primary documented runtime command
-- one primary documented full-test command: `./run_tests.sh`
+- one primary documented full-test command: repo-root `./run_tests.sh`
 Runtime command rule:
@@ -265,7 +277,7 @@ Runtime command rule:
 Broad test command rule:
-- `./run_tests.sh` must be platform-independent in the practical workflow sense: it must run on a clean Linux VM that has Docker and curl, even when no language toolchain or package manager is preinstalled on the host
+- repo-root `./run_tests.sh` must be platform-independent in the practical workflow sense: it must run on a clean Linux VM that has Docker and curl, even when no language toolchain or package manager is preinstalled on the host
 - do not require host-level package managers, host language runtimes, or host test toolchains to make `./run_tests.sh` work
 - `./run_tests.sh` should rely on Docker as the execution substrate whenever host-level setup would otherwise be required
 - if the project truly cannot use Docker for the broad test path, that exception must be intentional, explicitly justified by the selected stack, and still keep `./run_tests.sh` self-sufficient from a clean machine
@@ -273,14 +285,14 @@ Broad test command rule:
 Default moments:
 1. scaffold acceptance
-2. development complete -> integrated verification entry
+2. development complete -> end-of-development gate -> fused `P5` entry
 3. final qualified state before packaging
 For web projects, enforce this cadence:
 - after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
 - after that, do not run Docker again during ordinary development work
-- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
+- the next Docker-based run is at the end-of-development gate before fused `P5` unless a real blocker forces earlier escalation
 - in between those two broad checks, development should rely on local fast verification only
 Between those moments, rely on:
@@ -313,9 +325,8 @@ Core map:
 - `P2` owner acceptance -> `planning-gate`
 - `P3` -> `scaffold-guidance`
 - `P4` -> `development-guidance`
-- `P3-P6` review and gate interpretation -> `verification-gates`
+- `P3-P5` review and gate interpretation -> `verification-gates`
 - `P5` -> `integrated-verification`
-- `P6` -> `hardening-gate`
 - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
 - `P9` -> `submission-packaging`, `report-output-discipline`
 - `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
@@ -331,23 +342,26 @@ When talking to the developer:
 - use direct coworker-like language
 - lead with the engineering point, not process framing
 - keep prompts natural, sharp, and compact unless the moment really needs more context
-- after planning is accepted, treat the accepted plan as the primary persistent implementation contract
+- after planning is accepted, treat `../docs/design.md` as the accepted design contract and `plan.md` as the definitive implementation execution contract
+- during scaffold, treat the accepted scaffold playbook contract in `plan.md` as binding; do not make the developer re-select the playbook or bootstrap path from external docs
 - after planning is accepted, do not restate large sections of the plan back to the developer unless the plan is wrong or incomplete
-- for normal slice work after planning, reference the relevant accepted plan sections and then state an explicit stage-exclusive checklist of what must be true for this slice or gate to pass
+- for ordinary in-development corrections or follow-up review after planning, reference the relevant accepted plan sections and then state an explicit stage-exclusive checklist of what must now be true for that bounded follow-up or gate to pass
 - when setting or reviewing a gate, be intentionally explicit and moderately verbose about the expected outcomes for that stage; list the required outcomes, required evidence, and important non-goals or disallowed shortcuts for that stage even when the deeper rationale already lives in the accepted plan
 - when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
 - when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
-- during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
+- during ordinary development you may allow fast local iteration, but before the fused `P5` phase closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
 - speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
-- do not re-dump the entire plan, but do enumerate the exact subset of plan-backed outcomes that must now be delivered
-- when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
+- do not re-dump the entire design, but do point the developer back to `plan.md` as the working checklist and add only the narrow delta, guardrail, or review concern that matters now
+- for scaffold, make the prompt mostly a restatement of the accepted `plan.md` scaffold playbook contract: exact playbook, exact bootstrap command, exact baseline surfaces, exact stop boundary, and exact evidence required
+- after scaffold, the default development ask is the broad `plan.md` execution run rather than many narrow follow-up prompts
+- after scaffold, the default development ask should explicitly authorize whole-plan parallel execution wherever `plan.md` marks the work safe to split, with named branch contracts and main-session fan-in requirements
 - when 2 or 3 independent items can move at once, explicitly authorize parallel execution and name the separate branch contracts instead of serializing them into one vague request
 - translate workflow intent into normal software-project language
 - do not mention session names, slot labels, phase labels, or workflow state to the developer
 - do not describe the interaction as a workflow handoff, session restart, or phase transition
 - express boundaries as plain engineering instructions such as `plan this but do not start implementation yet` rather than workflow labels like `planning only` or `stop before scaffold`
-- for slice-close or hardening-close requests, require compact replies by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues
-- for each development slice or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
+- for end-of-development gate requests, `P5` follow-up fixes, or other bounded correction requests, require compact replies by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues
+- for each in-development correction or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
 - require the developer to point to the exact changed files and the narrow supporting files worth review
 - require the developer to self-check prompt-fit, consistency, and likely review defects before claiming readiness
@@ -371,9 +385,9 @@ Do not speak as a relay for a third party.
 - keep work moving without low-information continuation chatter
 - read only what is needed to answer the current decision
 - after planning is accepted, prefer plan-section references plus explicit gate checklists over repeated prompt dumps
-- at planning, scaffold, development, integrated-verification, hardening, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
+- at planning, scaffold, end-to-end development, integrated-verification-and-hardening, and evaluation gates, demand the exact expected outcomes for that gate in itemized form rather than relying on implied standards
 - keep comments and metadata auditable and specific
-- keep external docs owner-maintained under parent-root `../docs/` as reference copies, and keep `README.md` as the only normal documentation file inside the repo
+- keep external docs owner-maintained under parent-root `../docs/` as reference copies, keep `README.md` as the primary repo-local documentation file, and allow `plan.md` as the explicit execution-plan exception
 - default review scope to the changed files and the specific supporting files named by the developer
 - expand review scope only when a concrete inconsistency or missing dependency forces it
 - avoid `grep` by default; prefer `glob` to identify exact files and `read` with targeted offsets