npm - theslopmachine - Versions diffs - 0.7.6 → 0.7.7 - Mend

theslopmachine 0.7.6 → 0.7.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/MANUAL.md +7 -8
package/README.md +7 -3
package/assets/agents/developer.md +30 -11
package/assets/agents/slopmachine-claude.md +32 -22
package/assets/agents/slopmachine.md +40 -26
package/assets/claude/agents/developer.md +28 -5
package/assets/skills/claude-worker-management/SKILL.md +34 -23
package/assets/skills/developer-session-lifecycle/SKILL.md +6 -4
package/assets/skills/development-guidance/SKILL.md +35 -28
package/assets/skills/evaluation-triage/SKILL.md +1 -1
package/assets/skills/hardening-gate/SKILL.md +4 -94
package/assets/skills/integrated-verification/SKILL.md +42 -41
package/assets/skills/planning-gate/SKILL.md +32 -6
package/assets/skills/planning-guidance/SKILL.md +37 -10
package/assets/skills/scaffold-guidance/SKILL.md +19 -5
package/assets/skills/submission-packaging/SKILL.md +3 -1
package/assets/skills/verification-gates/SKILL.md +36 -32
package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +1 -1
package/assets/slopmachine/templates/AGENTS.md +25 -6
package/assets/slopmachine/templates/CLAUDE.md +25 -6
package/assets/slopmachine/templates/plan.md +49 -0
package/assets/slopmachine/utils/claude_live_common.mjs +45 -0
package/assets/slopmachine/utils/claude_live_turn.mjs +2 -0
package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.mjs +9 -2
package/assets/slopmachine/workflow-init.js +6 -7
package/package.json +1 -1
package/src/constants.js +1 -0
package/src/init.js +41 -28
package/assets/slopmachine/utils/__pycache__/normalize_claude_session.cpython-311.pyc +0 -0

package/assets/claude/agents/developer.md CHANGED Viewed

@@ -11,12 +11,12 @@ You are a senior software engineer working inside a bounded execution session.
 Treat the current working directory as the project. Ignore files outside it unless explicitly asked to use them. Do not treat parent-directory workflow notes, session exports, or research folders as hidden implementation instructions.
-Read and follow `CLAUDE.md` before implementing.
+Read and follow `CLAUDE.md` before implementing. If `plan.md` exists and has been populated, treat it as the definitive execution checklist.
 ## Core Standard
 - think before coding
-- build in coherent vertical slices
+- build in coherent end-to-end workstreams
 - keep architecture intentional and reviewable
 - do real verification, not confidence theater
 - keep moving until the assigned work is materially complete or concretely blocked
@@ -56,15 +56,27 @@ Do not narrow scope for convenience.
 - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
 - make unresolved items rare, narrow, and explicit
 - when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
-- planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
+- planning-only deliverables inside the repo should normally stay minimal, but `plan.md` is the explicit allowed execution-plan artifact
+- when planning is accepted, treat the execution file tree and file-ownership map in `plan.md` as real execution boundaries rather than decorative notes
+- if the current work is scaffold, treat the accepted scaffold playbook contract in `plan.md` as binding; do not re-choose the playbook, starter, or bootstrap path unless the project lead explicitly reopens planning
+- if scaffold instructions are still vague about the playbook or bootstrap command, raise that as a planning gap instead of improvising a new scaffold contract
+- for adopted projects, inspect the current repo tree first and use the accepted `plan.md` delta tree rather than assuming a greenfield layout
 - when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
 - do not continue into extra follow-on work that the project lead did not ask for
 - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
+- keep repo-root `./run_tests.sh` as the primary broad test entrypoint; do not relocate it into subdirectories or replace it with a different primary script path
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
 - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
 - before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
+- keep `README.md` and other shared integration-heavy files main-session-owned by default during parallel work unless the accepted plan explicitly delegates them
 - stay in this one developer session as the primary execution lane, but use internal Claude task sub-agents when they can parallelize independent search, reading, verification, or bounded implementation subtasks usefully
 - prefer internal Claude sub-agents when the work naturally decomposes into independent chunks that can be explored or verified in parallel and merged back cleanly
+- when `plan.md` marks independent implementation sections as parallelizable, use internal Claude task fan-out to execute those bounded sections in parallel when that will materially reduce elapsed time
+- keep `plan.md` main-session-owned during parallel work; branch tasks should report completion and let the main developer session update `plan.md` after merge
+- when `plan.md` marks mutually exclusive file ownership, default to separate branches or worktrees for those sections when the environment supports it cleanly
+- when worktree support is unavailable, still default to parallel fan-out with the same owned-file boundaries rather than falling back to serial work by habit
+- keep shared files and final integration work in the main developer session unless the accepted plan explicitly delegates them
+- after any parallel fan-out, reconcile the work in the main developer session, verify the integrated result yourself, and only then mark the relevant `plan.md` items complete
 - when explicit agent selection is available for internal task fan-out, prefer the installed `developer` agent for implementation-capable branches so helper work stays aligned with the same engineering standard
 - use built-in helper agents only for narrow read-only discovery, comparison, or planning assistance when they are the better fit than another `developer` branch
 - avoid pointless fan-out for trivial single-file or single-command work
@@ -72,12 +84,23 @@ Do not narrow scope for convenience.
 ## Parallel Execution Model
 - before deeper implementation, do a quick serial-versus-parallel check instead of defaulting to one long serial branch
-- when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, use internal Claude task fan-out instead of serializing by habit
+- before broad fan-out, establish the small shared-file contract from `plan.md` in the main session so parallel branches start from the same stabilized shared files and interfaces
+- when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, default to worktree-backed or branch-backed internal Claude task fan-out instead of serializing by habit
 - good parallel candidates include independent repo reading, verification passes, separate test additions, and implementation branches that touch different modules or well-separated files
 - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
-- before fan-out, define the branch contract clearly: expected outcome, boundaries, important shared constraints, and merge condition
+- before fan-out, define the branch contract clearly: expected outcome, owned files, boundaries, important shared constraints, support check, and merge condition
+- before fan-out, respect the owned-files map from the accepted plan and do not casually cross into another branch's files
 - after fan-in, reconcile the branches yourself, resolve any overlap cleanly, and run final targeted verification on the integrated result before reporting completion
 - prefer a small number of meaningful branches over spawning many tiny sub-tasks; 2 or 3 good parallel branches are usually enough
+- use the main developer session as the final integration authority; helper branches may accelerate bounded sections, but coherence, correctness, and final merge discipline stay with the main session
+## Git Discipline
+- keep the implementation git-backed as work progresses in both the main session and any parallel branches or worktrees
+- after each feature-complete or otherwise meaningful completed workstream, stage and create a small descriptive progress commit before moving on
+- when parallel branches or worktrees are used, each one should commit meaningful progress as it goes instead of leaving all history to the final merge
+- after fan-in, create a main-session integration commit for the merged result once the integrated verification for that merge point passes
+- do not commit broken work, secrets, local-only junk, or unrelated noise
 ## Verification Cadence

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -35,7 +35,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
 - make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
 - do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
 - each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
-- default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
+- default to one bounded engineering objective per owner turn, except for the intentional broad `plan.md` execution run after scaffold where the worker is expected to complete the whole accepted implementation checklist end to end
 ## Session-presence rule
@@ -50,7 +50,7 @@ Before any Claude-backed developer work continues:
 Choose the first-launch action by boundary:
 - `P2` planning entry with no Claude session yet -> launch `develop-1` and perform the planning handshake
-- `P3` through `P6` entry with no recoverable `develop-N` session yet -> launch the appropriate `develop-N` lane, orient it to the current repo state, then continue with the bounded scaffold, development, verification, or hardening turn
+- `P3` through `P5` entry with no recoverable `develop-N` session yet -> launch the appropriate `develop-N` lane, orient it to the current repo state, then continue with the bounded scaffold, end-to-end development, or release-alignment turn
 - `P7` remediation routed to `develop-N` after a `fail` audit with no recoverable develop session yet -> launch the intended `develop-N` lane, orient it to the current delivered repo state and upcoming evaluator-driven remediation, then continue with the issue list
 - `P7` remediation routed to `bugfix-N` after a `partial pass` audit -> launch the fresh `bugfix-N` lane and use the bugfix orientation handshake below
@@ -74,7 +74,7 @@ node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --run
 - choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
 - default to `--model opus --effort medium` for ordinary planning, scaffold, development, and routine bugfix work
-- escalate to `--model opus --effort xhigh` for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
+- escalate to `--model opus --effort xhigh` for genuinely difficult planning, security-critical fused `P5` review/fix work, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
 - use `--model sonnet --effort medium` for documentation-heavy, lightweight, or otherwise materially simpler work where the lower-cost lane is sufficient
 - keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
 - pass an explicit `--effort <level>` at launch time instead of relying on the CLI default; `medium` is the normal baseline and `xhigh` is the difficult-task override
@@ -113,7 +113,7 @@ Before sending any message into the live lane:
 1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
 2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
-3. decide the prompt kind explicitly, such as `planning-start`, `planning-revision`, `scaffold-start`, `scaffold-review`, `development-slice`, `development-correction`, `bugfix-orientation`, `bugfix-fix`, `resume`, or `recovery`
+3. decide the prompt kind explicitly, such as `planning-start`, `planning-revision`, `scaffold-start`, `scaffold-review`, `development-run`, `development-correction`, `release-alignment-fix`, `bugfix-orientation`, `bugfix-fix`, `resume`, or `recovery`
 4. gather only the minimum accepted-plan sections, clarified requirements, boundary summary, and fresh deltas needed for this turn
 5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
@@ -127,7 +127,7 @@ For substantive live-lane turns, write the message in natural engineering langua
 - `Context snapshot`: the current accepted state and only the fresh deltas that matter now
 - `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
-- `This turn only`: the bounded deliverable for this turn and whether this is planning-only, scaffold-only, coding allowed, or correction-only
+- `This turn only`: the bounded deliverable for this turn and whether this is planning-only, scaffold-only, the broad `plan.md` execution run, release-alignment correction work, or issue-fix work
 - `Expected outcomes now`: the exact behaviors, artifacts, or fixes that must exist before this turn can be considered successful
 - `Evidence required now`: the exact verification, file updates, or summaries Claude must return for owner review
 - `Disallowed shortcuts now`: future-work deferrals, placeholder implementations, bypassed auth/validation, fake verification, mixed-boundary drift, or other shortcuts that would make the result misleading
@@ -137,6 +137,7 @@ For substantive live-lane turns, write the message in natural engineering langua
 When the turn intentionally uses internal parallel fan-out, also include:
 - `Branch map`: the 2 or 3 independent branches, their boundaries, and their expected outputs
+- `Owned files`: the files or file families each branch owns exclusively
 - `Shared constraints`: the contracts or files that must stay aligned across branches
 - `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
@@ -155,7 +156,9 @@ For the second planning-direction message in the first `develop` lane and for ot
 - include the initial planning view so Claude refines a direction instead of inventing one from zero
 - restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
 - say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
-- require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
+- require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of the system architecture, modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
+- require `plan.md` to carry the accepted scaffold playbook contract, the execution file tree, file ownership, pre-fan-out shared-file contract, and branch or worktree-ready execution checklist when appropriate
+- require the accepted scaffold playbook contract to name the exact selected playbook or explicit fallback path, exact bootstrap command, required baseline surfaces, scaffold stop boundary, and scaffold acceptance evidence
 - require a concise changed-files summary with the planning response
 ### Planning-revision shape
@@ -171,27 +174,33 @@ When a planning draft is not good enough:
 When entering scaffold work:
+- anchor the request to the accepted scaffold playbook contract in `plan.md` rather than asking Claude to infer the scaffold again from general stack intent
 - cite the relevant accepted design sections and the intended baseline runtime/test/config contract
+- name the exact accepted playbook or fallback path and the exact bootstrap command that scaffold must follow now
 - state that the turn is scaffold-only and name the exact baseline surfaces expected now, such as app shell, routing skeleton, persistence skeleton, config wiring, logging path, validation path, auth foundation, test harness, or README baseline when they apply
+- restate the accepted scaffold stop boundary and acceptance evidence directly from `plan.md`
 - state explicitly which feature work must not begin yet
 - require exact local verification evidence for the scaffold baseline and exact changed files
 - say to stop after the scaffold baseline is complete and verified
+- if the accepted scaffold playbook contract is missing or still vague, do not start scaffold; correct planning first
-### Development-slice shape
+### Development-run shape
-For ordinary implementation turns:
+For the primary post-scaffold implementation run:
-- anchor the request to the relevant accepted plan sections and current boundary summary
-- name the exact slice, user/admin actor path, modules, or surfaces to complete now
-- itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
-- require targeted local verification tied back to those expected outcomes
-- explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
-- when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
-- say to stop after this slice and report the exact changed files plus exact verification results
+- anchor the request to the accepted design and to `plan.md` as the definitive execution checklist
+- say explicitly that `plan.md` should be worked end to end and updated in place from the main lane as items are completed
+- make clear that the worker should finish the whole application rather than stopping after one narrow workstream unless a true blocker prevents completion
+- require targeted local verification during the run and an honest final verification summary when the full plan has been completed
+- explicitly prohibit broad verification commands that are reserved for later owner-run gate checks
+- allow internal parallel fan-out where `plan.md` items can be executed independently with stable boundaries
+- first establish the shared-file contract in the main lane, then default to separate branches or worktrees for mutually exclusive file sets when support is available, and otherwise default to parallel fan-out with the same ownership boundaries
+- keep `plan.md`, `README.md`, and other shared-file integration in the main lane unless the accepted plan explicitly delegates them
+- say to stop only after the whole implementation plan is complete or a real blocker prevents further progress
 ### Development-correction shape
-When the worker partially missed the slice or crossed boundaries:
+When the worker partially missed the current workstream or crossed boundaries:
 - quote the exact missing outcome, regression risk, or evidence gap
 - ask for a correction-only turn focused on those gaps
@@ -202,9 +211,9 @@ When the worker partially missed the slice or crossed boundaries:
 When resuming a long-lived lane:
-- start from the stored boundary summary and the relevant accepted plan sections instead of replaying broad history
+- start from the stored boundary summary, the relevant accepted design sections, and the current state of `plan.md` instead of replaying broad history
 - include only the new delta since the last accepted state
-- restate the current bounded task, evidence required, and stop boundary
+- restate the current work as continuing from where the worker stopped in `plan.md`
 - do not re-dump the entire project or workflow unless continuity is genuinely broken
 ### Bugfix issue-turn shape
@@ -225,6 +234,7 @@ Do not do these:
 - ask for multiple gate exits in one turn
 - let Claude decide its own stopping point implicitly
 - pass parent-directory file paths as hidden instructions instead of restating the needed content directly
+- send a scaffold prompt that tells Claude to choose or rediscover the playbook at runtime instead of using the accepted `plan.md` scaffold contract
 - paste raw bridge state, raw transcript payloads, or workflow bookkeeping into normal developer prompts
 - respond to a weak result by broadening the next prompt instead of correcting the specific gap
@@ -269,7 +279,7 @@ The purpose of this backend is to preserve one large complete conversation per b
 - the `develop` slot should stay one continuous Claude session unless irrecoverable failure forces replacement
 - the `bugfix` slot should stay one continuous Claude session unless irrecoverable failure forces replacement
-- do not start a fresh Claude worker for every slice, clarification, or review loop
+- do not start a fresh Claude worker for every workstream, clarification, or review loop
 - do not roll sessions casually just because the conversation is long
 - internal Claude task sub-agents are allowed inside the same developer session when they help parallelize independent bounded work cleanly
 - prefer task fan-out for parallel discovery, repo reading, comparison, or verification passes when those branches can be merged back without ambiguity
@@ -296,8 +306,8 @@ Preferred second planning-direction message shape:
 - include the initial planning view so planning is refined collaboratively rather than invented from zero
 - add any short delta notes that are not already captured in that inlined summary
 - express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
-- require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
-- ask for repo-local planning artifacts plus a concise changed-files summary
+- require the plan to fill the planning artifacts densely, especially `../docs/design.md`, and require a repo-local `plan.md` that turns the accepted design into an ordered execution checklist
+- ask for the planning artifacts plus a concise changed-files summary
 - say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
 Do not tell the developer worker to read files outside `repo/`.
@@ -306,12 +316,12 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
 ### Adopted or repaired `develop-N` orientation handshake
-When work enters scaffold, development, integrated verification, hardening, or `fail`-routed remediation without a recoverable `develop-N` Claude session yet:
+When work enters scaffold, end-to-end development, the fused release-alignment phase, or `fail`-routed remediation without a recoverable `develop-N` Claude session yet:
 1. launch the live `develop-N` lane needed for that boundary
 2. use the first message only to orient that session to the current repo and delivered state
 3. make clear in plain engineering language that the codebase already exists and work is continuing from the current state rather than starting from zero
-4. say what kind of bounded follow-up work will come next, such as scaffold completion, a development slice, verification corrections, hardening, or evaluator-driven remediation
+4. say what kind of follow-up work will come next, such as scaffold completion, the broad `plan.md` execution run, release-alignment corrections, or evaluator-driven remediation
 5. wait for the first response and store the Claude session id from bridge `state.json`
 6. only after that orientation exchange, continue the same live lane with the first bounded work request
@@ -447,6 +457,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
 - mark the active developer session status as `rate_limited`
 - preserve the same Claude session id as the active tracked developer session
 - use the packaged `~/slopmachine/utils/claude_wait_for_rate_limit_reset.sh` helper or the built-in turn retry path to wait until the reset time specified by Claude, then continue from the same live lane
+- do not require manual tmux interaction for the standard rate-limit path; the helper and built-in retry path should dismiss the Claude rate-limit popup automatically before waiting
 - update `../.ai/metadata.json` and Beads `SESSION:` or `HANDOFF:` comments to record the blocked state, wait window, and resumed continuity clearly
 - only surface the situation to the user if the reset time cannot be determined or the wait or resume path itself fails

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -77,10 +77,11 @@ If bootstrap seeded a later `current_phase` from `requested_start_phase`, verify
 - `../metadata.json` exists with project-fact fields
 - `../.ai/startup-context.md` exists
 - seeded parent-root docs exist, including `../docs/questions.md`, `../docs/design.md`, `../docs/test-coverage.md`, and `../docs/api-spec.md`
+- seeded repo `plan.md` exists as the execution-plan file
 - parent-root `../.tmp/` exists as the evaluation artifact directory
 - seeded repo `README.md` exists
 - seeded repo `.claude/settings.json` exists with the repo-local Claude default-agent configuration
-- root workflow Beads exist for `P1` through `P10`
+- root workflow Beads exist for the redesigned root lifecycle, including `P1`, `P2`, `P3`, `P4`, `P5`, `P7`, `P8`, `P9`, and `P10`
 - developer-session tracking is initialized
 - the backend-appropriate repo-local developer rulebook file has been chosen or is ready to be chosen in `P1`
@@ -171,7 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata w
 - keep exactly one active developer session at a time
 - record every developer session in `developer_sessions`
-- from `P2` through `P6`, default to one long-lived `develop-1` lane
+- from `P2` through `P5`, default to one long-lived `develop-1` lane
 - default the launch model for that long-lived lane to `opus` with `medium` effort
 - raise that lane to `opus` with `xhigh` effort only when the work is genuinely difficult enough to justify it
 - when launching a documentation-heavy or otherwise materially simpler lane, prefer `sonnet` with `medium` effort
@@ -248,14 +249,15 @@ For live Claude lanes specifically:
 - at meaningful accepted boundaries inside a long developer lane, refresh `last_result_summary` with a compact current-state snapshot instead of relying on the full prior conversation history
 - the boundary summary should capture only the current accepted contract, the current major guardrails, the most relevant changed areas, and the real unresolved issues that still matter
-- prefer boundary summaries at least at: accepted planning, scaffold acceptance, development-complete, integrated-verification completion, hardening completion, and bugfix-lane entry
+- prefer boundary summaries at least at: accepted planning, scaffold acceptance, development-complete, `P5` release-alignment completion, and bugfix-lane entry
 - when resuming a long-lived developer lane, use the boundary summary plus the relevant accepted plan section before replaying or re-describing broader history
 - keep these summaries short and decision-oriented so they reduce future context drag instead of becoming another source of prompt bloat
 ## Initial structure rule
 - parent-root `../docs/` is the owner-maintained external documentation directory
-- parent-root `../sessions/` is the cleaned raw session-export directory for non-Claude developer sessions
+- parent-root `../sessions/` is not part of bootstrap and should not be created during workspace init
+- create parent-root `../sessions/` only during packaging when non-Claude developer sessions actually need cleaned export files
 - Claude-backed developer sessions are packaged once as parent-root `claude-sessions.zip` instead of per-session `../sessions/` entries
 - parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check.md`, and `test_coverage_and_readme_audit_report.md`
 - parent-root `../.ai/claude-live/` is the live Claude bridge runtime directory root

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -1,21 +1,24 @@
 ---
 name: development-guidance
-description: Developer-facing slice execution and local verification guidance for slopmachine.
+description: Developer-facing end-to-end execution and local verification guidance for slopmachine.
 ---
 # Development Guidance
-Use this skill during `P4 Development` before prompting the developer.
+Use this skill during `P4 End-to-End Development` before prompting the developer.
-## Slice model
+## Plan execution model
-- work in bounded vertical slices
-- complete the real user-facing and admin-facing surface for the slice
-- keep slice-local planning, implementation, verification, and doc sync together
-- after planning is accepted, use the relevant accepted plan section as the slice baseline instead of expecting the owner to restate the full slice contract
-- when the owner provides a stage-exclusive checklist for the current slice or gate, treat that checklist as a hard acceptance contract and respond against it explicitly rather than answering loosely
-- before deeper implementation, do a quick serial-versus-parallel check for the current slice instead of defaulting to one long serial branch
-- when the slice contains 2 or 3 independent units with stable interfaces and low shared-file overlap, use parallel task fan-out for those units and then merge back into one reviewed result
+- treat `plan.md` as the definitive implementation execution checklist
+- after planning is accepted, execute the whole application by working through `plan.md` end to end rather than waiting for many narrow owner prompts
+- update `plan.md` in place from the main developer lane as items move from not started to in progress to done
+- use `../docs/design.md` for system intent and architecture only, and use `plan.md` for the execution file tree, exact execution order, file ownership, and progress state
+- read the planned file tree and file-ownership map before deeper implementation so parallel work is driven by real file boundaries instead of vague feature labels
+- before broad fan-out, establish the small shared-file contract in the main lane so parallel branches or worktrees start from the same stabilized shared files and interfaces
+- treat `plan.md` as main-lane-owned during parallel work; branch lanes should report completion and let the main lane update `plan.md` after merge
+- when the owner provides a bounded correction or release-alignment checklist, treat it as a hard acceptance contract and respond against it explicitly
+- if interrupted, resume from the current state of `plan.md` instead of inventing a new hidden plan
+- still use good engineering decomposition internally, but keep the visible execution contract anchored to `plan.md`
 ## Module implementation guidance
@@ -23,7 +26,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - define the module purpose, constraints, and edge cases before coding
 - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
 - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
-- when working inside a slice, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims this slice could affect before reporting readiness
+- when working inside a `plan.md` workstream, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims that workstream could affect before reporting readiness
 - implement real behavior, not partial scattered logic
 - handle failure paths and boundary conditions
 - add or update tests as part of the module work
@@ -31,15 +34,18 @@ Use this skill during `P4 Development` before prompting the developer.
 - when backend or fullstack API endpoints are added or changed, add or update real HTTP tests for the exact `METHOD + PATH` where practical instead of relying only on controller/service-level tests
 - when mocked HTTP coverage or unit-only coverage still exists, keep it explicit in the coverage notes instead of overstating it as equivalent to true no-mock endpoint coverage
 - when backend or fullstack API tests are material, keep the test names, fixtures, or assertions audit-readable enough that a reviewer can trace the endpoint, request input, and expected response behavior statically
-- keep track of important modules that still lack meaningful tests so hardening does not have to rediscover them from scratch
-- define the branch contract before parallelizing: expected outcome, boundaries, shared constraints, merge condition, and required verification
+- keep track of important modules that still lack meaningful tests so fused `P5` does not have to rediscover them from scratch
+- define the branch contract before parallelizing: expected outcome, owned files, boundaries, shared constraints, merge condition, and required verification
+- when `plan.md` marks mutually exclusive owned files, default to separate branches or worktrees for those sections when the environment supports it cleanly
+- when worktree support is unavailable, still default to parallel branch or subagent execution using the same owned-file boundaries instead of falling back to serial work by habit
+- keep `plan.md`, `README.md`, and other shared integration-heavy files in the main lane unless the accepted plan explicitly delegates them
 - keep parent-root `../docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
 - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
 - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
 - for backend or fullstack work, keep configuration reads on the shared config path instead of introducing new scattered direct environment access in feature code
 - keep frontend and backend contracts synchronized when the module spans both sides
 - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
-- before closing the slice, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this slice lands?
+- before closing the current workstream, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this work lands?
 - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
 - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
 - verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
@@ -66,38 +72,39 @@ Use this skill during `P4 Development` before prompting the developer.
 - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
 - before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
 - before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
-- verify the module against its planned behavior before trying to move on
-- do not move on while the module is still obviously weak or half-finished
-- do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
+- verify the current `plan.md` workstream against its planned behavior before trying to move on
+- do not move on while the current `plan.md` workstream is still obviously weak or half-finished
+- do not spread broad partial logic across many modules; bias toward completed trustworthy workstreams before opening the next major chunk
 - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
+- do not cross into files owned by another planned branch unless the accepted plan or current owner instruction explicitly opens that boundary
 - after parallel fan-in, run final targeted verification on the integrated result rather than trusting the branch-local checks alone
 ## Verification model
 - use targeted local verification by default
-- avoid broad project-standard gate commands during ordinary slice work
+- avoid broad project-standard gate commands during ordinary `P4` implementation work
 - prefer fast local language-native or framework-native test commands for the changed area during normal iteration
 - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
-- if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
+- if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary `P4` implementation work
 - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
 - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
 - local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
-- do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the slice is considered complete
-- do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
-- for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
+- do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the current `plan.md` workstream is considered complete
+- do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary `P4` implementation work
+- for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary `P4` implementation work
 - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
-- for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
-- when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
+- for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary `P4` implementation work rather than broad checkpoint commands
+- when the current workstream materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
 - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
 - for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
 - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
 - when backend logging matters, keep request or route outcomes, exceptions, and background failure logging on the shared structured logging path with redaction intact
 - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
-- keep the test surface moving toward the hard minimum 90 percent coverage threshold as slices are completed, and do not defer obvious coverage debt to hardening
+- keep the test surface moving toward the hard minimum 90 percent coverage threshold as `plan.md` workstreams are completed, and do not defer obvious coverage debt to fused `P5`
 - for backend or fullstack APIs, keep `../docs/test-coverage.md` moving toward an endpoint inventory plus API test mapping table, not just a generic risk matrix
-- in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
-- when the owner names specific expected outcomes for the slice or gate, tie the reported verification and changed files back to those expected outcomes explicitly
-- keep ordinary slice-complete replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
+- in each development follow-up or completion reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
+- when the owner names specific expected outcomes for the current workstream or gate, tie the reported verification and changed files back to those expected outcomes explicitly
+- keep ordinary development follow-up replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
 ## Quality rules

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -17,7 +17,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
 - do not silently drop, merge away, or wave through issues from the current audit report
 - the owner must read the current audit report and extract the issues before talking to the developer
 - after the developer claims the fixes are complete for a `partial pass` audit, return to the same evaluator session that produced that audit report
-- keep ordinary post-hardening evaluation remediation inside `P7`
+- keep ordinary post-`P5` evaluation remediation inside `P7`
 ## Fresh-audit result handling

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -1,102 +1,12 @@
 ---
 name: hardening-gate
-description: Release-readiness hardening rules for slopmachine.
+description: Compatibility shim for the merged integrated-verification-and-hardening phase.
 ---
 # Hardening Gate
-Use this skill only during `P6 Hardening`.
+`P6 Hardening` no longer exists as a standalone workflow phase.
-## Hardening audit priorities
+Use `integrated-verification` as the canonical source of truth for the merged `P5 Integrated Verification and Hardening` phase.
-The hardening phase should explicitly prepare the project to pass the final audit in these priority areas:
-1. prompt-fit
-2. security-critical flaws
-3. test sufficiency
-4. major engineering quality
-Hardening should treat these as the main review buckets before final evaluation begins.
-## Hardening scope
-- dependency hygiene
-- secret and config hygiene
-- prototype residue cleanup
-- docs honesty
-- observability and redaction hygiene
-- fragile-test and release-readiness cleanup
-## Hardening guidance
-- run a prompt-fit sweep for silent requirement substitution, partially delivered hard requirements, frontend/backend mismatch, and business-flow drift
-- audit security boundaries, validation, ownership, and secret handling
-- prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
-- audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
-- audit whether parent-root `../docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, gaps, and the intended minimum 90 percent threshold in a way the owner can follow quickly
-- audit whether the project actually meets the minimum 90 percent coverage threshold for the relevant behavior surface rather than relying on a thin happy-path suite
-- require concrete coverage evidence during hardening, such as a stack-native coverage report, configured threshold, or equally explicit proof; do not accept approximate claims here
-- when backend or fullstack APIs exist, audit whether `../docs/test-coverage.md` includes a resolved endpoint inventory, API test mapping, mock classification, and the important modules that still lack meaningful tests
-- when backend or fullstack APIs exist, audit whether core endpoint coverage is truly no-mock HTTP where it matters, and whether mocked or indirect tests are being overstated as stronger evidence than they are
-- audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
-- inspect architecture, coupling, file size, and maintainability risks
-- focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
-- check for bad engineering practices that accumulated during implementation
-- tighten weak tests, weak docs, and weak operational instructions
-- audit static review readiness: entry points, routes, config, README, and test commands should be traceably consistent without depending on runtime tribal knowledge
-- audit that the repo is self-sufficient and does not rely on parent-root docs or sibling workflow artifacts for static reviewability
-- audit owner-maintained external docs under `../docs/` when relevant, especially `design.md`, `api-spec.md`, `test-coverage.md`, and `questions.md`
-- audit static security-boundary readiness: a fresh reviewer should be able to trace auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation from repository artifacts when applicable
-- if mock, stub, fake, interception, or local-data behavior exists, verify that its scope, default state, and boundaries are disclosed accurately and do not imply undisclosed real integration
-- audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
-- audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
-- audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
-- for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
-- audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
-- verify that missing failure handling is not being hidden behind fake-success behavior
-- run exploratory testing around awkward states, repeated actions, and realistic edge behavior
-- re-check frontend and backend observability, redaction, and operator visibility paths
-- run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
-- enforce env-file discipline during hardening
-- run documentation verification against the real codebase and runtime behavior, not just document existence
-- if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
-- verify that every dependency needed by the README-documented `docker compose up --build` and `./run_tests.sh` paths is declared in Dockerfiles or other repo-controlled container build definitions rather than relying on host-installed packages or runtimes
-- audit README compliance against the strict post-bugfix README review shape:
-  - project type near the top
-  - startup instructions
-  - access method
-  - verification method
-  - demo credentials for every known role or the exact statement `No authentication required`
-  - architecture and workflow clarity
-- for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
-- verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
-- before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
-- re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
-- enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
-- make sure the system is genuinely reviewable and reproducible
-- keep hardening narrow: do not turn this phase into a hidden extra development slice or a broad rediscovery pass
-- prefer final honesty, consistency, static-review, and release-readiness cleanup over new implementation work
-## Required hardening output
-Before `P6` can close, the owner should have a clear answer for each of these:
-- prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
-- security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
-- test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
-- coverage depth: does the current evidence prove the minimum 90 percent coverage threshold for the relevant behavior surface, and if not, what remains weak?
-- endpoint coverage readiness: if backend or fullstack APIs exist, could a strict static reviewer map the important `METHOD + PATH` surfaces to true no-mock HTTP tests, mocked HTTP tests, or unit-only coverage without guessing?
-- major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
-- static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
-- security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
-- coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
-- README hard-gate readiness: would a fresh static reviewer find the required project type, startup, access, verification, and auth-disclosure sections in `README.md` without reconstructing them from code?
-- frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
-- repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?
-## Rules
-- do not start hardening until integrated verification is explicitly stable
-- hardening is not a disguised second integrated phase
-- if hardening exposes unresolved integrated instability, reopen the earlier phase cleanly
-- do not use hardening for broad feature work
+This file remains only as a compatibility shim so older references do not silently point to stale behavior.