npm - theslopmachine - Versions diffs - 0.4.1 → 0.4.2 - Mend

theslopmachine 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +3 -4
package/assets/agents/slopmachine.md +39 -2
package/assets/skills/clarification-gate/SKILL.md +53 -4
package/assets/skills/developer-session-lifecycle/SKILL.md +6 -0
package/assets/skills/evaluation-triage/SKILL.md +40 -1
package/assets/skills/final-evaluation-orchestration/SKILL.md +15 -1
package/assets/skills/planning-guidance/SKILL.md +9 -3
package/assets/skills/retrospective-analysis/SKILL.md +91 -0
package/assets/skills/scaffold-guidance/SKILL.md +16 -3
package/assets/skills/submission-packaging/SKILL.md +39 -22
package/assets/skills/verification-gates/SKILL.md +21 -3
package/assets/slopmachine/templates/AGENTS.md +15 -4
package/package.json +23 -23
package/src/constants.js +1 -0
package/src/init.js +23 -9
package/src/install.js +13 -2

package/README.md CHANGED Viewed

@@ -164,16 +164,15 @@ The v2 workflow also expects:
 Every bootstrapped project should expose:
-- one primary documented launch/run command for its selected stack
-- one primary documented full-test command for its selected stack
+- one primary documented runtime command
+- one primary documented broad test command: `./run_tests.sh`
 Follow the original prompt and the existing repository first. Use the examples below only when they do not already specify the platform or stack.
 Examples:
 - web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
-- Expo mobile: `npx expo start` and the project's single full-test command
-- Electron desktop: `npm run dev` and the project's single full-test command
+- mobile or desktop when Docker runtime is not the direct run path: `./run_app.sh` and `./run_tests.sh`
 ## Files And Locations

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -109,6 +109,18 @@ State split:
 Do not create another competing workflow-state system.
+## Git Traceability
+Use git to preserve meaningful workflow checkpoints.
+- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
+- meaningful work includes accepted scaffold completion, accepted major development slices, accepted remediation passes, and other clearly reviewable milestones
+- keep the git flow simple and checkpoint-oriented
+- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
+- keep commit messages descriptive and easy to reason about later
+- do not push unless explicitly requested
+- do not commit secrets, local-only junk, or accidental noise
 ## Mandatory Operating Order
 Operate in this order:
@@ -149,6 +161,7 @@ Use these exact root phases:
 - `P8 Final Human Decision`
 - `P9 Remediation`
 - `P10 Submission Packaging`
+- `P11 Retrospective`
 Phase rules:
@@ -157,6 +170,7 @@ Phase rules:
 - do not close multiple root phases in one transition block
 - `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
 - `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
+- `P11 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
 ## Developer Session Model
@@ -199,8 +213,13 @@ Selected-stack rule:
 Every project must end up with:
-- one primary documented launch/run command for the selected stack
-- one primary documented full-test command for the selected stack
+- one primary documented runtime command
+- one primary documented full-test command: `./run_tests.sh`
+Runtime command rule:
+- for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
+- when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
 Default moments:
@@ -208,6 +227,12 @@ Default moments:
 2. development complete -> integrated verification entry
 3. final qualified state before packaging
+For Dockerized web backend/fullstack projects, enforce this cadence:
+- after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
+- after that, do not run Docker again during ordinary development work
+- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
 Between those moments, rely on:
 - local runtime checks
@@ -245,6 +270,7 @@ Core map:
 - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
 - `P9` -> `remediation-guidance`
 - `P10` -> `submission-packaging`, `report-output-discipline`
+- `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
 - state mutations -> `beads-operations`
 - evidence-heavy review -> `owner-evidence-discipline`
 - planned developer-session switch -> `session-rollover`
@@ -327,6 +353,16 @@ When `P10 Submission Packaging` begins:
 - follow its exact artifact, export, cleanup, and output contract
 - do not close packaging until every required final artifact path has been verified
+## Retrospective
+After `P10 Submission Packaging` closes successfully:
+- automatically enter `P11 Retrospective`
+- load `retrospective-analysis`
+- write dated retrospective output under `~/slopmachine/retrospectives/`
+- keep it owner-only and non-blocking by default
+- reopen packaging only if the retrospective finds a real packaged-result defect
 ## Completion Standard
 The workflow is not done until:
@@ -335,6 +371,7 @@ The workflow is not done until:
 - the current root phase closed cleanly
 - the workflow ledger closed cleanly
 - the final package is assembled and verified in its final structure
+- the retrospective phase has either documented improvements or reopened and resolved any real packaging defect it found
 Success means:

package/assets/skills/clarification-gate/SKILL.md CHANGED Viewed

@@ -45,6 +45,8 @@ Use this skill only during `P1 Clarification`.
 - never use defaults that drift from the original prompt
 - do not use quick, loose, or simplifying assumptions that shrink what the prompt asked for
 - do not guess through material ambiguity
+- do not expand the clarification artifact just to exhaust every minor edge case when the scope is already clear enough to plan correctly
+- once the core scope is understood, prefer a compact clarification record plus explicit safe defaults over a giant exhaustive rewrite
 ## Required outputs
@@ -52,16 +54,63 @@ Use this skill only during `P1 Clarification`.
 - developer-facing clarification prompt in `../.ai/clarification-prompt.md`
 - explicit list of safe defaults and resolved ambiguities
+## `questions.md` contract
+`../docs/questions.md` is not a general project summary.
+It exists only for prompt items that needed interpretation because they were unclear, incomplete, or materially ambiguous.
+Each entry should answer this structure:
+1. what was unclear from the original prompt
+2. how you interpreted it
+3. what decision or solution you chose for it
+4. why that choice is prompt-faithful and reasonable
+Keep the file narrow and explicit.
+Do not use `questions.md` for:
+- a full restatement of the entire prompt
+- broad planning notes
+- general project requirements that were already clear
+- implementation details that belong in planning or design docs
+Preferred entry shape:
+```md
+## Item N: <short ambiguity title>
+### What was unclear
+<the exact ambiguity or missing detail>
+### Interpretation
+<how it was interpreted>
+### Decision
+<the chosen resolution or safe default>
+### Why this is reasonable
+<brief justification tied to prompt faithfulness>
+```
+If nothing material was unclear, keep `questions.md` minimal rather than inventing content.
 ## Clarification-prompt validation loop
-- compare the original prompt and the prepared clarification prompt using a fresh ephemeral `General` session, never the developer session
-- build one self-contained validation prompt block for that `General` session every time
-- include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md` in that block
+- compare the original prompt and the prepared clarification prompt using one dedicated `General` validation session, never the developer session
+- do not create a new validation session for every retry unless the session became unusable or a fundamental misunderstanding requires a clean restart
+- on the first validation pass, build one self-contained validation prompt block for that `General` session
+- on that first pass, include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md`
 - do not use placeholders such as `same as previous`, `from context`, `see above`, or `latest artifact`
 - ask that `General` session whether the clarification prompt deviates from, weakens, narrows, or violates the original prompt in any way
 - require it to judge whether the clarification prompt is a genuine improvement in execution quality while remaining faithful to the original intent
-- if mismatches or prompt drift are found, revise the questions record and clarification prompt, then build a newly composed full validation block and run the check again
+- if the validator suggests real fixes, patch the existing questions record and clarification prompt directly; do not restart the clarification phase from scratch unless the validator found a fundamental scope misunderstanding
+- treat validator output as a correction list, not as a reason to regenerate giant clarification blocks repeatedly
+- when rerunning validation in the same validator session, send only the improved clarification payload and the concrete fixes you made; do not resend the original prompt block if the session already has that context
+- rerun validation only after applying the concrete fixes that matter
 - keep the validation loop bounded and intentional; prefer one strong pass plus a small number of revision cycles over repeated loose churn
+- once prompt-faithfulness is satisfied and the remaining notes are minor or cosmetic, stop iterating and proceed
 - only treat the clarification prompt as approved for developer use after this validation loop passes and your own review agrees
 - requesting human approval before this validation loop passes is illegal

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -102,6 +102,12 @@ Track at least:
 - `awaiting_human`
 - `clarification_approved`
 - `remediation_round`
+- `clarification_validator_session_id`
+- `evaluation_pass`
+- `backend_evaluation_session_id`
+- `frontend_evaluation_session_id`
+- `last_evaluation_session_id`
+- `passed_evaluation_tracks`
 - `developer_sessions`
 - `active_developer_session_index`

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -15,15 +15,54 @@ Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
 - do not enter remediation just because a report found something; enter it only when the accepted findings justify it
 - if no remediation is needed, move directly to the final human decision
+## Non-negotiable evaluation buckets
+These areas are hard gates and should not be passed with known meaningful failures:
+1. prompt compliance
+2. requirement fulfillment / delivery completeness
+3. security-critical flaws
+If evaluation finds a real issue in one of those buckets, the default outcome is remediation, not leniency.
+Do not wave through:
+- prompt drift or meaningful requirement mismatch
+- missing core flows or partial delivery of prompt-critical functionality
+- real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
+## Leniency buckets
+These areas may pass with minor residual issues when the product is still clearly acceptable overall:
+1. testing cases / test sufficiency
+2. engineering architecture / engineering quality
+3. aesthetics
+Leniency is allowed only when the issue is:
+- minor in impact
+- not hiding a likely blocker in another bucket
+- not undermining overall confidence in the delivered product
+High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
+If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
 ## Triage rules
 - read both reports and merge the findings into one explicit triage set before deciding what happens next
 - use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
-- any finding marked `Blocker` or `High` should normally be returned for remediation
+- any finding in the non-negotiable buckets should normally be returned for remediation if it is real
+- findings marked `Blocker` should normally be returned for remediation
+- findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
 - findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
 - findings marked `Low` may be passed without remediation
 - do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
 - if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
+- minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
+- minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
+- if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
 - if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
 - if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
 - challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -46,6 +46,19 @@ These two files are the only evaluation prompt sources for evaluation runs.
 - keep reports file-backed and bring only short summaries into chat
 - rerun only the evaluation track that still needs re-evaluation after remediation
+## Evaluation pass strategy
+- use a maximum of 3 full evaluation passes
+- after each evaluation pass, extract a detailed concrete issue list from the failing report(s)
+- send that list back to the active developer session with a direct instruction like: `fix these issues found in evaluation, verify affected flows dont regress after your fixes`
+- if one evaluation track passes, mark it as passed and do not rerun that track in later passes unless a later fix clearly reopens it
+- do not rerun both backend and frontend evaluation tracks when only one still needs re-evaluation
+- after pass 1 and pass 2, use the detailed issue list from the latest failing report(s) to drive the next remediation pass
+- after pass 3, do not create a new evaluation session for the still-failing track
+- after pass 3, send the final fix list back to the developer, then return to the last evaluation session used for that still-failing track and ask whether the last reported issues are now fixed
+- if they are fixed, have that same evaluation session update the report to reflect the current state cleanly, without mentioning recheck, retest, previous issues, or iterative review history
+- the final report should read like a normal current-state evaluation report, not like a patch log
 ## Remediation loop
 - route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
@@ -55,7 +68,8 @@ These two files are the only evaluation prompt sources for evaluation runs.
     - the selected stack's platform-appropriate UI/E2E verification where applicable, with fresh screenshots or equivalent artifacts
 - if remediation materially reopens an owner-run broad milestone boundary, route the project back to that boundary before re-evaluation instead of treating every remediation pass as an automatic broad rerun moment
 - keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
-- remember the external process allows a maximum of 3 repair rounds
+- store backend, frontend, and last-used evaluation session ids in metadata so later passes and packaging can safely reuse the correct session when needed
+- remember the evaluation flow allows a maximum of 3 full evaluation passes before the final issue-verification update path must be used
 ## Boundaries

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -82,9 +82,15 @@ Selected-stack defaults:
 - define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
 - call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
 - define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
-- define a project-standard launch command and a project-standard full-test command early, and keep both compatible with the selected stack
-- for web backend/fullstack projects, default those to `docker compose up --build` and `./run_tests.sh` only when the prompt or existing repo does not already dictate another stack-compatible contract
-- for mobile, desktop, CLI, library, or other non-web projects, define the selected stack's appropriate single documented launch command and single documented full-test command instead of forcing Docker conventions
+- define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
+- for Dockerized web backend/fullstack projects, the runtime contract may be `docker compose up --build` directly when the prompt or existing repo does not already dictate another stack-compatible contract
+- when `docker compose up --build` is not the runtime contract, require `./run_app.sh` as the single primary runtime wrapper for the project
+- for mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow instead of assuming host tooling conventions
+- `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
+- `./run_tests.sh` must prepare or install anything required before running the tests when that setup is needed for a clean environment
+- for Dockerized web backend/fullstack projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
+- for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
+- local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
 - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
 - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
 - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away

package/assets/skills/retrospective-analysis/SKILL.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: retrospective-analysis
+description: Owner-only final retrospective rules for slopmachine.
+---
+# Retrospective Analysis
+Use this skill only after `P10 Submission Packaging` has materially and formally succeeded.
+## Purpose
+- inspect what happened across the whole workflow run
+- identify what caused churn, waste, late defects, or preventable corrections
+- capture lessons that should improve future runs
+- write package-specific retrospective files under `~/slopmachine/retrospectives/`
+## Phase role
+- this is an automatic owner-only phase
+- it is quiet and non-blocking by default
+- it does not create a new human stop
+- it does not rerun broad verification by default
+- it should not reopen development unless it finds a real defect in the already-packaged result
+## Output location
+Write dated retrospective files under:
+- `~/slopmachine/retrospectives/`
+Preferred filenames:
+- `retrospective-YYYY-MM-DD.md`
+- `improvement-actions-YYYY-MM-DD.md`
+If only one file is needed, the retrospective file is sufficient.
+## Evidence sources
+Prefer existing workflow artifacts first:
+- root metadata
+- questions/clarification record
+- clarification prompt
+- planning artifacts
+- Beads comments and transitions
+- developer-session handoffs
+- review and rejection history
+- verification gate notes
+- evaluation reports
+- remediation records
+- packaging outputs
+Do not reread the entire codebase unless a real inconsistency requires it.
+Do not rerun broad Docker or full-suite verification just for retrospective analysis.
+## Required retrospective sections
+1. outcome summary
+2. what worked well
+3. what caused waste or looping
+4. what was caught too late
+5. findings by phase
+6. findings by instruction plane:
+   - owner shell
+   - developer prompt
+   - skills
+   - `AGENTS.md`
+7. actionable improvements
+## Audit buckets
+Evaluate at least these buckets in hindsight:
+1. prompt-fit
+2. security-critical flaws
+3. test sufficiency
+4. major engineering quality
+5. token/time waste
+For each meaningful finding, prefer:
+- what happened
+- why it happened
+- where the fix belongs
+- how it should change future runs
+## Rule for reopening work
+- if retrospective finds a real packaging or delivery defect, reopen `P10` and fix it
+- if it finds only improvements, document them and close the retrospective phase

package/assets/skills/scaffold-guidance/SKILL.md CHANGED Viewed

@@ -14,14 +14,24 @@ Use this skill during `P3 Scaffold` before prompting the developer.
 - establish the local verification path and the standardized gate path
 - make prompt-critical baseline behavior real where required
 - keep repo-local `README.md` honest from the start
-- make the selected-stack primary launch command and primary full-test command real from the scaffold stage
+- make the selected-stack primary runtime command and the universal `./run_tests.sh` broad test command real from the scaffold stage
+For Dockerized web backend/fullstack projects, scaffold must make these commands real and working before scaffold can pass:
+- `docker compose up --build`
+- `./run_tests.sh`
 ## Scaffold and foundation guidance
 - create the initial project structure intentionally
 - follow the original prompt and existing repository first; only use the package defaults below when they do not already specify the platform or stack
-- create the selected-stack primary full-test command during scaffold; for web backend/fullstack projects this is usually `./run_tests.sh`, while non-web projects should expose their own single documented full-test command
-- create the selected-stack primary launch command during scaffold; for web backend/fullstack projects this is usually `docker compose up --build`, while non-web projects should expose their own single documented launch command
+- create `./run_tests.sh` during scaffold for every project as the single broad test entrypoint
+- for Dockerized web backend/fullstack projects, make `docker compose up --build` real as the primary runtime command during scaffold
+- when `docker compose up --build` is not the runtime contract, create `./run_app.sh` during scaffold as the single primary runtime wrapper
+- make `./run_tests.sh` self-sufficient from a clean environment by preparing or installing anything it needs before executing the tests
+- for Dockerized web backend/fullstack projects, `./run_tests.sh` must execute the broad test path through Docker and should own that Dockerized test flow directly instead of requiring separate manual pre-setup
+- for non-web or non-Docker projects, `./run_tests.sh` must execute the selected stack's platform-equivalent broad test flow while preserving the same single-command interface
+- local non-Docker test commands should still be installed and working for normal development iteration
 - create required testing directories and baseline docs structure
 - put baseline config and logging structure in place
 - install and configure the local test tooling needed for ordinary iteration during scaffold rather than deferring local testing setup to later phases
@@ -42,6 +52,7 @@ Use this skill during `P3 Scaffold` before prompting the developer.
 - require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
 - for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
 - establish README structure early instead of leaving it until the end
+- ensure `README.md` clearly documents the primary runtime command and the broad `./run_tests.sh` contract for the selected stack
 - prove the scaffold in a clean state before deeper feature work
 - verify clean startup and teardown behavior under the selected stack's runtime contract
 - for Dockerized web projects, verify clean startup and teardown behavior under the chosen project namespace
@@ -66,3 +77,5 @@ Scaffold should make later slices easier, not force them to retrofit missing fun
 - use local and narrow checks while correcting scaffold work
 - reserve one broad owner-run scaffold gate for actual scaffold acceptance
 - do not spend extra broad reruns once the acceptance question is already answered
+- for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the baseline actually works
+- after that scaffold confirmation, do not run Docker again during ordinary development work; the next Docker-based run should be at development completion when integrated behavior is checked

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -79,11 +79,9 @@ The final submission layout in the parent project root must be:
   - relocated screenshots and proof materials needed for submission review
 - current working directory delivered as parent-root `repo/`
 - `../sessions/`
+  - `develop-N.json`
+  - `bugfix-N.json`
 - `../metadata.json`
-- `../session.json`
-- `../session-N.json` when multiple exported sessions exist
-- `../trajectory.json`
-- `../trajectory-N.json` when multiple trajectories exist
 - parent-root `../.tmp/` directory moved out of current `.tmp/` when it exists
 ## Required packaging actions
@@ -101,32 +99,49 @@ The final submission layout in the parent project root must be:
   - `~/slopmachine/implementation-comparison.md`
   - `~/slopmachine/quality-document.md`
 - ensure `README.md` matches the delivered codebase, functionality, runtime steps, and test steps, stays friendly to a junior developer, and does not reference the external docs set in `../docs/`
-- include the selected stack's primary full-test command and any supporting runner script or wrapper needed to execute it from a clean environment
+- include `./run_tests.sh` and any supporting runner logic it needs to execute the project's broad test path from a clean environment
 - relocate evaluation artifacts into parent-root `../submission/`
 - relocate screenshots and proof materials relevant to runtime behavior and major flows into parent-root `../submission/`
-- include exported session artifacts at the parent project root using the naming rules:
-  - `../session.json` for a single exported session
-  - `../session-N.json` when multiple exported sessions exist
-- include trajectory artifacts at the parent project root using the naming rules:
-  - `../trajectory.json` for a single trajectory
-  - `../trajectory-N.json` when multiple trajectories exist
-- preserve parent-root `../sessions/` as the session artifact directory for any additional exported conversation traces the package needs to retain
+- preserve parent-root `../sessions/` as the session artifact directory for converted workflow session exports
+- export all tracked workflow sessions before generating the final submission documents
+- after the session exports are complete, use the last evaluation session recorded in metadata when generating the final submission report content so the report answers can come from cached evaluation context instead of rebuilding that context from scratch
 ## Session export sequence
-For the developer session, run these exact steps:
+This export sequence must happen first in packaging, before final submission documents are generated.
-1. `opencode export <developer-session-id> > ../session-export.json`
-2. `python3 ~/utils/strip_session_parent.py ../session-export.json --output ../session.json`
-3. `python3 ~/utils/convert_ai_session.py -i ../session.json -o ../trajectory.json`
+Export every tracked workflow session from metadata.
+For each tracked session:
+1. `opencode export <session-id> > ../session-export-<label>.json`
+2. `python3 ~/utils/strip_session_parent.py ../session-export-<label>.json --output ../session-clean-<label>.json`
+3. `python3 ~/utils/convert_ai_session.py -i ../session-clean-<label>.json -o ../sessions/<final-name>.json`
+Naming rule for converted files under `../sessions/`:
+- development-phase sessions become `develop-N.json`
+- hardening or remediation sessions become `bugfix-N.json`
 After those steps:
-- keep the cleaned final exported session as parent-root `../session.json` unless multiple exports require `../session-N.json`
-- keep the generated final trajectory as parent-root `../trajectory.json` unless multiple trajectories require `../trajectory-N.json`
-- treat parent-root `../session-export.json` as a temporary packaging intermediate
-- immediately verify that all expected directories and required files exist before running later packaging steps
-- if the required utilities or output files are missing, packaging is not ready to continue
+- verify every planned developer session has been exported and converted before continuing packaging
+- keep the converted session outputs in `../sessions/` using the naming rules above
+- treat the `../session-export-*.json` and `../session-clean-*.json` files as temporary packaging intermediates unless the package contract later says otherwise
+- if the required utilities, metadata session ids, or output files are missing, packaging is not ready to continue
+- only after these exports are complete may you generate the final submission documents
+## Final report generation order
+After all session exports are complete:
+1. recover the last evaluation session id from metadata
+2. use that last evaluation session to answer the final submission-document questions from its cached context
+3. generate the required final submission documents from that evaluation context plus the canonical `~/slopmachine/` reference files
+Do not start generating the final submission documents before the session exports are complete.
+Do not create a new evaluation session for final report generation if the last evaluation session is still available.
+If the last evaluation session id is missing or unusable, stop and repair metadata/session recovery before continuing packaging.
 ## Required file moves
@@ -155,7 +170,9 @@ After those steps:
 - confirm shared project docs live in parent-root `../docs/` and any accidental repo-local `docs/` copy has been removed from the delivered tree
 - confirm required screenshots have been relocated into parent-root `../submission/`
 - confirm parent-root metadata fields are populated correctly
-- confirm session export naming rules are followed
+- confirm session export naming rules are followed under `../sessions/`:
+  - `develop-N.json` for development-phase sessions
+  - `bugfix-N.json` for hardening/remediation sessions
 ## Submission artifact and response contract

package/assets/skills/verification-gates/SKILL.md CHANGED Viewed

@@ -21,7 +21,12 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - do not allow `.env` files or env-file variants anywhere in the repo tree
 - do not allow a project that requires a preexisting `.env` file in the repo or package to start from scratch
 - if env-file format is needed at runtime, it must be generated ephemerally from the selected runtime environment rather than stored in the repo or package
-- require the README to show one primary launch command and one primary full-test command for the selected stack
+- require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
+- for Dockerized web backend/fullstack projects, that runtime command may be `docker compose up --build` directly
+- when `docker compose up --build` is not the runtime contract, require `./run_app.sh` to be the documented primary runtime wrapper
+- require `./run_tests.sh` to be self-sufficient enough to run from a clean environment, including any required install/setup steps when applicable
+- for Dockerized web backend/fullstack projects, require `./run_tests.sh` to be the Dockerized broad test path used for final broad verification rather than a local-only test wrapper
+- for non-web or non-Docker projects, require `./run_tests.sh` to be the platform-equivalent broad test path used for final broad verification
 ## Review standard
@@ -50,6 +55,8 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - use targeted local verification as the default during scaffold corrections, development, hardening, and remediation
 - reserve the selected stack's broad verification path for the limited owner-run gate moments in the workflow budget
 - do not turn ordinary acceptance into repeated integrated-style gate runs
+- for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
+- after that scaffold confirmation, the next Docker-based run should be at development completion or integrated-verification entry unless a real blocker forces earlier escalation
 ## Verify-fix loop
@@ -69,10 +76,18 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - a broad gate is an owner-run integrated verification boundary, not every ordinary phase change
 - a phase change alone does not automatically require a broad gate unless that phase exit explicitly calls for one
-- a broad gate normally means some combination of full clean runtime proof, the selected stack's primary full-test command, and platform-appropriate UI/E2E evidence when UI-bearing flows exist
+- a broad gate normally means some combination of full clean runtime proof, `./run_tests.sh`, and platform-appropriate UI/E2E evidence when UI-bearing flows exist
 - in v2, the workflow target is at most 3 broad owner-run verification moments across the whole cycle
 - ordinary planning, ordinary slice acceptance, and routine in-phase verification are not broad gates by default and should rely on targeted local verification unless the risk profile says otherwise
+For Dockerized web backend/fullstack projects, the default Docker cadence is:
+1. one owner-run `docker compose up --build` plus one owner-run `./run_tests.sh` after scaffold completion
+2. no more Docker-based runs during ordinary development work
+3. the next Docker-based run at development completion or integrated-verification entry
+Any earlier extra Docker run needs a concrete blocker-based justification.
 ## Runtime gate interpretation
 Use evidence such as internal metadata files, structured Beads comments, verification command results, and file/project-state checks.
@@ -80,10 +95,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - clarification requires the `clarification-gate` conditions plus explicit approval record
 - planning requires the `developer-session-lifecycle` and planning-gate conditions plus a fresh planning-oriented start and the required documentation and repo hygiene state when relevant
 - scaffold requires evidence for the bounded scaffold gate, baseline logging/config, and when relevant the chosen frontend stack and UI approach being set intentionally
-- scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, a usable local test environment in the current working directory, and the primary launch/test commands documented and working for the selected stack when practical
+- scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, a usable local test environment in the current working directory, and the correct primary runtime command plus `./run_tests.sh` documented and working when practical
+- scaffold also requires `./run_tests.sh` to handle its own required setup from a clean environment when applicable
+- local tests should still exist for ordinary development work even when the primary broad test command is Dockerized
 - when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
 - for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
 - scaffold acceptance also requires clean startup and teardown behavior in the selected runtime model; for Dockerized web projects this includes self-contained Compose namespacing and no unnecessary fragile `container_name` usage
+- for Dockerized web backend/fullstack projects, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
 - module implementation requires platform-appropriate local verification and selected-stack UI/E2E evidence when UI-bearing flows are material
 - module implementation acceptance should challenge tenant isolation, path confinement, sanitized error behavior, prototype residue, integration seams, and cross-cutting consistency when those concerns are in scope
 - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete

package/assets/slopmachine/templates/AGENTS.md CHANGED Viewed

@@ -26,16 +26,25 @@ This file is the repo-local engineering rulebook for `slopmachine` projects.
 Every project must expose:
-- one primary documented command to launch or run the application in its selected stack
-- one primary documented command to run the full supported test suite
-- follow the original prompt and existing repository first; use the defaults below only when they do not already specify the platform or stack
+- one primary documented runtime command
+- one primary documented broad test command: `./run_tests.sh`
+- follow the original prompt and existing repository first for the runtime stack; `./run_tests.sh` should exist regardless of project type
+- the primary full-test command should install or prepare what it needs first when that setup is required for a clean environment
 For web backend/fullstack projects, those are usually:
 - `docker compose up --build`
 - `./run_tests.sh`
-For mobile, desktop, CLI, library, or other non-web projects, use the selected stack's appropriate commands instead, but keep them to one clear documented launch command and one clear documented full-test command.
+For Dockerized web backend/fullstack projects:
+- `./run_tests.sh` must run the broad full-test path through Docker
+- local non-Docker tests should still exist for normal development work
+- final broad verification should use the Dockerized `./run_tests.sh` path, not only local test commands
+When `docker compose up --build` is not the runtime contract, provide `./run_app.sh` as the single primary runtime wrapper.
+For mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow, while `./run_tests.sh` remains the single broad test wrapper calling the platform-equivalent full test path.
 ## Testing Rules
@@ -55,6 +64,8 @@ Selected-stack defaults:
 - Keep `README.md` and any codebase-local docs accurate.
 - The README must explain what the project is, what it does, how to run it, and how to test it.
+- The README must clearly document whether the primary runtime command is `docker compose up --build` or `./run_app.sh`.
+- The README must clearly document `./run_tests.sh` as the broad test command.
 - The README must stand on its own for basic codebase use.
 ## Secret And Runtime Rules

package/package.json CHANGED Viewed

@@ -1,25 +1,25 @@
 {
-    "name": "theslopmachine",
-    "version": "0.4.1",
-    "description": "SlopMachine installer and project bootstrap CLI",
-    "license": "MIT",
-    "type": "module",
-    "bin": {
-        "slopmachine": "bin/slopmachine.js"
-    },
-    "scripts": {
-        "start": "node ./bin/slopmachine.js",
-        "check": "node ./bin/slopmachine.js --help"
-    },
-    "engines": {
-        "node": ">=18"
-    },
-    "files": [
-        "bin",
-        "src",
-        "assets",
-        "README.md",
-        "RELEASE.md",
-        "MANUAL.md"
-    ]
+  "name": "theslopmachine",
+  "version": "0.4.2",
+  "description": "SlopMachine installer and project bootstrap CLI",
+  "license": "MIT",
+  "type": "module",
+  "bin": {
+    "slopmachine": "bin/slopmachine.js"
+  },
+  "scripts": {
+    "start": "node ./bin/slopmachine.js",
+    "check": "node ./bin/slopmachine.js --help"
+  },
+  "engines": {
+    "node": ">=18"
+  },
+  "files": [
+    "bin",
+    "src",
+    "assets",
+    "README.md",
+    "RELEASE.md",
+    "MANUAL.md"
+  ]
 }

package/src/constants.js CHANGED Viewed

@@ -39,6 +39,7 @@ export const REQUIRED_SKILL_DIRS = [
   'evaluation-triage',
   'remediation-guidance',
   'submission-packaging',
+  'retrospective-analysis',
   'owner-evidence-discipline',
   'report-output-discipline',
   'frontend-design',

package/src/init.js CHANGED Viewed

@@ -177,17 +177,31 @@ async function maybeOpenOpencode(targetPath, openAfterInit) {
     return
   }
-  const opencodeCommand = await resolveCommand('opencode')
-  if (!opencodeCommand) {
-    warn('OpenCode is not available in PATH, so the project was initialized but could not be opened automatically. Launch OpenCode manually inside repo/.')
-    return
+  log('Opening OpenCode in repo/')
+  const repoPath = path.join(targetPath, 'repo')
+  let result
+  if (process.platform === 'win32') {
+    result = await runCommand('cmd', ['/c', 'opencode'], {
+      stdio: 'inherit',
+      cwd: repoPath,
+    })
+  } else {
+    const shellPath = process.env.SHELL && await pathExists(process.env.SHELL)
+      ? process.env.SHELL
+      : await resolveCommand('zsh') || await resolveCommand('bash') || await resolveCommand('sh')
+    if (!shellPath) {
+      warn('No usable shell was found to launch OpenCode automatically. Launch OpenCode manually inside repo/.')
+      return
+    }
+    result = await runCommand(shellPath, ['-lc', 'opencode'], {
+      stdio: 'inherit',
+      cwd: repoPath,
+    })
   }
-  log('Opening OpenCode in repo/')
-  const result = await runCommand(opencodeCommand, [], {
-    stdio: 'inherit',
-    cwd: path.join(targetPath, 'repo'),
-  })
   if (result.code !== 0) {
     warn(`Failed to launch OpenCode automatically (${result.stderr || `exit code ${result.code}`}). Launch it manually inside repo/.`)
   }

package/src/install.js CHANGED Viewed

@@ -670,6 +670,7 @@ async function installSkills(paths) {
 async function installSlopmachineAssets(paths) {
   const source = path.join(assetsRoot(), 'slopmachine')
   await ensureDir(paths.slopmachineDir)
+  await ensureDir(path.join(paths.slopmachineDir, 'retrospectives'))
   const summary = { installed: [], refreshed: [] }
   for (const relativePath of REQUIRED_SLOPMACHINE_FILES) {
@@ -736,7 +737,17 @@ async function mergeOpencodeConfig(paths, options) {
   log(`Updated ${paths.opencodeConfigPath}`)
 }
-async function maybeInstallPluginBinary() {
+function hasConfiguredPlugin(existingConfig, pluginName) {
+  const plugins = Array.isArray(existingConfig?.plugin) ? existingConfig.plugin : []
+  return plugins.includes(pluginName)
+}
+async function maybeInstallPluginBinary(existingConfig) {
+  if (hasConfiguredPlugin(existingConfig, 'oc-chatgpt-multi-auth')) {
+    log('OpenCode plugin already configured: oc-chatgpt-multi-auth')
+    return
+  }
   if (process.env.SLOPMACHINE_PLUGIN_BOOTSTRAP === '0') {
     return
   }
@@ -812,8 +823,8 @@ export async function runInstall() {
   const assetSummary = await installSlopmachineAssets(paths)
   section('OpenCode Config')
-  await maybeInstallPluginBinary()
   const existingConfig = (await readJsonIfExists(paths.opencodeConfigPath)) || null
+  await maybeInstallPluginBinary(existingConfig)
   const keys = await collectApiKeys(existingConfig)
   await mergeOpencodeConfig(paths, keys)