npm - theslopmachine - Versions diffs - 0.4.1 → 0.4.3 - Mend

theslopmachine 0.4.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/README.md +72 -8
package/RELEASE.md +2 -2
package/assets/agents/slopmachine.md +44 -8
package/assets/skills/clarification-gate/SKILL.md +53 -4
package/assets/skills/developer-session-lifecycle/SKILL.md +11 -6
package/assets/skills/evaluation-triage/SKILL.md +40 -1
package/assets/skills/final-evaluation-orchestration/SKILL.md +15 -1
package/assets/skills/planning-guidance/SKILL.md +10 -4
package/assets/skills/retrospective-analysis/SKILL.md +91 -0
package/assets/skills/scaffold-guidance/SKILL.md +16 -3
package/assets/skills/session-rollover/SKILL.md +1 -2
package/assets/skills/submission-packaging/SKILL.md +39 -22
package/assets/skills/verification-gates/SKILL.md +22 -4
package/assets/slopmachine/document-completeness.md +46 -32
package/assets/slopmachine/engineering-results.md +43 -39
package/assets/slopmachine/implementation-comparison.md +40 -33
package/assets/slopmachine/quality-document.md +45 -86
package/assets/slopmachine/templates/AGENTS.md +16 -5
package/package.json +23 -23
package/src/constants.js +61 -57
package/src/init.js +23 -9
package/src/install.js +13 -2

package/README.md CHANGED Viewed

@@ -37,7 +37,7 @@ The current engine is the lighter workflow line:
 - smaller always-loaded owner shell
 - smaller developer rulebook
 - richer phase-specific skills loaded when needed
-- bounded 2/3 developer-session model
+- bounded 2-session developer-session model
 - `beads_rust` bootstrap path
 ## Requirements
@@ -66,13 +66,13 @@ npm pack
 This produces a tarball such as:
 ```bash
-theslopmachine-0.4.0.tgz
+theslopmachine-0.4.3.tgz
 ```
 You can then install it globally with:
 ```bash
-npm install -g ./theslopmachine-0.4.0.tgz
+npm install -g ./theslopmachine-0.4.3.tgz
 ```
 For local development instead of global install:
@@ -143,6 +143,7 @@ The expected high-level lifecycle is:
 8. final human decision
 9. remediation when needed
 10. submission packaging
+11. retrospective
 ## How It Is Intended To Operate
@@ -154,7 +155,7 @@ That means:
 - planning, scaffold, development, verification, hardening, remediation, and packaging load detailed skills only when needed
 - early and late phases do not carry each other's full instruction payloads all the time
-The v2 workflow also expects:
+The current workflow also expects:
 - targeted reads over broad rereads
 - local and narrow verification during ordinary iteration
@@ -164,16 +165,79 @@ The v2 workflow also expects:
 Every bootstrapped project should expose:
-- one primary documented launch/run command for its selected stack
-- one primary documented full-test command for its selected stack
+- one primary documented runtime command
+- one primary documented broad test command: `./run_tests.sh`
 Follow the original prompt and the existing repository first. Use the examples below only when they do not already specify the platform or stack.
 Examples:
 - web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
-- Expo mobile: `npx expo start` and the project's single full-test command
-- Electron desktop: `npm run dev` and the project's single full-test command
+- mobile or desktop when Docker runtime is not the direct run path: `./run_app.sh` and `./run_tests.sh`
+## What It Does Well
+- keeps the owner shell strict without carrying a giant monolith prompt
+- loads detailed phase and activity skills only when they are actually needed
+- uses a bounded 2-session model to reduce long-run context drag
+- pushes prompt-fit, security, testing, and engineering-quality concerns earlier into planning and hardening
+- standardizes runtime and broad-test expectations with `docker compose up --build` or `./run_app.sh` plus `./run_tests.sh`
+- preserves strong packaging/report discipline with canonical files in `~/slopmachine/`
+## Installed Assets
+The package installs:
+- owner and developer agents
+- phase and activity skills
+- canonical evaluation and report templates in `~/slopmachine/`
+- workflow bootstrap helper
+- repo rulebook template
+- session export utilities
+Canonical files in `~/slopmachine/`:
+- `backend-evaluation-prompt.md`
+- `frontend-evaluation-prompt.md`
+- `document-completeness.md`
+- `engineering-results.md`
+- `implementation-comparison.md`
+- `quality-document.md`
+- `retrospectives/`
+## Dependencies And Assumptions
+- Node.js 18+ is required for the package CLI itself
+- OpenCode must already be available on the machine
+- git must be available
+- `beads_rust` / `br` is installed or verified by `slopmachine setup`
+Generated projects follow the original prompt and the existing repository first.
+Default runtime/test wrapper expectations:
+- Dockerized web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
+- non-web or non-Docker runtime cases: `./run_app.sh` and `./run_tests.sh`
+`./run_tests.sh` is always the broad test wrapper.
+## Command Summary
+Package CLI:
+- `slopmachine setup`
+- `slopmachine init`
+- `slopmachine init -o`
+Package validation:
+- `npm run check`
+- `npm pack`
+Generated project conventions:
+- `docker compose up --build` or `./run_app.sh`
+- `./run_tests.sh`
 ## Files And Locations

package/RELEASE.md CHANGED Viewed

@@ -41,13 +41,13 @@ npm pack
 This should produce a tarball such as:
 ```bash
-theslopmachine-0.4.0.tgz
+theslopmachine-0.4.3.tgz
 ```
 ## Inspect package contents
 ```bash
-tar -tzf theslopmachine-0.4.0.tgz
+tar -tzf theslopmachine-0.4.3.tgz
 ```
 Check that the tarball includes:

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -67,7 +67,7 @@ Agent-integrity rule:
 ## Optimization Goal
-The main v2 target is:
+The main target is:
 - less token waste
 - less elapsed time
@@ -109,6 +109,18 @@ State split:
 Do not create another competing workflow-state system.
+## Git Traceability
+Use git to preserve meaningful workflow checkpoints.
+- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
+- meaningful work includes accepted scaffold completion, accepted major development slices, accepted remediation passes, and other clearly reviewable milestones
+- keep the git flow simple and checkpoint-oriented
+- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
+- keep commit messages descriptive and easy to reason about later
+- do not push unless explicitly requested
+- do not commit secrets, local-only junk, or accidental noise
 ## Mandatory Operating Order
 Operate in this order:
@@ -149,6 +161,7 @@ Use these exact root phases:
 - `P8 Final Human Decision`
 - `P9 Remediation`
 - `P10 Submission Packaging`
+- `P11 Retrospective`
 Phase rules:
@@ -157,21 +170,21 @@ Phase rules:
 - do not close multiple root phases in one transition block
 - `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
 - `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
+- `P11 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
 ## Developer Session Model
-Use up to three bounded developer sessions:
+Use up to two bounded developer sessions:
-1. build session: planning, scaffold, development
-2. stabilization session: integrated verification and hardening, only if needed
-3. remediation session: evaluation-response remediation, only if needed
+1. develop session: planning, scaffold, development
+2. bugfix session: integrated verification, hardening, and remediation, only if needed
 Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
 Use `session-rollover` only for planned transitions between those bounded developer sessions.
 Do not launch the developer during `P0` or `P1`.
-When the first build developer session begins in `P2`, start it in this exact order:
+When the first develop developer session begins in `P2`, start it in this exact order:
 1. send `lets plan this <original-prompt>`
 2. wait for the developer's first reply
@@ -199,8 +212,13 @@ Selected-stack rule:
 Every project must end up with:
-- one primary documented launch/run command for the selected stack
-- one primary documented full-test command for the selected stack
+- one primary documented runtime command
+- one primary documented full-test command: `./run_tests.sh`
+Runtime command rule:
+- for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
+- when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
 Default moments:
@@ -208,6 +226,12 @@ Default moments:
 2. development complete -> integrated verification entry
 3. final qualified state before packaging
+For Dockerized web backend/fullstack projects, enforce this cadence:
+- after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
+- after that, do not run Docker again during ordinary development work
+- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
 Between those moments, rely on:
 - local runtime checks
@@ -245,6 +269,7 @@ Core map:
 - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
 - `P9` -> `remediation-guidance`
 - `P10` -> `submission-packaging`, `report-output-discipline`
+- `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
 - state mutations -> `beads-operations`
 - evidence-heavy review -> `owner-evidence-discipline`
 - planned developer-session switch -> `session-rollover`
@@ -327,6 +352,16 @@ When `P10 Submission Packaging` begins:
 - follow its exact artifact, export, cleanup, and output contract
 - do not close packaging until every required final artifact path has been verified
+## Retrospective
+After `P10 Submission Packaging` closes successfully:
+- automatically enter `P11 Retrospective`
+- load `retrospective-analysis`
+- write dated retrospective output under `~/slopmachine/retrospectives/`
+- keep it owner-only and non-blocking by default
+- reopen packaging only if the retrospective finds a real packaged-result defect
 ## Completion Standard
 The workflow is not done until:
@@ -335,6 +370,7 @@ The workflow is not done until:
 - the current root phase closed cleanly
 - the workflow ledger closed cleanly
 - the final package is assembled and verified in its final structure
+- the retrospective phase has either documented improvements or reopened and resolved any real packaging defect it found
 Success means:

package/assets/skills/clarification-gate/SKILL.md CHANGED Viewed

@@ -45,6 +45,8 @@ Use this skill only during `P1 Clarification`.
 - never use defaults that drift from the original prompt
 - do not use quick, loose, or simplifying assumptions that shrink what the prompt asked for
 - do not guess through material ambiguity
+- do not expand the clarification artifact just to exhaust every minor edge case when the scope is already clear enough to plan correctly
+- once the core scope is understood, prefer a compact clarification record plus explicit safe defaults over a giant exhaustive rewrite
 ## Required outputs
@@ -52,16 +54,63 @@ Use this skill only during `P1 Clarification`.
 - developer-facing clarification prompt in `../.ai/clarification-prompt.md`
 - explicit list of safe defaults and resolved ambiguities
+## `questions.md` contract
+`../docs/questions.md` is not a general project summary.
+It exists only for prompt items that needed interpretation because they were unclear, incomplete, or materially ambiguous.
+Each entry should answer this structure:
+1. what was unclear from the original prompt
+2. how you interpreted it
+3. what decision or solution you chose for it
+4. why that choice is prompt-faithful and reasonable
+Keep the file narrow and explicit.
+Do not use `questions.md` for:
+- a full restatement of the entire prompt
+- broad planning notes
+- general project requirements that were already clear
+- implementation details that belong in planning or design docs
+Preferred entry shape:
+```md
+## Item N: <short ambiguity title>
+### What was unclear
+<the exact ambiguity or missing detail>
+### Interpretation
+<how it was interpreted>
+### Decision
+<the chosen resolution or safe default>
+### Why this is reasonable
+<brief justification tied to prompt faithfulness>
+```
+If nothing material was unclear, keep `questions.md` minimal rather than inventing content.
 ## Clarification-prompt validation loop
-- compare the original prompt and the prepared clarification prompt using a fresh ephemeral `General` session, never the developer session
-- build one self-contained validation prompt block for that `General` session every time
-- include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md` in that block
+- compare the original prompt and the prepared clarification prompt using one dedicated `General` validation session, never the developer session
+- do not create a new validation session for every retry unless the session became unusable or a fundamental misunderstanding requires a clean restart
+- on the first validation pass, build one self-contained validation prompt block for that `General` session
+- on that first pass, include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md`
 - do not use placeholders such as `same as previous`, `from context`, `see above`, or `latest artifact`
 - ask that `General` session whether the clarification prompt deviates from, weakens, narrows, or violates the original prompt in any way
 - require it to judge whether the clarification prompt is a genuine improvement in execution quality while remaining faithful to the original intent
-- if mismatches or prompt drift are found, revise the questions record and clarification prompt, then build a newly composed full validation block and run the check again
+- if the validator suggests real fixes, patch the existing questions record and clarification prompt directly; do not restart the clarification phase from scratch unless the validator found a fundamental scope misunderstanding
+- treat validator output as a correction list, not as a reason to regenerate giant clarification blocks repeatedly
+- when rerunning validation in the same validator session, send only the improved clarification payload and the concrete fixes you made; do not resend the original prompt block if the session already has that context
+- rerun validation only after applying the concrete fixes that matter
 - keep the validation loop bounded and intentional; prefer one strong pass plus a small number of revision cycles over repeated loose churn
+- once prompt-faithfulness is satisfied and the remaining notes are minor or cosmetic, stop iterating and proceed
 - only treat the clarification prompt as approved for developer use after this validation loop passes and your own review agrees
 - requesting human approval before this validation loop passes is illegal

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -59,7 +59,7 @@ Optional startup inputs may include:
 6. wait only for the initial clarification approval before development starts
 7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
 8. initialize the bounded developer-session slots
-9. start the build developer session only after `P2` is ready to begin
+9. start the develop developer session only after `P2` is ready to begin
 10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
 11. wait for the developer's first exchange
 12. send the approved clarification prompt as the second owner message in that same session
@@ -69,7 +69,7 @@ Optional startup inputs may include:
 The first bounded developer session must begin in this exact order:
-1. owner starts the build developer session
+1. owner starts the develop developer session
 2. owner sends: `lets plan this <original-prompt>`
 3. developer responds
 4. owner sends the approved clarification prompt
@@ -102,6 +102,12 @@ Track at least:
 - `awaiting_human`
 - `clarification_approved`
 - `remediation_round`
+- `clarification_validator_session_id`
+- `evaluation_pass`
+- `backend_evaluation_session_id`
+- `frontend_evaluation_session_id`
+- `last_evaluation_session_id`
+- `passed_evaluation_tracks`
 - `developer_sessions`
 - `active_developer_session_index`
@@ -132,11 +138,10 @@ Required project metadata fields in `../metadata.json` when relevant:
 ## Bounded session model
-Track up to three planned developer sessions:
+Track up to two planned developer sessions:
-1. build
-2. stabilization
-3. remediation
+1. develop
+2. bugfix
 Later session slots may remain unused if the workflow never needs them.

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -15,15 +15,54 @@ Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
 - do not enter remediation just because a report found something; enter it only when the accepted findings justify it
 - if no remediation is needed, move directly to the final human decision
+## Non-negotiable evaluation buckets
+These areas are hard gates and should not be passed with known meaningful failures:
+1. prompt compliance
+2. requirement fulfillment / delivery completeness
+3. security-critical flaws
+If evaluation finds a real issue in one of those buckets, the default outcome is remediation, not leniency.
+Do not wave through:
+- prompt drift or meaningful requirement mismatch
+- missing core flows or partial delivery of prompt-critical functionality
+- real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
+## Leniency buckets
+These areas may pass with minor residual issues when the product is still clearly acceptable overall:
+1. testing cases / test sufficiency
+2. engineering architecture / engineering quality
+3. aesthetics
+Leniency is allowed only when the issue is:
+- minor in impact
+- not hiding a likely blocker in another bucket
+- not undermining overall confidence in the delivered product
+High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
+If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
 ## Triage rules
 - read both reports and merge the findings into one explicit triage set before deciding what happens next
 - use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
-- any finding marked `Blocker` or `High` should normally be returned for remediation
+- any finding in the non-negotiable buckets should normally be returned for remediation if it is real
+- findings marked `Blocker` should normally be returned for remediation
+- findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
 - findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
 - findings marked `Low` may be passed without remediation
 - do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
 - if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
+- minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
+- minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
+- if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
 - if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
 - if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
 - challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -46,6 +46,19 @@ These two files are the only evaluation prompt sources for evaluation runs.
 - keep reports file-backed and bring only short summaries into chat
 - rerun only the evaluation track that still needs re-evaluation after remediation
+## Evaluation pass strategy
+- use a maximum of 3 full evaluation passes
+- after each evaluation pass, extract a detailed concrete issue list from the failing report(s)
+- send that list back to the active developer session with a direct instruction like: `fix these issues found in evaluation, verify affected flows dont regress after your fixes`
+- if one evaluation track passes, mark it as passed and do not rerun that track in later passes unless a later fix clearly reopens it
+- do not rerun both backend and frontend evaluation tracks when only one still needs re-evaluation
+- after pass 1 and pass 2, use the detailed issue list from the latest failing report(s) to drive the next remediation pass
+- after pass 3, do not create a new evaluation session for the still-failing track
+- after pass 3, send the final fix list back to the developer, then return to the last evaluation session used for that still-failing track and ask whether the last reported issues are now fixed
+- if they are fixed, have that same evaluation session update the report to reflect the current state cleanly, without mentioning recheck, retest, previous issues, or iterative review history
+- the final report should read like a normal current-state evaluation report, not like a patch log
 ## Remediation loop
 - route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
@@ -55,7 +68,8 @@ These two files are the only evaluation prompt sources for evaluation runs.
     - the selected stack's platform-appropriate UI/E2E verification where applicable, with fresh screenshots or equivalent artifacts
 - if remediation materially reopens an owner-run broad milestone boundary, route the project back to that boundary before re-evaluation instead of treating every remediation pass as an automatic broad rerun moment
 - keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
-- remember the external process allows a maximum of 3 repair rounds
+- store backend, frontend, and last-used evaluation session ids in metadata so later passes and packaging can safely reuse the correct session when needed
+- remember the evaluation flow allows a maximum of 3 full evaluation passes before the final issue-verification update path must be used
 ## Boundaries

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -82,9 +82,15 @@ Selected-stack defaults:
 - define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
 - call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
 - define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
-- define a project-standard launch command and a project-standard full-test command early, and keep both compatible with the selected stack
-- for web backend/fullstack projects, default those to `docker compose up --build` and `./run_tests.sh` only when the prompt or existing repo does not already dictate another stack-compatible contract
-- for mobile, desktop, CLI, library, or other non-web projects, define the selected stack's appropriate single documented launch command and single documented full-test command instead of forcing Docker conventions
+- define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
+- for Dockerized web backend/fullstack projects, the runtime contract may be `docker compose up --build` directly when the prompt or existing repo does not already dictate another stack-compatible contract
+- when `docker compose up --build` is not the runtime contract, require `./run_app.sh` as the single primary runtime wrapper for the project
+- for mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow instead of assuming host tooling conventions
+- `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
+- `./run_tests.sh` must prepare or install anything required before running the tests when that setup is needed for a clean environment
+- for Dockerized web backend/fullstack projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
+- for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
+- local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
 - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
 - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
 - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
@@ -104,7 +110,7 @@ Selected-stack defaults:
 - for each major module, define how it integrates with existing modules and which shared contracts it must follow consistently
 - define verification plans that include cross-module scenarios and seam checks, not just isolated feature checks
 - surface real unresolved risks honestly
-- keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the v2 verification cadence
+- keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the current verification cadence
 ## Exit target

package/assets/skills/retrospective-analysis/SKILL.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: retrospective-analysis
+description: Owner-only final retrospective rules for slopmachine.
+---
+# Retrospective Analysis
+Use this skill only after `P10 Submission Packaging` has materially and formally succeeded.
+## Purpose
+- inspect what happened across the whole workflow run
+- identify what caused churn, waste, late defects, or preventable corrections
+- capture lessons that should improve future runs
+- write package-specific retrospective files under `~/slopmachine/retrospectives/`
+## Phase role
+- this is an automatic owner-only phase
+- it is quiet and non-blocking by default
+- it does not create a new human stop
+- it does not rerun broad verification by default
+- it should not reopen development unless it finds a real defect in the already-packaged result
+## Output location
+Write dated retrospective files under:
+- `~/slopmachine/retrospectives/`
+Preferred filenames:
+- `retrospective-YYYY-MM-DD.md`
+- `improvement-actions-YYYY-MM-DD.md`
+If only one file is needed, the retrospective file is sufficient.
+## Evidence sources
+Prefer existing workflow artifacts first:
+- root metadata
+- questions/clarification record
+- clarification prompt
+- planning artifacts
+- Beads comments and transitions
+- developer-session handoffs
+- review and rejection history
+- verification gate notes
+- evaluation reports
+- remediation records
+- packaging outputs
+Do not reread the entire codebase unless a real inconsistency requires it.
+Do not rerun broad Docker or full-suite verification just for retrospective analysis.
+## Required retrospective sections
+1. outcome summary
+2. what worked well
+3. what caused waste or looping
+4. what was caught too late
+5. findings by phase
+6. findings by instruction plane:
+   - owner shell
+   - developer prompt
+   - skills
+   - `AGENTS.md`
+7. actionable improvements
+## Audit buckets
+Evaluate at least these buckets in hindsight:
+1. prompt-fit
+2. security-critical flaws
+3. test sufficiency
+4. major engineering quality
+5. token/time waste
+For each meaningful finding, prefer:
+- what happened
+- why it happened
+- where the fix belongs
+- how it should change future runs
+## Rule for reopening work
+- if retrospective finds a real packaging or delivery defect, reopen `P10` and fix it
+- if it finds only improvements, document them and close the retrospective phase

package/assets/skills/scaffold-guidance/SKILL.md CHANGED Viewed

@@ -14,14 +14,24 @@ Use this skill during `P3 Scaffold` before prompting the developer.
 - establish the local verification path and the standardized gate path
 - make prompt-critical baseline behavior real where required
 - keep repo-local `README.md` honest from the start
-- make the selected-stack primary launch command and primary full-test command real from the scaffold stage
+- make the selected-stack primary runtime command and the universal `./run_tests.sh` broad test command real from the scaffold stage
+For Dockerized web backend/fullstack projects, scaffold must make these commands real and working before scaffold can pass:
+- `docker compose up --build`
+- `./run_tests.sh`
 ## Scaffold and foundation guidance
 - create the initial project structure intentionally
 - follow the original prompt and existing repository first; only use the package defaults below when they do not already specify the platform or stack
-- create the selected-stack primary full-test command during scaffold; for web backend/fullstack projects this is usually `./run_tests.sh`, while non-web projects should expose their own single documented full-test command
-- create the selected-stack primary launch command during scaffold; for web backend/fullstack projects this is usually `docker compose up --build`, while non-web projects should expose their own single documented launch command
+- create `./run_tests.sh` during scaffold for every project as the single broad test entrypoint
+- for Dockerized web backend/fullstack projects, make `docker compose up --build` real as the primary runtime command during scaffold
+- when `docker compose up --build` is not the runtime contract, create `./run_app.sh` during scaffold as the single primary runtime wrapper
+- make `./run_tests.sh` self-sufficient from a clean environment by preparing or installing anything it needs before executing the tests
+- for Dockerized web backend/fullstack projects, `./run_tests.sh` must execute the broad test path through Docker and should own that Dockerized test flow directly instead of requiring separate manual pre-setup
+- for non-web or non-Docker projects, `./run_tests.sh` must execute the selected stack's platform-equivalent broad test flow while preserving the same single-command interface
+- local non-Docker test commands should still be installed and working for normal development iteration
 - create required testing directories and baseline docs structure
 - put baseline config and logging structure in place
 - install and configure the local test tooling needed for ordinary iteration during scaffold rather than deferring local testing setup to later phases
@@ -42,6 +52,7 @@ Use this skill during `P3 Scaffold` before prompting the developer.
 - require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
 - for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
 - establish README structure early instead of leaving it until the end
+- ensure `README.md` clearly documents the primary runtime command and the broad `./run_tests.sh` contract for the selected stack
 - prove the scaffold in a clean state before deeper feature work
 - verify clean startup and teardown behavior under the selected stack's runtime contract
 - for Dockerized web projects, verify clean startup and teardown behavior under the chosen project namespace
@@ -66,3 +77,5 @@ Scaffold should make later slices easier, not force them to retrofit missing fun
 - use local and narrow checks while correcting scaffold work
 - reserve one broad owner-run scaffold gate for actual scaffold acceptance
 - do not spend extra broad reruns once the acceptance question is already answered
+- for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the baseline actually works
+- after that scaffold confirmation, do not run Docker again during ordinary development work; the next Docker-based run should be at development completion when integrated behavior is checked

package/assets/skills/session-rollover/SKILL.md CHANGED Viewed

@@ -9,8 +9,7 @@ Use this skill only when intentionally moving from one planned developer session
 ## Typical uses
-- build session -> stabilization session
-- stabilization session -> remediation session
+- develop session -> bugfix session
 ## Rules