npm - theslopmachine - Versions diffs - 0.4.7 → 0.4.8 - Mend

theslopmachine 0.4.7 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/MANUAL.md +13 -12
package/README.md +18 -22
package/RELEASE.md +6 -3
package/assets/agents/developer.md +10 -1
package/assets/agents/slopmachine.md +15 -96
package/assets/skills/developer-session-lifecycle/SKILL.md +4 -13
package/assets/skills/development-guidance/SKILL.md +17 -1
package/assets/skills/final-evaluation-orchestration/SKILL.md +1 -0
package/assets/skills/hardening-gate/SKILL.md +18 -0
package/assets/skills/integrated-verification/SKILL.md +5 -0
package/assets/skills/planning-gate/SKILL.md +24 -2
package/assets/skills/planning-guidance/SKILL.md +38 -9
package/assets/skills/scaffold-guidance/SKILL.md +23 -6
package/assets/skills/submission-packaging/SKILL.md +17 -3
package/assets/skills/verification-gates/SKILL.md +39 -6
package/assets/slopmachine/templates/AGENTS.md +33 -3
package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +124 -0
package/package.json +1 -1
package/src/constants.js +5 -7
package/src/init.js +35 -11
package/assets/skills/session-rollover/SKILL.md +0 -47
package/assets/slopmachine/document-completeness.md +0 -59
package/assets/slopmachine/engineering-results.md +0 -63
package/assets/slopmachine/implementation-comparison.md +0 -43
package/assets/slopmachine/quality-document.md +0 -67

package/MANUAL.md CHANGED Viewed

@@ -23,7 +23,7 @@ The installed agent set includes the current `slopmachine` and `developer` agent
 ## Start a project
-Inside the project root, run:
+Inside a new or empty project directory, run:
 ```bash
 slopmachine init
@@ -43,25 +43,26 @@ slopmachine init -o
 - bootstraps beads_rust (`br`)
 - creates `repo/`
 - copies the packaged repo rulebook into `repo/AGENTS.md`
-- creates the initial git checkpoint
+- creates the initial git commit so the workspace starts with a clean tree
 - optionally opens `opencode` in `repo/`
 ## Rough workflow
-1. Clarification
-2. Planning
-3. Scaffold/foundation
-4. Development
-5. Integrated verification
-6. Hardening
-7. Evaluation and triage
-8. Final human decision
-9. Remediation when needed
+1. Intake and setup
+2. Clarification
+3. Planning
+4. Scaffold/foundation
+5. Development
+6. Integrated verification
+7. Hardening
+8. Evaluation and fix verification
+9. Final human decision
 10. Submission packaging
+11. Retrospective
 ## Important notes
 - theslopmachine depends on OpenCode, beads_rust (`br`), git, python3, and Docker being available.
 - The workflow-owner agents use mandatory skills for specific phases; skipping them is considered a workflow failure.
 - `slopmachine` is the lighter current engine: it keeps the owner prompt smaller, uses more specialized skills, and keeps one active developer session at a time while preserving rollover history when new sessions are intentionally started.
-- Submission packaging collects the final docs, accepted evaluation reports, screenshots, cleaned session exports, converted session traces, and cleaned repo into the required final structure.
+- Submission packaging collects the final docs, accepted evaluation reports, cleaned session exports, converted session traces, and the cleaned repo into the required final structure.

package/README.md CHANGED Viewed

@@ -7,7 +7,6 @@
 - installs packaged OpenCode agents into `~/.config/opencode/agents/`
 - installs packaged skills into `~/.agents/skills/`
 - installs packaged workflow support files into `~/slopmachine/`
-- installs Claude worker runtime assets under `~/.claude/`
 - bootstraps a new project workspace with `repo/`, `docs/`, `sessions/`, `metadata.json`, `AGENTS.md`, and initialized `br` state
 - configures required OpenCode plugins and MCP entries without overwriting existing `context7` or `exa` configuration
@@ -26,9 +25,11 @@ Build and install the package:
 npm install
 npm run check
 npm pack
-npm install -g ./theslopmachine-0.4.4.tgz
+npm install -g ./theslopmachine-<version>.tgz
 ```
+`package.json` is the package-version source of truth. The packed tarball name and CLI version banner both derive from that version.
 For local package development instead of global install:
 ```bash
@@ -52,7 +53,6 @@ slopmachine setup
 - installs or refreshes packaged agents
 - installs or refreshes packaged skills
 - installs or refreshes packaged workflow files into `~/slopmachine/`
-- installs or refreshes Claude runtime assets under `~/.claude/`
 - updates `~/.config/opencode/opencode.json`
 - prompts for missing MCP API keys when needed
@@ -77,7 +77,7 @@ opencode auth list
 ## Startup
-Create and initialize a new project workspace:
+Create and initialize a new project workspace in a new or empty directory:
 ```bash
 mkdir my-project
@@ -110,6 +110,8 @@ Bootstrapped workspace layout:
 - `metadata.json` for project workflow metadata
 - `repo/AGENTS.md` for the repo-local agent instructions
+`slopmachine init` creates the initial git commit so the workspace starts from a clean tree.
 ## Testing
 Package-level checks:
@@ -142,17 +144,17 @@ Operating model:
 High-level lifecycle:
-1. clarification
-2. planning
-3. scaffold
-4. development
-5. integrated verification
-6. hardening
-7. evaluation and triage
-8. final human decision
-9. remediation when needed
-10. submission packaging
-11. retrospective
+1. `P0 Intake and Setup`
+2. `P1 Clarification`
+3. `P2 Planning`
+4. `P3 Scaffold`
+5. `P4 Development`
+6. `P5 Integrated Verification`
+7. `P6 Hardening`
+8. `P7 Evaluation and Fix Verification`
+9. `P8 Final Human Decision`
+10. `P9 Submission Packaging`
+11. `P10 Retrospective`
 Design constraints:
@@ -177,7 +179,6 @@ Main locations:
 - skills: `~/.agents/skills/`
 - OpenCode config: `~/.config/opencode/opencode.json`
 - packaged workflow files: `~/slopmachine/`
-- Claude runtime assets: `~/.claude/`
 Installed agents:
@@ -188,7 +189,6 @@ Installed skills:
 - `~/.agents/skills/clarification-gate/`
 - `~/.agents/skills/developer-session-lifecycle/`
-- `~/.agents/skills/session-rollover/`
 - `~/.agents/skills/final-evaluation-orchestration/`
 - `~/.agents/skills/beads-operations/`
 - `~/.agents/skills/planning-guidance/`
@@ -199,7 +199,6 @@ Installed skills:
 - `~/.agents/skills/integrated-verification/`
 - `~/.agents/skills/hardening-gate/`
 - `~/.agents/skills/evaluation-triage/`
-- `~/.agents/skills/remediation-guidance/`
 - `~/.agents/skills/submission-packaging/`
 - `~/.agents/skills/retrospective-analysis/`
 - `~/.agents/skills/owner-evidence-discipline/`
@@ -210,14 +209,11 @@ Installed workflow files under `~/slopmachine/`:
 - `backend-evaluation-prompt.md`
 - `frontend-evaluation-prompt.md`
-- `document-completeness.md`
-- `engineering-results.md`
-- `implementation-comparison.md`
-- `quality-document.md`
 - `templates/AGENTS.md`
 - `workflow-init.js`
 - `utils/strip_session_parent.py`
 - `utils/convert_ai_session.py`
+- `utils/cleanup_delivery_artifacts.py`
 OpenCode config entries ensured by `setup`:

package/RELEASE.md CHANGED Viewed

@@ -39,16 +39,18 @@ Note:
 npm pack
 ```
-This should produce a tarball such as:
+This should produce a tarball named like:
 ```bash
-theslopmachine-0.4.7.tgz
+theslopmachine-<version>.tgz
 ```
+`<version>` comes from `package.json`, which is the single package-version source of truth.
 ## Inspect package contents
 ```bash
-tar -tzf theslopmachine-0.4.7.tgz
+tar -tzf theslopmachine-<version>.tgz
 ```
 Check that the tarball includes:
@@ -87,6 +89,7 @@ npm publish --dry-run
 ## Versioning
+- `package.json` is the single package-version source of truth for the tarball name and CLI version banner
 - bump `package.json` version before each release
 - keep the CLI command as `slopmachine`
 - keep the npm package name as `theslopmachine`

package/assets/agents/developer.md CHANGED Viewed

@@ -49,7 +49,9 @@ Do not narrow scope for convenience.
 - implement real behavior, not placeholders
 - keep user-facing and admin-facing flows complete through their real surfaces
 - verify the changed area locally and realistically before reporting completion
-- update repo-local docs such as `README.md` when behavior or run/test instructions change
+- update repo-local docs such as `README.md` and `./docs/*` when behavior or run/test instructions change
+- keep repo-local docs and code structure statically reviewable; do not rely on runtime success alone to make the project understandable
+- keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
 - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
 ## Verification Cadence
@@ -87,6 +89,13 @@ Selected-stack defaults:
 - do not hardcode secrets or leave prototype residue behind
 - when the project has database dependencies, keep database setup in `./init_db.sh` rather than scattered repo logic
 - do not hardcode database connection values or database bootstrap values anywhere in the repo
+- if the project uses mock, stub, fake, or local-data behavior, disclose that scope accurately in the repo-local documentation instead of implying real backend or production behavior
+- if mock or interception behavior is enabled by default, document that clearly
+- disclose feature flags, debug/demo surfaces, and default enabled states clearly in repo-local docs when they exist
+- keep frontend state requirements explicit in code and repo-local docs for prompt-critical flows
+- use a shared logging path and avoid random print-style debugging as the durable implementation pattern
+- use a shared validation/error-handling path when validation materially affects the flow
+- do not hide missing failure handling behind fake-success paths
 ## Skills

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -176,36 +176,23 @@ Phase rules:
 Maintain exactly one active developer session at a time.
-Track every developer session in metadata, but create a new one only in these cases:
-1. you explicitly request a new session
-All tracked developer sessions use the `develop-N` naming line.
-There may be multiple `develop` sessions over the life of one project.
-During the first full run from planning through initial packaging, keep all work in the `develop-N` sequence, including integrated verification, hardening, evaluation issue fixing inside `P7`, and packaging follow-through.
-If the project is reopened after packaging because of later reported issues, continue with the existing developer session unless you explicitly request a new one.
-Fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule.
-If you explicitly request a new session while one is active, ask the current developer exactly `give me a summary of all the work that has been done`, then use that handoff to seed the next session.
-Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
-Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request.
+- track developer sessions in metadata using the `develop-N` line
+- keep the same active developer session through planning, development, verification, hardening, evaluation fixes, and packaging follow-through unless you explicitly request a new one
+- if the project is reopened later, recover and continue the active developer session unless you explicitly request a replacement
+- fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule
+- use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery
 Do not launch the developer during `P0` or `P1`.
-When the first develop developer session begins in `P2`, start it in this exact order:
+When the first develop developer session begins in `P2`, use this planning handshake:
-1. send `lets plan this <original-prompt>`
+1. send the original prompt and ask for an initial plan plus major risks or assumptions
 2. wait for the developer's first reply
-3. send the approved clarification prompt
+3. send the approved clarification prompt as the second owner message in that same session
 4. continue with planning from there
-Do not reorder that sequence.
 Do not merge those messages.
+Do not send the clarification prompt first.
 ## Verification Budget
@@ -218,50 +205,7 @@ Owner-side discipline:
 - do not rerun expensive local test or E2E commands just because the developer already ran them
 - when the developer reports the exact verification command and its result clearly, use that evidence unless there is a concrete reason to challenge it
 - rerun expensive verification only when the developer evidence is weak, contradictory, flaky, high-risk, needed for a true broad gate, or needed to answer a new question
-Target budget for the whole workflow:
-- at most 3 broad owner-run verification moments using the selected stack's full verification path
-Selected-stack rule:
-- follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
-- for backend and fullstack web projects, the broad path is usually Docker/runtime plus the full test command
-- for pure frontend web projects, the broad path is the documented production build plus the full test command and browser E2E when applicable
-- for mobile projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI/device verification when applicable
-- for desktop projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI verification when applicable
-Every project must end up with:
-- one primary documented runtime command
-- one primary documented full-test command: `./run_tests.sh`
-Runtime command rule:
-- for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
-- when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
-Default moments:
-1. scaffold acceptance
-2. development complete -> integrated verification entry
-3. final qualified state before packaging
-For Dockerized web backend/fullstack projects, enforce this cadence:
-- after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
-- after that, do not run Docker again during ordinary development work
-- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
-Between those moments, rely on:
-- local runtime checks
-- targeted unit tests
-- targeted integration tests
-- targeted module or route-family reruns
-- the selected stack's local UI or E2E tool when UI is material
-If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
+- use phase skills and `verification-gates` for stack-specific runtime and broad-gate cadence details
 ## Mandatory Skill Discipline
@@ -272,10 +216,6 @@ Named skills are mandatory, not optional.
 - if the required skill is not loaded, stop immediately and load it before continuing
 - do not prompt the developer first and load the skill later
-## Mandatory Skill Usage
-Load the required skill before the corresponding phase or activity work begins.
 Core map:
 - `P0` -> `developer-session-lifecycle`
@@ -292,7 +232,6 @@ Core map:
 - `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
 - state mutations -> `beads-operations`
 - evidence-heavy review -> `owner-evidence-discipline`
-- intentional new developer session -> `session-rollover`
 Do not improvise a phase from memory when a phase skill exists.
@@ -308,21 +247,6 @@ When talking to the developer:
 Do not leak workflow internals such as:
-- Beads
-- phases
-- overlays
-- `.ai/` files
-- approval-state machinery
-- session-slot bookkeeping
-- packaging-stage orchestration details
-Do not sound like workflow software talking to a worker.
-Do not speak as a relay for a third party.
-## Developer Isolation
-The developer must not be told about:
 - Beads workflow mechanics
 - `.ai/` orchestration files
 - approval-state machinery
@@ -330,6 +254,8 @@ The developer must not be told about:
 - packaging-stage orchestration details
 To the developer, this should feel like a normal engineering conversation with a strong technical lead.
+Do not sound like workflow software talking to a worker.
+Do not speak as a relay for a third party.
 ## Operating Discipline
@@ -338,7 +264,7 @@ To the developer, this should feel like a normal engineering conversation with a
 - keep work moving without low-information continuation chatter
 - read only what is needed to answer the current decision
 - keep comments and metadata auditable and specific
-- keep external docs owner-maintained and repo-local README developer-maintained
+- keep external docs owner-maintained as reference copies and repo-local docs developer-maintained for the repo's self-sufficient source of truth
 ## Review Posture
@@ -357,19 +283,14 @@ After each substantive developer reply, do one of four things:
 3. request clarification or justification
 4. require verification before deciding
-## Packaging Explicitness
+## Packaging
 Treat packaging as a first-class delivery contract from the start, not as late cleanup.
 - the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
-- `../self-test-run.md`, `../self-test-fixes.md`, `../sessions/`, `../metadata.json`, `../docs/`, and the delivered `repo/` are the mandatory late-stage artifacts
-- do not invent `submission/`, packaging-only report files, screenshots, or other extra artifact structures during ordinary packaging
-When `P9 Submission Packaging` begins:
 - load `submission-packaging` before any packaging action
 - follow its exact artifact, export, cleanup, and output contract
-- do not close packaging until every required final artifact path has been verified
+- do not invent extra artifact structures during ordinary packaging
 ## Retrospective
@@ -377,8 +298,6 @@ After `P9 Submission Packaging` closes successfully:
 - automatically enter `P10 Retrospective`
 - load `retrospective-analysis`
-- write `run_id`-scoped retrospective output under `~/slopmachine/retrospectives/`
-- keep it owner-only and non-blocking by default
 - reopen packaging only if the retrospective finds a real packaged-result defect
 ## Completion Standard

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -60,22 +60,21 @@ Optional startup inputs may include:
 7. wait only for the initial clarification approval before development starts
 8. initialize developer-session tracking for the run
 9. start the develop developer session only after `P2` is ready to begin
-10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
+10. send the original prompt and ask for an initial plan plus major risks or assumptions as the first owner message in that session
 11. wait for the developer's first exchange
 12. send the approved clarification prompt as the second owner message in that same session
 13. only after that second message, continue with the normal planning conversation
 ## First developer-session handshake
-The first developer session of the run must begin in this exact order:
+The first developer session of the run should begin in this order:
 1. owner starts the develop developer session
-2. owner sends: `lets plan this <original-prompt>`
+2. owner sends the original prompt and asks for an initial plan plus major risks or assumptions
 3. developer responds
 4. owner sends the approved clarification prompt
 5. planning proceeds from there
-Do not skip the initial planning opener.
 Do not send the clarification prompt first.
 Do not merge those two messages into one.
@@ -119,8 +118,6 @@ Each developer session record should include enough to recover and export it lat
 - `created_phase`
 - `session_id`
 - `status`
-- `handoff_in`
-- `handoff_out`
 Required project metadata fields in `../metadata.json` when relevant:
@@ -143,13 +140,7 @@ Required project metadata fields in `../metadata.json` when relevant:
 - record every developer session in `developer_sessions`
 - label every developer session using `develop-N`
 - create a new developer session only when the user explicitly requests a new session
-If the user explicitly requests a new session while one is active:
-1. ask the current developer exactly: `give me a summary of all the work that has been done`
-2. treat that reply as the handoff summary
-3. start the new developer session with that summary as the handoff-in context
-4. assign the next `develop-N` label in sequence
+- when a replacement session is explicitly requested, treat it as a manual restart rather than a formal handoff workflow and assign the next `develop-N` label in sequence
 ## Initial structure rule

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -17,16 +17,20 @@ Use this skill during `P4 Development` before prompting the developer.
 - define lightweight planning notes for the module before coding
 - define the module purpose, constraints, and edge cases before coding
+- define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
 - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
 - implement real behavior, not partial scattered logic
 - handle failure paths and boundary conditions
 - add or update tests as part of the module work
+- prefer TDD when the behavior is well defined and the module is practical to drive test-first; otherwise define the expected tests before implementation and keep them tied to the module plan
+- keep `./docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
 - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
 - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
 - keep frontend and backend contracts synchronized when the module spans both sides
 - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
 - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
 - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
+- verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
 - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
 - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
 - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
@@ -41,9 +45,15 @@ Use this skill during `P4 Development` before prompting the developer.
 - use the `frontend-design` skill for frontend component or page work
 - use the `frontend-design` skill during web or desktop UI verification when reviewing screenshots and tightening the interface
 - do not hardcode secrets or persist local sensitive values in the repo while implementing
-- explain behavior changes clearly enough that the documentation discipline can be satisfied accurately
+- explain behavior changes clearly enough that the repo-local documentation discipline can be satisfied accurately
+- update repo-local docs such as `README.md` and `./docs/*` when runtime, build/preview, configuration, routes, tests, security boundaries, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
+- do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
+- keep `./docs/reviewer-guide.md` aligned when app entry points, route registration, build/preview commands, configuration surfaces, feature flags, debug/demo surfaces, mock defaults, logging structure, validation structure, or major module boundaries change
+- keep `./docs/security-boundaries.md` aligned when auth, authorization, admin/debug, or isolation logic changes
+- keep `./docs/frontend-flow-matrix.md` aligned when frontend pages, interactions, state transitions, or required UI states change
 - verify the module against its planned behavior before trying to move on
 - do not move on while the module is still obviously weak or half-finished
+- do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
 ## Verification model
@@ -54,9 +64,15 @@ Use this skill during `P4 Development` before prompting the developer.
 - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
 - if the local toolchain is missing, try to install or enable it before falling back to the broad gate path
 - for web UI projects, default local UI/E2E verification to Playwright when that stack is in use
+- for frontend-bearing projects, use the component/page-or-route/E2E layers intentionally instead of relying on only one frontend test layer for every kind of proof
 - for mobile projects, default local UI testing to the selected mobile test stack and use a platform-appropriate mobile UI/E2E tool when device-flow proof matters
 - for desktop projects, default local UI verification to Playwright's Electron support or another platform-appropriate desktop UI/E2E tool when window-flow proof matters
 - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
+- for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
+- for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
+- use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
+- use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
+- keep the test surface moving toward at least 90 percent meaningful coverage of the relevant behavior area as slices are completed
 - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
 ## Quality rules

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -36,6 +36,7 @@ These two files are the only evaluation prompt sources for evaluation runs.
 - read the chosen evaluation prompt file contents yourself before launching evaluation
 - compose one large final prompt block
 - prefix the request with a clear instruction that the reviewer must work in the current project directory and evaluate the delivered project
+- make sure the repo-local docs and code inside the current project directory are sufficient for evaluation; do not assume the evaluator will rely on parent-root docs or sibling workflow artifacts
 - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
 - send that fully composed text block directly to one fresh `General` evaluator session
 - require that session to produce a detailed file-backed report plus an issue summary

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -33,11 +33,23 @@ Hardening should treat these as the main review buckets before final evaluation
 - audit security boundaries, validation, ownership, and secret handling
 - prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
 - audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
+- audit whether `./docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, and gaps in a way a static evaluator can follow quickly
+- audit whether the project is actually approaching or achieving at least 90 percent meaningful coverage of the relevant behavior surface rather than relying on a thin happy-path suite
 - audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
 - inspect architecture, coupling, file size, and maintainability risks
 - focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
 - check for bad engineering practices that accumulated during implementation
 - tighten weak tests, weak docs, and weak operational instructions
+- audit static review readiness: entry points, routes, config, README, and test commands should be traceably consistent without depending on runtime tribal knowledge
+- audit that the repo is self-sufficient and does not rely on parent-root docs or sibling workflow artifacts for static reviewability
+- audit repo-local evaluator docs: `./docs/reviewer-guide.md`, `./docs/test-coverage.md`, `./docs/security-boundaries.md`, `./docs/frontend-flow-matrix.md`, and `./docs/api-spec.md` when relevant
+- audit static security-boundary readiness: a fresh reviewer should be able to trace auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation from repository artifacts when applicable
+- if mock, stub, fake, interception, or local-data behavior exists, verify that its scope, default state, and boundaries are disclosed accurately and do not imply undisclosed real integration
+- audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in repo-local docs when they exist
+- audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
+- audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
+- audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
+- verify that missing failure handling is not being hidden behind fake-success behavior
 - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
 - re-check frontend and backend observability, redaction, and operator visibility paths
 - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
@@ -54,7 +66,13 @@ Before `P6` can close, the owner should have a clear answer for each of these:
 - prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
 - security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
 - test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
+- coverage depth: does the current evidence support roughly 90 percent meaningful coverage of the relevant behavior surface, and if not, what remains weak?
 - major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
+- static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
+- security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
+- coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
+- frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
+- repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?
 ## Rules

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -33,12 +33,17 @@ Once a failure class is known:
 - for mobile and desktop work, run the selected stack's platform-appropriate UI/E2E coverage for major flows and review screenshots or equivalent artifacts for real UI behavior and regressions
 - end-to-end coverage must use the real intended user-facing or admin-facing surfaces for the flow; if the flow cannot be exercised that way, treat the missing surface as incomplete work
 - verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
+- verify 401, 403, 404, conflict or duplicate-submission, object-authorization, tenant or user-isolation, and sensitive-log-exposure paths where those risks exist
 - verify security-sensitive behavior where applicable
 - verify multi-tenant and cross-user isolation where applicable, including negative checks rather than single-actor happy paths only
 - verify file/path safety for file-bearing flows where applicable, including traversal-style negative cases
 - verify secrets are not committed, hardcoded, or leaking through logs/config/docs
 - verify error surfaces and auth-related failures are sanitized for users and operators appropriately
 - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
+- tighten `./docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
+- when security-bearing behavior changes, tighten `./docs/security-boundaries.md` so enforcement points and mapped tests stay accurate
+- when frontend-bearing behavior changes, tighten `./docs/frontend-flow-matrix.md` so key pages, interactions, and required UI states stay accurate
+- when routes, entry points, build/preview/config, feature flags, debug/demo surfaces, or mock defaults change, tighten `./docs/reviewer-guide.md` so static traceability stays current
 - challenge integration seams and adjacent-module behavior, not just the changed module local path
 ## Rules

package/assets/skills/planning-gate/SKILL.md CHANGED Viewed

@@ -42,9 +42,12 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
 ## Cross-document discipline
-- require owner-maintained planning docs under parent-root `../docs/` when relevant, especially `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md`
-- require cross-document consistency so design, API/spec, and test-planning artifacts do not drift on lifecycle/state models, permissions, flow coverage, or operational behavior
+- require owner-maintained planning docs under parent-root `../docs/` when relevant, but do not let the repo depend on them for normal use or evaluation readiness
+- require repo-local evaluator-facing docs under `./docs/` when relevant, especially `./docs/reviewer-guide.md`, `./docs/test-coverage.md`, `./docs/security-boundaries.md`, `./docs/frontend-flow-matrix.md`, and `./docs/api-spec.md`
+- require cross-document consistency so external references, repo-local docs, API/spec notes, and test-planning artifacts do not drift on lifecycle/state models, permissions, flow coverage, or operational behavior
 - if planning docs disagree on core system behavior, planning is still in progress
+- when `./docs/test-coverage.md` is relevant, require it to be structured as explicit requirement or risk mappings rather than generic narrative
+- require the accepted plan to cover system overview, architecture reasoning, major modules or chunks, domain model, data model where relevant, interface contracts, failure paths, state transitions, logging strategy, testing strategy, README implications, and Docker execution assumptions when those dimensions apply
 ## Cross-cutting planning requirements
@@ -57,9 +60,17 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
   - auth/session edge cases such as expiry, refresh, or clock skew tolerance
 - when the prompt says behavior is configurable, require the real configuration surface, permissions, operator flow, and backend support to be planned explicitly
 - when a feature must be admin-manageable or operator-manageable, require a real usable UI surface for that management flow, not just API endpoints or data-model notes
+- for web projects, require Docker-first runtime planning unless the prompt or existing repository clearly dictates otherwise
 - when the project has database dependencies, require a dedicated `./init_db.sh` plan as the only project-standard database initialization path
 - when the project has database dependencies, require runtime and test entrypoints to rely on `./init_db.sh` for database preparation rather than scattered manual setup
 - do not accept planning that leaves database connection values or database bootstrap values hardcoded in repo logic instead of driven through `./init_db.sh`
+- when the project uses mock, stub, fake, interception, or local-data behavior, require the plan to state how that scope will be disclosed accurately in repo-local docs and visible adaptor/config boundaries
+- do not accept planning that lets a mock-only or local-data-only project look like undisclosed real integration delivery
+- do not accept planning that hides missing failure handling behind fake-success branches
+- when the project has meaningful auth or access control, require a static security-boundary inventory in planning artifacts covering auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug surfaces, and tenant or user isolation rules when applicable
+- require repo-local disclosure planning for feature flags, debug or demo surfaces, default enabled states, and mock or interception defaults whenever they exist
+- require traceability planning for build, preview, configuration, app entry points, route registration, module boundaries, and test entry points through repo-local docs rather than parent-root references
+- require logging and validation contracts to be planned concretely enough for repo-local static review
 ## Architecture-depth requirements
@@ -75,6 +86,13 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
 - major user-facing flows are mapped to backend support and verification targets
 - security-critical areas are planned early enough that they will not be left to accidental late cleanup
 - test sufficiency has been considered at the level of core happy path, major failure paths, security-critical paths, and obvious high-risk boundaries
+- the plan explicitly defines module-level responsibilities, flows, boundaries, and completion tests before implementation
+- TDD is planned where the behavior is well defined and practical, and where TDD is not practical the expected tests are still defined before implementation
+- test sufficiency is mapped explicitly enough that a fresh reviewer can trace requirement or risk point to test evidence and remaining gaps without guesswork
+- backend or fullstack plans explicitly cover 401, 403, 404, conflict or duplicate submission when relevant, object-level authorization, tenant or user isolation, and sensitive-log exposure in the coverage plan
+- frontend-bearing plans explicitly cover the required state model for major flows, including loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
+- frontend-bearing plans explicitly include component, page or route integration, and E2E coverage where applicable; non-trivial frontend plans explicitly include component, page, route, or state-focused test coverage where UI state complexity is meaningful rather than relying only on E2E or runtime confidence
+- the coverage plan is strong enough to reach at least 90 percent meaningful coverage of the relevant behavior surface
 - major engineering quality has been addressed through maintainable boundaries, clear decomposition, and shared contracts
 - frontend route, page, component, and state boundaries are planned when the UI is material
 - configurable behaviors are concretely planned where the prompt requires configurability
@@ -82,6 +100,10 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
 - prompt-critical operational obligations and operator visibility paths are concretely planned
 - prompt-literal storage, partitioning, indexing, retention, or performance requirements are explicitly represented
 - database-bearing projects explicitly plan `./init_db.sh`, its runtime/test integration points, and how it grows with real schema/bootstrap needs
+- static review readiness is explicitly planned, including how a fresh reviewer can trace entry points, routes, config, test commands, and any mock or local-data boundaries from repository artifacts alone
+- static security-boundary readiness is explicitly planned in docs or code structure where applicable
+- repo-local docs are sufficient for a new reviewer without requiring parent-root docs for startup, build/preview, config, feature flags, security boundaries, or coverage mapping
+- web projects default to Docker-first runtime planning unless a prompt-faithful exception is clearly justified
 - relevant cross-cutting system contracts are explicitly defined rather than left to per-module invention
 - each major module has a clear integration contract with existing modules and shared patterns
 - verification plans include cross-module seam checks, not just isolated feature tests