npm - theslopmachine - Versions diffs - 1.0.2 → 1.0.3 - Mend

theslopmachine 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -30,6 +30,8 @@ Use this skill before prompting the developer for the main implementation run.
 - before implementing lane-owned product code in the main checkout, either launch the planned helper lane or record the concrete blocker and revise sequencing; convenience serialization is a process failure
 - each module must verify every file it created or changed before reporting completion: the files must be real, relevant, integrated with their imports/routes/config/tests, free of placeholder/demo-only completion, and aligned with the module packet
 - each module must run all tests assigned to its owned module/files before reporting completion, plus the strongest relevant local checks for those files; if any assigned test cannot run, the module is incomplete unless it reports a concrete blocker for main-lane decision
+- use the plan-row execution ledger as the P3 scoreboard: update each actionable plan row as work lands, and do not report clean development completion while any row is still `planned`, `in progress`, delegated without a receiving module, or unverified
+- use safe parallelism to reduce elapsed time without reducing proof: independent module packets, test-coverage work, documentation reconciliation, and verification passes may run in helper worktrees when the plan marks them safe, but every helper result must be integrated, reread, and verified before its ledger rows can close
 - if a planned lane cannot be launched, record the exact skipped lane, blocker, and revised sequencing before falling back to serial work
 - use the rest of development to make the repo coherent enough for the owner-run local-harness gate in `P5` and the later owner-run Docker/runtime plus dockerized `./run_tests.sh` confirmation in `P9`
 - when the owner provides a bounded correction or final release-readiness checklist, treat it as a hard acceptance contract and respond against it explicitly
@@ -43,6 +45,14 @@ Use this skill before prompting the developer for the main implementation run.
 - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
 - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
 - when working inside a `plan.md` workstream, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims that workstream could affect before reporting readiness
+- implement vertically, not breadth-first: build one complete user/operator flow end-to-end before starting the next
+- for every form, implement template + route + handler + service + persistence + response + test together before moving to the next form
+- for every page link, register and render the target page and prove it works for the intended role before claiming the source page complete
+- for every background job, wire it from the entrypoint and verify it is reachable from startup before claiming it complete
+- for every security control, enforce it at the correct layer (service, middleware, DB, template, runtime) before claiming it complete
+- a feature is complete only when the intended actor can perform the task end-to-end through the real app path, or it is explicitly marked incomplete with a named residual risk
+- do not call a module complete because files, routes, templates, or tests exist
+- do not report completion counts (e.g., "12 modules done", "76 endpoints implemented", "104 tests passing") as completion evidence
 - implement real behavior, not partial scattered logic
 - do not count route registration, page shells, form shells, CRUD shells, placeholder handlers, or static demos as completion when the planned behavior is richer than that
 - do not count generated folders, package manifests, build wrappers, or smoke-only checks as feature completion when the prompt requires role workflows, real persistence, runtime policy behavior, or FE↔BE task closure
@@ -80,6 +90,9 @@ Use this skill before prompting the developer for the main implementation run.
 - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
 - when the project has database dependencies, keep `./init_db.sh` aligned with the real schema, migrations, bootstrap data, and dependency setup as implementation evolves
 - do not leave `./init_db.sh` as a scaffold placeholder once real database requirements are known
+- when the app needs accounts or sample records to be useful quickly, provide deterministic idempotent seeded data through the normal bootstrap/database/runtime path and document the exact values in `README.md`
+- if no seeded data is needed, make the README say exactly `No seeded data required; the app is useful from an empty state.` and make sure that claim is true
+- seeded data may be local demo/test fixture data, but it must not replace real persistence, validation, authorization, side effects, or task completion with static fake-success paths
 - do not hardcode database connection values or database bootstrap values anywhere in the repo; database setup must stay driven by `./init_db.sh`
 - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
 - do not treat a module as complete just because it renders, routes, stores data, or returns 200s if the business rules, failure handling, or operator-facing closure expected by `plan.md` are still missing
@@ -93,6 +106,7 @@ Use this skill before prompting the developer for the main implementation run.
 - explain behavior changes clearly enough that the README and owner-maintained external documentation can be kept accurate
 - update `README.md` when runtime, build/preview, configuration, routes, tests, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
 - keep `README.md` aligned with the strict audit contract as the implementation matures: project type near the top, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
+- keep `README.md` aligned with the quick-start seeded data contract: seeded accounts/sample records/IDs/URLs and main-flow steps, or the exact empty-state statement
 - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance for the strict README audit
 - for Android, iOS, and desktop work, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections the strict README audit expects
 - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
@@ -102,12 +116,15 @@ Use this skill before prompting the developer for the main implementation run.
 - before reporting development complete, do not leave obvious repo-coherence, local-harness, startup, or Docker wiring issues for `P5` or `P9`; `P5` should only need a rough correctness pass over the prepared local harness before evaluation
 - before reporting development complete, run one deliberate main-lane pre-`P5` reread against the original prompt plus accepted requirements-and-clarification package, accepted `plan.md`, `../docs/design.md`, accepted `../docs/api-spec.md` when applicable, `README.md`, and the integrated repo state so the owner is not first discovering obvious contract drift in `P5`
 - before reporting development complete, the main lane must perform module-by-module fan-in verification: reread every planned module row, inspect the files delivered for that module, confirm the files are real and integrated, run the module's assigned tests after merge, and record the result in the development-exit module verification matrix in `plan.md`
-- before reporting development complete, run the full non-Docker local test suite available for development in addition to module-targeted tests; Docker and dockerized `./run_tests.sh` remain deferred to `P9`
+- before reporting development complete, run the full non-Docker local test suite available for development in addition to module-targeted tests; for web/fullstack/frontend-bearing projects, also run the planned local E2E or platform-equivalent checks when the accepted plan requires them; Docker and dockerized `./run_tests.sh` remain deferred to `P9`
+- before reporting development complete, close the plan-row execution ledger and coverage closure ledger: every actionable row must be complete/not-applicable/risk-accepted, API true no-mock HTTP coverage must be 100% for documented prompt-relevant endpoints unless per-endpoint exceptions are recorded, unit-testable product-code coverage must be at least 90% where measurable, and planned E2E/platform-critical flow coverage must be at least 90% closed or explicitly risk-accepted
 - before reporting development complete, fill or update the planned-but-missing proof ledger for core semantic path, prompt-critical rules, role surface matrix, runtime lifecycle behavior, security fail-closed expectations, README command honesty, and behavioral coverage proof
 - before reporting development complete, prove the core semantic path with the exact input/setup, user/API path, expected state/artifact, and failure behavior named in `plan.md`, or report the exact residual risk rather than claiming readiness
 - before reporting development complete, for lifecycle-sensitive behavior, include entrypoint-level proof that the scheduler/worker/timed/export/import/polling/startup/cleanup path is wired and mutates state or artifacts as planned
 - before reporting development complete, close the common `P5` failure classes inside development rather than leaving them for owner rediscovery: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup or wrapper dishonesty, and partial user-facing or admin-facing flow closure
 - before reporting development complete, self-check the integrated repo against the release-readiness requirements already absorbed into `plan.md` `Delivery Review Requirements`; do not leave prompt-fit, static-reviewability, logging/validation, security-boundary, or coverage-structure defects unresolved
+- before reporting development complete, self-check static architecture credibility: README/docs/scripts/routes/config/examples/manifests/env examples agree, pages/routes/app shell are connected, state/data flow is traceable, service/adaptor/mock/storage boundaries are clear, redundant/unnecessary files are removed or justified, and core logic is not excessively piled into one file
+- for pure frontend `web` projects with no backend service, local/mock/sample data is acceptable only when README/UI boundaries are honest; do not imply real backend integration, and do not use fake-success paths to hide missing validation, failure, or state handling
 - before reporting development complete, self-check the no-orphan requirement ledger and every module requirement closure checklist; no accepted requirement, clarification, API route, actor path, data object, security boundary, report/export/notification, or documentation obligation may remain unchecked, vaguely delegated, or proven only by broad smoke coverage
 - before reporting development complete, verify that assertion-level module proof rows have corresponding implementation and test evidence; if a row says `Given/When/Then`, the delivered proof must exercise that condition and observable outcome, not only route registration, rendering, or a mocked success response
 - before reporting development complete, when backend/fullstack APIs exist, make sure endpoint inventory, `METHOD + PATH` mapping, and true no-mock HTTP coverage expectations in `../docs/test-coverage.md` and the repo are genuinely aligned rather than only implied
@@ -141,7 +158,7 @@ Use this skill before prompting the developer for the main implementation run.
 - even though Docker execution and dockerized `./run_tests.sh` are deferred until the owner-run confirmation in `P9`, build that Docker/runtime path as if it will be exercised by a cold reviewer on the first try: no hidden setup, no manual export steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
 - do not add runtime/test scripts, Compose services, or Docker entrypoints that assume hidden global setup for the final delivered path; keep both the local harness and the Docker/runtime path explicit and repo-controlled before the current `plan.md` workstream is considered complete
 - do not run Docker or dockerized `./run_tests.sh` during ordinary implementation work; use targeted local tests during iteration and run the prepared local harness before material readiness claims, while `P7` still remains non-Docker
-- for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary implementation work
+- for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests during ordinary module iteration; when all modules are complete and the plan includes Playwright/browser E2E or another local platform-equivalent check, run that planned local E2E/platform check before reporting P3 development complete
 - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
 - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary implementation work rather than broad checkpoint commands
 - when the current workstream materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
@@ -150,17 +167,19 @@ Use this skill before prompting the developer for the main implementation run.
 - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
 - when backend logging matters, keep request or route outcomes, exceptions, and background failure logging on the shared structured logging path with redaction intact
 - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
-- keep the test surface moving toward the planned confident roughly `90%` overall coverage goal with real tests where they matter, and do not defer obvious coverage debt just because evaluation is later
-- for backend or fullstack APIs, keep the work moving toward the `plan.md` target of 100 percent true no-mock HTTP coverage for the resolved prompt-relevant `METHOD + PATH` surfaces rather than leaving endpoint coverage as optional follow-up
+- keep the test surface moving toward the plan's hard coverage floors with real tests where they matter; do not defer obvious coverage debt just because evaluation is later
+- for backend or fullstack APIs, treat the `plan.md` target of 100 percent true no-mock HTTP coverage for resolved prompt-relevant `METHOD + PATH` surfaces as blocking for clean development completion unless narrow endpoint-level exceptions are recorded with compensating proof
 - for backend or fullstack APIs, keep `../docs/test-coverage.md` moving toward an endpoint inventory plus API test mapping table, not just a generic risk matrix
 - in each development follow-up or completion reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
 - when the owner names specific expected outcomes for the current workstream or gate, tie the reported verification and changed files back to those expected outcomes explicitly
-- before reporting overall development complete, run the prepared local test harness and report the exact command plus concrete result; this should normally be the current stack's real local suite rather than an invented placeholder wrapper
+- before reporting overall development complete, run the prepared local test harness and report the exact command plus concrete result; this should normally be the current stack's real local suite rather than an invented placeholder wrapper; also run the planned local E2E/platform-equivalent command when the accepted plan includes one, and report exact command plus result
 - before reporting overall development complete, report the module-by-module main-lane verification results, including files reviewed, tests run, FE↔BE/API wiring status, and any integration defects fixed during fan-in
 - before reporting development complete, compare planned optional helper branches to actually used helper branches; report launched transcript/session references, branch/worktree paths, commits, verification, skipped optional branches, and the reason for any skipped helper branch
 - before reporting development complete, require the main lane to consume every module completion packet: inspect ownership, reject incomplete packets, wire shared surfaces, rerun targeted module tests, and record integrated verification evidence before moving to the next module
 - before reporting development complete, use the P3 Development Completion Report shape: module packet closure, optional helper branches used, shared integration work, full-suite/unit/integration/E2E command results, cross-module integration proof, and remaining risks
+- before reporting development complete, include Plan Section Closure Evidence: cite each major accepted `plan.md` section or matrix row that is claimed complete, the concrete repo evidence, the test or verification result, and any residual risk or blocker
 - before reporting development complete, include a no-orphan closure summary in the P3 Development Completion Report: total requirement rows, closed rows, delegated rows with receiving module, risk-accepted rows, and any rows not closed; any non-zero unclosed row count must block a clean completion claim
+- before reporting development complete, include coverage closure in the P3 Development Completion Report: API endpoint count and true no-mock HTTP count, unit coverage percentage or substitute ledger result, E2E/platform-critical flow row count and closed count, exceptions, and exact commands/results
 - in the development-complete reply, explicitly report the core semantic path proof, prompt-critical rule proof status, role surface proof status if applicable, lifecycle proof status if applicable, and any accepted or unresolved residual risks
 - in the development-complete reply for fullstack or backend-backed frontend projects, explicitly report FE↔BE integration proof status, including any frontend surface not backed by real backend behavior and any backend feature not exposed through required frontend UI
 - when optional helper branches were part of the planned work, report which helpers actually launched, which were skipped, and the exact reason or revised sequencing for any skipped helper

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -40,7 +40,7 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
 - use its exact issue list extracted from the saved kept report file as the scope of that exact audit session's `bugfix-N` lane
 - save the report as `../.tmp/audit_report-<N>.md`
 - once that report is kept, treat its exact full issue list as the authoritative fix-check scope for the rest of that audit session; later remediation may narrow to the unresolved subset from that kept scope
-- send the full kept-report issue set to the developer in direct human review language, with explicit owner analysis of the failing surfaces and the expected fixes
+- translate the full kept-report issue set into direct owner instructions before messaging the developer: state what is broken, why it matters, affected files/surfaces, expected fixes, and verification to run
 ### `pass`
@@ -50,13 +50,13 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
 ## Issue handoff standard
-- send the developer the exact full scoped issues from the current report in direct human review language and explicit, detailed corrective form
-- do not tell the developer to read the audit report directly
+- send the developer only the owner-written corrective brief, in direct human language, with explicit expected behavior and verification
+- do not tell the developer to read the audit, evaluation, workflow, phase, lane, or report artifact directly
 - phrase the request as your own review, for example `fix these issues I found`, rather than attributing the list to a workflow event or report file
 - require the developer to address every currently unresolved item from the kept-report scope on later loop passes until the fix-check confirms the whole kept report scope is fixed
 - require the developer to report the exact verification commands that were run and the concrete results they produced
 - require the developer to provide an AI self-test report or concise self-test summary that can be attached or mentioned in the evaluator follow-up
-- if the developer claims an issue is invalid or already fixed, require a concrete justification against the audit output instead of silently omitting it
+- if the developer claims an issue is invalid or already fixed, require a concrete justification against your owner-written issue summary and the underlying evidence instead of silently omitting it
 - do not reduce the handoff to a small issue subset or a thin summary; the developer-facing prompt should contain the full issue set for the current scope
 - do not reduce the handoff to a small issue subset, top issue cluster list, or thin summary; the developer-facing prompt should contain the full issue set extracted from the saved report file for the current scope
 - for every issue, analyze and state as clearly as possible:

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -94,10 +94,22 @@ Reject and archive the report if shape validation fails. A fix-check report is t
   - session 2 for audit session `2`
   - session 3 for the final coverage/README audit
+## Evaluator session hygiene
+The same evaluator session is reused for all reruns within an audit session. This is intentional and required. The fix for contamination is not switching sessions; it is stronger rerun instructions that prevent the evaluator from referencing prior runs.
+Rules:
+- keep using the same evaluator session for all reruns, fail regenerations, and fix-checks within an audit session
+- do not start a fresh evaluator session for rerun contamination; re-send with stronger anti-contamination instructions instead
+- after receiving a rerun report, reject it if it contains any prior-run framing such as `previously`, `remaining`, `still remaining`, `fixed from the prior run`, `rerun`, `regenerated`, `again`, `previous inspection`, or similar
+- archive contaminated reports under `../.ai/archive/` with a `stale-reference-contamination` reason
+- the standalone-audit expectation applies to all ordinary audits and coverage/README reruns; fix-check loops are the only narrow exception
 ## Report root and naming
 - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
 - when a report must be discarded or replaced during `P7`, move it out of `../.tmp/` into parent-root `../.ai/archive/` with a unique name instead of deleting it; archived copies are trace artifacts only and are not part of the submission contract
+- maintain a lightweight parent-root `../.ai/p7-issue-lineage.md` register during `P7`; each kept report issue or recommendation that drives remediation should record audit number, source report path, severity/verdict context, origin classification, remediation lane/turn, closure evidence, and any reusable workflow lesson
 - do not use the older cycle-directory report-root model
 - audit session `1` owns:
   - `../.tmp/audit_report-1.md`
@@ -145,7 +157,21 @@ When rerunning an audit inside the same evaluator session after a remediation pa
 - let that utility append this exact short footer beneath the full prompt copy for the rerun:
 ```text
-Check the entire current project again against the full prompt above and regenerate the complete report from scratch. Save the report to the same required path. Make the report read as a fresh standalone audit of the current repo state and do not mention or imply any previous run, rerun, regeneration, prior inspection, earlier fixes, or earlier remaining items.
+You are performing a fresh standalone audit of the current project state. This is not a rerun, regeneration, or follow-up to any prior audit. You have no memory of any previous inspection of this project.
+Rules:
+- Audit the current repo state as if for the first time
+- Do not mention, imply, or reference any previous run, prior inspection, earlier findings, fixes made, items remaining from before, regeneration, or rerun
+- Do not use framing such as "previously", "remaining", "still", "fixed from the prior run", "again", "as before", "continues to", or similar prior-run language
+- Do not compare current state to any imagined prior state
+- Generate the complete report from scratch based only on the prompt above and the current repo files
+- Save the report to a new file; do not overwrite or append to any existing audit file
+- Report only the current verdict and only issues or recommendations that exist in the current repo state
+- If an issue no longer exists, do not mention it at all; do not state that it was fixed or resolved
+- If a new issue exists, report it as a fresh finding with no reference to prior state
+- Use only present-tense, current-state language; avoid temporal markers that suggest sequence or history
+Violation of these rules makes the report invalid and it will be rejected.
 ```
 - use this footer for same-session fail regenerations and for same-session final coverage/README reruns after fixes land
@@ -215,7 +241,7 @@ Inside a kept audit session after `audit_report-<N>.md` exists:
 - treat the exact issue list extracted from the saved `audit_report-<N>.md` file as the scope of the loop for `partial pass`
 - treat every reported issue and recommendation found in the saved kept report file as the scope of the loop for `pass`
 - send that full scoped issue set to `bugfix-<N>` in direct human review language with owner analysis of the exact surfaces and expected fixes
-- do not tell the developer to read the audit report file directly
+- do not tell the developer to read audit, evaluation, workflow, phase, lane, or report artifacts directly; translate findings into an owner-written corrective brief with concrete engineering instructions
 - phrase the fix request as your own review, for example `fix these issues I found`, rather than as a report handoff
 - do not ask the evaluator for `top issues`, `major issue clusters`, `summary issues`, or any other reduced remediation scope when the full report file already exists; the owner must read the whole file and extract the whole issue set instead
 - require the developer to fix the scoped issue set and report:
@@ -291,7 +317,7 @@ If a report appears stale, contradicted by current files, or degraded:
   - audit session `2` is complete
   - the post-audit coverage/README audit has run as the last subphase of `P7`
 - a clean `pass` or `partial pass` verdict alone does not end an audit session; the corresponding `bugfix-<N>` issue/recommendation loop must also close unless the kept `pass` report had no reported items at all
-- after the second audit session completes, run the coverage/README audit; only when the fresh report is a full standalone pass-level report and the owner has extracted no remaining issue/recommendation set from the saved file may the workflow move to `P8 Final Readiness Decision`; `P8` then performs one fast reconciliation sweep across the repo, parent-root docs, and carried audit artifacts before packaging begins
+- after the second audit session completes, run the coverage/README audit; only when the fresh report is a full standalone pass-level report and the owner has extracted no remaining issue/recommendation set from the saved file may the workflow move to `P8 Final Readiness Decision`; `P8` then loads `p8-readiness-reconciliation` and follows that skill before packaging begins
 - until that exit target is actually met, never stop merely because one audit attempt, one remediation turn, or one fix-check loop pass has finished
 ## Boundaries

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -15,11 +15,12 @@ It is a minimal local-verification-and-coherence gate, not a perfection gate.
 It starts immediately after accepted P3 architecture execution ends. There is no separate post-development gate before this phase.
-During `P5`, the working execution plan is still the repo-local `plan.md`. Before surfacing the proceed-to-evaluation pause, preserve its final truthful contents in parent-root `../docs/plan.md` and remove the repo-local copy, because the execution plan has become reference documentation at that point.
+During `P5`, the working execution plan is still the repo-local `plan.md`. Before entering `P7`, preserve its final truthful contents in parent-root `../docs/plan.md` and remove the repo-local copy, because the execution plan has become reference documentation at that point.
 The default goal is to get to an evaluation-ready checkpoint quickly.
-Run the prepared local test harness here as the owner-side integration gate, run the required internal evaluation loop below, fix only narrow owner-side docs, README, config, wrapper, or light script glue directly, reroute any real code or actual test-file work back to the developer, and stop for a user proceed-to-evaluation check as soon as:
+Run the prepared local test harness here as the owner-side integration gate, run the planned local E2E/platform-equivalent checks when the accepted plan requires them, run the required internal evaluation loop below, fix only narrow owner-side docs, README, config, wrapper, or light script glue directly, route any real code or actual test-file work to the active P5 bugfix lane, and enter `P7` as soon as:
 - the local test harness is green
+- the planned local E2E/platform-equivalent checks are green when the accepted plan requires them
 - the repo roughly aligns with `plan.md` and accepted `../docs/design.md`
 - the developer-side development-exit module verification matrix in `plan.md` is filled for every prompt-relevant module and shows main-lane file integration plus assigned module tests completed
 - the no-orphan requirement ledger and module requirement closure checklists in `plan.md` have no unchecked, vaguely delegated, or generic-smoke-proven requirement rows
@@ -47,8 +48,8 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
 - keep that pass focused on the minimal `P5` gate: green local verification and rough correctness/coherence against `plan.md` plus accepted `../docs/design.md`
 - include a planned-but-missing challenge table in the `P5` evidence: `planned claim`, `proof`, `gap`, `decision`
 - explicitly inspect the `Core Semantic Path Proof`, `Prompt-Critical Rule Matrix`, `Role Surface Matrix`, and `Runtime Lifecycle Checklist` sections of `plan.md` when they apply
-- explicitly inspect the development-exit module verification matrix in `plan.md`; if it is missing, stale, or does not show that every module's files were inspected and assigned tests were run after fan-in, route the gap back before completing `P5`
-- explicitly inspect the no-orphan requirement ledger and module requirement closure checklists; if any accepted requirement, API route, actor path, data object, security boundary, or report/export/notification path is unmapped, vaguely mapped, or marked complete without assertion-level proof, route the gap back before completing `P5`
+- explicitly inspect the development-exit module verification matrix in `plan.md`; if it is missing, stale, or does not show that every module's files were inspected and assigned tests were run after fan-in, route the gap to the active P5 bugfix lane before completing `P5`
+- explicitly inspect the no-orphan requirement ledger and module requirement closure checklists; if any accepted requirement, API route, actor path, data object, security boundary, or report/export/notification path is unmapped, vaguely mapped, or marked complete without assertion-level proof, route the gap to the active P5 bugfix lane before completing `P5`
 - explicitly inspect a sample of assertion-level proof rows across modules, including at least the core semantic path, one security/authorization row when applicable, one failure/edge row when applicable, and one FE↔BE/API/data row when applicable; if these rows do not correspond to real test or verification evidence, reject the development completion claim
 - do not accept shallow status/enqueue/helper-only evidence for the core semantic path when the plan requires a state transition, persisted artifact, frontend-to-backend behavior, export/render output, or failure-mode proof
 - for lifecycle-sensitive apps, require at least one entrypoint-level proof of scheduler/worker/timed/export/import/polling/startup/cleanup behavior rather than only unit tests of helpers
@@ -57,7 +58,7 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
 - for coverage-sensitive apps, separate route/API/surface inventory completeness from behavioral proof sufficiency rather than using percentages or route counts alone
 - treat `plan.md` plus accepted `../docs/design.md` as the primary owner comparison baseline in `P5`; use the raw prompt only to catch obvious major drift rather than to run a fresh literal requirement nitpick pass here
 - do not let `P5` compensate for planning gaps by manually reinterpreting requirements; if a prompt-required behavior is missing from the accepted no-orphan ledger or module packets, classify it as a planning/clarification miss, route the repo fix normally, and capture the workflow-learning classification
-- turn the collected findings into one complete analyzed correction brief after the full review sweep, write it in direct human review language, and only then route fixes back when a real coding reroute is truly needed
+- turn the collected findings into one complete analyzed correction brief after the full review sweep, write it in direct human review language, and only then send fixes to the P5 bugfix lane when real coding work is truly needed
 - every owner correction brief in `P5` must include, for each issue or grouped issue set:
   - what is wrong
   - why it matters
@@ -72,25 +73,25 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
   - what verification should prove it is fixed
 - do not classify issues by where they surfaced; classify them by the earliest artifact that should have made the correct behavior unavoidable
 - if `plan.md` or accepted `../docs/design.md` already names the expected behavior, treat the fix as a developer execution problem rather than reopening clarification or planning
-- if a prompt-required behavior is absent or materially vague in the accepted plan/design, capture it as a planning or clarification weakness while still routing the repo fix normally
+- if a prompt-required behavior is absent or materially vague in the accepted plan/design, capture it as a planning or clarification weakness while still routing the repo fix to the active P5 bugfix lane when code or tests must change
 - when a failure class is known, isolate the likely affected module, route family, helper family, or shared flow and direct narrow proof first
 - fix small owner-side issues directly only when they are low-risk and clearly not core product implementation, such as documentation cleanup, design or `plan.md` tightening, `README.md` sync, deferred Docker configuration, wrapper/config glue, local-harness glue, or light `./run_tests.sh` script cleanup
-- if the needed fix touches real product code, application logic, or actual test files or suites, route it back to the developer lane instead of patching it in-owner
-- route real coding fixes back to the developer lane for runtime failures, broken tests, or material `plan.md` coherence gaps that the owner should not patch directly
+- if the needed fix touches real product code, application logic, or actual test files or suites, send it to the active P5 bugfix lane instead of patching it in-owner or reopening `develop-*`
+- route real coding fixes to the P5 bugfix lane for runtime failures, broken tests, or material `plan.md` coherence gaps that the owner should not patch directly
 - start by comparing `plan.md` against the actual repo state and the developer's completion claim; missing or weak plan obligations are first-class `P5` work, not a reason to insert another pre-`P5` gate
 - require the developer to report the exact rerun commands and concrete results for any requested fixes
 - verify behavior against `plan.md`, accepted `../docs/design.md`, the documented local verification path, and rough repo coherence
 - verify that the delivered local test harness matches the documented development/review path in `README.md` or the accepted plan, and use that path here in `P5`
 - check for obvious correctness failures against the accepted design and plan, such as shell/demo/placeholder delivery where the plan required fuller closure, broken role/flow wiring, or major missing behavior that makes the repo clearly not ready for evaluation
-- run the prepared local test harness as the first real integrated gate for this phase
+- run the prepared local test harness as the first real integrated gate for this phase; when the accepted plan includes local E2E/platform-equivalent checks, run those before completing `P5` rather than deferring first execution to evaluation
 - if that path fails because of wrapper, README, docs, or local-harness glue issues, fix them directly in the owner session and rerun quickly
-- if making the gate pass would require editing actual test files, test suites, or product code, send that work back to the developer lane instead of patching it in-owner
-- if those commands expose failures that should not be owner-patched, include them in the same single complete analyzed correction brief and route that brief back to the developer lane
-- if the local test/coherence gate is green, run the internal `P5` evaluation loop before asking whether to proceed to formal evaluation
-- if the local test harness is green, the repo state roughly aligns with `plan.md` plus accepted `../docs/design.md` without major correctness/runtime breakage, and the internal `P5` evaluation loop has no unresolved Blocker/High findings, that is sufficient to complete `P5` and ask whether to proceed to formal evaluation
-- this sufficiency rule does not override no-orphan evidence: if accepted requirement rows or module closure checklist rows are visibly unchecked, vaguely delegated, or proven only by shell/smoke evidence, `P5` must route back rather than proceed
-- if the core semantic path remains unproven, stop and route proof/fix work back unless the user explicitly accepts the named residual risk
-- before asking whether to proceed, move the final truthful repo-local `plan.md` into parent-root `../docs/plan.md`, remove the repo-local copy, and then record in Beads that `P5` evidence is satisfied and that the workflow is waiting at the evaluation boundary; do not mutate into `P7` until the user explicitly says to continue
+- if making the gate pass would require editing actual test files, test suites, or product code, send that work to the active P5 bugfix lane instead of patching it in-owner
+- if those commands expose failures that should not be owner-patched, include them in the same single complete analyzed correction brief and route that brief to the P5 bugfix lane
+- if the local test/coherence gate is green, and any planned local E2E/platform-equivalent checks are green when required, run the internal `P5` evaluation loop before entering formal evaluation
+- if the local test harness is green, any planned local E2E/platform-equivalent checks are green when required, the repo state roughly aligns with `plan.md` plus accepted `../docs/design.md` without major correctness/runtime breakage, and the internal `P5` evaluation loop has no unresolved Blocker/High findings, that is sufficient to complete `P5` and enter `P7`
+- this sufficiency rule does not override no-orphan evidence: if accepted requirement rows or module closure checklist rows are visibly unchecked, vaguely delegated, or proven only by shell/smoke evidence, `P5` must route fixes to the active bugfix lane rather than proceed
+- if the core semantic path remains unproven, stop and route proof/fix work to the active bugfix lane unless the user explicitly accepts the named residual risk
+- before entering `P7`, move the final truthful repo-local `plan.md` into parent-root `../docs/plan.md`, remove the repo-local copy, and then record in Beads that `P5` evidence is satisfied
 ## Required Internal P5 Evaluation Loop
@@ -120,9 +121,9 @@ After round 5:
   - likely fix location and implementation guidance
   - potential regressions to check
   - whether it is `plan-required implementation miss`, `planning miss`, `clarification miss`, `planning mechanics miss`, or `owner review miss`
-- write the developer remediation prompt to `../.ai/p5-evaluation/developer-remediation-brief.md`
-- send the full remediation brief to the active developer lane
-- require the developer to fix all non-risk-accepted Blocker/High findings and report exact verification commands/results
+- write the bugfix remediation prompt to `../.ai/p5-evaluation/bugfix-remediation-brief.md`
+- open or reuse the active P5 bugfix lane, normally `bugfix-1`, and send the full remediation brief there
+- require the bugfix lane to fix all non-risk-accepted Blocker/High findings and report exact verification commands/results
 - verify the fixes against the consolidated findings before closing `P5`
 Do not run the final `P7` two-audit-session process inside this loop.
@@ -132,14 +133,14 @@ Do not count `P5` internal evaluator reports as formal acceptance artifacts.
 ## Rules
 - keep this integrated verification and hardening step practical, fast, and release-oriented rather than perfectionist
-- use the opening part of this phase to compare the repo against what `plan.md` and the accepted design claim, check that those claims are broadly true enough, then run the required internal evaluation loop before the proceed-to-evaluation check
+- use the opening part of this phase to compare the repo against what `plan.md` and the accepted design claim, check that those claims are broadly true enough, then run the required internal evaluation loop before entering `P7`
 - do not turn this phase into one-issue-at-a-time churn; do not remediate between internal evaluator rounds; after the owner broad pass and five-round issue-discovery loop, either proceed to the evaluation boundary, fix the small owner-side churn directly, or send one complete analyzed correction prompt listing all issues found if the repo is not yet evaluation-ready
 - do not create a lingering development-completion to `P5` mini-loop over anything that does not directly block green local verification or rough correctness/coherence against `plan.md` and accepted design
-- do not turn `P5` into an open-ended churn loop; run the owner-side local test harness, run exactly five internal evaluator issue-discovery rounds in one session without inter-round remediation, fix only narrow owner-side glue quickly after consolidation, reroute any real code or actual test-file work once, verify the consolidated Blocker/High fixes, and then stop for the proceed-to-evaluation check
+- do not turn `P5` into an open-ended churn loop; run the owner-side local test harness, run planned local E2E/platform-equivalent checks when required, run exactly five internal evaluator issue-discovery rounds in one session without inter-round remediation, fix only narrow owner-side glue quickly after consolidation, route any real code or actual test-file work once to the active P5 bugfix lane, verify the consolidated Blocker/High fixes, and then enter `P7`
 - every owner pass in `P5` should be a full design/API/plan/README/repo/evidence sweep rather than a targeted-section recheck
 - cap normal owner-side `P5` coherence iteration to 3 owner passes outside the required internal evaluation loop: one opening sweep plus up to two follow-up full-sweep passes after the consolidated correction list or owner-side glue fixes; if the repo is still not coherent enough after that, classify the remaining gap clearly instead of drifting into open-ended `P5` churn
-- when classifying remaining gaps, separate repo-remediation scope from workflow-learning scope: fixing the repo may be a developer task even when the retrospective lesson belongs to planning or clarification
-- when a `P5` correction list contains independent items, route those safe bundles back for parallel developer work where practical and require per-bundle verification before fan-in
+- when classifying remaining gaps, separate repo-remediation scope from workflow-learning scope: fixing the repo may be a bugfix-lane task even when the retrospective lesson belongs to planning or clarification
+- when a `P5` correction list contains independent items, route those safe bundles to the active bugfix lane for parallel helper work where practical and require per-bundle verification before fan-in
 - do not rerun the whole heavy suite after every single failure by default
 - if a broad rerun is not answering a new question, stop and go back to narrow proof
 - do not use this integrated verification and hardening step for broad new feature work

package/assets/skills/p8-readiness-reconciliation/SKILL.md ADDED Viewed

@@ -0,0 +1,98 @@
+---
+name: p8-readiness-reconciliation
+description: P8 final readiness reconciliation, D1-D9 developer-originated major-issue sweep, and agent-browser functional verification before packaging.
+---
+# P8 Readiness Reconciliation
+Use this skill only during `P8 Final Readiness Decision`.
+This skill is the single source of truth for the Developer D1-D9 major-issue categories used at P8. Do not duplicate these definitions in clarification, design, planning, development, or P5 assets. Earlier phases may require ordinary engineering evidence such as startup paths, Playwright E2E planning, validation tests, and README consistency, but the named D1-D9 sweep belongs here.
+## Required inputs
+Before deciding readiness, reread and reconcile:
+- delivered repo
+- repo-root `README.md`
+- parent-root `../docs/`
+- accepted final plan copy in `../docs/plan.md` when present
+- carried `../.tmp/` audit artifacts
+- archived stale/fail report lineage under `../.ai/archive/` when present
+- package-root expectations for `P9`
+- residual risks accepted earlier
+## Mandatory P8 output
+Record a readiness reconciliation note before entering packaging. It must include:
+- files/docs/artifacts checked
+- kept reports checked
+- archived/stale report lineage reviewed
+- package-root expectations checked
+- `agent-browser` verification result when applicable
+- D1-D9 table with `pass`, `fail`, `not applicable`, or `risk accepted`
+- concrete evidence for each non-`not applicable` D1-D9 row
+- final residual gaps and packaging decision
+## D1-D9 developer-originated major-issue sweep
+Use these exact categories and checks.
+| Category | Failure class | P8 evidence to check | Fail condition |
+|---|---|---|---|
+| D1 Execution and startup reliability | app cannot start, wrong command, broken path, hidden working-directory assumption, missing config/bootstrap | README startup command, actual repo scripts, app entrypoint, service health/readiness, runtime notes, local/P9 handoff expectations | documented startup path is missing, contradictory, non-repo-controlled, or has no credible evidence path |
+| D2 Implementation authenticity and logical correctness | fake implementation, hardcoded success, shell handler, uncalled service, wrong business rule, no persistence where persistence is required | handlers/services/jobs/components, tests proving real state/side effects, audit reports, plan closure evidence | core behavior is only route registration, static demo data, mock success, shallow status, or unproven logic |
+| D3 Product completeness and usability | product is not usable end to end, missing critical screen/action/entity setup, broken navigation/API path, unusable empty state | README quick-start, seeded data or empty-state rationale, key UI/API flows, `agent-browser` result where applicable, audit artifacts | main actor cannot complete prompt-critical task through delivered app path |
+| D4 Requirement alignment | prompt feature missing, narrowed, reassigned to wrong actor, API/UI/data/security requirement orphaned | original prompt, requirements breakdown, design/API docs, final plan, README, repo behavior | accepted requirement has no delivered surface, proof, or explicit accepted non-applicability rationale |
+| D5 Validation and testing | weak tests, missing validation/error/security tests, no meaningful E2E/platform proof, false-positive assertions | test files, coverage docs, audit outputs, Playwright/browser E2E evidence for web/fullstack, API/unit/integration evidence | tests do not exercise real behavior, prompt-critical validation is untested, or required Playwright/E2E proof is absent |
+| D6 Reproducibility and dependencies | clean environment cannot reproduce, local/private dependency, missing lockfile/manifest, manual setup hidden in docs | manifests, lockfiles, Docker/Compose files, init/seed scripts, README setup, no `.env`/`.env.example` dependency | delivered app depends on hidden local state, private services, manual installs not documented as host prerequisites, or missing runtime/test inputs |
+| D7 Documentation and verification consistency | README/docs claim behavior that repo does not deliver, wrong ports/commands/credentials, stale test claims | README, docs, scripts, manifests, route/app registration, audit reports, final plan | documentation and delivered repo disagree about commands, access, auth, seeded data, features, limitations, or verification |
+| D8 Dataset and session integrity | package lineage or task traceability is broken; prompt/session/provenance cannot anchor final metadata and docs | metadata, session/export package expectations, prompt lineage, docs/plan/report lineage, package-root manifest inputs | final package cannot prove which prompt, sessions, docs, reports, and repo state produced the delivery |
+| D9 Self-test and report integrity | audit/self-test artifacts are missing, stale, malformed, contaminated by prior-run wording, or inconsistent with fix scope | `.tmp` report set, archived failed/stale reports, fix-check lineage, coverage/README report, rerun packet lineage | required reports are missing/duplicated, stale reports remain in final outputs, report shape is invalid, or fix checks do not map to kept report issues |
+## Agent-browser functional verification
+During P8, perform one additional live functional verification with the `agent-browser` CLI when the delivered project type is `frontend`, `web`, `fullstack`, `backend`, or `server`.
+Required behavior:
+- launch the app using the documented repo command whenever feasible; P8 is the explicit exception to the normal Docker deferral rule for this live functional launch only
+- use `agent-browser` to launch or connect to the running app or API endpoint
+- interact with at least one prompt-critical path through the actual delivered interface
+- for frontend/web/fullstack: navigate the UI, perform a real user action, and observe resulting UI/API/state behavior
+- for backend/server with no UI: call the documented HTTP endpoint or API flow through `agent-browser` if it supports the target interaction; if `agent-browser` is not suitable for a pure API flow, record why and use the closest live HTTP interaction evidence instead
+- use seeded quick-start data from the README when applicable, or create data through the delivered app path
+- record command(s), URL(s), action(s), observed result, and whether the result supports D1/D2/D3/D7
+Failure handling:
+- if the app cannot be launched, D1 fails unless there is a previously accepted explicit platform limitation that makes launch impossible in the current environment
+- if `agent-browser` is unavailable, record the missing tool as a P8 blocker unless the user explicitly accepts the risk
+- if interaction shows a broken main path, D3 fails and packaging must not proceed without a fix or explicit user risk acceptance
+- do not use screenshots or visual inspection alone as proof of functionality; the interaction must exercise behavior
+Docker cadence:
+- P8 `agent-browser` live launch is an explicit, narrow exception to the normal Docker deferral policy
+- if the documented app launch command is `docker compose up --build`, use it for this P8 live interaction through the timeout helper and clean it up afterward; do not run dockerized `./run_tests.sh` here
+- if a documented non-Docker/local runtime is also available and equivalent for the delivered app, prefer the faster local runtime for P8
+- P8 live interaction does not replace P9 packaging/runtime confirmation; P9 still owns final Docker/runtime and dockerized broad-test closure when required
+## Frontend-design and Playwright reminders
+Do not let P8 replace earlier frontend obligations:
+- `frontend-design` remains mandatory whenever UI structure, usability, visual hierarchy, state, layout, or frontend quality matters
+- web/fullstack planning must still require Playwright or equivalent real in-browser E2E for critical browser flows
+- development must still run planned local Playwright/E2E/platform-equivalent checks before major development-complete claims when the accepted plan requires them
+- P8 `agent-browser` is an extra live readiness interaction, not a replacement for Playwright tests or frontend-design review
+## Decision rule
+Packaging can begin only when:
+- D1-D9 are `pass`, `not applicable`, or explicitly `risk accepted`
+- `agent-browser` verification passes or has an explicit accepted non-applicability/risk decision
+- final docs, reports, repo state, and package-root expectations agree
+- no material prompt-critical, security, runtime, report-lineage, or usability inconsistency remains

package/assets/skills/planning-gate/SKILL.md CHANGED Viewed

@@ -426,7 +426,7 @@ Reject if:
 ## C5. Test Coverage Execution Contract
 `plan.md` must explicitly state:
-- a confident overall coverage target around `90%`
+- at least `90%` unit-testable product-code coverage where measurable and at least `90%` closure of planned E2E/platform-critical flow rows
 - exact measurement path
 - confidence notes or known weak spots when relevant
 - full prompt-relevant surface inventory mapped to intended test layers
@@ -586,7 +586,7 @@ For database-bearing projects it must preserve:
 Reject if startup/test honesty is weak or if planning leaves these rules loose.
-It must also explicitly say that a separate local test harness is prepared during scaffold and used during development plus owner-side `P5`, that dockerized `./run_tests.sh` and Docker runtime are configured during development but not executed from planning through the end of `P7`, and that the owner performs the first real Docker check plus dockerized `./run_tests.sh` run in `P9`.
+It must also explicitly say that a separate local test harness is prepared during scaffold and used during development plus owner-side `P5`, that dockerized `./run_tests.sh` and Docker runtime are configured during development but not executed from planning through the end of `P7`, that `P8` may launch the app only for `agent-browser` functional verification under `p8-readiness-reconciliation`, and that the owner performs the final Docker broad check plus dockerized `./run_tests.sh` run in `P9`.
 ## C10. Plan Completeness Standard

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -21,12 +21,14 @@ Its job is to ensure the owner:
 - uses the approved phased planning artifacts in the correct order
 - sends the developer the correct planning boundary at the correct time
 - does not improvise a new planning contract when the accepted planning package already defines it
-- for the Claude workflow, prepares one owner-side comparison design draft before the Phase 1 Claude design request and then merges the best ideas in-owner before Phase 2 continues
+- prepare one owner-side comparison design draft before the Phase 1 developer design request and then merge the best ideas in-owner before Phase 2 continues; this applies to both OpenCode developer-subagent and Claude live-lane backends
 ## Core Rule
 The owner should not synthesize planning from scratch when the accepted planning package already exists.
+Use the Context7 CLI/skill for any framework, library, SDK, API, CLI, or cloud-service documentation lookup needed during planning. Resolve first with `npx ctx7@latest library <name> "<question>"`, then fetch docs with `npx ctx7@latest docs <libraryId> "<question>"`; use external web search only after Context7 is insufficient or not applicable.
 Use the phased planning documents as the primary planning payload:
 - Phase 1 design prompt
 - Phase 1 design template
@@ -48,7 +50,7 @@ Use the phased planning documents as the primary planning payload:
 2. Start Phase 1 first.
 - Use the Phase 1 design prompt and Phase 1 design template.
 - Copy the needed Phase 1 prompt text into the design request itself, and tell the developer to follow the initialized Phase 1 design template.
-- In the Claude workflow, before sending the Phase 1 design request, launch one owner-side design-prep subagent to produce a comparison design draft at `../.ai/design-prep.md`.
+- Before sending the Phase 1 design request, launch one owner-side design-prep subagent to produce a comparison design draft at `../.ai/design-prep.md`.
 - Treat that `.ai` design-prep draft as owner-side comparison input, not as an accepted contract and not as a developer-visible artifact.
 - The Phase 1 output must become the accepted design contract in `../docs/design.md`.
 - Keep `../docs/design.md` focused on repo/system design; exact runtime/bootstrap/README contracts belong in Phase 2 `plan.md`, not in the design doc.
@@ -60,8 +62,8 @@ Use the phased planning documents as the primary planning payload:
 - Before accepting Phase 1, explicitly reread the original prompt, the accepted requirements-and-clarification package, and `../docs/design.md` together; do not rely on a vague sense that the design is probably faithful.
 - Before accepting Phase 1, explicitly check that `../docs/design.md` still preserves the accepted core requirements extracted during clarification instead of only preserving the narrower ambiguity resolutions.
 - Before accepting Phase 1, explicitly check that the design identifies the core semantic path, captures prompt-critical rules, and includes a role surface matrix for any role/auth/ownership/public-route/admin/audit/export/notification surface.
-- In the Claude workflow, also compare the Claude-produced design against the owner-side `.ai` design-prep draft and merge the better ideas into `../docs/design.md` directly when they improve the accepted design.
-- If the owner patches `../docs/design.md` using those comparison ideas, inform Claude of the exact accepted design changes before requesting `../docs/api-spec.md` or `plan.md`.
+- Compare the developer-produced design against the owner-side `.ai` design-prep draft and merge the better ideas into `../docs/design.md` directly when they improve the accepted design.
+- If the owner patches `../docs/design.md` using those comparison ideas, inform the developer of the exact accepted design changes before requesting `../docs/api-spec.md` or `plan.md`.
 - Reject weak, high-level, narrowed, or incomplete design work.
 - If the design is materially sound and only small owner-side contract or wording fixes remain, patch `../docs/design.md` directly instead of bouncing the request back.
 - Do not move into the API contract request or execution planning until the design contract is accepted.
@@ -132,6 +134,7 @@ When sending planning work to the developer:
 - include the clarification content itself as detailed text before planning starts; do not assume the developer can infer it from a label or from owner-only files
 - keep the clarification brief clean and decisive; do not carry rejected clarifier guesses, duplicated entries, or stale ambiguity text into planning
 - inline the real prompt body needed for the current design or planning request; do not send `read this file and do it` instructions for owner-side packaged prompts, but it is fine to tell the developer to follow the initialized template
+- remove owner-only lifecycle, evaluator, audit, gate, lane, and workflow mechanics from the developer-facing wording; speak as the owner requesting a concrete design, plan, README, test, runtime, or implementation outcome
 - do not restate massive planning content outside the accepted planning package unless you are correcting it
 - do not ask for a generic plan when the phased templates already define the expected outputs
 - for design work, request `../docs/design.md` first and request `../docs/api-spec.md` separately afterward when applicable

package/assets/skills/scaffold-guidance/SKILL.md CHANGED Viewed

@@ -73,6 +73,8 @@ At scaffold time, do not require:
 - use `shared-contract.md` as the common runtime/test/README/scaffold contract
 - compose independent type modules such as `type-web-spa.md`, `type-api-service.md`, `type-database.md`, `type-background-jobs.md`, `type-offline-local-first.md`, `type-mobile-android.md`, and `type-desktop.md` before applying stack-specific details
 - compose independent tech modules such as `tech-frontend-vue.md`, `tech-frontend-react.md`, `tech-backend-go.md`, `tech-backend-koa.md`, `tech-backend-laravel.md`, `tech-backend-gin-templ.md`, `tech-db-mysql.md`, `tech-db-postgres.md`, `tech-db-room.md`, `tech-db-localdb.md`, and `tech-rust-workspace.md` for the actual frontend/backend/database/language pieces in the prompt
+- when a web/frontend prompt and adopted repo do not specify frontend framework, styling library, or UI component library, default to Vue 3 + Vite + TypeScript, Tailwind CSS, and shadcn/ui when compatible; do not override explicit prompt or existing-repo choices
+- before locking bootstrap commands, package setup, or integration details for Vue, Tailwind, shadcn/ui, Playwright, backend frameworks, databases, SDKs, APIs, CLIs, or cloud services, use the Context7 CLI/skill to fetch current docs; resolve first with `npx ctx7@latest library <name> "<question>"`, then fetch docs with `npx ctx7@latest docs <libraryId> "<question>"`
 - when no findings-driven full-stack profile matches, use `stack-generic.md` plus the selected type and tech modules instead of falling back to the old framework-default model
 - do not tell the developer to read those files directly if they are outside `repo/`; restate the relevant directives in the developer prompt
 - when a matching findings-driven stack profile exists, prefer following it over inventing a new scaffold contract from scratch

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -13,7 +13,7 @@ Use this skill only during `P9 Submission Packaging`.
 - keep packaging work inside the formal phase window
 - treat packaging as a minimal final-delivery contract, not a reporting exercise
 - `P9` should begin from a repo state that has already passed the quick `P8` reconciliation sweep across `repo/`, `README.md`, parent-root `../docs/`, and carried `../.tmp/` audit artifacts
-- `P5` is the ordinary phase that runs the separate local test harness; `P9` is the first real Docker/runtime and dockerized `./run_tests.sh` confirmation point
+- `P5` is the ordinary phase that runs the separate local test harness; `P8` may have launched the app only for `agent-browser` functional verification; `P9` is the final Docker/runtime and dockerized `./run_tests.sh` confirmation point
 - before closing `P9`, run the documented Docker/runtime path and dockerized `./run_tests.sh` when packaging changes, late bugfixes, or final confirmation needs make that necessary, then fix owner-side Docker/config/wrapper issues directly if needed
 - packaging does not close until runtime/test commands, parent-root docs, repo cleanup, session export, and final structure validation all agree with the delivered repo
 - do not invent extra reviewer artifacts beyond the required final structure
@@ -199,15 +199,42 @@ After those steps:
 - confirm workflow metadata marks `packaging_completed` as true
 - confirm no `submission/` directory or other obsolete packaging artifact structure remains
+## Docker environment cleanup
+After Docker confirmation and testing are complete, clean up project-specific Docker artifacts before closing packaging. This prevents leftover containers, volumes, and locally built images from accumulating on the host.
+Required cleanup (run from the repo directory containing `docker-compose.yml`):
+```bash
+docker compose down -v --rmi local
+```
+What this removes:
+- Containers created by this project's `docker-compose.yml`
+- Networks created by this project's `docker-compose.yml`
+- Volumes declared in this project's `docker-compose.yml`
+- Images **built locally** from this project's `Dockerfile` or `docker-compose.yml`
+What this preserves:
+- Base images pulled from registries (e.g., `postgres`, `golang`, `node`, `redis`, `alpine`)
+- Images tagged from Docker Hub or other registries
+- Unrelated containers, volumes, networks, or images from other projects
+Rules:
+- run this cleanup after `docker compose up --build` and `./run_tests.sh` have been confirmed working
+- do not run broad Docker pruning commands (e.g., `docker system prune`, `docker volume prune`) that could affect unrelated projects
+- do not remove base images that were pulled rather than built
+- if the user explicitly asks to keep containers running for manual testing, skip this step and record that decision
 ## Final packaging verification
 - do one final package review before declaring packaging complete
 - confirm the package is coherent as a delivered project, not just a working repo snapshot
 - confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
-- if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, treat `P9` as the first real Docker confirmation point and the first real dockerized broad-test run
+- if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, treat `P9` as the final Docker broad-confirmation point and the first real dockerized broad-test run; a prior P8 app launch for `agent-browser` does not count as this packaging confirmation
 - when those commands fail because of Docker config, wrapper, or test-harness glue issues, fix them directly in the owner session and rerun before closing packaging
 - confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
 - if packaging reveals a real defect or missing artifact, fix it before closing the phase
 - if packaging reveals parent-root doc drift or incomplete coverage/design/api reference docs, fix those docs before closing the phase
 - if packaging reveals that `plan.md` was not moved into parent-root `../docs/plan.md` when `P5` closed or that repo-local workflow rulebooks remain in `repo/`, treat that as missed earlier boundary cleanup and repair it before closing the phase
-- do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied
+- do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, Docker environment cleanup, and final structure checks are satisfied