theslopmachine 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/assets/agents/developer.md +38 -32
  2. package/assets/agents/slopmachine-claude.md +36 -25
  3. package/assets/agents/slopmachine.md +61 -45
  4. package/assets/claude/agents/developer.md +27 -10
  5. package/assets/skills/claude-worker-management/SKILL.md +4 -4
  6. package/assets/skills/developer-session-lifecycle/SKILL.md +13 -3
  7. package/assets/skills/development-guidance/SKILL.md +24 -5
  8. package/assets/skills/evaluation-triage/SKILL.md +4 -4
  9. package/assets/skills/final-evaluation-orchestration/SKILL.md +29 -3
  10. package/assets/skills/integrated-verification/SKILL.md +24 -23
  11. package/assets/skills/p8-readiness-reconciliation/SKILL.md +98 -0
  12. package/assets/skills/planning-gate/SKILL.md +2 -2
  13. package/assets/skills/planning-guidance/SKILL.md +7 -4
  14. package/assets/skills/scaffold-guidance/SKILL.md +2 -0
  15. package/assets/skills/submission-packaging/SKILL.md +30 -3
  16. package/assets/skills/verification-gates/SKILL.md +11 -7
  17. package/assets/slopmachine/clarification-faithfulness-review-prompt.md +69 -45
  18. package/assets/slopmachine/clarifier-agent-prompt.md +46 -40
  19. package/assets/slopmachine/exact-readme-template.md +38 -11
  20. package/assets/slopmachine/owner-verification-checklist.md +2 -2
  21. package/assets/slopmachine/phase-1-design-prompt.md +94 -17
  22. package/assets/slopmachine/phase-1-design-template.md +124 -21
  23. package/assets/slopmachine/phase-2-execution-planning-prompt.md +155 -87
  24. package/assets/slopmachine/phase-2-plan-template.md +169 -81
  25. package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +8 -1
  26. package/assets/slopmachine/scaffold-playbooks/tech-frontend-vue.md +2 -0
  27. package/assets/slopmachine/scaffold-playbooks/type-web-spa.md +1 -0
  28. package/assets/slopmachine/templates/AGENTS.md +18 -17
  29. package/assets/slopmachine/templates/CLAUDE.md +18 -17
  30. package/assets/slopmachine/templates/plan.md +115 -36
  31. package/package.json +9 -2
  32. package/src/constants.js +1 -0
  33. package/src/init.js +8 -0
  34. package/src/install.js +130 -0
  35. package/assets/slopmachine/utils/__pycache__/claude_live_hook.cpython-311.pyc +0 -0
  36. package/assets/slopmachine/utils/__pycache__/cleanup_delivery_artifacts.cpython-311.pyc +0 -0
  37. package/assets/slopmachine/utils/__pycache__/convert_ai_session.cpython-311.pyc +0 -0
  38. package/assets/slopmachine/utils/__pycache__/normalize_claude_session.cpython-311.pyc +0 -0
  39. package/assets/slopmachine/utils/__pycache__/strip_session_parent.cpython-311.pyc +0 -0
@@ -30,6 +30,8 @@ Use this skill before prompting the developer for the main implementation run.
30
30
  - before implementing lane-owned product code in the main checkout, either launch the planned helper lane or record the concrete blocker and revise sequencing; convenience serialization is a process failure
31
31
  - each module must verify every file it created or changed before reporting completion: the files must be real, relevant, integrated with their imports/routes/config/tests, free of placeholder/demo-only completion, and aligned with the module packet
32
32
  - each module must run all tests assigned to its owned module/files before reporting completion, plus the strongest relevant local checks for those files; if any assigned test cannot run, the module is incomplete unless it reports a concrete blocker for main-lane decision
33
+ - use the plan-row execution ledger as the P3 scoreboard: update each actionable plan row as work lands, and do not report clean development completion while any row is still `planned`, `in progress`, delegated without a receiving module, or unverified
34
+ - use safe parallelism to reduce elapsed time without reducing proof: independent module packets, test-coverage work, documentation reconciliation, and verification passes may run in helper worktrees when the plan marks them safe, but every helper result must be integrated, reread, and verified before its ledger rows can close
33
35
  - if a planned lane cannot be launched, record the exact skipped lane, blocker, and revised sequencing before falling back to serial work
34
36
  - use the rest of development to make the repo coherent enough for the owner-run local-harness gate in `P5` and the later owner-run Docker/runtime plus dockerized `./run_tests.sh` confirmation in `P9`
35
37
  - when the owner provides a bounded correction or final release-readiness checklist, treat it as a hard acceptance contract and respond against it explicitly
@@ -43,6 +45,14 @@ Use this skill before prompting the developer for the main implementation run.
43
45
  - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
44
46
  - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
45
47
  - when working inside a `plan.md` workstream, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims that workstream could affect before reporting readiness
48
+ - implement vertically, not breadth-first: build one complete user/operator flow end-to-end before starting the next
49
+ - for every form, implement template + route + handler + service + persistence + response + test together before moving to the next form
50
+ - for every page link, register and render the target page and prove it works for the intended role before claiming the source page complete
51
+ - for every background job, wire it from the entrypoint and verify it is reachable from startup before claiming it complete
52
+ - for every security control, enforce it at the correct layer (service, middleware, DB, template, runtime) before claiming it complete
53
+ - a feature is complete only when the intended actor can perform the task end-to-end through the real app path, or it is explicitly marked incomplete with a named residual risk
54
+ - do not call a module complete because files, routes, templates, or tests exist
55
+ - do not report completion counts (e.g., "12 modules done", "76 endpoints implemented", "104 tests passing") as completion evidence
46
56
  - implement real behavior, not partial scattered logic
47
57
  - do not count route registration, page shells, form shells, CRUD shells, placeholder handlers, or static demos as completion when the planned behavior is richer than that
48
58
  - do not count generated folders, package manifests, build wrappers, or smoke-only checks as feature completion when the prompt requires role workflows, real persistence, runtime policy behavior, or FE↔BE task closure
@@ -80,6 +90,9 @@ Use this skill before prompting the developer for the main implementation run.
80
90
  - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
81
91
  - when the project has database dependencies, keep `./init_db.sh` aligned with the real schema, migrations, bootstrap data, and dependency setup as implementation evolves
82
92
  - do not leave `./init_db.sh` as a scaffold placeholder once real database requirements are known
93
+ - when the app needs accounts or sample records to be useful quickly, provide deterministic idempotent seeded data through the normal bootstrap/database/runtime path and document the exact values in `README.md`
94
+ - if no seeded data is needed, make the README say exactly `No seeded data required; the app is useful from an empty state.` and make sure that claim is true
95
+ - seeded data may be local demo/test fixture data, but it must not replace real persistence, validation, authorization, side effects, or task completion with static fake-success paths
83
96
  - do not hardcode database connection values or database bootstrap values anywhere in the repo; database setup must stay driven by `./init_db.sh`
84
97
  - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
85
98
  - do not treat a module as complete just because it renders, routes, stores data, or returns 200s if the business rules, failure handling, or operator-facing closure expected by `plan.md` are still missing
@@ -93,6 +106,7 @@ Use this skill before prompting the developer for the main implementation run.
93
106
  - explain behavior changes clearly enough that the README and owner-maintained external documentation can be kept accurate
94
107
  - update `README.md` when runtime, build/preview, configuration, routes, tests, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
95
108
  - keep `README.md` aligned with the strict audit contract as the implementation matures: project type near the top, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
109
+ - keep `README.md` aligned with the quick-start seeded data contract: seeded accounts/sample records/IDs/URLs and main-flow steps, or the exact empty-state statement
96
110
  - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance for the strict README audit
97
111
  - for Android, iOS, and desktop work, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections the strict README audit expects
98
112
  - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
@@ -102,12 +116,15 @@ Use this skill before prompting the developer for the main implementation run.
102
116
  - before reporting development complete, do not leave obvious repo-coherence, local-harness, startup, or Docker wiring issues for `P5` or `P9`; `P5` should only need a rough correctness pass over the prepared local harness before evaluation
103
117
  - before reporting development complete, run one deliberate main-lane pre-`P5` reread against the original prompt plus accepted requirements-and-clarification package, accepted `plan.md`, `../docs/design.md`, accepted `../docs/api-spec.md` when applicable, `README.md`, and the integrated repo state so the owner is not first discovering obvious contract drift in `P5`
104
118
  - before reporting development complete, the main lane must perform module-by-module fan-in verification: reread every planned module row, inspect the files delivered for that module, confirm the files are real and integrated, run the module's assigned tests after merge, and record the result in the development-exit module verification matrix in `plan.md`
105
- - before reporting development complete, run the full non-Docker local test suite available for development in addition to module-targeted tests; Docker and dockerized `./run_tests.sh` remain deferred to `P9`
119
+ - before reporting development complete, run the full non-Docker local test suite available for development in addition to module-targeted tests; for web/fullstack/frontend-bearing projects, also run the planned local E2E or platform-equivalent checks when the accepted plan requires them; Docker and dockerized `./run_tests.sh` remain deferred to `P9`
120
+ - before reporting development complete, close the plan-row execution ledger and coverage closure ledger: every actionable row must be complete/not-applicable/risk-accepted, API true no-mock HTTP coverage must be 100% for documented prompt-relevant endpoints unless per-endpoint exceptions are recorded, unit-testable product-code coverage must be at least 90% where measurable, and planned E2E/platform-critical flow coverage must be at least 90% closed or explicitly risk-accepted
106
121
  - before reporting development complete, fill or update the planned-but-missing proof ledger for core semantic path, prompt-critical rules, role surface matrix, runtime lifecycle behavior, security fail-closed expectations, README command honesty, and behavioral coverage proof
107
122
  - before reporting development complete, prove the core semantic path with the exact input/setup, user/API path, expected state/artifact, and failure behavior named in `plan.md`, or report the exact residual risk rather than claiming readiness
108
123
  - before reporting development complete, for lifecycle-sensitive behavior, include entrypoint-level proof that the scheduler/worker/timed/export/import/polling/startup/cleanup path is wired and mutates state or artifacts as planned
109
124
  - before reporting development complete, close the common `P5` failure classes inside development rather than leaving them for owner rediscovery: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup or wrapper dishonesty, and partial user-facing or admin-facing flow closure
110
125
  - before reporting development complete, self-check the integrated repo against the release-readiness requirements already absorbed into `plan.md` `Delivery Review Requirements`; do not leave prompt-fit, static-reviewability, logging/validation, security-boundary, or coverage-structure defects unresolved
126
+ - before reporting development complete, self-check static architecture credibility: README/docs/scripts/routes/config/examples/manifests/env examples agree, pages/routes/app shell are connected, state/data flow is traceable, service/adaptor/mock/storage boundaries are clear, redundant/unnecessary files are removed or justified, and core logic is not excessively piled into one file
127
+ - for pure frontend `web` projects with no backend service, local/mock/sample data is acceptable only when README/UI boundaries are honest; do not imply real backend integration, and do not use fake-success paths to hide missing validation, failure, or state handling
111
128
  - before reporting development complete, self-check the no-orphan requirement ledger and every module requirement closure checklist; no accepted requirement, clarification, API route, actor path, data object, security boundary, report/export/notification, or documentation obligation may remain unchecked, vaguely delegated, or proven only by broad smoke coverage
112
129
  - before reporting development complete, verify that assertion-level module proof rows have corresponding implementation and test evidence; if a row says `Given/When/Then`, the delivered proof must exercise that condition and observable outcome, not only route registration, rendering, or a mocked success response
113
130
  - before reporting development complete, when backend/fullstack APIs exist, make sure endpoint inventory, `METHOD + PATH` mapping, and true no-mock HTTP coverage expectations in `../docs/test-coverage.md` and the repo are genuinely aligned rather than only implied
@@ -141,7 +158,7 @@ Use this skill before prompting the developer for the main implementation run.
141
158
  - even though Docker execution and dockerized `./run_tests.sh` are deferred until the owner-run confirmation in `P9`, build that Docker/runtime path as if it will be exercised by a cold reviewer on the first try: no hidden setup, no manual export steps, no interactive prompts, real readiness gating where practical, deterministic cleanup, and useful failure output
142
159
  - do not add runtime/test scripts, Compose services, or Docker entrypoints that assume hidden global setup for the final delivered path; keep both the local harness and the Docker/runtime path explicit and repo-controlled before the current `plan.md` workstream is considered complete
143
160
  - do not run Docker or dockerized `./run_tests.sh` during ordinary implementation work; use targeted local tests during iteration and run the prepared local harness before material readiness claims, while `P7` still remains non-Docker
144
- - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary implementation work
161
+ - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests during ordinary module iteration; when all modules are complete and the plan includes Playwright/browser E2E or another local platform-equivalent check, run that planned local E2E/platform check before reporting P3 development complete
145
162
  - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
146
163
  - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary implementation work rather than broad checkpoint commands
147
164
  - when the current workstream materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
@@ -150,17 +167,19 @@ Use this skill before prompting the developer for the main implementation run.
150
167
  - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
151
168
  - when backend logging matters, keep request or route outcomes, exceptions, and background failure logging on the shared structured logging path with redaction intact
152
169
  - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
153
- - keep the test surface moving toward the planned confident roughly `90%` overall coverage goal with real tests where they matter, and do not defer obvious coverage debt just because evaluation is later
154
- - for backend or fullstack APIs, keep the work moving toward the `plan.md` target of 100 percent true no-mock HTTP coverage for the resolved prompt-relevant `METHOD + PATH` surfaces rather than leaving endpoint coverage as optional follow-up
170
+ - keep the test surface moving toward the plan's hard coverage floors with real tests where they matter; do not defer obvious coverage debt just because evaluation is later
171
+ - for backend or fullstack APIs, treat the `plan.md` target of 100 percent true no-mock HTTP coverage for resolved prompt-relevant `METHOD + PATH` surfaces as blocking for clean development completion unless narrow endpoint-level exceptions are recorded with compensating proof
155
172
  - for backend or fullstack APIs, keep `../docs/test-coverage.md` moving toward an endpoint inventory plus API test mapping table, not just a generic risk matrix
156
173
  - in each development follow-up or completion reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
157
174
  - when the owner names specific expected outcomes for the current workstream or gate, tie the reported verification and changed files back to those expected outcomes explicitly
158
- - before reporting overall development complete, run the prepared local test harness and report the exact command plus concrete result; this should normally be the current stack's real local suite rather than an invented placeholder wrapper
175
+ - before reporting overall development complete, run the prepared local test harness and report the exact command plus concrete result; this should normally be the current stack's real local suite rather than an invented placeholder wrapper; also run the planned local E2E/platform-equivalent command when the accepted plan includes one, and report exact command plus result
159
176
  - before reporting overall development complete, report the module-by-module main-lane verification results, including files reviewed, tests run, FE↔BE/API wiring status, and any integration defects fixed during fan-in
160
177
  - before reporting development complete, compare planned optional helper branches to actually used helper branches; report launched transcript/session references, branch/worktree paths, commits, verification, skipped optional branches, and the reason for any skipped helper branch
161
178
  - before reporting development complete, require the main lane to consume every module completion packet: inspect ownership, reject incomplete packets, wire shared surfaces, rerun targeted module tests, and record integrated verification evidence before moving to the next module
162
179
  - before reporting development complete, use the P3 Development Completion Report shape: module packet closure, optional helper branches used, shared integration work, full-suite/unit/integration/E2E command results, cross-module integration proof, and remaining risks
180
+ - before reporting development complete, include Plan Section Closure Evidence: cite each major accepted `plan.md` section or matrix row that is claimed complete, the concrete repo evidence, the test or verification result, and any residual risk or blocker
163
181
  - before reporting development complete, include a no-orphan closure summary in the P3 Development Completion Report: total requirement rows, closed rows, delegated rows with receiving module, risk-accepted rows, and any rows not closed; any non-zero unclosed row count must block a clean completion claim
182
+ - before reporting development complete, include coverage closure in the P3 Development Completion Report: API endpoint count and true no-mock HTTP count, unit coverage percentage or substitute ledger result, E2E/platform-critical flow row count and closed count, exceptions, and exact commands/results
164
183
  - in the development-complete reply, explicitly report the core semantic path proof, prompt-critical rule proof status, role surface proof status if applicable, lifecycle proof status if applicable, and any accepted or unresolved residual risks
165
184
  - in the development-complete reply for fullstack or backend-backed frontend projects, explicitly report FE↔BE integration proof status, including any frontend surface not backed by real backend behavior and any backend feature not exposed through required frontend UI
166
185
  - when optional helper branches were part of the planned work, report which helpers actually launched, which were skipped, and the exact reason or revised sequencing for any skipped helper
@@ -40,7 +40,7 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
40
40
  - use its exact issue list extracted from the saved kept report file as the scope of that exact audit session's `bugfix-N` lane
41
41
  - save the report as `../.tmp/audit_report-<N>.md`
42
42
  - once that report is kept, treat its exact full issue list as the authoritative fix-check scope for the rest of that audit session; later remediation may narrow to the unresolved subset from that kept scope
43
- - send the full kept-report issue set to the developer in direct human review language, with explicit owner analysis of the failing surfaces and the expected fixes
43
+ - translate the full kept-report issue set into direct owner instructions before messaging the developer: state what is broken, why it matters, affected files/surfaces, expected fixes, and verification to run
44
44
 
45
45
  ### `pass`
46
46
 
@@ -50,13 +50,13 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
50
50
 
51
51
  ## Issue handoff standard
52
52
 
53
- - send the developer the exact full scoped issues from the current report in direct human review language and explicit, detailed corrective form
54
- - do not tell the developer to read the audit report directly
53
+ - send the developer only the owner-written corrective brief, in direct human language, with explicit expected behavior and verification
54
+ - do not tell the developer to read the audit, evaluation, workflow, phase, lane, or report artifact directly
55
55
  - phrase the request as your own review, for example `fix these issues I found`, rather than attributing the list to a workflow event or report file
56
56
  - require the developer to address every currently unresolved item from the kept-report scope on later loop passes until the fix-check confirms the whole kept report scope is fixed
57
57
  - require the developer to report the exact verification commands that were run and the concrete results they produced
58
58
  - require the developer to provide an AI self-test report or concise self-test summary that can be attached or mentioned in the evaluator follow-up
59
- - if the developer claims an issue is invalid or already fixed, require a concrete justification against the audit output instead of silently omitting it
59
+ - if the developer claims an issue is invalid or already fixed, require a concrete justification against your owner-written issue summary and the underlying evidence instead of silently omitting it
60
60
  - do not reduce the handoff to a small issue subset or a thin summary; the developer-facing prompt should contain the full issue set for the current scope
61
61
  - do not reduce the handoff to a small issue subset, top issue cluster list, or thin summary; the developer-facing prompt should contain the full issue set extracted from the saved report file for the current scope
62
62
  - for every issue, analyze and state as clearly as possible:
@@ -94,10 +94,22 @@ Reject and archive the report if shape validation fails. A fix-check report is t
94
94
  - session 2 for audit session `2`
95
95
  - session 3 for the final coverage/README audit
96
96
 
97
+ ## Evaluator session hygiene
98
+
99
+ The same evaluator session is reused for all reruns within an audit session. This is intentional and required. The fix for contamination is not switching sessions; it is stronger rerun instructions that prevent the evaluator from referencing prior runs.
100
+
101
+ Rules:
102
+ - keep using the same evaluator session for all reruns, fail regenerations, and fix-checks within an audit session
103
+ - do not start a fresh evaluator session for rerun contamination; re-send with stronger anti-contamination instructions instead
104
+ - after receiving a rerun report, reject it if it contains any prior-run framing such as `previously`, `remaining`, `still remaining`, `fixed from the prior run`, `rerun`, `regenerated`, `again`, `previous inspection`, or similar
105
+ - archive contaminated reports under `../.ai/archive/` with a `stale-reference-contamination` reason
106
+ - the standalone-audit expectation applies to all ordinary audits and coverage/README reruns; fix-check loops are the only narrow exception
107
+
97
108
  ## Report root and naming
98
109
 
99
110
  - all `P7` audit and fix-check reports live under parent-root `../.tmp/`
100
111
  - when a report must be discarded or replaced during `P7`, move it out of `../.tmp/` into parent-root `../.ai/archive/` with a unique name instead of deleting it; archived copies are trace artifacts only and are not part of the submission contract
112
+ - maintain a lightweight parent-root `../.ai/p7-issue-lineage.md` register during `P7`; each kept report issue or recommendation that drives remediation should record audit number, source report path, severity/verdict context, origin classification, remediation lane/turn, closure evidence, and any reusable workflow lesson
101
113
  - do not use the older cycle-directory report-root model
102
114
  - audit session `1` owns:
103
115
  - `../.tmp/audit_report-1.md`
@@ -145,7 +157,21 @@ When rerunning an audit inside the same evaluator session after a remediation pa
145
157
  - let that utility append this exact short footer beneath the full prompt copy for the rerun:
146
158
 
147
159
  ```text
148
- Check the entire current project again against the full prompt above and regenerate the complete report from scratch. Save the report to the same required path. Make the report read as a fresh standalone audit of the current repo state and do not mention or imply any previous run, rerun, regeneration, prior inspection, earlier fixes, or earlier remaining items.
160
+ You are performing a fresh standalone audit of the current project state. This is not a rerun, regeneration, or follow-up to any prior audit. You have no memory of any previous inspection of this project.
161
+
162
+ Rules:
163
+ - Audit the current repo state as if for the first time
164
+ - Do not mention, imply, or reference any previous run, prior inspection, earlier findings, fixes made, items remaining from before, regeneration, or rerun
165
+ - Do not use framing such as "previously", "remaining", "still", "fixed from the prior run", "again", "as before", "continues to", or similar prior-run language
166
+ - Do not compare current state to any imagined prior state
167
+ - Generate the complete report from scratch based only on the prompt above and the current repo files
168
+ - Save the report to a new file; do not overwrite or append to any existing audit file
169
+ - Report only the current verdict and only issues or recommendations that exist in the current repo state
170
+ - If an issue no longer exists, do not mention it at all; do not state that it was fixed or resolved
171
+ - If a new issue exists, report it as a fresh finding with no reference to prior state
172
+ - Use only present-tense, current-state language; avoid temporal markers that suggest sequence or history
173
+
174
+ Violation of these rules makes the report invalid and it will be rejected.
149
175
  ```
150
176
 
151
177
  - use this footer for same-session fail regenerations and for same-session final coverage/README reruns after fixes land
@@ -215,7 +241,7 @@ Inside a kept audit session after `audit_report-<N>.md` exists:
215
241
  - treat the exact issue list extracted from the saved `audit_report-<N>.md` file as the scope of the loop for `partial pass`
216
242
  - treat every reported issue and recommendation found in the saved kept report file as the scope of the loop for `pass`
217
243
  - send that full scoped issue set to `bugfix-<N>` in direct human review language with owner analysis of the exact surfaces and expected fixes
218
- - do not tell the developer to read the audit report file directly
244
+ - do not tell the developer to read audit, evaluation, workflow, phase, lane, or report artifacts directly; translate findings into an owner-written corrective brief with concrete engineering instructions
219
245
  - phrase the fix request as your own review, for example `fix these issues I found`, rather than as a report handoff
220
246
  - do not ask the evaluator for `top issues`, `major issue clusters`, `summary issues`, or any other reduced remediation scope when the full report file already exists; the owner must read the whole file and extract the whole issue set instead
221
247
  - require the developer to fix the scoped issue set and report:
@@ -291,7 +317,7 @@ If a report appears stale, contradicted by current files, or degraded:
291
317
  - audit session `2` is complete
292
318
  - the post-audit coverage/README audit has run as the last subphase of `P7`
293
319
  - a clean `pass` or `partial pass` verdict alone does not end an audit session; the corresponding `bugfix-<N>` issue/recommendation loop must also close unless the kept `pass` report had no reported items at all
294
- - after the second audit session completes, run the coverage/README audit; only when the fresh report is a full standalone pass-level report and the owner has extracted no remaining issue/recommendation set from the saved file may the workflow move to `P8 Final Readiness Decision`; `P8` then performs one fast reconciliation sweep across the repo, parent-root docs, and carried audit artifacts before packaging begins
320
+ - after the second audit session completes, run the coverage/README audit; only when the fresh report is a full standalone pass-level report and the owner has extracted no remaining issue/recommendation set from the saved file may the workflow move to `P8 Final Readiness Decision`; `P8` then loads `p8-readiness-reconciliation` and follows that skill before packaging begins
295
321
  - until that exit target is actually met, never stop merely because one audit attempt, one remediation turn, or one fix-check loop pass has finished
296
322
 
297
323
  ## Boundaries
@@ -15,11 +15,12 @@ It is a minimal local-verification-and-coherence gate, not a perfection gate.
15
15
 
16
16
  It starts immediately after accepted P3 architecture execution ends. There is no separate post-development gate before this phase.
17
17
 
18
- During `P5`, the working execution plan is still the repo-local `plan.md`. Before surfacing the proceed-to-evaluation pause, preserve its final truthful contents in parent-root `../docs/plan.md` and remove the repo-local copy, because the execution plan has become reference documentation at that point.
18
+ During `P5`, the working execution plan is still the repo-local `plan.md`. Before entering `P7`, preserve its final truthful contents in parent-root `../docs/plan.md` and remove the repo-local copy, because the execution plan has become reference documentation at that point.
19
19
 
20
20
  The default goal is to get to an evaluation-ready checkpoint quickly.
21
- Run the prepared local test harness here as the owner-side integration gate, run the required internal evaluation loop below, fix only narrow owner-side docs, README, config, wrapper, or light script glue directly, reroute any real code or actual test-file work back to the developer, and stop for a user proceed-to-evaluation check as soon as:
21
+ Run the prepared local test harness here as the owner-side integration gate, run the planned local E2E/platform-equivalent checks when the accepted plan requires them, run the required internal evaluation loop below, fix only narrow owner-side docs, README, config, wrapper, or light script glue directly, route any real code or actual test-file work to the active P5 bugfix lane, and enter `P7` as soon as:
22
22
  - the local test harness is green
23
+ - the planned local E2E/platform-equivalent checks are green when the accepted plan requires them
23
24
  - the repo roughly aligns with `plan.md` and accepted `../docs/design.md`
24
25
  - the developer-side development-exit module verification matrix in `plan.md` is filled for every prompt-relevant module and shows main-lane file integration plus assigned module tests completed
25
26
  - the no-orphan requirement ledger and module requirement closure checklists in `plan.md` have no unchecked, vaguely delegated, or generic-smoke-proven requirement rows
@@ -47,8 +48,8 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
47
48
  - keep that pass focused on the minimal `P5` gate: green local verification and rough correctness/coherence against `plan.md` plus accepted `../docs/design.md`
48
49
  - include a planned-but-missing challenge table in the `P5` evidence: `planned claim`, `proof`, `gap`, `decision`
49
50
  - explicitly inspect the `Core Semantic Path Proof`, `Prompt-Critical Rule Matrix`, `Role Surface Matrix`, and `Runtime Lifecycle Checklist` sections of `plan.md` when they apply
50
- - explicitly inspect the development-exit module verification matrix in `plan.md`; if it is missing, stale, or does not show that every module's files were inspected and assigned tests were run after fan-in, route the gap back before completing `P5`
51
- - explicitly inspect the no-orphan requirement ledger and module requirement closure checklists; if any accepted requirement, API route, actor path, data object, security boundary, or report/export/notification path is unmapped, vaguely mapped, or marked complete without assertion-level proof, route the gap back before completing `P5`
51
+ - explicitly inspect the development-exit module verification matrix in `plan.md`; if it is missing, stale, or does not show that every module's files were inspected and assigned tests were run after fan-in, route the gap to the active P5 bugfix lane before completing `P5`
52
+ - explicitly inspect the no-orphan requirement ledger and module requirement closure checklists; if any accepted requirement, API route, actor path, data object, security boundary, or report/export/notification path is unmapped, vaguely mapped, or marked complete without assertion-level proof, route the gap to the active P5 bugfix lane before completing `P5`
52
53
  - explicitly inspect a sample of assertion-level proof rows across modules, including at least the core semantic path, one security/authorization row when applicable, one failure/edge row when applicable, and one FE↔BE/API/data row when applicable; if these rows do not correspond to real test or verification evidence, reject the development completion claim
53
54
  - do not accept shallow status/enqueue/helper-only evidence for the core semantic path when the plan requires a state transition, persisted artifact, frontend-to-backend behavior, export/render output, or failure-mode proof
54
55
  - for lifecycle-sensitive apps, require at least one entrypoint-level proof of scheduler/worker/timed/export/import/polling/startup/cleanup behavior rather than only unit tests of helpers
@@ -57,7 +58,7 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
57
58
  - for coverage-sensitive apps, separate route/API/surface inventory completeness from behavioral proof sufficiency rather than using percentages or route counts alone
58
59
  - treat `plan.md` plus accepted `../docs/design.md` as the primary owner comparison baseline in `P5`; use the raw prompt only to catch obvious major drift rather than to run a fresh literal requirement nitpick pass here
59
60
  - do not let `P5` compensate for planning gaps by manually reinterpreting requirements; if a prompt-required behavior is missing from the accepted no-orphan ledger or module packets, classify it as a planning/clarification miss, route the repo fix normally, and capture the workflow-learning classification
60
- - turn the collected findings into one complete analyzed correction brief after the full review sweep, write it in direct human review language, and only then route fixes back when a real coding reroute is truly needed
61
+ - turn the collected findings into one complete analyzed correction brief after the full review sweep, write it in direct human review language, and only then send fixes to the P5 bugfix lane when real coding work is truly needed
61
62
  - every owner correction brief in `P5` must include, for each issue or grouped issue set:
62
63
  - what is wrong
63
64
  - why it matters
@@ -72,25 +73,25 @@ Do not burn time on anything beyond that minimal gate unless a failure directly
72
73
  - what verification should prove it is fixed
73
74
  - do not classify issues by where they surfaced; classify them by the earliest artifact that should have made the correct behavior unavoidable
74
75
  - if `plan.md` or accepted `../docs/design.md` already names the expected behavior, treat the fix as a developer execution problem rather than reopening clarification or planning
75
- - if a prompt-required behavior is absent or materially vague in the accepted plan/design, capture it as a planning or clarification weakness while still routing the repo fix normally
76
+ - if a prompt-required behavior is absent or materially vague in the accepted plan/design, capture it as a planning or clarification weakness while still routing the repo fix to the active P5 bugfix lane when code or tests must change
76
77
  - when a failure class is known, isolate the likely affected module, route family, helper family, or shared flow and direct narrow proof first
77
78
  - fix small owner-side issues directly only when they are low-risk and clearly not core product implementation, such as documentation cleanup, design or `plan.md` tightening, `README.md` sync, deferred Docker configuration, wrapper/config glue, local-harness glue, or light `./run_tests.sh` script cleanup
78
- - if the needed fix touches real product code, application logic, or actual test files or suites, route it back to the developer lane instead of patching it in-owner
79
- - route real coding fixes back to the developer lane for runtime failures, broken tests, or material `plan.md` coherence gaps that the owner should not patch directly
79
+ - if the needed fix touches real product code, application logic, or actual test files or suites, send it to the active P5 bugfix lane instead of patching it in-owner or reopening `develop-*`
80
+ - route real coding fixes to the P5 bugfix lane for runtime failures, broken tests, or material `plan.md` coherence gaps that the owner should not patch directly
80
81
  - start by comparing `plan.md` against the actual repo state and the developer's completion claim; missing or weak plan obligations are first-class `P5` work, not a reason to insert another pre-`P5` gate
81
82
  - require the developer to report the exact rerun commands and concrete results for any requested fixes
82
83
  - verify behavior against `plan.md`, accepted `../docs/design.md`, the documented local verification path, and rough repo coherence
83
84
  - verify that the delivered local test harness matches the documented development/review path in `README.md` or the accepted plan, and use that path here in `P5`
84
85
  - check for obvious correctness failures against the accepted design and plan, such as shell/demo/placeholder delivery where the plan required fuller closure, broken role/flow wiring, or major missing behavior that makes the repo clearly not ready for evaluation
85
- - run the prepared local test harness as the first real integrated gate for this phase
86
+ - run the prepared local test harness as the first real integrated gate for this phase; when the accepted plan includes local E2E/platform-equivalent checks, run those before completing `P5` rather than deferring first execution to evaluation
86
87
  - if that path fails because of wrapper, README, docs, or local-harness glue issues, fix them directly in the owner session and rerun quickly
87
- - if making the gate pass would require editing actual test files, test suites, or product code, send that work back to the developer lane instead of patching it in-owner
88
- - if those commands expose failures that should not be owner-patched, include them in the same single complete analyzed correction brief and route that brief back to the developer lane
89
- - if the local test/coherence gate is green, run the internal `P5` evaluation loop before asking whether to proceed to formal evaluation
90
- - if the local test harness is green, the repo state roughly aligns with `plan.md` plus accepted `../docs/design.md` without major correctness/runtime breakage, and the internal `P5` evaluation loop has no unresolved Blocker/High findings, that is sufficient to complete `P5` and ask whether to proceed to formal evaluation
91
- - this sufficiency rule does not override no-orphan evidence: if accepted requirement rows or module closure checklist rows are visibly unchecked, vaguely delegated, or proven only by shell/smoke evidence, `P5` must route back rather than proceed
92
- - if the core semantic path remains unproven, stop and route proof/fix work back unless the user explicitly accepts the named residual risk
93
- - before asking whether to proceed, move the final truthful repo-local `plan.md` into parent-root `../docs/plan.md`, remove the repo-local copy, and then record in Beads that `P5` evidence is satisfied and that the workflow is waiting at the evaluation boundary; do not mutate into `P7` until the user explicitly says to continue
88
+ - if making the gate pass would require editing actual test files, test suites, or product code, send that work to the active P5 bugfix lane instead of patching it in-owner
89
+ - if those commands expose failures that should not be owner-patched, include them in the same single complete analyzed correction brief and route that brief to the P5 bugfix lane
90
+ - if the local test/coherence gate is green, and any planned local E2E/platform-equivalent checks are green when required, run the internal `P5` evaluation loop before entering formal evaluation
91
+ - if the local test harness is green, any planned local E2E/platform-equivalent checks are green when required, the repo state roughly aligns with `plan.md` plus accepted `../docs/design.md` without major correctness/runtime breakage, and the internal `P5` evaluation loop has no unresolved Blocker/High findings, that is sufficient to complete `P5` and enter `P7`
92
+ - this sufficiency rule does not override no-orphan evidence: if accepted requirement rows or module closure checklist rows are visibly unchecked, vaguely delegated, or proven only by shell/smoke evidence, `P5` must route fixes to the active bugfix lane rather than proceed
93
+ - if the core semantic path remains unproven, stop and route proof/fix work to the active bugfix lane unless the user explicitly accepts the named residual risk
94
+ - before entering `P7`, move the final truthful repo-local `plan.md` into parent-root `../docs/plan.md`, remove the repo-local copy, and then record in Beads that `P5` evidence is satisfied
94
95
 
95
96
  ## Required Internal P5 Evaluation Loop
96
97
 
@@ -120,9 +121,9 @@ After round 5:
120
121
  - likely fix location and implementation guidance
121
122
  - potential regressions to check
122
123
  - whether it is `plan-required implementation miss`, `planning miss`, `clarification miss`, `planning mechanics miss`, or `owner review miss`
123
- - write the developer remediation prompt to `../.ai/p5-evaluation/developer-remediation-brief.md`
124
- - send the full remediation brief to the active developer lane
125
- - require the developer to fix all non-risk-accepted Blocker/High findings and report exact verification commands/results
124
+ - write the bugfix remediation prompt to `../.ai/p5-evaluation/bugfix-remediation-brief.md`
125
+ - open or reuse the active P5 bugfix lane, normally `bugfix-1`, and send the full remediation brief there
126
+ - require the bugfix lane to fix all non-risk-accepted Blocker/High findings and report exact verification commands/results
126
127
  - verify the fixes against the consolidated findings before closing `P5`
127
128
 
128
129
  Do not run the final `P7` two-audit-session process inside this loop.
@@ -132,14 +133,14 @@ Do not count `P5` internal evaluator reports as formal acceptance artifacts.
132
133
  ## Rules
133
134
 
134
135
  - keep this integrated verification and hardening step practical, fast, and release-oriented rather than perfectionist
135
- - use the opening part of this phase to compare the repo against what `plan.md` and the accepted design claim, check that those claims are broadly true enough, then run the required internal evaluation loop before the proceed-to-evaluation check
136
+ - use the opening part of this phase to compare the repo against what `plan.md` and the accepted design claim, check that those claims are broadly true enough, then run the required internal evaluation loop before entering `P7`
136
137
  - do not turn this phase into one-issue-at-a-time churn; do not remediate between internal evaluator rounds; after the owner broad pass and five-round issue-discovery loop, either proceed to the evaluation boundary, fix the small owner-side churn directly, or send one complete analyzed correction prompt listing all issues found if the repo is not yet evaluation-ready
137
138
  - do not create a lingering development-completion to `P5` mini-loop over anything that does not directly block green local verification or rough correctness/coherence against `plan.md` and accepted design
138
- - do not turn `P5` into an open-ended churn loop; run the owner-side local test harness, run exactly five internal evaluator issue-discovery rounds in one session without inter-round remediation, fix only narrow owner-side glue quickly after consolidation, reroute any real code or actual test-file work once, verify the consolidated Blocker/High fixes, and then stop for the proceed-to-evaluation check
139
+ - do not turn `P5` into an open-ended churn loop; run the owner-side local test harness, run planned local E2E/platform-equivalent checks when required, run exactly five internal evaluator issue-discovery rounds in one session without inter-round remediation, fix only narrow owner-side glue quickly after consolidation, route any real code or actual test-file work once to the active P5 bugfix lane, verify the consolidated Blocker/High fixes, and then enter `P7`
139
140
  - every owner pass in `P5` should be a full design/API/plan/README/repo/evidence sweep rather than a targeted-section recheck
140
141
  - cap normal owner-side `P5` coherence iteration to 3 owner passes outside the required internal evaluation loop: one opening sweep plus up to two follow-up full-sweep passes after the consolidated correction list or owner-side glue fixes; if the repo is still not coherent enough after that, classify the remaining gap clearly instead of drifting into open-ended `P5` churn
141
- - when classifying remaining gaps, separate repo-remediation scope from workflow-learning scope: fixing the repo may be a developer task even when the retrospective lesson belongs to planning or clarification
142
- - when a `P5` correction list contains independent items, route those safe bundles back for parallel developer work where practical and require per-bundle verification before fan-in
142
+ - when classifying remaining gaps, separate repo-remediation scope from workflow-learning scope: fixing the repo may be a bugfix-lane task even when the retrospective lesson belongs to planning or clarification
143
+ - when a `P5` correction list contains independent items, route those safe bundles to the active bugfix lane for parallel helper work where practical and require per-bundle verification before fan-in
143
144
  - do not rerun the whole heavy suite after every single failure by default
144
145
  - if a broad rerun is not answering a new question, stop and go back to narrow proof
145
146
  - do not use this integrated verification and hardening step for broad new feature work
@@ -0,0 +1,98 @@
1
+ ---
2
+ name: p8-readiness-reconciliation
3
+ description: P8 final readiness reconciliation, D1-D9 developer-originated major-issue sweep, and agent-browser functional verification before packaging.
4
+ ---
5
+
6
+ # P8 Readiness Reconciliation
7
+
8
+ Use this skill only during `P8 Final Readiness Decision`.
9
+
10
+ This skill is the single source of truth for the Developer D1-D9 major-issue categories used at P8. Do not duplicate these definitions in clarification, design, planning, development, or P5 assets. Earlier phases may require ordinary engineering evidence such as startup paths, Playwright E2E planning, validation tests, and README consistency, but the named D1-D9 sweep belongs here.
11
+
12
+ ## Required inputs
13
+
14
+ Before deciding readiness, reread and reconcile:
15
+
16
+ - delivered repo
17
+ - repo-root `README.md`
18
+ - parent-root `../docs/`
19
+ - accepted final plan copy in `../docs/plan.md` when present
20
+ - carried `../.tmp/` audit artifacts
21
+ - archived stale/fail report lineage under `../.ai/archive/` when present
22
+ - package-root expectations for `P9`
23
+ - residual risks accepted earlier
24
+
25
+ ## Mandatory P8 output
26
+
27
+ Record a readiness reconciliation note before entering packaging. It must include:
28
+
29
+ - files/docs/artifacts checked
30
+ - kept reports checked
31
+ - archived/stale report lineage reviewed
32
+ - package-root expectations checked
33
+ - `agent-browser` verification result when applicable
34
+ - D1-D9 table with `pass`, `fail`, `not applicable`, or `risk accepted`
35
+ - concrete evidence for each non-`not applicable` D1-D9 row
36
+ - final residual gaps and packaging decision
37
+
38
+ ## D1-D9 developer-originated major-issue sweep
39
+
40
+ Use these exact categories and checks.
41
+
42
+ | Category | Failure class | P8 evidence to check | Fail condition |
43
+ |---|---|---|---|
44
+ | D1 Execution and startup reliability | app cannot start, wrong command, broken path, hidden working-directory assumption, missing config/bootstrap | README startup command, actual repo scripts, app entrypoint, service health/readiness, runtime notes, local/P9 handoff expectations | documented startup path is missing, contradictory, non-repo-controlled, or has no credible evidence path |
45
+ | D2 Implementation authenticity and logical correctness | fake implementation, hardcoded success, shell handler, uncalled service, wrong business rule, no persistence where persistence is required | handlers/services/jobs/components, tests proving real state/side effects, audit reports, plan closure evidence | core behavior is only route registration, static demo data, mock success, shallow status, or unproven logic |
46
+ | D3 Product completeness and usability | product is not usable end to end, missing critical screen/action/entity setup, broken navigation/API path, unusable empty state | README quick-start, seeded data or empty-state rationale, key UI/API flows, `agent-browser` result where applicable, audit artifacts | main actor cannot complete prompt-critical task through delivered app path |
47
+ | D4 Requirement alignment | prompt feature missing, narrowed, reassigned to wrong actor, API/UI/data/security requirement orphaned | original prompt, requirements breakdown, design/API docs, final plan, README, repo behavior | accepted requirement has no delivered surface, proof, or explicit accepted non-applicability rationale |
48
+ | D5 Validation and testing | weak tests, missing validation/error/security tests, no meaningful E2E/platform proof, false-positive assertions | test files, coverage docs, audit outputs, Playwright/browser E2E evidence for web/fullstack, API/unit/integration evidence | tests do not exercise real behavior, prompt-critical validation is untested, or required Playwright/E2E proof is absent |
49
+ | D6 Reproducibility and dependencies | clean environment cannot reproduce, local/private dependency, missing lockfile/manifest, manual setup hidden in docs | manifests, lockfiles, Docker/Compose files, init/seed scripts, README setup, no `.env`/`.env.example` dependency | delivered app depends on hidden local state, private services, manual installs not documented as host prerequisites, or missing runtime/test inputs |
50
+ | D7 Documentation and verification consistency | README/docs claim behavior that repo does not deliver, wrong ports/commands/credentials, stale test claims | README, docs, scripts, manifests, route/app registration, audit reports, final plan | documentation and delivered repo disagree about commands, access, auth, seeded data, features, limitations, or verification |
51
+ | D8 Dataset and session integrity | package lineage or task traceability is broken; prompt/session/provenance cannot anchor final metadata and docs | metadata, session/export package expectations, prompt lineage, docs/plan/report lineage, package-root manifest inputs | final package cannot prove which prompt, sessions, docs, reports, and repo state produced the delivery |
52
+ | D9 Self-test and report integrity | audit/self-test artifacts are missing, stale, malformed, contaminated by prior-run wording, or inconsistent with fix scope | `.tmp` report set, archived failed/stale reports, fix-check lineage, coverage/README report, rerun packet lineage | required reports are missing/duplicated, stale reports remain in final outputs, report shape is invalid, or fix checks do not map to kept report issues |
53
+
54
+ ## Agent-browser functional verification
55
+
56
+ During P8, perform one additional live functional verification with the `agent-browser` CLI when the delivered project type is `frontend`, `web`, `fullstack`, `backend`, or `server`.
57
+
58
+ Required behavior:
59
+
60
+ - launch the app using the documented repo command whenever feasible; P8 is the explicit exception to the normal Docker deferral rule for this live functional launch only
61
+ - use `agent-browser` to launch or connect to the running app or API endpoint
62
+ - interact with at least one prompt-critical path through the actual delivered interface
63
+ - for frontend/web/fullstack: navigate the UI, perform a real user action, and observe resulting UI/API/state behavior
64
+ - for backend/server with no UI: call the documented HTTP endpoint or API flow through `agent-browser` if it supports the target interaction; if `agent-browser` is not suitable for a pure API flow, record why and use the closest live HTTP interaction evidence instead
65
+ - use seeded quick-start data from the README when applicable, or create data through the delivered app path
66
+ - record command(s), URL(s), action(s), observed result, and whether the result supports D1/D2/D3/D7
67
+
68
+ Failure handling:
69
+
70
+ - if the app cannot be launched, D1 fails unless there is a previously accepted explicit platform limitation that makes launch impossible in the current environment
71
+ - if `agent-browser` is unavailable, record the missing tool as a P8 blocker unless the user explicitly accepts the risk
72
+ - if interaction shows a broken main path, D3 fails and packaging must not proceed without a fix or explicit user risk acceptance
73
+ - do not use screenshots or visual inspection alone as proof of functionality; the interaction must exercise behavior
74
+
75
+ Docker cadence:
76
+
77
+ - P8 `agent-browser` live launch is an explicit, narrow exception to the normal Docker deferral policy
78
+ - if the documented app launch command is `docker compose up --build`, use it for this P8 live interaction through the timeout helper and clean it up afterward; do not run dockerized `./run_tests.sh` here
79
+ - if a documented non-Docker/local runtime is also available and equivalent for the delivered app, prefer the faster local runtime for P8
80
+ - P8 live interaction does not replace P9 packaging/runtime confirmation; P9 still owns final Docker/runtime and dockerized broad-test closure when required
81
+
82
+ ## Frontend-design and Playwright reminders
83
+
84
+ Do not let P8 replace earlier frontend obligations:
85
+
86
+ - `frontend-design` remains mandatory whenever UI structure, usability, visual hierarchy, state, layout, or frontend quality matters
87
+ - web/fullstack planning must still require Playwright or equivalent real in-browser E2E for critical browser flows
88
+ - development must still run planned local Playwright/E2E/platform-equivalent checks before major development-complete claims when the accepted plan requires them
89
+ - P8 `agent-browser` is an extra live readiness interaction, not a replacement for Playwright tests or frontend-design review
90
+
91
+ ## Decision rule
92
+
93
+ Packaging can begin only when:
94
+
95
+ - D1-D9 are `pass`, `not applicable`, or explicitly `risk accepted`
96
+ - `agent-browser` verification passes or has an explicit accepted non-applicability/risk decision
97
+ - final docs, reports, repo state, and package-root expectations agree
98
+ - no material prompt-critical, security, runtime, report-lineage, or usability inconsistency remains
@@ -426,7 +426,7 @@ Reject if:
426
426
  ## C5. Test Coverage Execution Contract
427
427
 
428
428
  `plan.md` must explicitly state:
429
- - a confident overall coverage target around `90%`
429
+ - at least `90%` unit-testable product-code coverage where measurable and at least `90%` closure of planned E2E/platform-critical flow rows
430
430
  - exact measurement path
431
431
  - confidence notes or known weak spots when relevant
432
432
  - full prompt-relevant surface inventory mapped to intended test layers
@@ -586,7 +586,7 @@ For database-bearing projects it must preserve:
586
586
 
587
587
  Reject if startup/test honesty is weak or if planning leaves these rules loose.
588
588
 
589
- It must also explicitly say that a separate local test harness is prepared during scaffold and used during development plus owner-side `P5`, that dockerized `./run_tests.sh` and Docker runtime are configured during development but not executed from planning through the end of `P7`, and that the owner performs the first real Docker check plus dockerized `./run_tests.sh` run in `P9`.
589
+ It must also explicitly say that a separate local test harness is prepared during scaffold and used during development plus owner-side `P5`, that dockerized `./run_tests.sh` and Docker runtime are configured during development but not executed from planning through the end of `P7`, that `P8` may launch the app only for `agent-browser` functional verification under `p8-readiness-reconciliation`, and that the owner performs the final Docker broad check plus dockerized `./run_tests.sh` run in `P9`.
590
590
 
591
591
  ## C10. Plan Completeness Standard
592
592
 
@@ -21,12 +21,14 @@ Its job is to ensure the owner:
21
21
  - uses the approved phased planning artifacts in the correct order
22
22
  - sends the developer the correct planning boundary at the correct time
23
23
  - does not improvise a new planning contract when the accepted planning package already defines it
24
- - for the Claude workflow, prepares one owner-side comparison design draft before the Phase 1 Claude design request and then merges the best ideas in-owner before Phase 2 continues
24
+ - prepare one owner-side comparison design draft before the Phase 1 developer design request and then merge the best ideas in-owner before Phase 2 continues; this applies to both OpenCode developer-subagent and Claude live-lane backends
25
25
 
26
26
  ## Core Rule
27
27
 
28
28
  The owner should not synthesize planning from scratch when the accepted planning package already exists.
29
29
 
30
+ Use the Context7 CLI/skill for any framework, library, SDK, API, CLI, or cloud-service documentation lookup needed during planning. Resolve first with `npx ctx7@latest library <name> "<question>"`, then fetch docs with `npx ctx7@latest docs <libraryId> "<question>"`; use external web search only after Context7 is insufficient or not applicable.
31
+
30
32
  Use the phased planning documents as the primary planning payload:
31
33
  - Phase 1 design prompt
32
34
  - Phase 1 design template
@@ -48,7 +50,7 @@ Use the phased planning documents as the primary planning payload:
48
50
  2. Start Phase 1 first.
49
51
  - Use the Phase 1 design prompt and Phase 1 design template.
50
52
  - Copy the needed Phase 1 prompt text into the design request itself, and tell the developer to follow the initialized Phase 1 design template.
51
- - In the Claude workflow, before sending the Phase 1 design request, launch one owner-side design-prep subagent to produce a comparison design draft at `../.ai/design-prep.md`.
53
+ - Before sending the Phase 1 design request, launch one owner-side design-prep subagent to produce a comparison design draft at `../.ai/design-prep.md`.
52
54
  - Treat that `.ai` design-prep draft as owner-side comparison input, not as an accepted contract and not as a developer-visible artifact.
53
55
  - The Phase 1 output must become the accepted design contract in `../docs/design.md`.
54
56
  - Keep `../docs/design.md` focused on repo/system design; exact runtime/bootstrap/README contracts belong in Phase 2 `plan.md`, not in the design doc.
@@ -60,8 +62,8 @@ Use the phased planning documents as the primary planning payload:
60
62
  - Before accepting Phase 1, explicitly reread the original prompt, the accepted requirements-and-clarification package, and `../docs/design.md` together; do not rely on a vague sense that the design is probably faithful.
61
63
  - Before accepting Phase 1, explicitly check that `../docs/design.md` still preserves the accepted core requirements extracted during clarification instead of only preserving the narrower ambiguity resolutions.
62
64
  - Before accepting Phase 1, explicitly check that the design identifies the core semantic path, captures prompt-critical rules, and includes a role surface matrix for any role/auth/ownership/public-route/admin/audit/export/notification surface.
63
- - In the Claude workflow, also compare the Claude-produced design against the owner-side `.ai` design-prep draft and merge the better ideas into `../docs/design.md` directly when they improve the accepted design.
64
- - If the owner patches `../docs/design.md` using those comparison ideas, inform Claude of the exact accepted design changes before requesting `../docs/api-spec.md` or `plan.md`.
65
+ - Compare the developer-produced design against the owner-side `.ai` design-prep draft and merge the better ideas into `../docs/design.md` directly when they improve the accepted design.
66
+ - If the owner patches `../docs/design.md` using those comparison ideas, inform the developer of the exact accepted design changes before requesting `../docs/api-spec.md` or `plan.md`.
65
67
  - Reject weak, high-level, narrowed, or incomplete design work.
66
68
  - If the design is materially sound and only small owner-side contract or wording fixes remain, patch `../docs/design.md` directly instead of bouncing the request back.
67
69
  - Do not move into the API contract request or execution planning until the design contract is accepted.
@@ -132,6 +134,7 @@ When sending planning work to the developer:
132
134
  - include the clarification content itself as detailed text before planning starts; do not assume the developer can infer it from a label or from owner-only files
133
135
  - keep the clarification brief clean and decisive; do not carry rejected clarifier guesses, duplicated entries, or stale ambiguity text into planning
134
136
  - inline the real prompt body needed for the current design or planning request; do not send `read this file and do it` instructions for owner-side packaged prompts, but it is fine to tell the developer to follow the initialized template
137
+ - remove owner-only lifecycle, evaluator, audit, gate, lane, and workflow mechanics from the developer-facing wording; speak as the owner requesting a concrete design, plan, README, test, runtime, or implementation outcome
135
138
  - do not restate massive planning content outside the accepted planning package unless you are correcting it
136
139
  - do not ask for a generic plan when the phased templates already define the expected outputs
137
140
  - for design work, request `../docs/design.md` first and request `../docs/api-spec.md` separately afterward when applicable
@@ -73,6 +73,8 @@ At scaffold time, do not require:
73
73
  - use `shared-contract.md` as the common runtime/test/README/scaffold contract
74
74
  - compose independent type modules such as `type-web-spa.md`, `type-api-service.md`, `type-database.md`, `type-background-jobs.md`, `type-offline-local-first.md`, `type-mobile-android.md`, and `type-desktop.md` before applying stack-specific details
75
75
  - compose independent tech modules such as `tech-frontend-vue.md`, `tech-frontend-react.md`, `tech-backend-go.md`, `tech-backend-koa.md`, `tech-backend-laravel.md`, `tech-backend-gin-templ.md`, `tech-db-mysql.md`, `tech-db-postgres.md`, `tech-db-room.md`, `tech-db-localdb.md`, and `tech-rust-workspace.md` for the actual frontend/backend/database/language pieces in the prompt
76
+ - when a web/frontend prompt and adopted repo do not specify frontend framework, styling library, or UI component library, default to Vue 3 + Vite + TypeScript, Tailwind CSS, and shadcn/ui when compatible; do not override explicit prompt or existing-repo choices
77
+ - before locking bootstrap commands, package setup, or integration details for Vue, Tailwind, shadcn/ui, Playwright, backend frameworks, databases, SDKs, APIs, CLIs, or cloud services, use the Context7 CLI/skill to fetch current docs; resolve first with `npx ctx7@latest library <name> "<question>"`, then fetch docs with `npx ctx7@latest docs <libraryId> "<question>"`
76
78
  - when no findings-driven full-stack profile matches, use `stack-generic.md` plus the selected type and tech modules instead of falling back to the old framework-default model
77
79
  - do not tell the developer to read those files directly if they are outside `repo/`; restate the relevant directives in the developer prompt
78
80
  - when a matching findings-driven stack profile exists, prefer following it over inventing a new scaffold contract from scratch
@@ -13,7 +13,7 @@ Use this skill only during `P9 Submission Packaging`.
13
13
  - keep packaging work inside the formal phase window
14
14
  - treat packaging as a minimal final-delivery contract, not a reporting exercise
15
15
  - `P9` should begin from a repo state that has already passed the quick `P8` reconciliation sweep across `repo/`, `README.md`, parent-root `../docs/`, and carried `../.tmp/` audit artifacts
16
- - `P5` is the ordinary phase that runs the separate local test harness; `P9` is the first real Docker/runtime and dockerized `./run_tests.sh` confirmation point
16
+ - `P5` is the ordinary phase that runs the separate local test harness; `P8` may have launched the app only for `agent-browser` functional verification; `P9` is the final Docker/runtime and dockerized `./run_tests.sh` confirmation point
17
17
  - before closing `P9`, run the documented Docker/runtime path and dockerized `./run_tests.sh` when packaging changes, late bugfixes, or final confirmation needs make that necessary, then fix owner-side Docker/config/wrapper issues directly if needed
18
18
  - packaging does not close until runtime/test commands, parent-root docs, repo cleanup, session export, and final structure validation all agree with the delivered repo
19
19
  - do not invent extra reviewer artifacts beyond the required final structure
@@ -199,15 +199,42 @@ After those steps:
199
199
  - confirm workflow metadata marks `packaging_completed` as true
200
200
  - confirm no `submission/` directory or other obsolete packaging artifact structure remains
201
201
 
202
+ ## Docker environment cleanup
203
+
204
+ After Docker confirmation and testing are complete, clean up project-specific Docker artifacts before closing packaging. This prevents leftover containers, volumes, and locally built images from accumulating on the host.
205
+
206
+ Required cleanup (run from the repo directory containing `docker-compose.yml`):
207
+
208
+ ```bash
209
+ docker compose down -v --rmi local
210
+ ```
211
+
212
+ What this removes:
213
+ - Containers created by this project's `docker-compose.yml`
214
+ - Networks created by this project's `docker-compose.yml`
215
+ - Volumes declared in this project's `docker-compose.yml`
216
+ - Images **built locally** from this project's `Dockerfile` or `docker-compose.yml`
217
+
218
+ What this preserves:
219
+ - Base images pulled from registries (e.g., `postgres`, `golang`, `node`, `redis`, `alpine`)
220
+ - Images tagged from Docker Hub or other registries
221
+ - Unrelated containers, volumes, networks, or images from other projects
222
+
223
+ Rules:
224
+ - run this cleanup after `docker compose up --build` and `./run_tests.sh` have been confirmed working
225
+ - do not run broad Docker pruning commands (e.g., `docker system prune`, `docker volume prune`) that could affect unrelated projects
226
+ - do not remove base images that were pulled rather than built
227
+ - if the user explicitly asks to keep containers running for manual testing, skip this step and record that decision
228
+
202
229
  ## Final packaging verification
203
230
 
204
231
  - do one final package review before declaring packaging complete
205
232
  - confirm the package is coherent as a delivered project, not just a working repo snapshot
206
233
  - confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
207
- - if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, treat `P9` as the first real Docker confirmation point and the first real dockerized broad-test run
234
+ - if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, treat `P9` as the final Docker broad-confirmation point and the first real dockerized broad-test run; a prior P8 app launch for `agent-browser` does not count as this packaging confirmation
208
235
  - when those commands fail because of Docker config, wrapper, or test-harness glue issues, fix them directly in the owner session and rerun before closing packaging
209
236
  - confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
210
237
  - if packaging reveals a real defect or missing artifact, fix it before closing the phase
211
238
  - if packaging reveals parent-root doc drift or incomplete coverage/design/api reference docs, fix those docs before closing the phase
212
239
  - if packaging reveals that `plan.md` was not moved into parent-root `../docs/plan.md` when `P5` closed or that repo-local workflow rulebooks remain in `repo/`, treat that as missed earlier boundary cleanup and repair it before closing the phase
213
- - do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied
240
+ - do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, Docker environment cleanup, and final structure checks are satisfied