npm - theslopmachine - Versions diffs - 1.0.13 → 1.0.22 - Mend

theslopmachine 1.0.13 → 1.0.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/assets/agents/developer.md +6 -7
package/assets/agents/slopmachine-claude.md +66 -9
package/assets/agents/slopmachine.md +68 -9
package/assets/claude/agents/developer.md +5 -1
package/assets/skills/clarification-gate/SKILL.md +56 -20
package/assets/skills/claude-worker-management/SKILL.md +14 -4
package/assets/skills/deep-retrospective/SKILL.md +179 -0
package/assets/skills/deep-retrospective/run.py +446 -0
package/assets/skills/deep-retrospective/workflow-reference.md +240 -0
package/assets/skills/developer-session-lifecycle/SKILL.md +18 -4
package/assets/skills/development-guidance/SKILL.md +52 -31
package/assets/skills/evaluation-triage/SKILL.md +21 -7
package/assets/skills/final-evaluation-orchestration/SKILL.md +92 -28
package/assets/skills/integrated-verification/SKILL.md +38 -42
package/assets/skills/p8-readiness-reconciliation/SKILL.md +31 -10
package/assets/skills/planning-gate/SKILL.md +10 -7
package/assets/skills/planning-guidance/SKILL.md +60 -52
package/assets/skills/retrospective-analysis/SKILL.md +172 -58
package/assets/skills/scaffold-guidance/SKILL.md +18 -6
package/assets/skills/submission-packaging/SKILL.md +11 -3
package/assets/slopmachine/clarifier-agent-prompt.md +7 -6
package/assets/slopmachine/exact-readme-template.md +8 -12
package/assets/slopmachine/owner-verification-checklist.md +1 -1
package/assets/slopmachine/phase-1-design-prompt.md +5 -10
package/assets/slopmachine/phase-1-design-template.md +15 -11
package/assets/slopmachine/phase-2-execution-planning-prompt.md +5 -2
package/assets/slopmachine/phase-2-plan-template.md +14 -4
package/assets/slopmachine/scaffold-playbooks/shared-contract.md +2 -1
package/assets/slopmachine/templates/AGENTS.md +3 -1
package/assets/slopmachine/templates/CLAUDE.md +3 -1
package/assets/slopmachine/test-coverage-prompt.md +8 -1
package/assets/slopmachine/utils/README.md +1 -5
package/assets/slopmachine/utils/claude_live_common.mjs +2 -5
package/assets/slopmachine/utils/prepare_evaluation_send_packet.mjs +3 -3
package/package.json +1 -1
package/src/constants.js +0 -9
package/src/init.js +17 -24
package/src/install.js +30 -28
package/assets/slopmachine/utils/prepare_evaluation_prompt.mjs +0 -81

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -7,7 +7,7 @@ description: Phase 5 evaluation/remediation cycles, fix-checks, and coverage/REA
 Use this skill during `Phase 5: Evaluation And Fix Verification`.
-This phase is strict. Follow the cycle rules exactly.
+This phase is strict. Follow the cycle rules exactly. If another section appears to conflict with the non-negotiable prompt block below, the non-negotiable prompt block wins.
 ## Core Shape
@@ -18,16 +18,47 @@ Phase 5 has three parts:
 The existing bugfix lane from Phase 4 remains active through Audit Cycle 1 and Audit Cycle 2. After both audit cycles complete and the 4 required audit reports exist, close that bugfix lane and start a new test-coverage/final-reconciliation lane.
-## Prompt Preparation
+## Non-Negotiable Full-Audit Prompt Block
-- Prepare the evaluation prompt with `prepare_evaluation_prompt.mjs` based on project type.
-- Build the exact evaluator send packet with `prepare_evaluation_send_packet.mjs` before every full audit send.
-- Use `backend-evaluation-prompt.md` for backend, fullstack, API, mobile, desktop, CLI, data, or any non-pure-frontend project.
-- Use `frontend-evaluation-prompt.md` for pure frontend projects.
-- Read the saved send packet file and send that exact content verbatim to the evaluator subagent.
-- Record the prepared prompt path and exact send packet path in metadata and Beads.
-- Save evaluator session ids in `../.ai/metadata.json` and Beads.
-- Treat evaluator reports as immutable once written.
+This block controls every full audit and fail-regeneration evaluator send. Do not reinterpret it from later sections.
+Only two evaluator prompts are allowed in the full-audit/fail-regeneration cycle:
+1. **Initial full audit prompt**
+   - Determine project type.
+   - Use and record both the installed full evaluation prompt asset and the generated send packet.
+   - The installed asset is the source prompt, located under the installed SlopMachine assets directory (`~/slopmachine/` or `$SLOPMACHINE_HOME/slopmachine/`), for example `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md`.
+   - Use `~/slopmachine/backend-evaluation-prompt.md` for backend, fullstack, API, mobile, desktop, CLI, data, or any non-pure-frontend project.
+   - Use `~/slopmachine/frontend-evaluation-prompt.md` for pure frontend projects.
+   - Prepare the send packet with `prepare_evaluation_send_packet.mjs` after prompt interpolation and exact output path insertion.
+   - The installed evaluation prompt content must be preserved word-for-word with nothing omitted. The evaluator must receive the whole long prompt packet, not a summary, not a path, not a shortened excerpt, and not an owner-authored substitute. If the installed prompt is 400+ lines, all 400+ lines must be present in the packet sent to the evaluator.
+   - Read the complete saved send packet before sending.
+   - Paste/send the complete saved send packet into the evaluator subagent word-for-word, with nothing added, removed, summarized, paraphrased, or appended.
+   - The owner must not add a preface, footer, clarification, instruction, file reference substitute, or any extra sentence.
+2. **Fail-regeneration prompt**
+   - If a full audit report is Fail, archive the failed report unchanged, extract all issues, fix them through the active bugfix lane, and owner-verify the fixes.
+   - Return to the same evaluator session that produced the failed report.
+   - Send exactly the regeneration prompt below, verbatim, with absolutely nothing added, removed, summarized, paraphrased, prefaced, suffixed, or otherwise modified.
+```text
+Regenerate the audit on a new file. Check, I fixed every issues you identified. but don't make it obvious that it is a regeneration, you shouldn't mention that the previous issues are fixed on the current audit, just put the new updated verdict and only mention remaining or newly found issues.
+Refrain from using words like "Remaining, still, now", and the likes that may expose this current audit is a regen.
+```
+No other full-audit or fail-regeneration prompt is allowed. In particular, the owner must not:
+- shorten the installed evaluation prompt;
+- send only a path to the prompt instead of the complete prompt packet;
+- add local context outside the prepared packet;
+- append rerun footers or status notes;
+- rewrite the regeneration prompt;
+- combine the regeneration prompt with issue summaries, fix evidence, or extra report instructions;
+- replace a fail-regeneration send with a different custom prompt.
+If any deviation is found from either allowed prompt send, the current audit cycle is invalid. Archive every report or candidate report generated during that invalid cycle unchanged under `../.ai/archive/`, record the restart reason in metadata and Beads, then restart that audit cycle from a fresh evaluator session using the installed asset and exact saved send packet. Do not patch, rename, or reuse invalid-cycle reports as kept reports.
+Record the installed prompt asset path, prepared prompt path, exact send packet path, evaluator session id, and report path in metadata and Beads. Treat evaluator reports as immutable once written.
 ## Report Paths And Names
@@ -40,6 +71,15 @@ Use task-local report paths under `./.tmp/`. The kept report paths are:
 Do not write final kept reports under `../.tmp` or `./tmp`.
+Cycle naming is fixed:
+- Audit Cycle 1 kept full audit report: `./.tmp/audit_report-1.md`.
+- Audit Cycle 1 fix-check report, when required: `./.tmp/audit_report-1-fix_check.md`.
+- Audit Cycle 2 kept full audit report: `./.tmp/audit_report-2.md`.
+- Audit Cycle 2 fix-check report, when required: `./.tmp/audit_report-2-fix_check.md`.
+- Coverage/README report: `./.tmp/test_coverage_and_readme_audit_report.md`.
+A cycle fix-check is required for every kept full audit report, including Pass reports with zero issues. The fix-check report must confirm that all issues from the kept audit report are fixed; when the kept audit report has zero scoped issues, the fix-check must explicitly state that the kept audit report had no scoped issues to close and that the cycle is clean.
 Archive failed/superseded reports unchanged under `../.ai/archive/`.
 ## Audit Cycle Procedure
@@ -50,9 +90,10 @@ Run this procedure twice: once for Audit Cycle 1 and once for Audit Cycle 2.
 - Start a new `evaluator` subagent session.
 - Save its session id.
-- Send the exact saved evaluator send packet verbatim.
+- Send the exact saved evaluator send packet verbatim under the non-negotiable full-audit prompt block.
 - Require the report to be written to the current cycle's candidate report path.
 - Reject the report if you cannot confirm the last evaluator send used the exact saved send packet content.
+- Reject and restart the cycle if you cannot cite both the installed prompt asset path and the exact saved send packet path used for that evaluator send.
 ### 2. If The Report Is Fail
@@ -63,16 +104,10 @@ Run this procedure twice: once for Audit Cycle 1 and once for Audit Cycle 2.
 - Convert the issue list into broad human language.
 - Send the full issue set to the existing bugfix lane without file/line details, report names, or evaluator language.
 - Do not run a fix-check for a Fail report. First fix every issue from the failed report and verify the fixes owner-side.
-- After the bugfix lane fixes every issue from the failed report and the owner verifies, return to the same evaluator session and send this exact regeneration instruction verbatim, with no additions, removals, preface, suffix, or wording changes:
-```text
-Regenerate the audit on a new file. Check, I fixed every issues you identified. but don't make it obvious that it is a regeneration, you shouldn't mention that the previous issues are fixed on the current audit, just put the new updated verdict and only mention remaining or newly found issues.
-Refrain from using words like "Remaining, still, now", and the likes that may expose this current audit is a regen.
-```
+- After the bugfix lane fixes every issue from the failed report and the owner verifies, return to the same evaluator session and send only the exact fail-regeneration prompt from the non-negotiable full-audit prompt block.
 - Validate the regenerated report before using it.
-- If the regenerated report is less than 150 lines or materially less complete than the archived report in content size/depth, reject it and resend the full prepared evaluation prompt.
+- If the regenerated report is less than 150 lines or materially less complete than the archived report in content size/depth, reject it as invalid and repeat the fail-remediation/regeneration path using only the exact fail-regeneration prompt from the non-negotiable block after fixes are verified.
 - Reject the regenerated report if it is continuation-shaped, fix-only, materially omits required sections/tables/verdict panels/evidence style, or appears stale against current repo state.
 - If the regenerated report is reasonably similar in depth and structure, proceed with its verdict.
 - If the regenerated report is still Fail, archive it unchanged, extract every issue again, fix every issue again through the bugfix lane, owner-verify again, then return to the same evaluator session and send the same exact regeneration instruction verbatim again.
@@ -103,15 +138,44 @@ Refrain from using words like "Remaining, still, now", and the likes that may ex
 - Keep the report as the cycle's audit report.
 - Save it under the cycle's kept report path in `./.tmp/`.
-- If it contains any issue or recommendation, treat those as the cycle issue set and run the same bugfix/fix-check discipline needed to close them.
-- If it contains no issues or recommendations, the cycle is complete with no fix-check required unless an explicit report requirement says otherwise.
+- If it contains any issue, recommendation, caveat, suggestion, action item, or requested change, treat those as the cycle issue set.
+- Send the full issue set to the existing bugfix lane in broad human language, then owner-verify the fixes.
+- Run the fix-check in the same evaluator session that wrote the kept Pass report.
+- The fix-check must address every scoped issue/recommendation/caveat/suggestion/action item/requested change from the kept Pass report.
+- Save the fix-check report as:
+  - Cycle 1: `./.tmp/audit_report-1-fix_check.md`
+  - Cycle 2: `./.tmp/audit_report-2-fix_check.md`
+- If the fix-check says any item is only partially fixed or not fixed, send those items back to the bugfix lane, then rerun the fix-check in the same evaluator session against the whole kept report issue set.
+- If it contains no issues, recommendations, caveats, suggestions, action items, or requested changes, run the same-session cycle fix-check anyway; the fix-check must state that the kept audit report had zero scoped items to close and that the cycle is clean.
+### 5. Cycle Completion Validation
+Before a cycle can close, validate the whole cycle as a hard gate. If any item fails, archive every candidate/kept/fix-check report produced in that cycle unchanged under `../.ai/archive/`, record the reason, and restart that audit cycle from a fresh evaluator session using the non-negotiable full-audit prompt block.
+Required for each cycle:
+- the installed evaluation prompt asset path is recorded and matches the project type;
+- the exact saved send packet path is recorded;
+- the full saved send packet was read before send and sent word-for-word with no owner additions, omissions, summaries, path-only substitutions, or footers;
+- if the cycle had any Fail report, the failed report was archived unchanged, fixes were verified owner-side, and the exact fail-regeneration prompt was sent word-for-word with no owner additions, omissions, summaries, issue lists, fix evidence, or footers;
+- the kept audit report exists at `./.tmp/audit_report-<N>.md`;
+- the kept audit report is rich and complete: at least 150 lines and not materially shallower than the installed prompt's required output structure;
+- the kept audit report includes the required verdict, scope/boundary, prompt/repository mapping, section review or blocker/high panel as applicable, issues/suggestions or explicit no-issue statement, security/data-risk review where applicable, and test/logging/coverage sections required by the installed prompt;
+- the cycle fix-check report exists at `./.tmp/audit_report-<N>-fix_check.md`;
+- the cycle fix-check report confirms every issue/recommendation/caveat/suggestion/action item/requested change from the kept audit report is fixed, or explicitly confirms the kept audit report had zero scoped items to close;
+- the fix-check report does not perform a broader new audit and does not use history-exposing comparison language.
+No audit cycle is complete without both `./.tmp/audit_report-<N>.md` and `./.tmp/audit_report-<N>-fix_check.md` passing this validation gate.
 ## Fix-Check Prompt
-Use this exact fix-check instruction after a kept Partial Pass report has been fixed:
+Use this exact fix-check instruction after a kept Pass or Partial Pass report with any scoped issue/recommendation/caveat/suggestion/action item/requested change has been fixed. Send it verbatim to the same evaluator session after providing concise developer fix evidence, exact verification results when available, and the exact full audit-scoped issue list from the kept `audit_report-<N>.md`:
 ```text
-Check the current repository state against every issue and recommendation from the kept audit report for this audit. Write the fix-check report to the requested file. For each scoped issue, state whether the current repository satisfies the expected behavior, cite current evidence, and identify any issue that is not fully fixed. Do not perform a broader new audit.
+Only confirm whether those exact earlier items are fixed; do not perform a broader new review.
+The follow-up report must describe each scoped issue's current status, current evidence, and any important verification caveats. It must not describe issues as `resolved`, `remaining`, `still open`, `no longer existing`, `left`, or compare against prior audit history.
+Write the follow-up report to `./.tmp/audit_report-<N>-fix_check.md`.
 ```
 Do not allow fix-check scope to shrink. It must cover all issues from the kept audit report.
@@ -119,7 +183,7 @@ Do not allow fix-check scope to shrink. It must cover all issues from the kept a
 ## Issue Handoff To Bugfix Lane
 Issue handoff must be human and broad:
-- say `I found these issues...`
+- speak as yourself, say I found these issues
 - group by module/product area
 - include all issues from the current scope
 - emphasize prompt-fit/compliance and security issues
@@ -129,16 +193,16 @@ Issue handoff must be human and broad:
 Example:
 ```text
-I found several issues that need cleanup. The auth behavior still has security gaps around access to other users' data, the API error handling is inconsistent, and the README/test coverage is missing some required flows. Please inspect those areas, fix the issues, and rerun the relevant checks.
+I found several issues that need cleanup. The auth behavior still has security gaps around access to other users data. The API error handling is inconsistent. The README and test coverage is missing some required flows. Please inspect those areas, fix the issues, and rerun the relevant checks.
 ```
 ## Completing Two Audit Cycles
 Phase 5 cannot move to coverage/README reconciliation until both cycles are complete and these reports exist:
 - `./.tmp/audit_report-1.md`
-- `./.tmp/audit_report-1-fix_check.md` when Cycle 1 required fixes
+- `./.tmp/audit_report-1-fix_check.md`
 - `./.tmp/audit_report-2.md`
-- `./.tmp/audit_report-2-fix_check.md` when Cycle 2 required fixes
+- `./.tmp/audit_report-2-fix_check.md`
 Once both cycles are complete:
 - close the existing bugfix lane
@@ -151,7 +215,7 @@ Once both cycles are complete:
 After the new reconciliation lane is established:
 1. Start a fresh evaluator session.
-2. Send `test-coverage-prompt.md` verbatim.
+2. Send `~/slopmachine/test-coverage-prompt.md` verbatim.
 3. Require `./.tmp/test_coverage_and_readme_audit_report.md`.
 4. Read the generated report.
 5. Require an overall Pass. Pass with caveats is acceptable only when caveats are explicit, bounded, non-blocking, and do not contradict README hard gates or required coverage surfaces.

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -28,14 +28,14 @@ Phase 4 starts a new bugfix lane. The original development lane is no longer the
 1. Close the original development lane for normal implementation.
 2. Start a new bugfix lane/session.
-3. Brief the new lane in normal human language about the project, current state, and that it will be helping clean up issues.
+3. Brief the new lane in normal human language about the project and current status. The goal is orientation only: explain that development just finished and this session will help clean up issues after discovery is complete. Do not send the issue batch yet.
 4. Do not reveal phases, workflow state, Beads, metadata, evaluator mechanics, or hidden paths.
 5. Record lane/session id, backend, purpose, runtime dir if applicable, and handoff summary in `../.ai/metadata.json` and Beads.
 Example bugfix-lane briefing style:
 ```text
-I'm going to have you help clean up this project. Read the README and the docs in docs/ to understand what it is supposed to do. Don't start changing anything yet; first get oriented to the app and how it is structured.
+I am going to have you help clean up this project. Read the README and the docs in the docs directory to understand what it is supposed to do. Do not start changing anything yet, first get oriented to the app and how it is structured.
 ```
 ## Owner Plan-Based Review
@@ -56,57 +56,53 @@ Check:
 - security, authorization, ownership, validation, and data integrity
 - placeholder, shell, fake-success, disconnected UI, or static-demo behavior
-Create an internal issue list under `../.ai/` and record the artifact in metadata and Beads.
-Send the issues to the bugfix lane in broad human language. Do not pass file/line references, report paths, or internal plan rows unless explicitly needed.
-Example:
-```text
-I found several issues in the codebase. The auth area still has gaps around users accessing records they should not, the billing flow is missing negative tests, and the README startup notes don't match the current app. Please inspect those areas, fix the problems, and rerun the relevant checks.
-```
+Create a consolidated issue file under `../.ai/consolidated-internal-issues.md`. Record the artifact in metadata and Beads. This file will collect issues from the plan-based review and all evaluator passes. Do not send issues to the bugfix lane yet — all issues are batched and sent together after the evaluator loop completes.
 ## Internal Evaluator Loop
-This is an isolated owner-side loop. It is not the final evaluation phase.
+This is an isolated owner-side loop for discovery only. Do not send any issues to the bugfix lane between passes. All issues are batched and sent once after the loop completes.
 Purpose:
-- discover additional Blocker/High issues before local verification
+- discover Blocker, High, security, and prompt-fit issues before local verification
 - preserve issue evidence internally
-- send consolidated issue batches to the bugfix lane in human language
+- batch all discoveries before sending the bugfix lane its first fix request
 Procedure:
-1. Prepare the evaluator prompt packet using `prepare_evaluation_prompt.mjs` based on project type.
-2. Use `backend-evaluation-prompt.md` for backend/fullstack/mixed/non-pure-frontend work.
-3. Use `frontend-evaluation-prompt.md` for pure frontend work.
+1. Prepare the evaluator prompt packet using `prepare_evaluation_send_packet.mjs` based on project type, ensuring the installed prompt file body is included verbatim in the packet under the owner-level verbatim prompt paste rule.
+2. The installed asset is `~/slopmachine/backend-evaluation-prompt.md` for backend/fullstack/mixed/non-pure-frontend work.
+3. The installed asset is `~/slopmachine/frontend-evaluation-prompt.md` for pure frontend work.
 4. Start one isolated evaluator subagent session.
-5. Send the prepared packet and require a report file under `../.ai/internal-verification/` or another stable internal path.
-6. Archive each generated report unchanged.
-7. Extract Blocker and High issues into a saved internal issue list.
-8. In the same evaluator session, ask for another issue-discovery pass using the same strategy.
-9. Repeat until 5 total evaluator passes have completed, or stop earlier only if the evaluator cannot produce meaningful new findings and the reason is recorded.
-10. Consolidate all Blocker/High issues, deduplicate by root cause, and save the consolidated issue list.
-11. Record report paths, extracted issue paths, and consolidated issue paths in metadata and Beads.
+5. Send the prepared packet and require a report file under `../.ai/internal-verification/report-1.md`.
+6. After the report is generated, archive it unchanged. Extract all Blocker, High, security, and prompt-fit issues and append them to `../.ai/consolidated-internal-issues.md`.
+7. Return to the same evaluator session (before any fixes have been made). Ask it to find another set of different issues using the same prompt and rigor.
+8. Require a new report file under `../.ai/internal-verification/report-N.md`. Archive it and extract issues into the same consolidated file.
+9. Repeat for 5 total passes: the initial full evaluation prompt (step 5) plus 4 follow-up "find more issues" passes (step 7-8). **All 5 passes must be attempted.** Do not stop early unless the evaluator explicitly returns no new findings in two consecutive passes and the reason is recorded. Skipping passes is a workflow violation — 5 passes is the minimum, not a target. If the evaluator produces even one new finding in pass 4, pass 5 must still run.
+10. After all passes, the consolidated issue file contains every issue from all 5 evaluator reports plus the plan-based review.
-Follow-up prompt style for the same evaluator session:
+Follow-up prompt style for the same evaluator session (passes 2-5):
 ```text
-Using the same review strategy, do another pass and look for a different set of material issues that were not already covered. Focus on independent Blocker or High risks. Write the next report to the requested path.
+Using the same review strategy, do another pass and look for a different set of material issues that were not already covered. Focus on independent Blocker, High, security, or prompt-fit risks. Write the next report to the requested path.
 ```
-The evaluator loop may receive internal paths and report instructions because it is an owner-side verification tool. Do not forward evaluator mechanics or report excerpts to the bugfix lane.
+The evaluator loop may receive internal paths and report instructions because it is an owner-side verification tool. Do not forward evaluator mechanics, report excerpts, or the consolidated file to the bugfix lane.
-## Bugfix Loop From Internal Findings
+## Bugfix From Consolidated Issues
-After the internal evaluator loop:
-1. Convert the consolidated issue list into a concise human bugfix prompt.
-2. Group issues by broad module/product area.
-3. Avoid line numbers, file paths, report names, and evaluator language.
-4. Send the issue batch to the bugfix lane.
-5. Let the bugfix lane inspect, fix, and verify.
-6. After the result, owner checks changed files and targeted behavior privately against the issue list and plan.
-7. Send any remaining issues back in the same broad human style.
-8. Record fixes, verification evidence, remaining issues, and handoffs in metadata and Beads.
+After the owner plan-based review and all 5 evaluator passes are complete:
+1. The consolidated issue file under `../.ai/consolidated-internal-issues.md` contains all discoveries.
+2. Return to the already-oriented bugfix lane/session from Phase 4 entry. If it was not established yet due to recovery, establish and orient it now before sending fixes.
+3. Convert the full consolidated issue list into a single concise human bugfix prompt. Group issues by broad module/product area. Avoid line numbers, file paths, report names, and evaluator language. Send the full issue batch to the bugfix lane in broad human language. Do not pass file/line references, report paths, or internal plan rows unless explicitly needed.
+Example:
+```text
+I found several issues in the codebase. The auth area still has gaps around users accessing records they should not. The billing flow is missing negative tests. The README startup notes do not match the current app. Please inspect those areas, fix the problems, and rerun the relevant checks.
+```
+4. After the result, owner checks changed files and targeted behavior privately against the consolidated issue list and plan.
+5. If any issues were not properly fixed, send them back in the same broad human style.
+6. Record fixes, verification evidence, remaining issues, and handoffs in metadata and Beads.
 ## Local Non-Docker Verification
@@ -118,7 +114,7 @@ Rules:
 - Use the startup commands and expected flows supplied at the end of development.
 - Verify the app starts locally when feasible.
 - Verify key expected flows manually/API-wise/platform-wise as appropriate.
-- For browser-accessible apps, manually exercise representative core prompt requirements and every README-listed seeded/demo account, role, and seeded value where feasible. Record any unverified surface and route failures to the bugfix lane.
+- For browser-accessible apps, **use agent-browser to verify core prompt requirements, every README credential, and key user journeys.** This is mandatory for web/fullstack projects before closing Phase 4. Verify the app is reachable, login works with each README credential, key pages load, and at least one primary workflow per role completes end to end. Record any browser-found failures and route them to the bugfix lane. Do not accept "page loaded" as verification — each flow must prove the expected business outcome (state change, data persistence, authorization).
 - Run relevant unit/API/integration/E2E/platform checks locally when available.
 - If a command or local runtime check cannot run, record the exact blocker and risk.
 - Any issue found goes back to the bugfix lane in human language.
@@ -128,10 +124,10 @@ Rules:
 Phase 4 is complete only when:
 - the original development lane is closed for normal implementation
 - the bugfix lane is established and recorded
-- owner plan-based review issues are fixed or recorded as concrete risks
-- the internal evaluator loop completed up to 5 passes or stopped early with recorded justification
-- consolidated Blocker/High internal issues are fixed or recorded as concrete risks
+- owner plan-based review and internal evaluator loop (5 passes) completed, with all issues collected in `../.ai/consolidated-internal-issues.md`
+- consolidated issues are fixed or recorded as concrete risks
+- **browser verification (agent-browser) completed for web/fullstack projects** — every README credential tested, core user journeys verified, browser-found failures fixed or recorded
 - local non-Docker app/test verification has run where feasible
 - local verification issues have been fixed or recorded as concrete risks
 - README/runtime/test/design/API surfaces are coherent enough to proceed to final evaluation
-- metadata and Beads record the reports, issue lists, fixes, verification evidence, blockers, and closure decision
+- metadata and Beads record the report paths, issue files, fixes, verification evidence, blockers, and closure decision

package/assets/skills/p8-readiness-reconciliation/SKILL.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: p8-readiness-reconciliation
+description: Phase 6 final readiness reconciliation and issue classification after Phase 5.
+---
 # Phase 6 Readiness Reconciliation
 Use this skill for final readiness decision and issue classification after Phase 5.
@@ -23,7 +28,7 @@ Use these D1-D9 buckets for major issue classification:
 | D6 Reproducibility & Dependency Failure | System cannot run in a clean isolated environment. Examples: undeclared dependencies, reliance on local paths/local DB/private services, manual DB setup/import, dependency version mismatch, or environment-specific config. |
 | D7 Documentation & Verification Consistency Failure | Documentation does not match actual behavior. Examples: README startup steps fail, URLs/ports are wrong, verification steps cannot be executed, access instructions are missing, documented flow differs from implementation, or README omits startup/service/verification sections. |
 | D8 Dataset & Session Integrity Failure | Dataset/session artifacts lack authenticity, structure, or traceability. Examples: missing planning/execution sessions, reconstructed logs, irrelevant session content, missing prompt-to-validation trajectory, wrong task-root launch, absent initial prompt/planning steps, or unjustified context-losing splits. |
-| D9 Self-Test Report Integrity & Compliance Failure | Self-test reports are missing, altered, generated with wrong prompts/tools, inconsistent with actual behavior, missing required cycles, unresolved report issues remain, or report prompt type is wrong/distorted. |
+| D9 Evaluation Report Integrity & Compliance Failure | Evaluation reports are missing, altered, generated with wrong prompts/tools, inconsistent with actual behavior, missing required cycles, unresolved report issues remain, or report prompt type is wrong/distorted. |
 ## Readiness Decision
@@ -31,30 +36,46 @@ Use these D1-D9 buckets for major issue classification:
 - Partial Pass only if residual risks are explicit, bounded, and user-accepted.
 - Fail if core prompt-fit, security, runtime/test, delivery credibility, or evidence integrity remains materially broken.
 - Never imply unrun verification passed.
+- A check is "not applicable" only when the project type genuinely cannot support it (e.g. no browser UI for a pure-backend API) and the reason is user-confirmed. Cost, convenience, or time pressure are not valid not-applicable reasons — route the failing check to the active developer/Claude lane instead.
 ## Required Final Checks
-- Run the broad product test wrapper `./repo/run_tests.sh` when it exists and is applicable.
+- Run the broad product test wrapper `./repo/run_tests.sh` through Docker when it exists and is applicable. This is the first time Dockerized tests run — they were deferred from earlier phases. Do not run `run_tests.sh` outside Docker during readiness.
 - Run final runtime verification before packaging: `docker compose up --build` for web/backend/fullstack/container-supported projects, native/platform-equivalent startup for mobile/desktop projects, or a recorded not-applicable reason.
-- Use `agent-browser` for manual functionality verification where browser-accessible UI exists. The browser pass must walk the core prompt requirements and main user journeys, not just confirm that the app loads.
-- Exercise every relevant seeded/demo account, role/state, and README-listed seeded value. Confirm that documented credentials, seeded records, examples, IDs, statuses, roles, permissions, and expected default states are present, usable, and consistent with README claims.
-- For backend/API-only projects, replace browser checks with equivalent API/manual checks for every README-listed credential, seeded value, role/state, and core requirement.
-- If any final runtime, test, browser, account, or platform check cannot run, readiness cannot be `Pass` unless the user explicitly risk-accepts the unverified surface.
+### Agent-Browser Verification
+The `agent-browser` skill is installed as part of slopmachine setup. Use it for automated browser-based verification when the project has a browser-accessible UI. Load the skill and use its tools to walk every surface the prompt requires.
+The browser verification must cover, at minimum, these three categories:
+**1. Requirement surfaces.** Every prompt requirement that is user-facing must be exercised through agent-browser. Cross-reference `../.ai/requirements-breakdown.md` and extract every user-facing requirement, then verify each one through the browser. Walk through each core flow from beginning to task closure — not just opening pages but performing the actual interactions the user would do. For each requirement, verify that the system responds correctly: state changes, data persistence, error handling, auth enforcement, and task completion. If a requirement involves multiple roles, test each role's path separately. No requirement that has a user-visible surface may be skipped — every REQ-### with a user-facing implication must have a browser verification result.
+**2. Seeded values and credentials.** Exercise every README-listed demo credential, role, seeded record, example ID, status, and documented default. Log in with every role and confirm the permissions and seeded data described in the README are accurate. Flag any credential that does not work, any seeded value that does not exist, or any role that has incorrect access.
+**3. Core flows end to end.** Walk the main user journeys from start to finish — creating data, viewing it, modifying it, deleting it, and verifying the state at each step. If the system has multiple actors, test the interaction between them: one user creates, another views, a third approves or rejects. Every read, create, update, and delete operation that is prompt-relevant must be verified through agent-browser.
+**Batching and consolidation.** Do multiple agent-browser runs covering different surfaces before reporting issues. Test a group of related flows, note every failure, then test the next group. Collect all failures into a single consolidated issue list. Only when all surfaces have been tested, send the full consolidated list to the active developer/Claude lane once — do not send issues surface by surface. The lane should receive one batch of fixes to work through.
+For backend/API-only projects where browser verification is not possible, replace agent-browser checks with equivalent API-based checks for every README-listed credential, seeded value, role/state, and core requirement. Use curl or platform-equivalent CLI tools to exercise each endpoint and verify behavior.
+If any final runtime, test, browser, account, or platform check cannot run, readiness cannot be `Pass` unless the user explicitly risk-accepts the unverified surface.
 ## Failure Routing Loop
 - Phase 6 is the primary green gate for broad Docker/runtime and `./repo/run_tests.sh` verification.
 - Final reconciliation work belongs in the currently active developer/Claude implementation lane whenever it is more than a tiny, safe owner-side edit. Route product behavior, tests, README/runtime drift, Docker/runtime failures, browser/account issues, and coverage gaps to that lane in broad human language.
-- If `docker compose up --build`, native/platform startup, browser/API manual checks, account/seeded-value checks, or `./repo/run_tests.sh` fails, do not move to packaging.
+- If `docker compose up --build`, native/platform startup, or `./repo/run_tests.sh` fails, route the failure to the lane, fix, verify, and rerun.
+- For agent-browser or API verification failures: collect all issues across all surfaces into one consolidated list, then send the full list to the lane once. Do not send issues surface by surface — batch everything before the single handoff.
 - Route the failure to the currently active developer/Claude implementation lane in broad human language: describe the failing behavior, command, and user-visible/runtime impact without exposing evaluator or owner-private mechanics.
-- After the lane reports a fix, the owner verifies the changed surface and reruns the failed check.
-- Repeat fix, verify, and rerun until the check is green, not applicable for a documented reason, or explicitly risk-accepted by the user.
+- After the lane reports fixes, the owner verifies the changed surfaces and reruns the affected checks.
+- Repeat fix, verify, and rerun until the check is green, demonstrably not applicable to the project type and confirmed by the user, or explicitly risk-accepted by the user.
 - Record every failed command, routed issue, fix acknowledgement, rerun result, and final decision in metadata and Beads.
 ## Owner Direct Fixes
 - The owner may directly fix only minor, safe docs, wrapper, config, cleanup, or light glue issues when the change does not require product-design judgment, new tests, behavioral changes, or non-trivial debugging.
 - If the reconciliation issue is large enough to need real implementation work, meaningful test updates, runtime debugging, README/runtime restructuring, or product judgment, do not fix it owner-side. Send it to the currently active developer/Claude lane.
-- After any direct owner fix, send a minimal note to the currently active developer/Claude lane describing the changed surface and ask it to inspect/acknowledge the change before readiness continues.
+- Batch multiple direct fixes into a group, then inform the lane once. Do not notify turn by turn for every small edit. After all fixes in the batch are complete, send a single note describing all changed surfaces and ask the lane to inspect and acknowledge.
 - The note should be concise and developer-facing, not a workflow report.
 - Still rerun the affected command or check after acknowledgement.

package/assets/skills/planning-gate/SKILL.md CHANGED Viewed

@@ -25,7 +25,9 @@ Accept `./docs/design.md` only if it:
 - defines modules as product/system responsibilities, not file-by-file work packets
 - handles auth, authorization, ownership/isolation, validation, logging/redaction, admin/debug boundaries, and sensitive data where relevant
 - defines frontend states and FE-BE expectations where relevant
-- visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, true HTTP/API tests for every runtime endpoint with positive and negative cases, identifiable frontend unit tests that import/render real components/modules where a frontend exists, fullstack FE-BE proof, and full E2E/platform coverage for main user journeys in user-facing apps
+- specifies test surface per module in the module design table: each module lists its API endpoints to test, unit test targets, and E2E flows
+- visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, true HTTP/API tests for every runtime endpoint with positive and negative cases, identifiable frontend unit tests that import/render real components/modules where a frontend exists, fullstack FE-BE proof, and E2E/platform test coverage for every prompt requirement (not just main journeys — every requirement, actor path, business rule, authorization rule, error state, and task-closure condition must have identifiable E2E coverage or explicit not-applicable rationale)
+- requires unit tests under `unit_tests/` and API/integration HTTP tests under `API_tests/` (both mandatory when the corresponding test surface exists)
 - defines strict README/runtime obligations: project type near the top, primary `docker compose up --build` for container-supported deliveries, legacy compatibility string `docker-compose up` without making it primary, access and verification method, all auth/demo credentials and roles or exact `No authentication required`, seeded data values or empty-state statement, no manual runtime installs/manual DB setup/hidden `.env` dependency, mock/local/debug disclosures, and known limitations
 - gives explicit not-applicable reasons and replacement proof layers for any missing unit/API/E2E coverage surface
 - avoids vague placeholders such as `TBD`, `later`, `standard CRUD`, `normal auth`, or `basic tests` for correctness-critical behavior
@@ -71,12 +73,13 @@ Reject the plan if it is mostly narrative, starts with files before modules, cen
 ## Review Procedure
 1. Reread the original prompt and accepted clarification package.
-2. Review `./docs/design.md` against that baseline.
-3. Review `./docs/api-spec.md` against the design if applicable.
-4. Review `../.ai/plan.md` against design/API and check that it can drive implementation without exposing itself.
-5. Patch small owner-fixable wording or traceability issues directly when meaning is unchanged.
-6. Send material design/API gaps back to Claude in human issue/fix wording.
-7. Regenerate or revise owner-private planning when docs change materially.
+2. Cross-reference every requirement from `../.ai/requirements-breakdown.md` by REQ-### against `./docs/design.md`. Each requirement must map to a design surface, an explicitly addressed section, or have an accepted not-applicable reason. Record unmatched requirements as gaps.
+3. Review `./docs/design.md` against the original prompt, clarifications, and requirements baseline.
+4. Review `./docs/api-spec.md` against the design if applicable. Repeat the REQ-### cross-reference to confirm every requirement is covered in design or API spec.
+5. Review `../.ai/plan.md` against design/API and check that it can drive implementation without exposing itself.
+6. Patch small owner-fixable wording or traceability issues directly when meaning is unchanged.
+7. Send material design/API gaps back to Claude in human issue/fix wording.
+8. Regenerate or revise owner-private planning when docs change materially.
 ## Exit Condition

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -25,58 +25,66 @@ Phase 2 establishes the primary developer session and produces the accepted plan
 - Packaged design prompt and design template.
 - Packaged execution planning prompt and plan template.
-## Claude-Focused Flow
-1. Establish the Claude developer lane.
-   - Use `claude-worker-management`.
-   - Send the original prompt verbatim.
-   - Tell Claude in normal human wording that planning is starting and it should not write code.
-   - Record lane/session state in `../.ai/metadata.json` and Beads.
-   - Add a Beads `SESSION:` comment with the lane, backend, session id when available, runtime dir, and handshake status.
-2. Send accepted clarifications and requirements.
-   - Send only the clean accepted package the developer needs.
-   - Do not reveal hidden paths or workflow mechanics.
-   - Explain that these are accepted clarifications and requirements to use for design.
-3. Create `./docs/design.md`.
-   - Send the packaged design prompt and design template, or a direct instruction to use the seeded `./docs/design.md` template when it already exists.
-   - The design must be a true product/system design, not an execution plan.
-   - It should define product behavior, actors, flows, modules, data, security, UI/API surfaces, assumptions, and verification strategy at design level.
-   - It should not become a step-by-step coding checklist.
-   - When the design is returned, record the artifact path and Claude turn result in metadata and Beads.
-4. Owner reviews and cleans the design.
-   - Compare `./docs/design.md` against the original prompt, accepted requirements, and accepted clarifications.
-   - Patch small wording/traceability issues directly when meaning is unchanged.
-   - Send material gaps back to Claude in ordinary issue/fix wording.
-   - Record design acceptance, rejection, or owner patch decisions in metadata and Beads.
-5. Create `./docs/api-spec.md` when applicable.
-   - Use the accepted design as input.
-   - Require exact endpoint/interface contracts where APIs exist.
-   - If no meaningful API exists, mark `./docs/api-spec.md` as not applicable with a short reason.
-   - Record the API spec artifact path and applicability decision in metadata and Beads.
-6. Owner reviews design and API spec together.
-   - Confirm no prompt drift, no orphan requirements, and no conflict between design and API spec.
-   - Confirm frontend/backend exposure is coherent where applicable.
-   - Confirm security, data, validation, failure behavior, and testing expectations match.
-   - Record the combined integrity review decision in metadata and Beads.
-7. Create owner-private execution plan.
-   - The owner must delegate creation of `../.ai/plan.md` to one general owner-side planning subagent.
-   - Do not ask Claude/developer implementation lanes to create, edit, review, or know about `../.ai/plan.md`.
-   - Provide original prompt, stack/context, accepted questions, requirements breakdown, design, and API spec.
-   - The general subagent must use the packaged `phase-2-execution-planning-prompt.md` as its instruction prompt.
-   - The general subagent must use the packaged `phase-2-plan-template.md` as the required structure for `../.ai/plan.md`.
-- Require output to `../.ai/plan.md` and `../.ai/test-coverage.md` when useful.
-- Record private plan and coverage artifact paths in metadata and Beads after the subagent returns.
-- Ensure the private plan can be executed as small sequential developer prompts. Reject plans that require dumping multiple phases or the whole delivery contract into a single developer/Claude prompt.
-8. Owner accepts or rejects the planning package.
-   - Use `planning-gate`.
-   - Record accepted artifacts and phase closure evidence in metadata and Beads.
+## Deterministic Planning Sequence (Must Follow Exactly)
+This sequence is strict. Do not combine, reorder, skip, or batch these steps. Each step must complete and receive developer acknowledgement before the next step begins.
+### Step 1: Send Original Prompt
+Send the original product prompt from `./metadata.json` exactly as-is. No prefix, no introduction, no context, no clarifications. Append only this exact sentence at the end:
+```
+Don't write code yet — we'll plan this first.
+```
+That is the entire message. Wait for the developer to acknowledge before proceeding.
+### Step 2: Send Clarifications
+After the developer acknowledges the prompt, send:
+```
+Here are some clarifications I made:
+```
+Then list the accepted clarifications from `./docs/questions.md` in natural language. After the clarifications, list the core requirements from the approved clarification package as plain natural-language statements describing what the system must do. Do not use requirement IDs, numbered lists, traceability fields, or workflow metadata. Write the requirements as concise sentences: "The system must support user registration with email verification" rather than "REQ-001: User registration." Keep the whole message readable and conversational.
+Wait for the developer to acknowledge before proceeding.
+### Step 3: Send Design Prompt
+After the developer acknowledges the clarifications, read the installed `~/slopmachine/phase-1-design-prompt.md` fresh from its asset path and paste its full body verbatim under the non-negotiable verbatim prompt paste rule. The design prompt references the context already provided in Steps 1 and 2 and the template already seeded at `./docs/design.md`. Do not paste the template file body — it is already in the workspace.
+The design must:
+- be a true product/system design, not an execution plan
+- define product behavior, actors, flows, modules, data, security, UI/API surfaces, assumptions, and verification strategy at design level
+- not become a step-by-step coding checklist
+When the design is returned, record the artifact path and Claude turn result in metadata and Beads.
+### Step 4: Owner Reviews and Cleans the Design
+Compare `./docs/design.md` against the original prompt, accepted requirements, and accepted clarifications. The owner must cross-reference every requirement from `../.ai/requirements-breakdown.md` by its REQ-### ID against the design. Each requirement must map to a design surface, a design section that explicitly addresses it, or have an explicit accepted not-applicable reason recorded. Record any unmatched requirement as a design gap before proceeding.
+If a requirement is missing from the design, describe what is missing in plain language and tell the developer it needs to be included. Do not mention REQ-### IDs, traceability codes, or internal workflow labels — the developer does not know them. Say something like "the design does not cover how users approve invoices before they are finalized. This needs to be added to the design" rather than "REQ-042 is missing."
+Patch small wording/traceability issues directly when meaning is unchanged. Send material gaps back to Claude in ordinary issue/fix wording. Record design acceptance, rejection, owner patch decisions, and the REQ-### cross-reference result (matched, not-applicable, or gap) in metadata and Beads.
+### Step 5: Create `./docs/api-spec.md` When Applicable
+Use the accepted design as input. Require exact endpoint/interface contracts where APIs exist. If no meaningful API exists, mark `./docs/api-spec.md` as not applicable with a short reason. Record the API spec artifact path and applicability decision in metadata and Beads.
+### Step 6: Owner Reviews Design and API Spec Together
+Repeat the REQ-### cross-reference against the combined design and API spec. Confirm every requirement, clarification, and accepted constraint is addressed in either the design or the API spec with no orphan items. For each missing requirement, describe what is missing in plain language and send it back to the developer without mentioning requirement IDs. Confirm frontend/backend exposure is coherent where applicable. Confirm security, data, validation, failure behavior, and testing expectations match. Record the combined integrity review decision, the cross-reference result, and any residual orphan items in metadata and Beads.
+### Step 7: Create Owner-Private Execution Plan
+After the combined review (Step 6) confirms the design and API spec are coherent, the owner must delegate creation of `../.ai/plan.md` to one general owner-side planning subagent. Do not ask Claude/developer implementation lanes to create, edit, review, or know about `../.ai/plan.md`. Provide original prompt, stack/context, accepted questions, the minified requirements list extracted from `../.ai/requirements-breakdown.md`, the accepted design, and the accepted API spec. The general subagent must receive the complete body of the packaged `~/slopmachine/phase-2-execution-planning-prompt.md` as its instruction prompt, pasted verbatim into the sent message. The general subagent must receive the complete body of the packaged `~/slopmachine/phase-2-plan-template.md` as the required structure for `../.ai/plan.md`, pasted verbatim into the sent message. Require output to `../.ai/plan.md` and `../.ai/test-coverage.md` when useful. Record private plan and coverage artifact paths in metadata and Beads after the subagent returns. Ensure the private plan can be executed as small sequential developer prompts. Reject plans that require dumping multiple phases or the whole delivery contract into a single developer/Claude prompt.
+### Step 8: Owner Accepts or Rejects the Planning Package
+Use `planning-gate`. Record accepted artifacts and phase closure evidence in metadata and Beads.
 ## Worker Communication Boundary