npm - theslopmachine - Versions diffs - 1.0.11 → 1.0.13 - Mend

theslopmachine 1.0.11 → 1.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/assets/agents/developer.md CHANGED Viewed

@@ -56,6 +56,8 @@ All communication, code comments, docs, tests, and user-facing strings you add m
 - Tests should prove behavior and side effects, not only existence or rendering.
 - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
 - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
+- API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
+- Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
 - Include negative and boundary coverage when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
 - For frontend work, test loading, empty, submitting, disabled, success, error, and re-entry states when those states are relevant.
 - For backend-backed frontend work, verify the frontend uses the real client/API path and the backend performs real handler/service/data work.

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -148,11 +148,14 @@ Do not interact with Claude through raw `claude` commands, manual tmux typing, u
 - Claude messages must read like a lead engineer talking to another engineer.
 - Use private planning only to decide the next normal Claude instruction; do not mention private planning or its existence.
 - Include what to build or fix, why it matters, the broad affected area, expected behavior, and useful verification.
+- Prompt Claude phase-by-phase and slice-by-slice. Prefer one phase, scaffold, module, or fix batch per prompt; at most combine two adjacent tightly coupled slices when separating them would create needless churn.
+- Never give Claude the whole workflow, all phases, or a full end-to-end delivery packet at once.
 - For substantial Claude turns, you may include a normal human reminder that Claude can use its own built-in subagents for bounded investigation, implementation support, or verification inside the same Claude lane. Do not frame Claude subagents as separate workflow lanes, and do not create OpenCode subagents to help Claude implement.
 - Keep ordinary issue prompts at module/product level. Avoid file/line details unless the user explicitly asks you to pass exact references.
 - Do not paste, summarize, cite, name, or mention hidden plans.
 - Do not combine original-prompt orientation, design, implementation, verification, and bugfix work into one large prompt.
 - Do not send workflow mechanics, evaluator internals, Beads state, hidden-file paths, owner-state reasoning, or negative instructions about nonexistent artifacts to Claude.
+- After each Claude completion, verify the result against the original product prompt in `./metadata.json`, `./docs/design.md`, `./docs/api-spec.md` when applicable, and owner-private `../.ai/plan.md`. If there are issues, correct through the same active Claude lane before proceeding to the next slice.
 - If you make a direct owner-side code or docs change that affects the product repo, tell the active Claude lane exactly what changed and what remains to verify.
 ## Claude Utility Map
@@ -193,6 +196,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Scaffold first, then proceed module by module.
 - Prompt in casual human language using only visible project context.
 - Use internal planning privately for review and module acceptance.
+- Do not send more than the current module/slice, or two adjacent tightly coupled slices, in a single Claude prompt.
 - Record Claude turns, issues, verification evidence, and module acceptance in metadata and Beads.
 - After all modules are complete, ask the same Claude lane to check the implementation against the design/API docs and provide startup commands plus expected flows.
@@ -211,6 +215,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Preserve reports, extract complete issue sets, and route fixes in broad human language.
 - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
 - Complete only when the coverage/README audit passes with at least 90% test score.
+- Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active Claude lane before this phase closes.
 ### Phase 6: Final Readiness Decision
@@ -219,8 +224,9 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Run final runtime and test checks appropriate to the project.
 - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
 - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
+- Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
 - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active Claude lane in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
-- If the owner makes a direct safe fix, send a minimal note to the active Claude lane describing the changed surface and ask it to inspect/acknowledge before continuing.
+- Route final reconciliation work to the active Claude lane whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active Claude lane describing the changed surface and ask it to inspect/acknowledge before continuing.
 - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
 - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
@@ -232,6 +238,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Include only package docs: `docs/questions.md`, `docs/design.md`, and `docs/api-spec.md` when applicable.
 - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
 - Run final package boundary checks before closing.
+- If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
 ### Phase 8: Retrospective
@@ -248,7 +255,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - API/integration HTTP tests belong under `API_tests/` where that convention exists.
 - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
 - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
-- README must truthfully document startup, tests, configuration, access, demo credentials or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, and known limitations.
+- README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
 ## Evidence Discipline

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -127,10 +127,13 @@ All other subagent types are forbidden for owner use unless the user explicitly
 - Developer messages must read like a lead engineer talking to another engineer.
 - Use private planning only to decide the next normal implementation instruction; do not mention private planning or its existence.
 - Include what to build or fix, why it matters, the broad affected area, expected behavior, and useful verification.
+- Prompt developers phase-by-phase and slice-by-slice. Prefer one phase, scaffold, module, or fix batch per prompt; at most combine two adjacent tightly coupled slices when separating them would create needless churn.
+- Never give the developer the whole workflow, all phases, or a full end-to-end delivery packet at once.
 - Keep ordinary issue prompts at module/product level. Avoid file/line details unless the user explicitly asks you to pass exact references.
 - Do not paste, summarize, cite, name, or mention hidden plans.
 - Do not combine original-prompt orientation, design, implementation, verification, and bugfix work into one large prompt.
 - Do not send workflow mechanics, evaluator internals, Beads state, hidden-file paths, owner-state reasoning, or negative instructions about nonexistent artifacts to developers.
+- After each developer completion, verify the result against the original product prompt in `./metadata.json`, `./docs/design.md`, `./docs/api-spec.md` when applicable, and owner-private `../.ai/plan.md`. If there are issues, correct through the same active developer session before proceeding to the next slice.
 - If you make a direct owner-side code or docs change that affects the product repo, tell the active developer session exactly what changed and what remains to verify.
 ## Phase Model
@@ -160,6 +163,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Scaffold first, then proceed module by module.
 - Prompt in casual human language using only visible project context.
 - Use internal planning privately for review and module acceptance.
+- Do not send more than the current module/slice, or two adjacent tightly coupled slices, in a single developer prompt.
 - Record session turns, issues, verification evidence, and module acceptance in metadata and Beads.
 - After all modules are complete, ask the same session to check the implementation against the design/API docs and provide startup commands plus expected flows.
@@ -178,6 +182,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Preserve reports, extract complete issue sets, and route fixes in broad human language.
 - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
 - Complete only when the coverage/README audit passes with at least 90% test score.
+- Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active lane before this phase closes.
 ### Phase 6: Final Readiness Decision
@@ -186,8 +191,9 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Run final runtime and test checks appropriate to the project.
 - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
 - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
+- Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
 - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active developer session in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
-- If the owner makes a direct safe fix, send a minimal note to the active developer session describing the changed surface and ask it to inspect/acknowledge before continuing.
+- Route final reconciliation work to the active developer session whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active developer session describing the changed surface and ask it to inspect/acknowledge before continuing.
 - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
 - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
@@ -199,6 +205,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - Include only package docs: `docs/questions.md`, `docs/design.md`, and `docs/api-spec.md` when applicable.
 - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
 - Run final package boundary checks before closing.
+- If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
 ### Phase 8: Retrospective
@@ -215,7 +222,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
 - API/integration HTTP tests belong under `API_tests/` where that convention exists.
 - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
 - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
-- README must truthfully document startup, tests, configuration, access, demo credentials or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, and known limitations.
+- README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
 ## Evidence Discipline

package/assets/claude/agents/developer.md CHANGED Viewed

@@ -42,6 +42,8 @@ All communication, code comments, docs, tests, and user-facing strings you add m
 - Tests must prove behavior and side effects, not only existence or rendering.
 - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
 - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
+- API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
+- Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
 - Cover negative and boundary paths when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
 - For frontend work, test loading, empty, submitting, disabled, success, error, and re-entry states when those states are relevant.
 - For backend-backed frontend work, verify the frontend uses the real client/API path and the backend performs real handler/service/data work.

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -20,6 +20,8 @@ Use this skill during `Phase 3: Development` before prompting the active develop
 Prompt like a human developer working with an AI coding assistant.
+Prompt one bounded slice at a time. The preferred unit is one phase-purpose, scaffold, module, work package, or fix batch. At most combine two adjacent tightly coupled slices in one prompt, and only when splitting them would make the work less coherent. Never send all phases, the full private plan, or a start-to-finish workflow packet to the developer/Claude lane.
 Use direct wording such as:
 - `I checked the user module and found a missing authorization test. Please add that and rerun the relevant tests.`
 - `Continue with the invoice module. Build the create/list/detail flow against the existing product contract and cover the main success and validation paths.`
@@ -29,7 +31,7 @@ Do not send robotic process language. Do not require a specific response format.
 Do not keep restating visible doc paths in routine follow-up prompts when the same session already knows the project contract. It is fine to say `existing product contract`, `accepted docs`, or simply name the module. Mention exact doc paths only when orienting a new session, resolving confusion, or asking for a final contract check.
-For larger module slices, group expectations by user/business behavior instead of turning every endpoint, field, and negative case into a long checklist. Ask for real backend-backed behavior, visible UI states, and meaningful success/failure tests, but keep the wording natural.
+For larger module slices, group expectations by user/business behavior instead of turning every endpoint, field, and negative case into a long checklist. Ask for real backend-backed behavior, visible UI states, and meaningful success/failure tests, but keep the wording natural. If a module is too large to explain without becoming a checklist packet, split it into smaller sequential prompts.
 Example of a good larger module prompt:
@@ -50,7 +52,7 @@ Do not say `the review found`, `the evaluation found`, or `the audit found`. The
 ## Development Sequence
 1. **Scaffold first.**
-   - Establish the framework/runtime/test/README baseline.
+   - Establish the framework/runtime/test/README baseline, including the strict README gates that final review will expect.
    - Keep it free of project-specific business logic except the minimum proof surface needed to verify the stack is wired.
    - Use `scaffold-guidance` and scaffold playbooks privately to shape the prompt.
@@ -61,11 +63,13 @@ Do not say `the review found`, `the evaluation found`, or `the audit found`. The
 3. **Proceed module by module.**
    - Select the next section/module from `./docs/design.md` and the private plan.
-   - Prompt the developer using the docs only.
+   - Prompt the developer using the docs only, one module/work package at a time by default.
    - Ask for the implementation and the relevant tests/checks for that module.
+   - Combine two adjacent modules/work packages only when they share the same user flow or data contract and are easier to verify together.
 4. **Owner checks after each module.**
    - Inspect changed files manually.
+   - Compare behavior against the original product prompt in `./metadata.json`.
    - Compare behavior against `./docs/design.md` and `./docs/api-spec.md`.
    - Privately compare against `../.ai/plan.md` for tests, coverage, discoverability, functionality, and module completeness.
    - Run targeted checks when practical.
@@ -100,11 +104,12 @@ For each scaffold/module, check:
 - no-orphan ledger items assigned to the module are closed
 - project-specific behavior is real, not placeholder/shell/demo-only behavior
 - tests exist for the implemented behavior or a concrete exception is recorded
-- planned API/interface proof is present when the module owns endpoints/interfaces
+- planned API/interface proof is present when the module owns endpoints/interfaces, with true no-mock HTTP/API endpoint tests where applicable
+- frontend unit tests are directly detectable and import/render real frontend components/modules when the module owns frontend behavior
 - planned FE-BE proof is present when the module crosses frontend/backend boundaries
 - failure, validation, authorization, ownership, empty, loading, error, and duplicate/re-entry cases are covered where relevant
 - frontend/backend wiring is real where applicable
-- README changes match delivered runtime, commands, auth/no-auth, seed/demo data, and verification behavior
+- README changes match delivered runtime, commands, auth/no-auth, seed/demo data, verification behavior, mock/local/debug boundaries, and strict startup/access gates
 - targeted checks ran or were clearly blocked
 ## Internal Plan Alignment

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -154,9 +154,9 @@ After the new reconciliation lane is established:
 2. Send `test-coverage-prompt.md` verbatim.
 3. Require `./.tmp/test_coverage_and_readme_audit_report.md`.
 4. Read the generated report.
-5. Require an overall Pass. Pass with caveats is acceptable.
+5. Require an overall Pass. Pass with caveats is acceptable only when caveats are explicit, bounded, non-blocking, and do not contradict README hard gates or required coverage surfaces.
 6. Require at least 90% test score.
-7. If verdict is not Pass or test score is below 90%, extract all missing items and send them to the reconciliation lane in broad human language.
+7. If verdict is not Pass or Pass with acceptable caveats, if test score is below 90%, or if the report identifies README hard-gate failures, mocked endpoint coverage presented as true API coverage, missing frontend unit tests for web/fullstack, or missing FE-BE proof, extract all missing items and send them to the reconciliation lane in broad human language.
 8. After fixes, start a new evaluator session and send the same full verbatim test coverage/README prompt again.
 9. Repeat until verdict is Pass or Pass with caveats and test score is at least 90%.

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -52,6 +52,7 @@ Check:
 - README gate matrix
 - risk/negative coverage matrix
 - runtime/test/config consistency
+- strict README gates: project type near the top, Docker/startup/access/verification commands, auth/no-auth, every documented demo credential/role and seeded value, mock/local/debug disclosure, and no hidden manual setup
 - security, authorization, ownership, validation, and data integrity
 - placeholder, shell, fake-success, disconnected UI, or static-demo behavior
@@ -117,6 +118,7 @@ Rules:
 - Use the startup commands and expected flows supplied at the end of development.
 - Verify the app starts locally when feasible.
 - Verify key expected flows manually/API-wise/platform-wise as appropriate.
+- For browser-accessible apps, manually exercise representative core prompt requirements and every README-listed seeded/demo account, role, and seeded value where feasible. Record any unverified surface and route failures to the bugfix lane.
 - Run relevant unit/API/integration/E2E/platform checks locally when available.
 - If a command or local runtime check cannot run, record the exact blocker and risk.
 - Any issue found goes back to the bugfix lane in human language.

package/assets/skills/p8-readiness-reconciliation/SKILL.md CHANGED Viewed

@@ -36,14 +36,16 @@ Use these D1-D9 buckets for major issue classification:
 - Run the broad product test wrapper `./repo/run_tests.sh` when it exists and is applicable.
 - Run final runtime verification before packaging: `docker compose up --build` for web/backend/fullstack/container-supported projects, native/platform-equivalent startup for mobile/desktop projects, or a recorded not-applicable reason.
-- Use `agent-browser` for manual functionality verification where browser-accessible UI exists.
-- Exercise every relevant seeded/demo account and role/state where credentials or seeded data exist.
+- Use `agent-browser` for manual functionality verification where browser-accessible UI exists. The browser pass must walk the core prompt requirements and main user journeys, not just confirm that the app loads.
+- Exercise every relevant seeded/demo account, role/state, and README-listed seeded value. Confirm that documented credentials, seeded records, examples, IDs, statuses, roles, permissions, and expected default states are present, usable, and consistent with README claims.
+- For backend/API-only projects, replace browser checks with equivalent API/manual checks for every README-listed credential, seeded value, role/state, and core requirement.
 - If any final runtime, test, browser, account, or platform check cannot run, readiness cannot be `Pass` unless the user explicitly risk-accepts the unverified surface.
 ## Failure Routing Loop
 - Phase 6 is the primary green gate for broad Docker/runtime and `./repo/run_tests.sh` verification.
-- If `docker compose up --build`, native/platform startup, browser checks, account checks, or `./repo/run_tests.sh` fails, do not move to packaging.
+- Final reconciliation work belongs in the currently active developer/Claude implementation lane whenever it is more than a tiny, safe owner-side edit. Route product behavior, tests, README/runtime drift, Docker/runtime failures, browser/account issues, and coverage gaps to that lane in broad human language.
+- If `docker compose up --build`, native/platform startup, browser/API manual checks, account/seeded-value checks, or `./repo/run_tests.sh` fails, do not move to packaging.
 - Route the failure to the currently active developer/Claude implementation lane in broad human language: describe the failing behavior, command, and user-visible/runtime impact without exposing evaluator or owner-private mechanics.
 - After the lane reports a fix, the owner verifies the changed surface and reruns the failed check.
 - Repeat fix, verify, and rerun until the check is green, not applicable for a documented reason, or explicitly risk-accepted by the user.
@@ -51,7 +53,8 @@ Use these D1-D9 buckets for major issue classification:
 ## Owner Direct Fixes
-- The owner may directly fix narrow docs, wrapper, config, cleanup, or light glue issues when the fix is safe and does not require product-design judgment.
+- The owner may directly fix only minor, safe docs, wrapper, config, cleanup, or light glue issues when the change does not require product-design judgment, new tests, behavioral changes, or non-trivial debugging.
+- If the reconciliation issue is large enough to need real implementation work, meaningful test updates, runtime debugging, README/runtime restructuring, or product judgment, do not fix it owner-side. Send it to the currently active developer/Claude lane.
 - After any direct owner fix, send a minimal note to the currently active developer/Claude lane describing the changed surface and ask it to inspect/acknowledge the change before readiness continues.
 - The note should be concise and developer-facing, not a workflow report.
 - Still rerun the affected command or check after acknowledgement.

package/assets/skills/planning-gate/SKILL.md CHANGED Viewed

@@ -25,7 +25,8 @@ Accept `./docs/design.md` only if it:
 - defines modules as product/system responsibilities, not file-by-file work packets
 - handles auth, authorization, ownership/isolation, validation, logging/redaction, admin/debug boundaries, and sensitive data where relevant
 - defines frontend states and FE-BE expectations where relevant
-- visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, API/interface tests for every endpoint with positive and negative cases, and full E2E/platform coverage for main user journeys in user-facing apps
+- visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, true HTTP/API tests for every runtime endpoint with positive and negative cases, identifiable frontend unit tests that import/render real components/modules where a frontend exists, fullstack FE-BE proof, and full E2E/platform coverage for main user journeys in user-facing apps
+- defines strict README/runtime obligations: project type near the top, primary `docker compose up --build` for container-supported deliveries, legacy compatibility string `docker-compose up` without making it primary, access and verification method, all auth/demo credentials and roles or exact `No authentication required`, seeded data values or empty-state statement, no manual runtime installs/manual DB setup/hidden `.env` dependency, mock/local/debug disclosures, and known limitations
 - gives explicit not-applicable reasons and replacement proof layers for any missing unit/API/E2E coverage surface
 - avoids vague placeholders such as `TBD`, `later`, `standard CRUD`, `normal auth`, or `basic tests` for correctness-critical behavior
@@ -55,11 +56,13 @@ Accept `../.ai/plan.md` only if it is strong enough for the owner to drive devel
 - security execution obligations
 - API/interface implementation and proof obligations
 - API coverage matrix when APIs exist, including true HTTP/API proof and exception rationale
+- frontend unit-test detectability when frontend exists: direct test files, framework evidence, and imports/renders of real frontend components/modules
 - frontend state and integration obligations where applicable
 - FE-BE integration matrix and backend-to-frontend exposure check where applicable
 - README/runtime/test obligations
 - README gate matrix covering startup, access, verification, auth/no-auth, seeded/empty-state, config/no-secret handling, mock/local-data disclosure, and known limitations
-- test coverage map with unit/API/integration/E2E or platform-equivalent expectations
+- README gate matrix covering strict audit requirements: project type near top, `docker compose up --build`, legacy `docker-compose up` string, startup, access, verification, auth/no-auth, seeded values/empty-state, config/no-secret/no-hidden-env/no-manual-install handling, mock/local/debug disclosure, and known limitations
+- test coverage map with unit/API/integration/E2E or platform-equivalent expectations, including true no-mock HTTP/API endpoint proof where applicable
 - risk/negative coverage matrix for validation, authorization, ownership/isolation, empty/not-found, duplicate/conflict, re-entry, and sensitive-data leakage where relevant
 - final integrated verification and readiness preparation

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -70,8 +70,9 @@ Phase 2 establishes the primary developer session and produces the accepted plan
    - Provide original prompt, stack/context, accepted questions, requirements breakdown, design, and API spec.
    - The general subagent must use the packaged `phase-2-execution-planning-prompt.md` as its instruction prompt.
    - The general subagent must use the packaged `phase-2-plan-template.md` as the required structure for `../.ai/plan.md`.
-   - Require output to `../.ai/plan.md` and `../.ai/test-coverage.md` when useful.
-   - Record private plan and coverage artifact paths in metadata and Beads after the subagent returns.
+- Require output to `../.ai/plan.md` and `../.ai/test-coverage.md` when useful.
+- Record private plan and coverage artifact paths in metadata and Beads after the subagent returns.
+- Ensure the private plan can be executed as small sequential developer prompts. Reject plans that require dumping multiple phases or the whole delivery contract into a single developer/Claude prompt.
 8. Owner accepts or rejects the planning package.
    - Use `planning-gate`.

package/assets/skills/scaffold-guidance/SKILL.md CHANGED Viewed

@@ -45,7 +45,7 @@ Adjust the exact wording to the project. Do not over-format the message.
 - product repo root `./repo/run_tests.sh` when required by the project contract
 - runtime/Docker files when relevant, wired honestly for later verification
 - database/bootstrap/seed path when the product will require seeded data or persistent storage
-- README baseline with project type, stack, startup/access, verification, auth/no-auth, seeded/empty-state note, and repo layout
+- README baseline with project type near the top, stack, primary startup/access command, legacy `docker-compose up` compatibility string where applicable, verification method, auth/no-auth, seeded/empty-state note, mock/local/debug disclosures, known limitations, and repo layout
 - no committed secrets, `.env`, `.env.example`, hidden host setup, no-op tests, or fake-success integration paths
 ## Scaffold Should Not Deliver

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -39,7 +39,8 @@ Packaging must reject or remove stale workflow notes and scratch execution artif
 ## Packaging Checks
-- Confirm README, scripts, config, routes, docs, tests, and runtime instructions agree.
+- Confirm README, scripts, config, routes, docs, tests, browser/API manual evidence, and runtime instructions agree.
+- Confirm every README-listed demo credential, role, seeded value, documented example, and expected default state was verified in the final runtime/browser/API pass or explicitly risk-accepted by the user.
 - Confirm kept evaluation reports remain immutable evidence under `.tmp`, and that failed/stale/superseded reports are archived unchanged outside final `.tmp`.
 - Confirm Claude/session handoff artifacts are outside the product package path.
 - Confirm task-root rulebooks/settings are stripped from the final submission package when the packaging flow requires a product-only handoff.
@@ -55,7 +56,9 @@ Packaging must reject or remove stale workflow notes and scratch execution artif
 ## Final Runtime And Test Confirmation
 - Phase 7 owns the final Docker/runtime confirmation and dockerized broad `./repo/run_tests.sh` confirmation when those commands are part of the delivered contract or when late fixes/packaging changes could affect runtime/test behavior.
+- Phase 7 also owns final browser/API manual confirmation when late fixes, README edits, cleanup, package boundary changes, or seed/config changes could affect user-visible behavior or documented seeded values.
 - If `./repo/README.md` documents `docker compose up --build` or `./repo/run_tests.sh`, treat those as package contract commands, not aspirational notes.
+- If `./repo/README.md` documents demo accounts, roles, seeded data, example IDs, default statuses, or verification flows, treat those as package contract values that must be exercised through `agent-browser` or API/platform-equivalent checks before closure.
 - Fix owner-side Docker/config/wrapper/README/docs/light-script glue directly when safe; route real product-code or test-file defects back through the appropriate developer fix lane before packaging closes.
 - Never imply unrun Docker, runtime, browser, native/platform, or broad test commands passed.
 - End Docker verification with project-specific cleanup unless the user explicitly wants containers left running.
@@ -99,6 +102,7 @@ Phase 7 can close only when:
 - final package structure satisfies the allowlist;
 - stale visible execution artifacts are absent;
 - README, docs, scripts, config, routes, tests, audit artifacts, and repo behavior no longer contradict one another;
+- README-listed credentials, roles, seeded values, examples, default states, and verification flows have been exercised or explicitly risk-accepted;
 - runtime/test/package commands that were required have run or are explicitly risk-accepted by the user;
 - `.tmp` contains the final kept report set and no stale superseded reports;
 - session exports are complete and outside the package root;

package/assets/slopmachine/exact-readme-template.md CHANGED Viewed

@@ -215,10 +215,10 @@ Expected result:
 If `init_db.sh` is part of the standard test bootstrap, document that relationship clearly.
 ### Local verification harness
-- Document the separate local verification command(s) used for ordinary development and readiness checks.
+- Document the separate local verification command(s) used for ordinary development and readiness checks only if they do not become required reviewer setup.
 - Make clear that these local verification commands are distinct from the dockerized `./repo/run_tests.sh` broad test path.
 - Use the real stack-native local suite for the chosen language/framework where applicable, for example Vitest, Jest, PHPUnit, pytest, go test, cargo test, or another framework-native equivalent.
-- If that local suite needs machine-level installation or setup, document that clearly in the local verification notes.
+- Do not require reviewers to run manual installs or machine-level setup for the standard packaged verification path.
 ### Test entry points
 - Unit tests: `[command/path]`

package/assets/slopmachine/owner-verification-checklist.md CHANGED Viewed

@@ -18,7 +18,7 @@ Reject only for material defects that would mislead development, evaluation, or
 - [ ] `../.ai/plan.md` captures owner-private workstreams, module slices, tests, runtime rules, security obligations, and packaging checks.
 - [ ] `../.ai/plan.md` contains a no-orphan ledger mapping every accepted requirement, clarification, design trace row, API route, actor path, data object, security boundary, report/export/notification, and documentation obligation to a module/workstream and proof path.
 - [ ] `../.ai/plan.md` defines scaffold first, ordered module packets, owned files/tests, shared-file boundaries, FE<->BE/API proof, verification commands, completion checklist, and development-exit proof.
-- [ ] `../.ai/test-coverage.md` exists when meaningful coverage mapping is applicable and maps requirements/risks/API endpoints/frontend flows to planned tests, assertions, current status, and gaps.
+- [ ] `../.ai/test-coverage.md` exists when meaningful coverage mapping is applicable and maps requirements/risks/API endpoints/frontend flows to planned tests, assertions, current status, and gaps, including true no-mock HTTP/API classification and frontend unit-test detectability where applicable.
 - [ ] Private plan slices can be translated into normal developer prompts.
 - [ ] Developer prompts do not ask workers to read private workflow files.
@@ -32,6 +32,7 @@ Reject only for material defects that would mislead development, evaluation, or
 - [ ] A separate stack-native local harness exists for development/Phase 4, or the missing harness is explicitly user risk-accepted.
 - [ ] Tests prove behavior and side effects, not only route existence, component existence, mocked client returns, or status codes detached from state/artifact effects.
 - [ ] Fullstack/backend-backed frontend flows have real FE<->BE proof, not only separate backend and frontend tests.
+- [ ] Web/fullstack frontend unit tests are directly detectable and import/render real frontend components/modules.
 ## Development Completion
@@ -43,7 +44,7 @@ Reject only for material defects that would mislead development, evaluation, or
 ## Phase 4 And Phase 5
 - [ ] Phase 4 runs all available relevant tests except broad commands that require explicit user approval, asks the bugfix lane for verification guidance where useful, manually exercises relevant runtime/account surfaces, runs internal owner self-test cycles for issue discovery, and routes issues back to the bugfix lane.
-- [ ] Every provided seeded/demo account and every relevant role/state has been exercised or the unverified surface is explicitly user risk-accepted.
+- [ ] Every provided seeded/demo account, README-listed seeded value, example ID/status, and every relevant role/state has been exercised or the unverified surface is explicitly user risk-accepted.
 - [ ] Phase 5 uses fresh evaluator sessions for full self-test audits, evaluator subagent only, full prompt packets verbatim, no rerun footer, immutable reports, one bugfix/fix-check lane for issues from both final self-test audits, and same-evaluator scoped fix-check only for kept Partial Pass reports.
 - [ ] Every evaluator finding and recommendation is fixed and verified or explicitly risk-accepted by the user before Phase 5 closes.
 - [ ] Coverage/README/final reconciliation uses a dedicated developer session after Phase 5 findings.
@@ -52,7 +53,7 @@ Reject only for material defects that would mislead development, evaluation, or
 - [ ] Final runtime verification has run before packaging, or the unrun surface is explicitly risk-accepted.
 - [ ] `./repo/run_tests.sh` has run where applicable before packaging, or the unrun surface is explicitly risk-accepted.
-- [ ] `agent-browser` manual functionality verification has run for browser-accessible UI before packaging, or the unrun surface is explicitly risk-accepted.
+- [ ] `agent-browser` manual functionality verification has run through core prompt requirements, main user journeys, README-listed seeded values, demo credentials, and role/state behavior for browser-accessible UI before packaging, or the unrun surface is explicitly risk-accepted.
 - [ ] D1-D9 readiness categories are pass, not applicable, or explicitly risk-accepted.
 - [ ] Final docs, reports, repo state, and package-root expectations agree.
@@ -60,7 +61,7 @@ Reject only for material defects that would mislead development, evaluation, or
 - [ ] Final package root and docs allowlist are correct.
 - [ ] Workflow-private artifacts stay outside the product package.
-- [ ] README, scripts, config, tests, and runtime instructions agree.
+- [ ] README, scripts, config, tests, runtime instructions, browser/API manual evidence, and seeded/demo values agree.
 - [ ] Stale workflow notes and scratch execution artifacts are absent from final package.
 - [ ] `.tmp` contains final kept audit/fix-check/coverage reports only, with stale failed/superseded reports archived outside the package.
 - [ ] `repo/docker-compose.yml`, `repo/run_tests.sh`, and `repo/init_db.sh` where applicable match README claims.

package/assets/slopmachine/phase-1-design-prompt.md CHANGED Viewed

@@ -23,7 +23,8 @@ The design must:
 - preserve the original business goal and required user outcomes
 - incorporate accepted clarifications and requirements without narrowing them
 - identify the project type, stack, actors, roles, main flows, modules, data, UI/API surfaces, security boundaries, assumptions, and verification strategy
-- define the testing contract as part of the visible design: every API/interface endpoint must have positive and negative tests, unit coverage must target 90%+ for meaningful business logic, and user-facing applications must include full E2E/platform coverage for the main user journeys unless a surface is genuinely not applicable
+- define the testing contract as part of the visible design: every API/interface endpoint must have positive and negative true HTTP/API tests where a runtime endpoint exists, unit coverage must target 90%+ for meaningful business logic, frontend unit tests must be identifiable and must import/render real frontend components where a frontend exists, fullstack/web apps must prove frontend-to-backend behavior, and user-facing applications must include full E2E/platform coverage for the main user journeys unless a surface is genuinely not applicable
+- define README/runtime obligations that satisfy strict review: project type near the top, `docker compose up --build` as the primary startup command for container-supported deliveries, the legacy compatibility string `docker-compose up` without making it primary, access URL/port or platform launch method, verification method, auth/demo credentials for every role or the exact statement `No authentication required`, seeded data or empty-state statement, no manual runtime installs, no hidden `.env` dependency, mock/local/debug disclosures, and known limitations
 - make meaningful assumptions explicit
 - mark unresolved items only when a real decision is still needed
 - identify API/interface surfaces that should be captured in `./docs/api-spec.md`

package/assets/slopmachine/phase-1-design-template.md CHANGED Viewed

@@ -88,6 +88,8 @@ Cover where relevant: authentication, route authorization, object authorization,
 - Configuration model:
 - Persistent storage:
 - Seed/demo data need:
+- README startup/access expectation:
+- README auth/seed expectation:
 - Background jobs or scheduled work:
 - External integrations:
@@ -96,8 +98,10 @@ Cover where relevant: authentication, route authorization, object authorization,
 This is a design-level strategy, not an execution checklist.
 Required testing contract:
-- All API/interface endpoints must have test coverage for successful behavior and important negative/error cases. If there is no API/interface surface, state `Not Applicable` with the reason.
+- All API/interface endpoints must have true HTTP/API test coverage for successful behavior and important negative/error cases where a runtime endpoint exists. If a non-HTTP interface or accepted exception requires another proof layer, state the exception and replacement proof. If there is no API/interface surface, state `Not Applicable` with the reason.
 - Meaningful business logic must target 90%+ unit coverage. If a component cannot be unit-tested meaningfully, state the exception and the replacement proof layer.
+- Frontend unit tests must be identifiable by file pattern/framework evidence and must import or render real frontend components/modules when a frontend exists.
+- Fullstack or backend-backed frontend work must include proof that real frontend actions reach the intended backend/service behavior.
 - User-facing applications must have full E2E/platform coverage for the main user journeys, including success, validation/failure, and recovery states. If E2E/platform testing is not applicable, state why and what proof replaces it.
 | Surface / risk | Expected proof layer | Notes |
@@ -107,10 +111,24 @@ Required testing contract:
 | security boundaries |  |  |
 | API/interface behavior | endpoint tests for every endpoint, including positive and negative cases |  |
 | UI states / interactions |  |  |
-| integration paths |  |  |
+| frontend-to-backend integration paths |  |  |
 | unit coverage | 90%+ meaningful business-logic coverage |  |
+| frontend unit/component tests | identifiable tests importing/rendering real components/modules |  |
 | E2E/platform journeys | full main-journey coverage for user-facing apps |  |
+## 11.1 README / Runtime Gate Strategy
+| README/runtime gate | Required design outcome |
+|---|---|
+| project type | `backend`, `fullstack`, `web`, `android`, `ios`, or `desktop` near the top of README |
+| startup | primary `docker compose up --build` for container-supported deliveries; include legacy compatibility string `docker-compose up` without making it primary |
+| access | URL + port, emulator/device steps, or desktop launch steps |
+| verification | concrete API/UI/mobile/desktop verification method |
+| environment | Docker-contained or platform-contained setup; no manual runtime installs, manual DB setup, hidden `.env`, or secret-bearing examples |
+| auth | all demo credentials and roles, or exact `No authentication required` statement |
+| seed/demo data | seeded values and how to exercise them, or an empty-state statement |
+| mock/local/debug | truthful disclosure of mock, stub, local-data, or debug boundaries |
 ## 12. API Spec Handoff
 - API spec required: [yes/no]

package/assets/slopmachine/phase-2-execution-planning-prompt.md CHANGED Viewed

@@ -37,11 +37,11 @@ Create a practical implementation plan that can be translated into concise imple
 - Security work needs negative proof where relevant.
 - README/runtime/test obligations must be assigned.
 - Maintain a no-orphan ledger so no accepted requirement, clarification, API/interface, data object, actor path, security boundary, report/export/notification, README obligation, or test obligation disappears between design and implementation.
-- Include an API coverage matrix when APIs exist: endpoint/interface, expected implementation owner, true HTTP/API proof, mocked or unit-only exceptions, negative cases, and exact proof expectation.
+- Include an API coverage matrix when APIs exist: exact `METHOD + PATH` or interface, expected implementation owner, true no-mock HTTP/API proof where applicable, mocked or unit-only exceptions, negative cases, and exact proof expectation.
 - Include a FE-BE integration matrix for fullstack/backend-backed frontend work: frontend action, backend endpoint/service/job, payload/state input, response/side effect, UI states, and proof path.
 - Include a backend-to-frontend exposure check when backend capabilities exist: every prompt-relevant backend capability must have visible exposure or a specific accepted internal/API-only reason.
-- Include README/runtime gates: project type, startup/access, verification, auth/no-auth, seeded/empty-state, configuration/no-secret handling, test commands, known limitations, and mock/local-data disclosures.
-- Include coverage rigor: unit, API/interface, integration, E2E/platform, frontend component/state, security/negative, and final local verification expectations.
+- Include README/runtime gates that match the strict README audit: project type near the top, primary `docker compose up --build` for container-supported deliveries, legacy compatibility string `docker-compose up` without making it primary, startup/access, verification, auth/no-auth, all demo credentials/roles when auth exists, seeded values or empty-state statement, configuration/no-secret handling, no manual runtime installs or manual DB setup, test commands, known limitations, and mock/local-data/debug disclosures.
+- Include coverage rigor: 90%+ unit target for meaningful business logic, exact true no-mock HTTP/API endpoint tests, identifiable frontend unit tests that import/render real components/modules, fullstack FE-BE proof, E2E/platform proof, security/negative cases, and final local verification expectations.
 - Include module acceptance checks that prevent shell/demo completion: observable behavior, persisted state/artifact or UI/API outcome, relevant negative paths, tests, README impact, and integration evidence.
 ## Output Requirements
@@ -50,6 +50,8 @@ Use the provided plan template.
 `../.ai/plan.md` must be actionable enough to support one clean human implementation prompt at a time without exposing this file.
+The plan must be sequenced so the owner can prompt the developer/Claude lane one bounded slice at a time. Prefer one work package per prompt. Mark any pair that should be sent together only when two adjacent slices are tightly coupled and easier to verify together. Do not create a plan that requires sending all phases, all modules, or the full delivery workflow to the implementation lane at once.
 `../.ai/test-coverage.md`, when written, should summarize planned coverage by module, API/interface, UI flow, risk, and final verification need.
 If the design or API spec has a material contradiction, record it as a planning exception instead of silently rewriting the contract.

package/assets/slopmachine/phase-2-plan-template.md CHANGED Viewed

@@ -68,6 +68,8 @@ If APIs exist, every accepted endpoint/interface must have a row.
 |---|---|---|---|---|---|---|
 |  |  |  | yes/no |  |  |  |
+True HTTP/API proof means a test sends a request to the exact runtime route/interface and reaches the real handler/business logic without mocking transport, controllers, services, or providers used in the execution path. If this is not possible or not applicable, record the accepted exception and replacement proof.
 ## 7. Frontend / Interaction Execution Plan
 If not applicable, state `Not Applicable` with the accepted reason.
@@ -76,6 +78,8 @@ If not applicable, state `Not Applicable` with the accepted reason.
 |---|---|---|---|---|
 |  |  | loading / empty / submitting / disabled / success / error |  |  |
+Frontend unit tests must be directly detectable by the final audit: test files must use the project test framework and import or render actual frontend components/modules, not only backend utilities or package scripts.
 ## 7.1 FE-BE Integration Matrix
 Required for fullstack or backend-backed frontend work. If not applicable, state `Not Applicable` with the accepted reason.
@@ -101,11 +105,12 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
 ## 9. README / Runtime / Configuration Plan
 - README obligations:
-- Startup/access documentation:
+- Startup/access documentation, including primary `docker compose up --build` where container-supported and legacy compatibility string `docker-compose up` without making it primary:
 - Test documentation:
-- Auth/no-auth documentation:
-- Seed/demo data documentation:
-- Config and no-secret handling:
+- Auth/no-auth documentation, including all demo credentials and roles when auth exists or exact `No authentication required` statement:
+- Seed/demo data documentation, including every seeded value the reviewer should be able to exercise or an empty-state statement:
+- Config and no-secret handling, including no hidden `.env` dependency and no secret-bearing examples:
+- Environment constraints, including no manual runtime installs or manual DB setup for packaged verification:
 - Known limitations documentation:
 ## 9.1 README Gate Matrix
@@ -113,13 +118,13 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
 | README requirement | Expected content | Owning work package | Proof / review point |
 |---|---|---|---|
 | project type near top |  |  |  |
-| startup command |  |  |  |
+| startup command | primary `docker compose up --build` where applicable, plus legacy compatibility string `docker-compose up` not presented as primary |  |  |
 | access method |  |  |  |
 | verification method |  |  |  |
 | broad test command |  |  |  |
 | auth credentials or no-auth statement |  |  |  |
 | seeded data or empty-state statement |  |  |  |
-| configuration / no secrets / no env-file dependency |  |  |  |
+| configuration / no secrets / no env-file dependency / no manual installs |  |  |  |
 | mock/local-data/debug disclosure |  |  |  |
 | known limitations |  |  |  |
@@ -147,6 +152,8 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
 - Integration coverage expectation:
 - E2E/platform coverage expectation:
 - Frontend state/component coverage expectation:
+- Final browser/manual core-flow expectation:
+- README seeded-value/account verification expectation:
 - Known accepted exceptions:
 ## 11. Integration And Hardening Plan
@@ -174,9 +181,11 @@ For each module, acceptance requires:
 Translate these into human prompts; do not paste this section verbatim.
-| Sequence | Human prompt intent | Visible context to mention | Expected completion report |
-|---|---|---|---|
-| 1 |  |  |  |
+Each row should represent one bounded prompt by default. Mark a combined prompt only when two adjacent rows are tightly coupled and should be implemented/verified together. Never combine all phases, the full plan, or the whole delivery into one developer/Claude prompt.
+| Sequence | Human prompt intent | Visible context to mention | Expected completion report | Combine with adjacent row? |
+|---|---|---|---|---|
+| 1 |  |  |  | no |
 ## 13. Plan Closure Checklist

package/assets/slopmachine/templates/AGENTS.md CHANGED Viewed

@@ -42,6 +42,7 @@ This file contains product engineering rules for the current project.
 - Use `unit_tests/` for unit tests and `API_tests/` for API/integration HTTP tests when those surfaces exist.
 - Every implementation change should include tests for the behavior it owns. Target full meaningful coverage across unit, API/integration, and E2E/platform layers where those surfaces exist.
 - API/interface endpoints should have real positive and negative tests for exact behavior. User-facing flows should have E2E/platform coverage for the main journeys and important failure/recovery states.
+- API/interface tests should hit the real route/interface and real business logic without mocking transport/controllers/execution-path services unless there is a documented exception. Frontend unit tests should import or render real components/modules so coverage is directly reviewable.
 - Prefer the fastest meaningful targeted checks during ordinary implementation.
 - Never claim a command passed unless you actually ran it and saw the result.
 - If required verification cannot run in the current environment, report it as unverified with the exact risk.

package/assets/slopmachine/templates/CLAUDE.md CHANGED Viewed

@@ -42,6 +42,7 @@ This file contains product engineering rules for the current project.
 - Use `unit_tests/` for unit tests and `API_tests/` for API/integration HTTP tests when those surfaces exist.
 - Every implementation change should include tests for the behavior it owns. Target full meaningful coverage across unit, API/integration, and E2E/platform layers where those surfaces exist.
 - API/interface endpoints should have real positive and negative tests for exact behavior. User-facing flows should have E2E/platform coverage for the main journeys and important failure/recovery states.
+- API/interface tests should hit the real route/interface and real business logic without mocking transport/controllers/execution-path services unless there is a documented exception. Frontend unit tests should import or render real components/modules so coverage is directly reviewable.
 - Prefer the fastest meaningful targeted checks during ordinary implementation.
 - Never claim a command passed unless you actually ran it and saw the result.
 - If required verification cannot run in the current environment, report it as unverified with the exact risk.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "theslopmachine",
-  "version": "1.0.11",
+  "version": "1.0.13",
   "description": "SlopMachine installer and project bootstrap CLI",
   "license": "MIT",
   "type": "module",