theslopmachine 1.0.11 → 1.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -56,6 +56,8 @@ All communication, code comments, docs, tests, and user-facing strings you add m
56
56
  - Tests should prove behavior and side effects, not only existence or rendering.
57
57
  - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
58
58
  - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
59
+ - API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
60
+ - Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
59
61
  - Include negative and boundary coverage when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
60
62
  - For frontend work, test loading, empty, submitting, disabled, success, error, and re-entry states when those states are relevant.
61
63
  - For backend-backed frontend work, verify the frontend uses the real client/API path and the backend performs real handler/service/data work.
@@ -211,6 +211,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
211
211
  - Preserve reports, extract complete issue sets, and route fixes in broad human language.
212
212
  - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
213
213
  - Complete only when the coverage/README audit passes with at least 90% test score.
214
+ - Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active Claude lane before this phase closes.
214
215
 
215
216
  ### Phase 6: Final Readiness Decision
216
217
 
@@ -219,8 +220,9 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
219
220
  - Run final runtime and test checks appropriate to the project.
220
221
  - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
221
222
  - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
223
+ - Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
222
224
  - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active Claude lane in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
223
- - If the owner makes a direct safe fix, send a minimal note to the active Claude lane describing the changed surface and ask it to inspect/acknowledge before continuing.
225
+ - Route final reconciliation work to the active Claude lane whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active Claude lane describing the changed surface and ask it to inspect/acknowledge before continuing.
224
226
  - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
225
227
  - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
226
228
 
@@ -232,6 +234,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
232
234
  - Include only package docs: `docs/questions.md`, `docs/design.md`, and `docs/api-spec.md` when applicable.
233
235
  - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
234
236
  - Run final package boundary checks before closing.
237
+ - If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
235
238
 
236
239
  ### Phase 8: Retrospective
237
240
 
@@ -248,7 +251,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
248
251
  - API/integration HTTP tests belong under `API_tests/` where that convention exists.
249
252
  - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
250
253
  - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
251
- - README must truthfully document startup, tests, configuration, access, demo credentials or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, and known limitations.
254
+ - README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
252
255
 
253
256
  ## Evidence Discipline
254
257
 
@@ -178,6 +178,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
178
178
  - Preserve reports, extract complete issue sets, and route fixes in broad human language.
179
179
  - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
180
180
  - Complete only when the coverage/README audit passes with at least 90% test score.
181
+ - Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active lane before this phase closes.
181
182
 
182
183
  ### Phase 6: Final Readiness Decision
183
184
 
@@ -186,8 +187,9 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
186
187
  - Run final runtime and test checks appropriate to the project.
187
188
  - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
188
189
  - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
190
+ - Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
189
191
  - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active developer session in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
190
- - If the owner makes a direct safe fix, send a minimal note to the active developer session describing the changed surface and ask it to inspect/acknowledge before continuing.
192
+ - Route final reconciliation work to the active developer session whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active developer session describing the changed surface and ask it to inspect/acknowledge before continuing.
191
193
  - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
192
194
  - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
193
195
 
@@ -199,6 +201,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
199
201
  - Include only package docs: `docs/questions.md`, `docs/design.md`, and `docs/api-spec.md` when applicable.
200
202
  - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
201
203
  - Run final package boundary checks before closing.
204
+ - If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
202
205
 
203
206
  ### Phase 8: Retrospective
204
207
 
@@ -215,7 +218,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
215
218
  - API/integration HTTP tests belong under `API_tests/` where that convention exists.
216
219
  - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
217
220
  - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
218
- - README must truthfully document startup, tests, configuration, access, demo credentials or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, and known limitations.
221
+ - README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
219
222
 
220
223
  ## Evidence Discipline
221
224
 
@@ -42,6 +42,8 @@ All communication, code comments, docs, tests, and user-facing strings you add m
42
42
  - Tests must prove behavior and side effects, not only existence or rendering.
43
43
  - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
44
44
  - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
45
+ - API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
46
+ - Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
45
47
  - Cover negative and boundary paths when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
46
48
  - For frontend work, test loading, empty, submitting, disabled, success, error, and re-entry states when those states are relevant.
47
49
  - For backend-backed frontend work, verify the frontend uses the real client/API path and the backend performs real handler/service/data work.
@@ -50,7 +50,7 @@ Do not say `the review found`, `the evaluation found`, or `the audit found`. The
50
50
  ## Development Sequence
51
51
 
52
52
  1. **Scaffold first.**
53
- - Establish the framework/runtime/test/README baseline.
53
+ - Establish the framework/runtime/test/README baseline, including the strict README gates that final review will expect.
54
54
  - Keep it free of project-specific business logic except the minimum proof surface needed to verify the stack is wired.
55
55
  - Use `scaffold-guidance` and scaffold playbooks privately to shape the prompt.
56
56
 
@@ -100,11 +100,12 @@ For each scaffold/module, check:
100
100
  - no-orphan ledger items assigned to the module are closed
101
101
  - project-specific behavior is real, not placeholder/shell/demo-only behavior
102
102
  - tests exist for the implemented behavior or a concrete exception is recorded
103
- - planned API/interface proof is present when the module owns endpoints/interfaces
103
+ - planned API/interface proof is present when the module owns endpoints/interfaces, with true no-mock HTTP/API endpoint tests where applicable
104
+ - frontend unit tests are directly detectable and import/render real frontend components/modules when the module owns frontend behavior
104
105
  - planned FE-BE proof is present when the module crosses frontend/backend boundaries
105
106
  - failure, validation, authorization, ownership, empty, loading, error, and duplicate/re-entry cases are covered where relevant
106
107
  - frontend/backend wiring is real where applicable
107
- - README changes match delivered runtime, commands, auth/no-auth, seed/demo data, and verification behavior
108
+ - README changes match delivered runtime, commands, auth/no-auth, seed/demo data, verification behavior, mock/local/debug boundaries, and strict startup/access gates
108
109
  - targeted checks ran or were clearly blocked
109
110
 
110
111
  ## Internal Plan Alignment
@@ -154,9 +154,9 @@ After the new reconciliation lane is established:
154
154
  2. Send `test-coverage-prompt.md` verbatim.
155
155
  3. Require `./.tmp/test_coverage_and_readme_audit_report.md`.
156
156
  4. Read the generated report.
157
- 5. Require an overall Pass. Pass with caveats is acceptable.
157
+ 5. Require an overall Pass. Pass with caveats is acceptable only when caveats are explicit, bounded, non-blocking, and do not contradict README hard gates or required coverage surfaces.
158
158
  6. Require at least 90% test score.
159
- 7. If verdict is not Pass or test score is below 90%, extract all missing items and send them to the reconciliation lane in broad human language.
159
+ 7. If verdict is not Pass or Pass with acceptable caveats, if test score is below 90%, or if the report identifies README hard-gate failures, mocked endpoint coverage presented as true API coverage, missing frontend unit tests for web/fullstack, or missing FE-BE proof, extract all missing items and send them to the reconciliation lane in broad human language.
160
160
  8. After fixes, start a new evaluator session and send the same full verbatim test coverage/README prompt again.
161
161
  9. Repeat until verdict is Pass or Pass with caveats and test score is at least 90%.
162
162
 
@@ -52,6 +52,7 @@ Check:
52
52
  - README gate matrix
53
53
  - risk/negative coverage matrix
54
54
  - runtime/test/config consistency
55
+ - strict README gates: project type near the top, Docker/startup/access/verification commands, auth/no-auth, every documented demo credential/role and seeded value, mock/local/debug disclosure, and no hidden manual setup
55
56
  - security, authorization, ownership, validation, and data integrity
56
57
  - placeholder, shell, fake-success, disconnected UI, or static-demo behavior
57
58
 
@@ -117,6 +118,7 @@ Rules:
117
118
  - Use the startup commands and expected flows supplied at the end of development.
118
119
  - Verify the app starts locally when feasible.
119
120
  - Verify key expected flows manually/API-wise/platform-wise as appropriate.
121
+ - For browser-accessible apps, manually exercise representative core prompt requirements and every README-listed seeded/demo account, role, and seeded value where feasible. Record any unverified surface and route failures to the bugfix lane.
120
122
  - Run relevant unit/API/integration/E2E/platform checks locally when available.
121
123
  - If a command or local runtime check cannot run, record the exact blocker and risk.
122
124
  - Any issue found goes back to the bugfix lane in human language.
@@ -36,14 +36,16 @@ Use these D1-D9 buckets for major issue classification:
36
36
 
37
37
  - Run the broad product test wrapper `./repo/run_tests.sh` when it exists and is applicable.
38
38
  - Run final runtime verification before packaging: `docker compose up --build` for web/backend/fullstack/container-supported projects, native/platform-equivalent startup for mobile/desktop projects, or a recorded not-applicable reason.
39
- - Use `agent-browser` for manual functionality verification where browser-accessible UI exists.
40
- - Exercise every relevant seeded/demo account and role/state where credentials or seeded data exist.
39
+ - Use `agent-browser` for manual functionality verification where browser-accessible UI exists. The browser pass must walk the core prompt requirements and main user journeys, not just confirm that the app loads.
40
+ - Exercise every relevant seeded/demo account, role/state, and README-listed seeded value. Confirm that documented credentials, seeded records, examples, IDs, statuses, roles, permissions, and expected default states are present, usable, and consistent with README claims.
41
+ - For backend/API-only projects, replace browser checks with equivalent API/manual checks for every README-listed credential, seeded value, role/state, and core requirement.
41
42
  - If any final runtime, test, browser, account, or platform check cannot run, readiness cannot be `Pass` unless the user explicitly risk-accepts the unverified surface.
42
43
 
43
44
  ## Failure Routing Loop
44
45
 
45
46
  - Phase 6 is the primary green gate for broad Docker/runtime and `./repo/run_tests.sh` verification.
46
- - If `docker compose up --build`, native/platform startup, browser checks, account checks, or `./repo/run_tests.sh` fails, do not move to packaging.
47
+ - Final reconciliation work belongs in the currently active developer/Claude implementation lane whenever it is more than a tiny, safe owner-side edit. Route product behavior, tests, README/runtime drift, Docker/runtime failures, browser/account issues, and coverage gaps to that lane in broad human language.
48
+ - If `docker compose up --build`, native/platform startup, browser/API manual checks, account/seeded-value checks, or `./repo/run_tests.sh` fails, do not move to packaging.
47
49
  - Route the failure to the currently active developer/Claude implementation lane in broad human language: describe the failing behavior, command, and user-visible/runtime impact without exposing evaluator or owner-private mechanics.
48
50
  - After the lane reports a fix, the owner verifies the changed surface and reruns the failed check.
49
51
  - Repeat fix, verify, and rerun until the check is green, not applicable for a documented reason, or explicitly risk-accepted by the user.
@@ -51,7 +53,8 @@ Use these D1-D9 buckets for major issue classification:
51
53
 
52
54
  ## Owner Direct Fixes
53
55
 
54
- - The owner may directly fix narrow docs, wrapper, config, cleanup, or light glue issues when the fix is safe and does not require product-design judgment.
56
+ - The owner may directly fix only minor, safe docs, wrapper, config, cleanup, or light glue issues when the change does not require product-design judgment, new tests, behavioral changes, or non-trivial debugging.
57
+ - If the reconciliation issue is large enough to need real implementation work, meaningful test updates, runtime debugging, README/runtime restructuring, or product judgment, do not fix it owner-side. Send it to the currently active developer/Claude lane.
55
58
  - After any direct owner fix, send a minimal note to the currently active developer/Claude lane describing the changed surface and ask it to inspect/acknowledge the change before readiness continues.
56
59
  - The note should be concise and developer-facing, not a workflow report.
57
60
  - Still rerun the affected command or check after acknowledgement.
@@ -25,7 +25,8 @@ Accept `./docs/design.md` only if it:
25
25
  - defines modules as product/system responsibilities, not file-by-file work packets
26
26
  - handles auth, authorization, ownership/isolation, validation, logging/redaction, admin/debug boundaries, and sensitive data where relevant
27
27
  - defines frontend states and FE-BE expectations where relevant
28
- - visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, API/interface tests for every endpoint with positive and negative cases, and full E2E/platform coverage for main user journeys in user-facing apps
28
+ - visibly defines the testing contract in the design itself: 90%+ unit coverage target for meaningful business logic, true HTTP/API tests for every runtime endpoint with positive and negative cases, identifiable frontend unit tests that import/render real components/modules where a frontend exists, fullstack FE-BE proof, and full E2E/platform coverage for main user journeys in user-facing apps
29
+ - defines strict README/runtime obligations: project type near the top, primary `docker compose up --build` for container-supported deliveries, legacy compatibility string `docker-compose up` without making it primary, access and verification method, all auth/demo credentials and roles or exact `No authentication required`, seeded data values or empty-state statement, no manual runtime installs/manual DB setup/hidden `.env` dependency, mock/local/debug disclosures, and known limitations
29
30
  - gives explicit not-applicable reasons and replacement proof layers for any missing unit/API/E2E coverage surface
30
31
  - avoids vague placeholders such as `TBD`, `later`, `standard CRUD`, `normal auth`, or `basic tests` for correctness-critical behavior
31
32
 
@@ -55,11 +56,13 @@ Accept `../.ai/plan.md` only if it is strong enough for the owner to drive devel
55
56
  - security execution obligations
56
57
  - API/interface implementation and proof obligations
57
58
  - API coverage matrix when APIs exist, including true HTTP/API proof and exception rationale
59
+ - frontend unit-test detectability when frontend exists: direct test files, framework evidence, and imports/renders of real frontend components/modules
58
60
  - frontend state and integration obligations where applicable
59
61
  - FE-BE integration matrix and backend-to-frontend exposure check where applicable
60
62
  - README/runtime/test obligations
61
63
  - README gate matrix covering startup, access, verification, auth/no-auth, seeded/empty-state, config/no-secret handling, mock/local-data disclosure, and known limitations
62
- - test coverage map with unit/API/integration/E2E or platform-equivalent expectations
64
+ - README gate matrix covering strict audit requirements: project type near top, `docker compose up --build`, legacy `docker-compose up` string, startup, access, verification, auth/no-auth, seeded values/empty-state, config/no-secret/no-hidden-env/no-manual-install handling, mock/local/debug disclosure, and known limitations
65
+ - test coverage map with unit/API/integration/E2E or platform-equivalent expectations, including true no-mock HTTP/API endpoint proof where applicable
63
66
  - risk/negative coverage matrix for validation, authorization, ownership/isolation, empty/not-found, duplicate/conflict, re-entry, and sensitive-data leakage where relevant
64
67
  - final integrated verification and readiness preparation
65
68
 
@@ -45,7 +45,7 @@ Adjust the exact wording to the project. Do not over-format the message.
45
45
  - product repo root `./repo/run_tests.sh` when required by the project contract
46
46
  - runtime/Docker files when relevant, wired honestly for later verification
47
47
  - database/bootstrap/seed path when the product will require seeded data or persistent storage
48
- - README baseline with project type, stack, startup/access, verification, auth/no-auth, seeded/empty-state note, and repo layout
48
+ - README baseline with project type near the top, stack, primary startup/access command, legacy `docker-compose up` compatibility string where applicable, verification method, auth/no-auth, seeded/empty-state note, mock/local/debug disclosures, known limitations, and repo layout
49
49
  - no committed secrets, `.env`, `.env.example`, hidden host setup, no-op tests, or fake-success integration paths
50
50
 
51
51
  ## Scaffold Should Not Deliver
@@ -39,7 +39,8 @@ Packaging must reject or remove stale workflow notes and scratch execution artif
39
39
 
40
40
  ## Packaging Checks
41
41
 
42
- - Confirm README, scripts, config, routes, docs, tests, and runtime instructions agree.
42
+ - Confirm README, scripts, config, routes, docs, tests, browser/API manual evidence, and runtime instructions agree.
43
+ - Confirm every README-listed demo credential, role, seeded value, documented example, and expected default state was verified in the final runtime/browser/API pass or explicitly risk-accepted by the user.
43
44
  - Confirm kept evaluation reports remain immutable evidence under `.tmp`, and that failed/stale/superseded reports are archived unchanged outside final `.tmp`.
44
45
  - Confirm Claude/session handoff artifacts are outside the product package path.
45
46
  - Confirm task-root rulebooks/settings are stripped from the final submission package when the packaging flow requires a product-only handoff.
@@ -55,7 +56,9 @@ Packaging must reject or remove stale workflow notes and scratch execution artif
55
56
  ## Final Runtime And Test Confirmation
56
57
 
57
58
  - Phase 7 owns the final Docker/runtime confirmation and dockerized broad `./repo/run_tests.sh` confirmation when those commands are part of the delivered contract or when late fixes/packaging changes could affect runtime/test behavior.
59
+ - Phase 7 also owns final browser/API manual confirmation when late fixes, README edits, cleanup, package boundary changes, or seed/config changes could affect user-visible behavior or documented seeded values.
58
60
  - If `./repo/README.md` documents `docker compose up --build` or `./repo/run_tests.sh`, treat those as package contract commands, not aspirational notes.
61
+ - If `./repo/README.md` documents demo accounts, roles, seeded data, example IDs, default statuses, or verification flows, treat those as package contract values that must be exercised through `agent-browser` or API/platform-equivalent checks before closure.
59
62
  - Fix owner-side Docker/config/wrapper/README/docs/light-script glue directly when safe; route real product-code or test-file defects back through the appropriate developer fix lane before packaging closes.
60
63
  - Never imply unrun Docker, runtime, browser, native/platform, or broad test commands passed.
61
64
  - End Docker verification with project-specific cleanup unless the user explicitly wants containers left running.
@@ -99,6 +102,7 @@ Phase 7 can close only when:
99
102
  - final package structure satisfies the allowlist;
100
103
  - stale visible execution artifacts are absent;
101
104
  - README, docs, scripts, config, routes, tests, audit artifacts, and repo behavior no longer contradict one another;
105
+ - README-listed credentials, roles, seeded values, examples, default states, and verification flows have been exercised or explicitly risk-accepted;
102
106
  - runtime/test/package commands that were required have run or are explicitly risk-accepted by the user;
103
107
  - `.tmp` contains the final kept report set and no stale superseded reports;
104
108
  - session exports are complete and outside the package root;
@@ -215,10 +215,10 @@ Expected result:
215
215
  If `init_db.sh` is part of the standard test bootstrap, document that relationship clearly.
216
216
 
217
217
  ### Local verification harness
218
- - Document the separate local verification command(s) used for ordinary development and readiness checks.
218
+ - Document the separate local verification command(s) used for ordinary development and readiness checks only if they do not become required reviewer setup.
219
219
  - Make clear that these local verification commands are distinct from the dockerized `./repo/run_tests.sh` broad test path.
220
220
  - Use the real stack-native local suite for the chosen language/framework where applicable, for example Vitest, Jest, PHPUnit, pytest, go test, cargo test, or another framework-native equivalent.
221
- - If that local suite needs machine-level installation or setup, document that clearly in the local verification notes.
221
+ - Do not require reviewers to run manual installs or machine-level setup for the standard packaged verification path.
222
222
 
223
223
  ### Test entry points
224
224
  - Unit tests: `[command/path]`
@@ -18,7 +18,7 @@ Reject only for material defects that would mislead development, evaluation, or
18
18
  - [ ] `../.ai/plan.md` captures owner-private workstreams, module slices, tests, runtime rules, security obligations, and packaging checks.
19
19
  - [ ] `../.ai/plan.md` contains a no-orphan ledger mapping every accepted requirement, clarification, design trace row, API route, actor path, data object, security boundary, report/export/notification, and documentation obligation to a module/workstream and proof path.
20
20
  - [ ] `../.ai/plan.md` defines scaffold first, ordered module packets, owned files/tests, shared-file boundaries, FE<->BE/API proof, verification commands, completion checklist, and development-exit proof.
21
- - [ ] `../.ai/test-coverage.md` exists when meaningful coverage mapping is applicable and maps requirements/risks/API endpoints/frontend flows to planned tests, assertions, current status, and gaps.
21
+ - [ ] `../.ai/test-coverage.md` exists when meaningful coverage mapping is applicable and maps requirements/risks/API endpoints/frontend flows to planned tests, assertions, current status, and gaps, including true no-mock HTTP/API classification and frontend unit-test detectability where applicable.
22
22
  - [ ] Private plan slices can be translated into normal developer prompts.
23
23
  - [ ] Developer prompts do not ask workers to read private workflow files.
24
24
 
@@ -32,6 +32,7 @@ Reject only for material defects that would mislead development, evaluation, or
32
32
  - [ ] A separate stack-native local harness exists for development/Phase 4, or the missing harness is explicitly user risk-accepted.
33
33
  - [ ] Tests prove behavior and side effects, not only route existence, component existence, mocked client returns, or status codes detached from state/artifact effects.
34
34
  - [ ] Fullstack/backend-backed frontend flows have real FE<->BE proof, not only separate backend and frontend tests.
35
+ - [ ] Web/fullstack frontend unit tests are directly detectable and import/render real frontend components/modules.
35
36
 
36
37
  ## Development Completion
37
38
 
@@ -43,7 +44,7 @@ Reject only for material defects that would mislead development, evaluation, or
43
44
  ## Phase 4 And Phase 5
44
45
 
45
46
  - [ ] Phase 4 runs all available relevant tests except broad commands that require explicit user approval, asks the bugfix lane for verification guidance where useful, manually exercises relevant runtime/account surfaces, runs internal owner self-test cycles for issue discovery, and routes issues back to the bugfix lane.
46
- - [ ] Every provided seeded/demo account and every relevant role/state has been exercised or the unverified surface is explicitly user risk-accepted.
47
+ - [ ] Every provided seeded/demo account, README-listed seeded value, example ID/status, and every relevant role/state has been exercised or the unverified surface is explicitly user risk-accepted.
47
48
  - [ ] Phase 5 uses fresh evaluator sessions for full self-test audits, evaluator subagent only, full prompt packets verbatim, no rerun footer, immutable reports, one bugfix/fix-check lane for issues from both final self-test audits, and same-evaluator scoped fix-check only for kept Partial Pass reports.
48
49
  - [ ] Every evaluator finding and recommendation is fixed and verified or explicitly risk-accepted by the user before Phase 5 closes.
49
50
  - [ ] Coverage/README/final reconciliation uses a dedicated developer session after Phase 5 findings.
@@ -52,7 +53,7 @@ Reject only for material defects that would mislead development, evaluation, or
52
53
 
53
54
  - [ ] Final runtime verification has run before packaging, or the unrun surface is explicitly risk-accepted.
54
55
  - [ ] `./repo/run_tests.sh` has run where applicable before packaging, or the unrun surface is explicitly risk-accepted.
55
- - [ ] `agent-browser` manual functionality verification has run for browser-accessible UI before packaging, or the unrun surface is explicitly risk-accepted.
56
+ - [ ] `agent-browser` manual functionality verification has run through core prompt requirements, main user journeys, README-listed seeded values, demo credentials, and role/state behavior for browser-accessible UI before packaging, or the unrun surface is explicitly risk-accepted.
56
57
  - [ ] D1-D9 readiness categories are pass, not applicable, or explicitly risk-accepted.
57
58
  - [ ] Final docs, reports, repo state, and package-root expectations agree.
58
59
 
@@ -60,7 +61,7 @@ Reject only for material defects that would mislead development, evaluation, or
60
61
 
61
62
  - [ ] Final package root and docs allowlist are correct.
62
63
  - [ ] Workflow-private artifacts stay outside the product package.
63
- - [ ] README, scripts, config, tests, and runtime instructions agree.
64
+ - [ ] README, scripts, config, tests, runtime instructions, browser/API manual evidence, and seeded/demo values agree.
64
65
  - [ ] Stale workflow notes and scratch execution artifacts are absent from final package.
65
66
  - [ ] `.tmp` contains final kept audit/fix-check/coverage reports only, with stale failed/superseded reports archived outside the package.
66
67
  - [ ] `repo/docker-compose.yml`, `repo/run_tests.sh`, and `repo/init_db.sh` where applicable match README claims.
@@ -23,7 +23,8 @@ The design must:
23
23
  - preserve the original business goal and required user outcomes
24
24
  - incorporate accepted clarifications and requirements without narrowing them
25
25
  - identify the project type, stack, actors, roles, main flows, modules, data, UI/API surfaces, security boundaries, assumptions, and verification strategy
26
- - define the testing contract as part of the visible design: every API/interface endpoint must have positive and negative tests, unit coverage must target 90%+ for meaningful business logic, and user-facing applications must include full E2E/platform coverage for the main user journeys unless a surface is genuinely not applicable
26
+ - define the testing contract as part of the visible design: every API/interface endpoint must have positive and negative true HTTP/API tests where a runtime endpoint exists, unit coverage must target 90%+ for meaningful business logic, frontend unit tests must be identifiable and must import/render real frontend components where a frontend exists, fullstack/web apps must prove frontend-to-backend behavior, and user-facing applications must include full E2E/platform coverage for the main user journeys unless a surface is genuinely not applicable
27
+ - define README/runtime obligations that satisfy strict review: project type near the top, `docker compose up --build` as the primary startup command for container-supported deliveries, the legacy compatibility string `docker-compose up` without making it primary, access URL/port or platform launch method, verification method, auth/demo credentials for every role or the exact statement `No authentication required`, seeded data or empty-state statement, no manual runtime installs, no hidden `.env` dependency, mock/local/debug disclosures, and known limitations
27
28
  - make meaningful assumptions explicit
28
29
  - mark unresolved items only when a real decision is still needed
29
30
  - identify API/interface surfaces that should be captured in `./docs/api-spec.md`
@@ -88,6 +88,8 @@ Cover where relevant: authentication, route authorization, object authorization,
88
88
  - Configuration model:
89
89
  - Persistent storage:
90
90
  - Seed/demo data need:
91
+ - README startup/access expectation:
92
+ - README auth/seed expectation:
91
93
  - Background jobs or scheduled work:
92
94
  - External integrations:
93
95
 
@@ -96,8 +98,10 @@ Cover where relevant: authentication, route authorization, object authorization,
96
98
  This is a design-level strategy, not an execution checklist.
97
99
 
98
100
  Required testing contract:
99
- - All API/interface endpoints must have test coverage for successful behavior and important negative/error cases. If there is no API/interface surface, state `Not Applicable` with the reason.
101
+ - All API/interface endpoints must have true HTTP/API test coverage for successful behavior and important negative/error cases where a runtime endpoint exists. If a non-HTTP interface or accepted exception requires another proof layer, state the exception and replacement proof. If there is no API/interface surface, state `Not Applicable` with the reason.
100
102
  - Meaningful business logic must target 90%+ unit coverage. If a component cannot be unit-tested meaningfully, state the exception and the replacement proof layer.
103
+ - Frontend unit tests must be identifiable by file pattern/framework evidence and must import or render real frontend components/modules when a frontend exists.
104
+ - Fullstack or backend-backed frontend work must include proof that real frontend actions reach the intended backend/service behavior.
101
105
  - User-facing applications must have full E2E/platform coverage for the main user journeys, including success, validation/failure, and recovery states. If E2E/platform testing is not applicable, state why and what proof replaces it.
102
106
 
103
107
  | Surface / risk | Expected proof layer | Notes |
@@ -107,10 +111,24 @@ Required testing contract:
107
111
  | security boundaries | | |
108
112
  | API/interface behavior | endpoint tests for every endpoint, including positive and negative cases | |
109
113
  | UI states / interactions | | |
110
- | integration paths | | |
114
+ | frontend-to-backend integration paths | | |
111
115
  | unit coverage | 90%+ meaningful business-logic coverage | |
116
+ | frontend unit/component tests | identifiable tests importing/rendering real components/modules | |
112
117
  | E2E/platform journeys | full main-journey coverage for user-facing apps | |
113
118
 
119
+ ## 11.1 README / Runtime Gate Strategy
120
+
121
+ | README/runtime gate | Required design outcome |
122
+ |---|---|
123
+ | project type | `backend`, `fullstack`, `web`, `android`, `ios`, or `desktop` near the top of README |
124
+ | startup | primary `docker compose up --build` for container-supported deliveries; include legacy compatibility string `docker-compose up` without making it primary |
125
+ | access | URL + port, emulator/device steps, or desktop launch steps |
126
+ | verification | concrete API/UI/mobile/desktop verification method |
127
+ | environment | Docker-contained or platform-contained setup; no manual runtime installs, manual DB setup, hidden `.env`, or secret-bearing examples |
128
+ | auth | all demo credentials and roles, or exact `No authentication required` statement |
129
+ | seed/demo data | seeded values and how to exercise them, or an empty-state statement |
130
+ | mock/local/debug | truthful disclosure of mock, stub, local-data, or debug boundaries |
131
+
114
132
  ## 12. API Spec Handoff
115
133
 
116
134
  - API spec required: [yes/no]
@@ -37,11 +37,11 @@ Create a practical implementation plan that can be translated into concise imple
37
37
  - Security work needs negative proof where relevant.
38
38
  - README/runtime/test obligations must be assigned.
39
39
  - Maintain a no-orphan ledger so no accepted requirement, clarification, API/interface, data object, actor path, security boundary, report/export/notification, README obligation, or test obligation disappears between design and implementation.
40
- - Include an API coverage matrix when APIs exist: endpoint/interface, expected implementation owner, true HTTP/API proof, mocked or unit-only exceptions, negative cases, and exact proof expectation.
40
+ - Include an API coverage matrix when APIs exist: exact `METHOD + PATH` or interface, expected implementation owner, true no-mock HTTP/API proof where applicable, mocked or unit-only exceptions, negative cases, and exact proof expectation.
41
41
  - Include a FE-BE integration matrix for fullstack/backend-backed frontend work: frontend action, backend endpoint/service/job, payload/state input, response/side effect, UI states, and proof path.
42
42
  - Include a backend-to-frontend exposure check when backend capabilities exist: every prompt-relevant backend capability must have visible exposure or a specific accepted internal/API-only reason.
43
- - Include README/runtime gates: project type, startup/access, verification, auth/no-auth, seeded/empty-state, configuration/no-secret handling, test commands, known limitations, and mock/local-data disclosures.
44
- - Include coverage rigor: unit, API/interface, integration, E2E/platform, frontend component/state, security/negative, and final local verification expectations.
43
+ - Include README/runtime gates that match the strict README audit: project type near the top, primary `docker compose up --build` for container-supported deliveries, legacy compatibility string `docker-compose up` without making it primary, startup/access, verification, auth/no-auth, all demo credentials/roles when auth exists, seeded values or empty-state statement, configuration/no-secret handling, no manual runtime installs or manual DB setup, test commands, known limitations, and mock/local-data/debug disclosures.
44
+ - Include coverage rigor: 90%+ unit target for meaningful business logic, exact true no-mock HTTP/API endpoint tests, identifiable frontend unit tests that import/render real components/modules, fullstack FE-BE proof, E2E/platform proof, security/negative cases, and final local verification expectations.
45
45
  - Include module acceptance checks that prevent shell/demo completion: observable behavior, persisted state/artifact or UI/API outcome, relevant negative paths, tests, README impact, and integration evidence.
46
46
 
47
47
  ## Output Requirements
@@ -68,6 +68,8 @@ If APIs exist, every accepted endpoint/interface must have a row.
68
68
  |---|---|---|---|---|---|---|
69
69
  | | | | yes/no | | | |
70
70
 
71
+ True HTTP/API proof means a test sends a request to the exact runtime route/interface and reaches the real handler/business logic without mocking transport, controllers, services, or providers used in the execution path. If this is not possible or not applicable, record the accepted exception and replacement proof.
72
+
71
73
  ## 7. Frontend / Interaction Execution Plan
72
74
 
73
75
  If not applicable, state `Not Applicable` with the accepted reason.
@@ -76,6 +78,8 @@ If not applicable, state `Not Applicable` with the accepted reason.
76
78
  |---|---|---|---|---|
77
79
  | | | loading / empty / submitting / disabled / success / error | | |
78
80
 
81
+ Frontend unit tests must be directly detectable by the final audit: test files must use the project test framework and import or render actual frontend components/modules, not only backend utilities or package scripts.
82
+
79
83
  ## 7.1 FE-BE Integration Matrix
80
84
 
81
85
  Required for fullstack or backend-backed frontend work. If not applicable, state `Not Applicable` with the accepted reason.
@@ -101,11 +105,12 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
101
105
  ## 9. README / Runtime / Configuration Plan
102
106
 
103
107
  - README obligations:
104
- - Startup/access documentation:
108
+ - Startup/access documentation, including primary `docker compose up --build` where container-supported and legacy compatibility string `docker-compose up` without making it primary:
105
109
  - Test documentation:
106
- - Auth/no-auth documentation:
107
- - Seed/demo data documentation:
108
- - Config and no-secret handling:
110
+ - Auth/no-auth documentation, including all demo credentials and roles when auth exists or exact `No authentication required` statement:
111
+ - Seed/demo data documentation, including every seeded value the reviewer should be able to exercise or an empty-state statement:
112
+ - Config and no-secret handling, including no hidden `.env` dependency and no secret-bearing examples:
113
+ - Environment constraints, including no manual runtime installs or manual DB setup for packaged verification:
109
114
  - Known limitations documentation:
110
115
 
111
116
  ## 9.1 README Gate Matrix
@@ -113,13 +118,13 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
113
118
  | README requirement | Expected content | Owning work package | Proof / review point |
114
119
  |---|---|---|---|
115
120
  | project type near top | | | |
116
- | startup command | | | |
121
+ | startup command | primary `docker compose up --build` where applicable, plus legacy compatibility string `docker-compose up` not presented as primary | | |
117
122
  | access method | | | |
118
123
  | verification method | | | |
119
124
  | broad test command | | | |
120
125
  | auth credentials or no-auth statement | | | |
121
126
  | seeded data or empty-state statement | | | |
122
- | configuration / no secrets / no env-file dependency | | | |
127
+ | configuration / no secrets / no env-file dependency / no manual installs | | | |
123
128
  | mock/local-data/debug disclosure | | | |
124
129
  | known limitations | | | |
125
130
 
@@ -147,6 +152,8 @@ Required when backend capabilities exist. If not applicable, state `Not Applicab
147
152
  - Integration coverage expectation:
148
153
  - E2E/platform coverage expectation:
149
154
  - Frontend state/component coverage expectation:
155
+ - Final browser/manual core-flow expectation:
156
+ - README seeded-value/account verification expectation:
150
157
  - Known accepted exceptions:
151
158
 
152
159
  ## 11. Integration And Hardening Plan
@@ -42,6 +42,7 @@ This file contains product engineering rules for the current project.
42
42
  - Use `unit_tests/` for unit tests and `API_tests/` for API/integration HTTP tests when those surfaces exist.
43
43
  - Every implementation change should include tests for the behavior it owns. Target full meaningful coverage across unit, API/integration, and E2E/platform layers where those surfaces exist.
44
44
  - API/interface endpoints should have real positive and negative tests for exact behavior. User-facing flows should have E2E/platform coverage for the main journeys and important failure/recovery states.
45
+ - API/interface tests should hit the real route/interface and real business logic without mocking transport/controllers/execution-path services unless there is a documented exception. Frontend unit tests should import or render real components/modules so coverage is directly reviewable.
45
46
  - Prefer the fastest meaningful targeted checks during ordinary implementation.
46
47
  - Never claim a command passed unless you actually ran it and saw the result.
47
48
  - If required verification cannot run in the current environment, report it as unverified with the exact risk.
@@ -42,6 +42,7 @@ This file contains product engineering rules for the current project.
42
42
  - Use `unit_tests/` for unit tests and `API_tests/` for API/integration HTTP tests when those surfaces exist.
43
43
  - Every implementation change should include tests for the behavior it owns. Target full meaningful coverage across unit, API/integration, and E2E/platform layers where those surfaces exist.
44
44
  - API/interface endpoints should have real positive and negative tests for exact behavior. User-facing flows should have E2E/platform coverage for the main journeys and important failure/recovery states.
45
+ - API/interface tests should hit the real route/interface and real business logic without mocking transport/controllers/execution-path services unless there is a documented exception. Frontend unit tests should import or render real components/modules so coverage is directly reviewable.
45
46
  - Prefer the fastest meaningful targeted checks during ordinary implementation.
46
47
  - Never claim a command passed unless you actually ran it and saw the result.
47
48
  - If required verification cannot run in the current environment, report it as unverified with the exact risk.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "theslopmachine",
3
- "version": "1.0.11",
3
+ "version": "1.0.12",
4
4
  "description": "SlopMachine installer and project bootstrap CLI",
5
5
  "license": "MIT",
6
6
  "type": "module",