theslopmachine 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/MANUAL.md +21 -6
  2. package/README.md +55 -7
  3. package/RELEASE.md +15 -0
  4. package/assets/agents/developer.md +41 -1
  5. package/assets/agents/slopmachine-claude.md +100 -60
  6. package/assets/agents/slopmachine.md +40 -17
  7. package/assets/claude/agents/developer.md +42 -5
  8. package/assets/skills/clarification-gate/SKILL.md +25 -5
  9. package/assets/skills/claude-worker-management/SKILL.md +280 -57
  10. package/assets/skills/developer-session-lifecycle/SKILL.md +81 -37
  11. package/assets/skills/development-guidance/SKILL.md +21 -1
  12. package/assets/skills/evaluation-triage/SKILL.md +32 -23
  13. package/assets/skills/final-evaluation-orchestration/SKILL.md +86 -50
  14. package/assets/skills/hardening-gate/SKILL.md +17 -3
  15. package/assets/skills/integrated-verification/SKILL.md +3 -3
  16. package/assets/skills/planning-gate/SKILL.md +32 -3
  17. package/assets/skills/planning-guidance/SKILL.md +72 -13
  18. package/assets/skills/retrospective-analysis/SKILL.md +2 -2
  19. package/assets/skills/scaffold-guidance/SKILL.md +129 -124
  20. package/assets/skills/submission-packaging/SKILL.md +33 -27
  21. package/assets/skills/verification-gates/SKILL.md +44 -14
  22. package/assets/slopmachine/backend-evaluation-prompt.md +1 -1
  23. package/assets/slopmachine/frontend-evaluation-prompt.md +5 -5
  24. package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +81 -0
  25. package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +191 -0
  26. package/assets/slopmachine/scaffold-playbooks/android-native-java.md +203 -0
  27. package/assets/slopmachine/scaffold-playbooks/angular-default.md +181 -0
  28. package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +142 -0
  29. package/assets/slopmachine/scaffold-playbooks/backend-family-matrix.md +80 -0
  30. package/assets/slopmachine/scaffold-playbooks/database-module-matrix.md +80 -0
  31. package/assets/slopmachine/scaffold-playbooks/django-default.md +166 -0
  32. package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +189 -0
  33. package/assets/slopmachine/scaffold-playbooks/docker-shared-contract.md +334 -0
  34. package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +124 -0
  35. package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +73 -0
  36. package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +134 -0
  37. package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +160 -0
  38. package/assets/slopmachine/scaffold-playbooks/frontend-family-matrix.md +134 -0
  39. package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +136 -0
  40. package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +160 -0
  41. package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +93 -0
  42. package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +151 -0
  43. package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +188 -0
  44. package/assets/slopmachine/scaffold-playbooks/laravel-default.md +216 -0
  45. package/assets/slopmachine/scaffold-playbooks/livewire-default.md +265 -0
  46. package/assets/slopmachine/scaffold-playbooks/overlay-module-matrix.md +130 -0
  47. package/assets/slopmachine/scaffold-playbooks/platform-family-matrix.md +79 -0
  48. package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +72 -0
  49. package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +182 -0
  50. package/assets/slopmachine/scaffold-playbooks/tauri-default.md +80 -0
  51. package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +162 -0
  52. package/assets/slopmachine/scaffold-playbooks/web-default.md +96 -0
  53. package/assets/slopmachine/templates/AGENTS.md +41 -3
  54. package/assets/slopmachine/templates/CLAUDE.md +111 -0
  55. package/assets/slopmachine/utils/claude_create_session.mjs +1 -0
  56. package/assets/slopmachine/utils/claude_live_channel.mjs +188 -0
  57. package/assets/slopmachine/utils/claude_live_common.mjs +406 -0
  58. package/assets/slopmachine/utils/claude_live_hook.py +47 -0
  59. package/assets/slopmachine/utils/claude_live_launch.mjs +181 -0
  60. package/assets/slopmachine/utils/claude_live_status.mjs +25 -0
  61. package/assets/slopmachine/utils/claude_live_stop.mjs +45 -0
  62. package/assets/slopmachine/utils/claude_live_turn.mjs +250 -0
  63. package/assets/slopmachine/utils/claude_resume_session.mjs +1 -0
  64. package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.mjs +23 -0
  65. package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.sh +5 -0
  66. package/assets/slopmachine/utils/claude_worker_common.mjs +224 -4
  67. package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +4 -0
  68. package/assets/slopmachine/utils/export_ai_session.mjs +1 -1
  69. package/assets/slopmachine/utils/normalize_claude_session.py +153 -0
  70. package/assets/slopmachine/utils/package_claude_session.mjs +96 -0
  71. package/assets/slopmachine/utils/prepare_strict_audit_workspace.mjs +65 -0
  72. package/package.json +1 -1
  73. package/src/constants.js +42 -3
  74. package/src/init.js +173 -28
  75. package/src/install.js +75 -0
  76. package/src/send-data.js +56 -57
@@ -37,9 +37,9 @@ Once a failure class is known:
37
37
  - verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
38
38
  - for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
39
39
  - for mobile and desktop work, run the selected stack's platform-appropriate UI/E2E coverage for major flows and review screenshots or equivalent artifacts for real UI behavior and regressions
40
- - for Electron or other Linux-targetable desktop work, use the Dockerized desktop build/test path plus headless UI/runtime verification through Xvfb or an equivalent Linux-capable harness
41
- - for Android work, use the Dockerized Android build/test path without requiring an emulator
42
- - for iOS-targeted work on Linux, rely on `./run_tests.sh`, portable test evidence, and static review evidence honestly; do not claim native iOS runtime verification unless a real macOS/Xcode checkpoint exists
40
+ - for Electron or other Linux-targetable desktop work, use `docker compose up --build` plus the Dockerized desktop build/test path and headless UI/runtime verification through Xvfb or an equivalent Linux-capable harness
41
+ - for Android work, use `docker compose up --build` plus the Dockerized Android build/test path without requiring an emulator
42
+ - for iOS-targeted work on Linux, use `docker compose up --build` plus `./run_tests.sh`, portable test evidence, and static review evidence honestly; do not claim native iOS runtime verification unless a real macOS/Xcode checkpoint exists
43
43
  - end-to-end coverage must use the real intended user-facing or admin-facing surfaces for the flow; if the flow cannot be exercised that way, treat the missing surface as incomplete work
44
44
  - verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
45
45
  - verify 401, 403, 404, conflict or duplicate-submission, object-authorization, tenant or user-isolation, and sensitive-log-exposure paths where those risks exist
@@ -37,11 +37,19 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
37
37
 
38
38
  - the developer should produce the first in-depth technical plan
39
39
  - once accepted, the plan should be detailed and section-addressable enough that later owner prompts can stay short and point the developer back to the relevant accepted section instead of re-dumping the implementation contract
40
+ - reject planning that stays at a high level when the project obviously needs deeper domain rules, lifecycle rules, permission rules, or verification detail to avoid later ambiguity
40
41
  - do not create deep execution sub-items before the technical plan is accepted
41
42
  - do not accept planning that reduces, weakens, narrows, or silently reinterprets the original prompt
42
43
  - do not accept convenience-based narrowing, including unauthorized `v1` simplifications, deferred workflows, reduced actor/role models, weaker enforcement, or omitted operator/admin surfaces
43
44
  - declare prompt-critical planning acceptance criteria before accepting the first planning pass when those criteria are already visible from the prompt
44
45
  - require relevant cross-cutting system contracts to be explicitly planned rather than left to per-module invention
46
+ - when the project has multiple major modules or workstreams, require the plan to distinguish shared prerequisites from the 2 or 3 main work packages that can later proceed in parallel, including where the merge points and no-parallel boundaries really are
47
+
48
+ ## Owner planning-demand rule
49
+
50
+ - when correcting or evaluating planning, reference the specific plan sections that matter and state an explicit planning-exit checklist rather than a generic request for `more detail`
51
+ - the planning-exit checklist should say exactly which sections must be deeper, what kind of specificity is missing, and what later-phase risk that missing detail would create
52
+ - demand enough planning detail that scaffold and development can mostly execute by following the accepted plan instead of inventing critical structure later
45
53
 
46
54
  ## Unauthorized narrowing rule
47
55
 
@@ -69,10 +77,19 @@ Examples of rejection-worthy narrowing include:
69
77
  - when `../docs/test-coverage.md` is relevant, require it to be structured as explicit requirement or risk mappings rather than generic narrative
70
78
  - review `../docs/test-coverage.md` only after `README.md` and `../docs/design.md` establish the claimed scope; then use the coverage doc to verify prompt-critical risks concretely instead of rereading unrelated planning docs
71
79
  - require the accepted plan to cover system overview, architecture reasoning, major modules or chunks, domain model, data model where relevant, interface contracts, failure paths, state transitions, logging strategy, testing strategy, README implications, and Docker execution assumptions when those dimensions apply
80
+ - require the accepted plan to have section-addressable coverage for authoritative tech stack summary, product overview, explicit out-of-scope items, actors and roles, actor success paths, authoritative business rules, permissions, validation, security/compliance expectations, non-functional requirements, implementation checkpoints, definition of done, and deliverables when those dimensions matter
81
+ - require the accepted plan to define the final README contract when the post-bugfix README audit will apply, including project-type declaration, startup instructions, access method, verification method, and demo-credentials or explicit no-auth disclosure
82
+ - require the accepted plan to explain how fast local iteration may be used during development without leaking local-only setup assumptions into the final delivered Docker-contained runtime and test contract
83
+ - reject planning that uses placeholder language such as `TBD`, `later`, `as needed`, `standard CRUD`, `normal auth`, or similarly vague stand-ins where concrete implementation guidance is expected
84
+ - reject planning that leaves large sections effectively empty or only restates the prompt without making forward-looking engineering decisions
72
85
 
73
86
  ## Cross-cutting planning requirements
74
87
 
75
88
  - require shared lifecycle and state models to be aligned across planning artifacts when the product has meaningful workflow state
89
+ - require actor-to-surface coverage to be explicit enough that each important persona has a real path to success and is not accidentally dropped in implementation
90
+ - require authoritative business-rule coverage when the prompt implies formulas, thresholds, limits, conflict rules, uniqueness, retries, reversals, or ownership constraints
91
+ - for backend or fullstack APIs, require an endpoint-inventory and API-test strategy that distinguishes true no-mock HTTP coverage from HTTP-with-mocking and unit-only or indirect coverage
92
+ - for backend or fullstack APIs, require the plan to call out how important endpoints will be exercised through the real HTTP layer and how important modules not yet tested will be made visible in `../docs/test-coverage.md`
76
93
  - require explicit cross-cutting system contracts when relevant, especially:
77
94
  - error normalization and user-visible error behavior
78
95
  - audit/logging and redaction patterns
@@ -81,7 +98,7 @@ Examples of rejection-worthy narrowing include:
81
98
  - auth/session edge cases such as expiry, refresh, or clock skew tolerance
82
99
  - when the prompt says behavior is configurable, require the real configuration surface, permissions, operator flow, and backend support to be planned explicitly
83
100
  - when a feature must be admin-manageable or operator-manageable, require a real usable UI surface for that management flow, not just API endpoints or data-model notes
84
- - for web projects, require Docker-first runtime planning unless the prompt or existing repository clearly dictates otherwise
101
+ - for web projects, require `docker compose up --build` as the runtime contract
85
102
  - for Dockerized web projects, require a concrete dev-only runtime bootstrap script plan so `docker compose up --build` works without user exports or `.env`
86
103
  - do not accept Dockerized web planning that depends on manual `export ...` steps, checked-in env files, or hardcoded runtime values for startup
87
104
  - do not accept Dockerized web planning where `./run_tests.sh` uses a different secret/bootstrap model than `docker compose up --build`
@@ -95,6 +112,7 @@ Examples of rejection-worthy narrowing include:
95
112
  - do not accept planning that lets a mock-only or local-data-only project look like undisclosed real integration delivery
96
113
  - do not accept planning that hides missing failure handling behind fake-success branches
97
114
  - when the project has meaningful auth or access control, require a static security-boundary inventory in planning artifacts covering auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug surfaces, and tenant or user isolation rules when applicable
115
+ - for Android, mobile, desktop, and iOS-targeted projects, require planning for a meaningful `docker compose up --build` command plus containerized `./run_tests.sh` even when platform-specific runtime proof differs from web semantics
98
116
  - require README disclosure planning for feature flags, debug or demo surfaces, default enabled states, and mock or interception defaults whenever they exist
99
117
  - require traceability planning for build, preview, configuration, app entry points, route registration, module boundaries, and test entry points through `README.md` plus external references rather than additional in-repo docs
100
118
  - require logging and validation contracts to be planned concretely enough for static review through code, `README.md`, and external docs when needed
@@ -111,7 +129,12 @@ Examples of rejection-worthy narrowing include:
111
129
  - scope is still prompt-faithful
112
130
  - the plan has explicitly addressed prompt-fit risks and requirement drift
113
131
  - no unauthorized convenience-based narrowing or `v1` simplification has been introduced
132
+ - explicit out-of-scope items are documented tightly enough to prevent speculative overbuilding without shrinking the prompt
133
+ - actor and role coverage is explicit enough that each main persona has a real path to success
114
134
  - major user-facing flows are mapped to backend support and verification targets
135
+ - authoritative business rules, defaults, limits, transitions, and ownership rules are explicit enough that implementation will not invent them ad hoc later
136
+ - unresolved items are narrow, explicit, and few enough that they will not force broad replanning during scaffold or early implementation
137
+ - security, compliance, reliability, reporting, and non-functional requirements are explicit where the prompt or product shape makes them material
115
138
  - security-critical areas are planned early enough that they will not be left to accidental late cleanup
116
139
  - test sufficiency has been considered at the level of core happy path, major failure paths, security-critical paths, and obvious high-risk boundaries
117
140
  - the plan explicitly defines module-level responsibilities, flows, boundaries, and completion tests before implementation
@@ -122,7 +145,10 @@ Examples of rejection-worthy narrowing include:
122
145
  - backend or fullstack plans explicitly cover 401, 403, 404, conflict or duplicate submission when relevant, object-level authorization, tenant or user isolation, and sensitive-log exposure in the coverage plan
123
146
  - frontend-bearing plans explicitly cover the required state model for major flows, including loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
124
147
  - frontend-bearing plans explicitly include component, page or route integration, and E2E coverage where applicable; non-trivial frontend plans explicitly include component, page, route, or state-focused test coverage where UI state complexity is meaningful rather than relying only on E2E or runtime confidence
125
- - the coverage plan is strong enough to reach at least 90 percent meaningful coverage of the relevant behavior surface
148
+ - the coverage plan names a concrete measurement path and is strong enough to achieve and prove a minimum 90 percent test coverage threshold for the delivered behavior surface
149
+ - when backend or fullstack APIs exist, the coverage plan includes a resolved endpoint inventory, API test mapping strategy, and explicit mock-classification strategy rather than only high-level risk prose
150
+ - the README plan is specific enough to satisfy the strict README audit without contradicting the real canonical runtime and test contract
151
+ - the plan includes a cleanup step that removes local-iteration dependency traces before development closes and before hardening judges the final Docker-contained delivery
126
152
  - major engineering quality has been addressed through maintainable boundaries, clear decomposition, and shared contracts
127
153
  - frontend route, page, component, and state boundaries are planned when the UI is material
128
154
  - configurable behaviors are concretely planned where the prompt requires configurability
@@ -133,9 +159,12 @@ Examples of rejection-worthy narrowing include:
133
159
  - static review readiness is explicitly planned, including how a fresh reviewer can trace entry points, routes, config, test commands, and any mock or local-data boundaries from `README.md` plus the code inside the repo, while the owner maintains fuller external references under `../docs/`
134
160
  - static security-boundary readiness is explicitly planned in docs or code structure where applicable
135
161
  - the repo remains self-sufficient with `README.md` as its only documentation file; external docs under `../docs/` may exist as references, but the repo must not depend on them
136
- - web projects default to Docker-first runtime planning unless a prompt-faithful exception is clearly justified
162
+ - web projects require Docker runtime planning
137
163
  - relevant cross-cutting system contracts are explicitly defined rather than left to per-module invention
164
+ - implementation checkpoints and a hard definition of done are explicit enough to block fake-complete or scaffold-only acceptance
165
+ - the plan has done a real look-ahead sweep across scaffold, implementation, integrated verification, hardening, evaluation, and packaging concerns instead of treating those as future rediscovery work
138
166
  - each major module has a clear integration contract with existing modules and shared patterns
167
+ - when the project is large enough to benefit from it, the plan makes dependency order and safe parallel work packages explicit enough that execution can parallelize independent items without colliding on unstable shared foundations
139
168
  - verification plans include cross-module seam checks, not just isolated feature tests
140
169
  - visible mismatches are corrected or explicitly dispositioned
141
170
  - planning comments and artifacts reflect current policy truth
@@ -34,6 +34,7 @@ The goal is to reduce late audit failures by designing for these concerns up fro
34
34
  - planning should make the delivered repo statically reviewable by a fresh reviewer through `README.md`, entry points, config shape, tests, and visible module boundaries rather than depending on runtime tribal knowledge
35
35
  - keep `README.md` as the only normal documentation file inside the repo
36
36
  - do not create or rely on additional documentation files inside `repo/` beyond `README.md` unless the user explicitly asks for them
37
+ - if explicit assumptions or dispositions must be recorded, keep them in the owner-maintained external planning docs rather than creating a repo-local `ASSUMPTIONS.md` by default
37
38
  - keep the owner-maintained external doc set under `../docs/` current when relevant, especially:
38
39
  - `../docs/design.md`
39
40
  - `../docs/api-spec.md`
@@ -61,6 +62,8 @@ Selected-stack defaults:
61
62
  - start from the actual project prompt and build the plan from there
62
63
  - carry the settled project requirements forward consistently as you plan
63
64
  - make the accepted plan durable enough to serve as the primary execution contract for later scaffold and development prompts instead of forcing the owner to restate the same implementation context repeatedly
65
+ - prefer over-specifying important implementation details in planning rather than deferring them to later invention during coding
66
+ - treat the planning package as an execution spec, not a sketch; almost every later-critical decision should be made now unless there is a strong reason it truly cannot be
64
67
  - identify the hard non-negotiable requirements early and do not quietly trade them away for implementation convenience
65
68
  - explicitly check that the plan still fits the business goal, main flows, and implicit constraints from the prompt
66
69
  - when planning technical items that depend on a library, framework, API, or tool, check Context7 documentation first for authoritative usage details
@@ -71,7 +74,34 @@ Selected-stack defaults:
71
74
  - make the planning explicit enough that the owner can maintain external design notes and API/spec docs accurately when relevant
72
75
  - keep the spec focused on required behavior rather than turning it into a progress or completion narrative
73
76
  - make the plan include system overview, architecture choice and reasoning, major modules or chunks, domain model, data model where relevant, interface contracts, failure paths, state transitions, logging strategy, testing strategy, README implications, and Docker execution assumptions when those dimensions apply
77
+ - make the plan explicitly account for the final post-bugfix coverage and README audit contract so hardening is not surprised later
78
+ - identify shared prerequisites and the 2 or 3 biggest work packages that could later proceed in parallel once those prerequisites are settled when the project is large enough for that distinction to matter
79
+ - define which planned work must stay serial because of shared contracts or overlapping files, and which work can safely branch in parallel with a clear merge point
80
+ - make parallel work packages explicit enough that later owner prompts can ask for parallel execution without re-inventing the branch boundaries
81
+ - make the accepted planning package explicitly section-addressable and execution-grade, with clear headings for at least:
82
+ - authoritative tech stack summary
83
+ - product overview
84
+ - in-scope domains or modules
85
+ - explicit out-of-scope items
86
+ - actors and roles
87
+ - actor-specific path-to-success summaries for the main workflows
88
+ - authoritative business rules
89
+ - state machines or lifecycle rules when workflow state matters
90
+ - permissions and authorization model
91
+ - validation rules
92
+ - security, compliance, and data-governance requirements
93
+ - offline, queueing, reliability, and background-job behavior when relevant
94
+ - reporting, analytics, search, indexing, import, or export behavior when relevant
95
+ - non-functional requirements
96
+ - implementation phases and checkpoints
97
+ - definition of done
98
+ - concrete deliverables
99
+ - explicit assumptions or dispositions when safe defaults had to be locked
100
+ - keep unresolved items rare; if something really cannot be decided yet, isolate it in a small explicit unresolved-items section with the reason it is still open and what evidence or decision is needed to close it
101
+ - do not leave major module boundaries, API shapes, business rules, state transitions, security boundaries, or verification criteria as vague future implementation work
102
+ - use tables, bullet lists, and explicit subsections so the plan is dense, skimmable, and hard to misread
74
103
  - keep the primary planning package concentrated in parent-root `../docs/design.md`
104
+ - make `../docs/design.md` the authoritative detailed plan, not a high-level narrative summary
75
105
  - organize the accepted plan so later slices can reference concrete sections cleanly instead of requiring the owner to rewrite the plan in follow-up prompts
76
106
  - put the risk-to-test matrix in parent-root `../docs/test-coverage.md`
77
107
  - when prompt-critical API/interface details need a dedicated document, keep them in parent-root `../docs/api-spec.md`
@@ -92,18 +122,24 @@ Selected-stack defaults:
92
122
  - plan disclosure of feature flags, debug or demo surfaces, default enabled states, and mock or interception defaults in `README.md` and owner-maintained external docs whenever they exist
93
123
  - do not plan fake-success paths that hide missing failure handling
94
124
  - define failure paths, permissions, validation, logging, runtime assumptions, and test strategy before coding
125
+ - define authoritative business rules before coding, including defaults, limits, conflicts, uniqueness, reversal or cancellation behavior, retry rules, and ownership rules when they matter
95
126
  - for frontend-bearing work, plan each prompt-critical flow with an explicit state model covering loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where relevant
96
127
  - define logging contracts early, including categories, levels, redaction expectations, and what must never be logged
128
+ - for backend or fullstack work, define a central config module or equivalent single source of truth for runtime configuration instead of scattering direct environment reads through business logic
129
+ - for backend or fullstack work, define centralized logging expectations strongly enough that route or request outcomes, exceptions, and background failures can be understood without leaking sensitive data
97
130
  - define validation contracts early, including request validation, form validation, boundary validation, and normalized user-facing error behavior when relevant
98
131
  - for complex security, offline, sync, authorization, or data-governance features, define what `done` means across all prompt-promised dimensions rather than stopping at a partial foundation or hook layer
99
132
  - define shared lifecycle and state models when the product has meaningful workflow state, and keep those models aligned across design notes and API/spec notes
100
133
  - require cross-document consistency so design, API/spec, and test-planning artifacts do not drift on lifecycle/state models, flow coverage, permissions, or operational behavior
134
+ - define implementation dependency and parallelism expectations early enough that scaffold and development do not accidentally serialize independent work or parallelize unstable shared foundations
101
135
  - define logging and observability expectations for both frontend and backend
102
136
  - define operator visibility and operator workflow expectations when the prompt implies admin, operational, audit, backup, or support responsibilities
103
137
  - when the system has meaningful cross-cutting behavior, define shared implementation contracts early rather than leaving each module to invent its own pattern
104
138
  - define error-handling contracts when relevant, including normalization patterns for user-visible errors and backend error-shape expectations
105
139
  - define audit contracts when relevant, including centralized helper or service expectations and redaction rules
140
+ - when third-party services are mentioned but real live integration is not clearly required for delivery proof, define explicit adaptor or stub boundaries rather than leaving the integration strategy ambiguous
106
141
  - define permission contracts when relevant so navigation visibility, route guards, and API enforcement stay aligned
142
+ - define actor and role contracts explicitly, including which personas exist, which ones need real surfaces, and what successful end-to-end completion looks like for each main actor path
107
143
  - define state-lifecycle contracts when relevant, including context-switch or tenant-switch cleanup expectations
108
144
  - define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
109
145
  - call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
@@ -114,35 +150,37 @@ Selected-stack defaults:
114
150
  - start `./init_db.sh` during scaffold with the real database setup already known, then keep expanding it as migrations, schema setup, bootstrap data, and other database dependencies become real through implementation
115
151
  - when the project has database dependencies, plan to inject database setup through initialization scripts rather than packaging local database dependency artifacts or environment-specific database state
116
152
  - define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
117
- - for web projects, default to a Docker-first runtime contract unless the prompt or existing repository clearly dictates another model
118
- - for web projects, default the primary runtime command to `docker compose up --build`
153
+ - for web projects, require `docker compose up --build` as the runtime contract
119
154
  - for Dockerized web projects, plan a dev-only runtime bootstrap script that is invoked by the Docker startup path so `docker compose up --build` works without user-side exports or `.env` files
120
155
  - for Dockerized web projects, plan runtime value generation or injection through that dev-only bootstrap path instead of hardcoded repo values
121
156
  - for Dockerized web projects, require `./run_tests.sh` to use the same bootstrap path or an equivalent path with the same generated-value rules rather than a separate pre-seeded secret model
122
157
  - for Dockerized web projects, do not allow pre-seeded secret literals in Compose files, config files, Dockerfiles, or startup scripts even if comments describe them as local-only, test-only, or non-production
123
158
  - for Dockerized web projects, if runtime values must persist across restarts, plan Docker-managed runtime state rather than committed repo files
124
159
  - for Dockerized web projects, plan README disclosure that the bootstrap path is local-development-only behavior and not the production secret-management path
125
- - when `docker compose up --build` is not the runtime contract, require `./run_app.sh` as the single primary runtime wrapper for the project
126
- - for mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow instead of assuming host tooling conventions
160
+ - for Android, mobile, desktop, and iOS-targeted projects, also require a meaningful `docker compose up --build` command that starts a containerized build, artifact, preview, or support environment even when native runtime proof differs from web semantics
161
+ - for Android, mobile, desktop, and iOS-targeted projects, keep `./run_tests.sh` containerized as the broad verification path
162
+ - for non-web projects, `./run_app.sh` may still exist as a platform helper, but it does not replace the required Docker contract
127
163
  - `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
128
164
  - `./run_tests.sh` must be able to run on a clean Linux VM that only has Docker and curl available by default
129
165
  - do not require host package managers, host language runtimes, or host test tooling for the broad test path unless the stack absolutely forces it and the exception is explicitly justified
130
166
  - `./run_tests.sh` must prepare or install anything required inside its own controlled execution path when that setup is needed for a clean environment
131
- - for web projects using the default Docker-first runtime model, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
167
+ - for web projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
132
168
  - when host-level setup would otherwise be required, prefer a Dockerized `./run_tests.sh` path even outside traditional web stacks so the broad verification remains portable
133
- - for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
169
+ - for non-web projects, `./run_tests.sh` must call the selected stack's Dockerized or platform-equivalent full test path while keeping the same single-command interface
134
170
  - local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
135
171
  - for Electron or other Linux-targetable desktop projects, plan a Dockerized broad path that covers build, tests, packaging smoke checks, and headless UI/runtime verification through Xvfb or an equivalent Linux-capable desktop harness
136
172
  - for Android projects, plan a Dockerized broad path that covers Gradle build, lint, unit tests, and local Android JVM-side tests such as Robolectric without depending on an emulator
137
- - for iOS-targeted projects, keep `./run_tests.sh` as the portable Linux verification wrapper for lint, typecheck, shared logic tests, JS/UI-level tests when applicable, and static config/build-shape validation, but do not pretend it is native iOS runtime proof
173
+ - for iOS-targeted projects, keep `./run_tests.sh` as the containerized portable Linux verification wrapper for lint, typecheck, shared logic tests, JS/UI-level tests when applicable, and static config/build-shape validation, but do not pretend it is native iOS runtime proof
138
174
  - if true native iOS build or runtime evidence is prompt-critical, call out that it requires a separate macOS/Xcode owner checkpoint rather than trying to fake equivalence on Linux
139
- - for web projects using the default Docker-first runtime model, plan collision-resistant Compose defaults from the start: unique `COMPOSE_PROJECT_NAME`, no unnecessary `container_name`, only the app-facing port exposed to host by default, and internal services kept off host ports unless required
140
- - for web projects using the default Docker-first runtime model, prefer random host-port binding on `127.0.0.1` for the default runtime so parallel projects can start cleanly; if a fixed host port is genuinely required, plan an override plus a free-port fallback in the runtime or test wrapper
175
+ - for web projects, plan collision-resistant Compose defaults from the start: unique `COMPOSE_PROJECT_NAME`, no unnecessary `container_name`, only the app-facing port exposed to host by default, and internal services kept off host ports unless required
176
+ - for web projects, prefer random host-port binding on `127.0.0.1` for the default runtime so parallel projects can start cleanly; if a fixed host port is genuinely required, plan an override plus a free-port fallback in the runtime or test wrapper
141
177
  - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
142
178
  - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
143
179
  - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
144
- - for web frontend work, unless the prompt, existing repository, or established stack clearly dictates otherwise, default to Tailwind CSS for styling and `shadcn/ui` for component primitives
145
- - if the existing project already uses a different UI system, preserve and extend that system instead of forcing Tailwind CSS or `shadcn/ui` into it
180
+ - for web frontend work, unless the prompt, existing repository, or established stack clearly dictates otherwise, default to Tailwind CSS for styling
181
+ - when the selected frontend ecosystem supports `shadcn/ui` or an equivalent well-documented port cleanly, prefer that for component primitives
182
+ - otherwise use a mainstream documented component system appropriate to the chosen stack, such as Material UI, Ant Design, Ant Design Vue, or Angular Material
183
+ - if the existing project already uses a different UI system, preserve and extend that system instead of forcing the default CSS/component choices into it
146
184
  - when the prompt leaves the stack or starter open, explicitly choose the default stack, starter, and bootstrap command during planning instead of leaving scaffold to improvise them ad hoc
147
185
  - prefer official or clearly de facto standard starters and bootstrap commands when they fit the prompt, because they usually reduce setup waste and improve baseline quality
148
186
  - when multiple credible defaults exist, prefer the one with the strongest ecosystem support, best current maintenance posture, easiest Docker/test/E2E integration, and least friction for prompt-faithful delivery
@@ -151,24 +189,45 @@ Selected-stack defaults:
151
189
  - for mobile work, unless the prompt or existing repository clearly dictates otherwise, default to Expo + React Native + TypeScript
152
190
  - for desktop work, unless the prompt or existing repository clearly dictates otherwise, default to Electron + Vite + TypeScript
153
191
  - define end-to-end coverage for major user flows before coding
192
+ - define phase checkpoints and definition-of-done gates strongly enough that a coding model cannot confuse partial infrastructure with completed product behavior
193
+ - do an explicit look-ahead sweep across scaffold, implementation, integrated verification, hardening, evaluation, and packaging so later-phase needs are not rediscovered too late
154
194
  - define enough test coverage up front to catch major issues later, especially core happy path, important failure paths, security-critical paths, and obvious high-risk boundaries
155
- - enforce a plan to reach at least 90 percent meaningful coverage of the relevant behavior surface, not decorative line coverage
195
+ - enforce a concrete plan to achieve a minimum 90 percent test coverage threshold, including the exact measurement path, reporting command, and failing threshold for the selected stack when practical
196
+ - do not leave coverage as a qualitative aspiration; planning must state how the project will prove and maintain the minimum 90 percent threshold
197
+ - for backend or fullstack projects, plan an endpoint-by-endpoint API audit story: resolved `METHOD + PATH` inventory, expected HTTP coverage, true no-mock HTTP coverage, and which tests are only mocked or indirect
198
+ - for backend or fullstack projects, plan core API tests so the important endpoints are exercised through the real HTTP layer rather than controller or service bypasses
199
+ - when mocked HTTP tests or unit-only coverage still exist, plan to classify them explicitly instead of overstating them as equivalent to true no-mock API coverage
200
+ - plan audit-readable API test evidence: the test suite and `../docs/test-coverage.md` should make the endpoint, request input, and response assertions easy to trace statically
201
+ - plan a module-family test summary that can call out important modules not yet tested, especially controllers, services, repositories, auth, guards, and middleware when they exist
156
202
  - require API tests to exercise real API endpoints and real call flows rather than bypassing the endpoint layer with internal helper-only checks
157
203
  - when API tests are material, plan for them to print simple useful response evidence such as status codes and message/body summaries so verification output is easy to inspect
158
204
  - plan endpoint coverage so prompt-required functions and dependent multi-step API flows are actually exercised, not just isolated happy-path fragments
159
205
  - plan `../docs/test-coverage.md` in evaluator-facing shape rather than loose prose: requirement or risk point, mapped test file(s), key assertion(s) or fixtures, coverage status, major gap, and minimum test addition
160
206
  - do not satisfy `../docs/test-coverage.md` with generic test categories alone; make the matrix concrete enough that the owner can review prompt-critical risks without reconstructing the test story manually
207
+ - when backend or fullstack APIs exist, make `../docs/test-coverage.md` carry both the requirement/risk matrix and an endpoint inventory plus API test mapping table
208
+ - when backend or fullstack APIs exist, make `../docs/test-coverage.md` distinguish true no-mock HTTP tests from HTTP-with-mocking and unit-only or indirect coverage
161
209
  - when multiple prompt-critical domains exist, group the matrix by domain or risk cluster so each section names the requirement, planned test location, key assertions, current status, and remaining gap explicitly
162
210
  - for backend or fullstack projects, explicitly plan coverage for 401, 403, 404, conflicts or duplicate submission when relevant, object-level authorization, tenant or user isolation, sensitive-log exposure, and pagination/filter/sort when those behaviors exist
163
211
  - for frontend-bearing projects, explicitly plan a layered frontend test story when UI state or routing is material: unit, component, page or route integration, and E2E where applicable
164
212
  - for non-trivial frontend projects, explicitly plan a frontend test layer beyond runtime-only confidence: component, page, route, or state-focused tests when UI state complexity is meaningful
165
- - for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable
213
+ - for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable, but treat Playwright as a real verified dependency rather than a decorative default
166
214
  - for mobile work, plan Jest plus React Native Testing Library as the local default test layer and add a platform-appropriate mobile UI/E2E tool when real device-flow proof is needed
167
215
  - for desktop work, plan a local desktop test runner plus Playwright Electron support or another platform-appropriate desktop UI/E2E tool when real window-flow proof is needed
168
216
  - for Android work, do not rely on an emulator as the default broad verification contract
169
217
  - for iOS work on Linux, plan code and portable-test evaluation honestly and treat native simulator/runtime proof as out-of-band unless a macOS checkpoint is explicitly available
170
218
  - when UI-bearing flows are material, explicitly plan screenshot review or equivalent platform artifacts as part of UI verification so correctness is checked, not just command success
171
219
  - define verification strategy, selected-stack runtime expectations, and documentation implications before coding
220
+ - plan the final README contract explicitly enough to satisfy both the normal delivery docs and the strict post-bugfix README audit:
221
+ - project type declared near the top using one of `backend`, `fullstack`, `web`, `android`, `ios`, or `desktop`
222
+ - startup instructions
223
+ - access method
224
+ - verification method
225
+ - demo credentials for every role when auth exists, or the exact statement `No authentication required`
226
+ - tech stack clarity and architecture explanation
227
+ - workflow and security or role notes when relevant
228
+ - for backend, fullstack, and web projects, plan README startup instructions so they include the canonical `docker compose up --build` contract and also the exact legacy compatibility string `docker-compose up` to satisfy the strict README audit without weakening the real runtime contract
229
+ - for Android, iOS, and desktop projects, plan README platform-specific host-side build or launch guidance in addition to the required Docker-contained runtime and test contract so the strict README audit has the expected section shape
230
+ - plan a deliberate cleanup step between fast local iteration and final hardening so local-only setup traces, host-only dependency assumptions, and misleading README instructions are removed before integrated verification and hardening close
172
231
  - define the static review story before coding: a fresh reviewer should be able to trace startup, test entry points, main routes or entry modules, core data flow, and any mock or local-data boundaries from repo artifacts without rewriting the project
173
232
  - define the static audit story for security and tests before coding: a fresh reviewer should be able to trace security boundaries and requirement-to-test coverage from repository artifacts and docs without reconstructing the design mentally
174
233
  - define repo traceability before coding through `README.md` plus the code structure inside the repo; keep fuller external references in `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md`
@@ -49,7 +49,7 @@ Prefer existing workflow artifacts first:
49
49
  - developer-session handoffs
50
50
  - review and rejection history
51
51
  - verification gate notes
52
- - `../self_test_reports/`
52
+ - `../.tmp/` audit and fix-check reports
53
53
  - packaging checks
54
54
 
55
55
  Do not reread the entire codebase unless a real inconsistency requires it.
@@ -66,7 +66,7 @@ Do not rerun broad Docker or full-suite verification just for retrospective anal
66
66
  - owner shell
67
67
  - developer prompt
68
68
  - skills
69
- - `AGENTS.md`
69
+ - repo-local rulebook file such as `AGENTS.md` or `CLAUDE.md`
70
70
  7. actionable improvements
71
71
 
72
72
  ## Audit buckets