theslopmachine 0.6.2 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/MANUAL.md +21 -6
  2. package/README.md +55 -7
  3. package/RELEASE.md +16 -1
  4. package/assets/agents/developer.md +41 -1
  5. package/assets/agents/slopmachine-claude.md +101 -60
  6. package/assets/agents/slopmachine.md +40 -17
  7. package/assets/claude/agents/developer.md +42 -5
  8. package/assets/skills/clarification-gate/SKILL.md +25 -5
  9. package/assets/skills/claude-worker-management/SKILL.md +290 -57
  10. package/assets/skills/developer-session-lifecycle/SKILL.md +83 -38
  11. package/assets/skills/development-guidance/SKILL.md +21 -1
  12. package/assets/skills/evaluation-triage/SKILL.md +34 -23
  13. package/assets/skills/final-evaluation-orchestration/SKILL.md +88 -50
  14. package/assets/skills/hardening-gate/SKILL.md +17 -3
  15. package/assets/skills/integrated-verification/SKILL.md +3 -3
  16. package/assets/skills/planning-gate/SKILL.md +32 -3
  17. package/assets/skills/planning-guidance/SKILL.md +72 -13
  18. package/assets/skills/retrospective-analysis/SKILL.md +2 -2
  19. package/assets/skills/scaffold-guidance/SKILL.md +129 -124
  20. package/assets/skills/submission-packaging/SKILL.md +33 -27
  21. package/assets/skills/verification-gates/SKILL.md +44 -14
  22. package/assets/slopmachine/backend-evaluation-prompt.md +1 -1
  23. package/assets/slopmachine/frontend-evaluation-prompt.md +5 -5
  24. package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +81 -0
  25. package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +191 -0
  26. package/assets/slopmachine/scaffold-playbooks/android-native-java.md +203 -0
  27. package/assets/slopmachine/scaffold-playbooks/angular-default.md +181 -0
  28. package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +142 -0
  29. package/assets/slopmachine/scaffold-playbooks/backend-family-matrix.md +80 -0
  30. package/assets/slopmachine/scaffold-playbooks/database-module-matrix.md +80 -0
  31. package/assets/slopmachine/scaffold-playbooks/django-default.md +166 -0
  32. package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +189 -0
  33. package/assets/slopmachine/scaffold-playbooks/docker-shared-contract.md +334 -0
  34. package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +124 -0
  35. package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +73 -0
  36. package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +134 -0
  37. package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +160 -0
  38. package/assets/slopmachine/scaffold-playbooks/frontend-family-matrix.md +134 -0
  39. package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +136 -0
  40. package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +160 -0
  41. package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +93 -0
  42. package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +151 -0
  43. package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +188 -0
  44. package/assets/slopmachine/scaffold-playbooks/laravel-default.md +216 -0
  45. package/assets/slopmachine/scaffold-playbooks/livewire-default.md +265 -0
  46. package/assets/slopmachine/scaffold-playbooks/overlay-module-matrix.md +130 -0
  47. package/assets/slopmachine/scaffold-playbooks/platform-family-matrix.md +79 -0
  48. package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +72 -0
  49. package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +182 -0
  50. package/assets/slopmachine/scaffold-playbooks/tauri-default.md +80 -0
  51. package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +162 -0
  52. package/assets/slopmachine/scaffold-playbooks/web-default.md +96 -0
  53. package/assets/slopmachine/templates/AGENTS.md +41 -3
  54. package/assets/slopmachine/templates/CLAUDE.md +111 -0
  55. package/assets/slopmachine/test-coverage-prompt.md +561 -0
  56. package/assets/slopmachine/utils/claude_create_session.mjs +3 -2
  57. package/assets/slopmachine/utils/claude_live_channel.mjs +188 -0
  58. package/assets/slopmachine/utils/claude_live_common.mjs +411 -0
  59. package/assets/slopmachine/utils/claude_live_hook.py +47 -0
  60. package/assets/slopmachine/utils/claude_live_launch.mjs +187 -0
  61. package/assets/slopmachine/utils/claude_live_status.mjs +25 -0
  62. package/assets/slopmachine/utils/claude_live_stop.mjs +46 -0
  63. package/assets/slopmachine/utils/claude_live_turn.mjs +277 -0
  64. package/assets/slopmachine/utils/claude_resume_session.mjs +3 -2
  65. package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.mjs +23 -0
  66. package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.sh +5 -0
  67. package/assets/slopmachine/utils/claude_worker_common.mjs +361 -4
  68. package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +4 -0
  69. package/assets/slopmachine/utils/export_ai_session.mjs +1 -1
  70. package/assets/slopmachine/utils/normalize_claude_session.py +153 -0
  71. package/assets/slopmachine/utils/package_claude_session.mjs +123 -0
  72. package/assets/slopmachine/utils/prepare_strict_audit_workspace.mjs +65 -0
  73. package/package.json +1 -1
  74. package/src/constants.js +42 -3
  75. package/src/init.js +173 -28
  76. package/src/install.js +156 -8
  77. package/src/send-data.js +56 -57
@@ -25,6 +25,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
25
25
  - require the README to explain what the project does, how to run it, how to test it, the main repo contents, and any important new-developer information
26
26
  - require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
27
27
  - do not require the README to carry a full API catalog
28
+ - require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
28
29
  - do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
29
30
  - require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
30
31
  - if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
@@ -33,7 +34,8 @@ Use this skill after development begins whenever you are reviewing work, decidin
33
34
  - require parent-root `../docs/test-coverage.md` to be evaluator-shaped rather than generic: requirement or risk point, mapped test evidence, coverage status, major gap, and minimum test addition
34
35
  - when auth or access-control behavior is relevant, require static security-boundary evidence that a fresh reviewer can trace for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug surfaces, and tenant or user isolation when applicable
35
36
  - require logging structure and validation or error-handling structure to be statically traceable from repo artifacts and, when needed, owner-maintained external docs
36
- - for web projects, default the runtime command to `docker compose up --build` unless the prompt or existing repository clearly dictates another model
37
+ - for web projects, require the runtime command to be `docker compose up --build`
38
+ - for backend, fullstack, and web projects, allow and expect an additional README compatibility note containing the exact string `docker-compose up` for the strict README audit, but do not treat that as a replacement for the canonical `docker compose up --build` contract
37
39
  - for Dockerized web projects, require a dev-only runtime bootstrap script or equivalent startup path so `docker compose up --build` works without user exports or `.env`
38
40
  - do not accept Dockerized web startup that depends on manual export steps before the runtime command
39
41
  - do not accept Dockerized web startup that relies on checked-in `.env` files or hardcoded runtime values to satisfy local startup
@@ -41,12 +43,13 @@ Use this skill after development begins whenever you are reviewing work, decidin
41
43
  - require `./run_tests.sh` to use the same runtime bootstrap model or an equivalent model with the same generated-value rules as `docker compose up --build`
42
44
  - if runtime values persist across restarts, require them to live in Docker-managed runtime state rather than committed repo files
43
45
  - require README disclosure that the bootstrap path is local-development-only behavior rather than the production secret-management path
44
- - when `docker compose up --build` is not the runtime contract, require `./run_app.sh` to be the documented primary runtime wrapper
46
+ - for Android, mobile, desktop, and iOS-targeted projects, require a meaningful `docker compose up --build` command even when platform-specific runtime proof differs from web semantics
47
+ - for Android, mobile, desktop, and iOS-targeted projects, allow `./run_app.sh` as an additional platform helper but not as a replacement for the required Docker command
45
48
  - require `./run_tests.sh` to be self-sufficient enough to run from a clean Linux VM that only has Docker and curl available by default
46
49
  - do not accept a broad test path that depends on host package managers or preinstalled host language runtimes when Docker can provide the execution environment instead
47
- - for web projects using the default Docker-first runtime model, require `./run_tests.sh` to be the Dockerized broad test path used only for the limited broad verification moments rather than as the ordinary development verification path
50
+ - for web projects, require `./run_tests.sh` to be the Dockerized broad test path used only for the limited broad verification moments rather than as the ordinary development verification path
48
51
  - when host-level setup would otherwise be required, prefer a Dockerized `./run_tests.sh` path even outside traditional web stacks so the broad verification remains portable
49
- - for non-web or non-Docker projects, require `./run_tests.sh` to be the platform-equivalent broad test path used for final broad verification
52
+ - for non-web projects, require `./run_tests.sh` to remain containerized and usable as the platform-equivalent broad test path used for final broad verification
50
53
 
51
54
  ## Review standard
52
55
 
@@ -67,7 +70,11 @@ Use this skill after development begins whenever you are reviewing work, decidin
67
70
  - do not accept fake-success paths that materially hide missing failure handling
68
71
  - do not accept frontend/backend drift in fullstack work
69
72
  - do not accept missing end-to-end coverage for major fullstack flows
70
- - do not accept coverage posture that clearly falls short of roughly 90 percent meaningful coverage of the relevant behavior surface without a prompt-faithful reason
73
+ - do not accept coverage posture that falls short of the minimum 90 percent coverage threshold for the relevant behavior surface without an explicit prompt-faithful exception
74
+ - when backend or fullstack APIs exist, do not accept missing endpoint inventory or missing API-test mapping for the important `METHOD + PATH` surfaces
75
+ - when backend or fullstack APIs exist, do not accept mocked or indirect tests being presented as equivalent to true no-mock HTTP endpoint coverage
76
+ - do not accept a README that is missing project type, startup instructions, access method, verification method, or auth disclosure when the strict README audit would expect them
77
+ - do not accept final delivered docs or wrapper flows that still depend on `npm install`, `pip install`, `apt-get`, manual DB setup, or other host-only setup assumptions after development is complete
71
78
  - do not accept a repo that only becomes understandable by reading parent-root docs or sibling workflow artifacts
72
79
  - do not accept frontend-bearing work that lacks repo-local build/preview/config guidance when those commands or surfaces are material to the product
73
80
  - do not accept frontend-bearing work that lacks a credible state model for prompt-critical flows
@@ -84,6 +91,16 @@ Use this skill after development begins whenever you are reviewing work, decidin
84
91
  - do not accept module completion that ignores integration seams or cross-cutting consistency with the existing system
85
92
  - do not accept end-to-end evidence that bypasses a required user-facing or admin-facing surface with direct API shortcuts
86
93
 
94
+ ## Gate-demand rule
95
+
96
+ - when setting a planning, scaffold, development, integrated-verification, hardening, or evaluation gate, reference the relevant accepted plan sections and then give an explicit stage-exclusive checklist for that gate
97
+ - the gate checklist should name:
98
+ - the exact outcomes that must now be true
99
+ - the exact evidence that must now exist
100
+ - the important shortcuts, omissions, or future-work excuses that are not acceptable for this gate
101
+ - do not re-dump the whole plan; isolate the exact subset of plan-backed expectations that must now be closed
102
+ - at gate moments, prefer more explicit owner messages over ultra-short prompts so the developer cannot plausibly misread what acceptance depends on
103
+
87
104
  ## Cadence rule
88
105
 
89
106
  - use targeted local verification as the default during scaffold corrections, development, hardening, and evaluation fix loops
@@ -91,7 +108,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
91
108
  - do not turn ordinary acceptance into repeated integrated-style gate runs
92
109
  - do not run `./run_tests.sh` casually on the owner side
93
110
  - do not run `docker compose up --build` casually on the owner side
94
- - for web projects using the default Docker-first runtime model, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
111
+ - for web projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
95
112
  - after that scaffold confirmation, the next Docker-based run should be at development completion or integrated-verification entry unless a real blocker forces earlier escalation
96
113
  - in between those two broad checks, ordinary development should rely on local fast verification only
97
114
  - ordinary in-phase verification should not invoke `docker compose up --build` or `./run_tests.sh` unless the workflow is explicitly at one of those broad gate moments or a blocker justifies an earlier escalation
@@ -101,8 +118,10 @@ Use this skill after development begins whenever you are reviewing work, decidin
101
118
  - inspect the result and evidence, not just the developer claim
102
119
  - review technical quality, prompt alignment, architecture impact, and verification depth of the current work
103
120
  - after planning is accepted, treat the accepted plan and its relevant section as the default slice baseline instead of restating the full slice contract in every owner prompt
104
- - for ordinary slice work after planning, keep the owner prompt to one short paragraph plus a small checklist of slice-specific guardrails, review concerns, or deltas that are not already clear from the accepted plan
121
+ - for ordinary slice work after planning, keep the owner prompt anchored to the relevant accepted plan sections and use an explicit checklist of slice-specific required outcomes, verification expectations, and review concerns that are not already clear from the accepted plan
122
+ - when the current step is a real gate or phase-exit decision, be more explicit than ordinary slice prompts and enumerate the full stage-exclusive acceptance checklist
105
123
  - during normal implementation iteration, always prefer fast local language-native or framework-native verification for the changed area instead of the selected stack's broad gate path
124
+ - during normal implementation iteration, fast local tooling setup is allowed when it helps iteration speed, but treat it as temporary engineering scaffolding rather than part of the final delivered runtime or test contract
106
125
  - require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
107
126
  - require the developer to report the exact verification commands that were run and the concrete results they produced
108
127
  - when API tests are used as evidence, require them to hit real endpoints and expose simple useful response evidence such as status codes and message/body summaries
@@ -126,11 +145,11 @@ Use this skill after development begins whenever you are reviewing work, decidin
126
145
  - the evaluator-session cycles required inside `P7` are not part of the ordinary owner-run broad-gate budget; they are the formal final evaluation model for that phase
127
146
  - for Electron or other Linux-targetable desktop projects, the broad gate should use the Dockerized desktop build/test path plus headless UI/runtime verification rather than pretending web-style Docker runtime semantics apply
128
147
  - for Android projects, the broad gate should use the Dockerized Android build/test path without depending on an emulator
129
- - for iOS-targeted projects on Linux, the broad gate should rely on `./run_tests.sh` plus static/code review evidence and should not claim native iOS runtime proof unless a real macOS/Xcode checkpoint exists
148
+ - for iOS-targeted projects on Linux, the broad gate should include `docker compose up --build` plus `./run_tests.sh` and static/code review evidence, and should not claim native iOS runtime proof unless a real macOS/Xcode checkpoint exists
130
149
  - the workflow target is at most 3 broad owner-run verification moments across the whole cycle
131
150
  - ordinary planning, ordinary slice acceptance, and routine in-phase verification are not broad gates by default and should rely on targeted local verification unless the risk profile says otherwise
132
151
 
133
- For web projects using the default Docker-first runtime model, the default Docker cadence is:
152
+ For web projects, the default Docker cadence is:
134
153
 
135
154
  1. one owner-run `docker compose up --build` plus one owner-run `./run_tests.sh` after scaffold completion
136
155
  2. no more Docker-based runs during ordinary development work
@@ -144,24 +163,34 @@ Use evidence such as internal metadata files, structured Beads comments, verific
144
163
 
145
164
  - clarification requires the `clarification-gate` conditions plus explicit approval record
146
165
  - planning requires the `developer-session-lifecycle` and planning-gate conditions plus a fresh planning-oriented start and the required documentation and repo hygiene state when relevant
166
+ - planning exit also requires explicit owner review that the accepted planning artifacts cover the section-addressable contract deeply enough for later implementation: in-scope and out-of-scope, actors and success paths, modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
167
+ - planning exit does not pass if those sections exist only nominally or remain too vague to drive implementation without broad reinvention
168
+ - planning exit also requires that the accepted plan covers the final README hard-gate shape and, when backend or fullstack APIs exist, the endpoint-inventory and API-test mapping strategy needed for the strict coverage audit
147
169
  - scaffold requires evidence for the bounded scaffold gate, baseline logging/config, and when relevant the chosen frontend stack and UI approach being set intentionally
148
170
  - scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, a usable local test environment in the current working directory, and the correct primary runtime command plus `./run_tests.sh` documented and working when practical
149
- - for web projects, scaffold normally requires Docker-first runtime foundations unless the prompt or existing repository clearly dictates another model
171
+ - for web projects, scaffold requires Docker runtime foundations
172
+ - for Android, mobile, desktop, and iOS-targeted projects, scaffold also requires a meaningful `docker compose up --build` path plus containerized `./run_tests.sh`
150
173
  - for Dockerized web projects, scaffold also requires the dev-only runtime bootstrap path to be wired so `docker compose up --build` works without manual exports or `.env`
151
174
  - for Dockerized web projects, scaffold also requires owner review of Compose files, runtime bootstrap scripts, entrypoints or wrappers, and `./run_tests.sh` to confirm the no-export, no-`.env`, no-pre-seeded-secret-literals model is actually implemented
152
175
  - when the project has database dependencies, scaffold also requires a real `./init_db.sh` created during scaffold, wired into the runtime/test flow when needed, and populated with the database setup already known at that stage
153
176
  - scaffold also requires `./run_tests.sh` to handle its own required setup from a clean Linux VM that only has Docker and curl available by default
154
177
  - local tests should still exist for ordinary development work even when the primary broad test command is Dockerized
178
+ - scaffold also requires `README.md` to have the baseline section shape needed for the final README audit, even when many sections are still scaffold-level placeholders
155
179
  - when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
156
180
  - for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
157
181
  - scaffold acceptance also requires clean startup and teardown behavior in the selected runtime model; for Dockerized web projects this includes self-contained Compose namespacing and no unnecessary fragile `container_name` usage
158
182
  - for Dockerized web projects, scaffold acceptance also requires collision-resistant shared-machine defaults: only the primary app-facing port exposed to host by default, internal services not bound to host without prompt need, default host binding on `127.0.0.1`, and either random host-port assignment or a real free-port fallback when fixed ports are required
159
- - for web projects using the default Docker-first runtime model, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
183
+ - for web projects, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
184
+ - for Android, mobile, desktop, and iOS-targeted projects, scaffold acceptance is not complete until the owner has also run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
160
185
  - module implementation requires targeted local verification only; browser E2E and other broad gate evidence belong to owner-run major checkpoints rather than ordinary slice acceptance
186
+ - module implementation acceptance requires explicit checking against the relevant accepted plan sections and the current stage-exclusive checklist, not just a loose sense that the feature exists
161
187
  - module implementation acceptance should challenge tenant isolation, path confinement, sanitized error behavior, prototype residue, integration seams, and cross-cutting consistency when those concerns are in scope
162
188
  - module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
189
+ - when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
163
190
  - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
164
- - module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the planned 90 percent meaningful coverage target instead of accumulating test debt
191
+ - module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
192
+ - before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
193
+ - before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
165
194
  - integrated verification completion requires explicit full-system evidence before the phase can close
166
195
  - integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
167
196
  - web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
@@ -174,14 +203,15 @@ Use evidence such as internal metadata files, structured Beads comments, verific
174
203
  - hardening must explicitly re-check secret handling, redaction, and frontend/backend observability hygiene
175
204
  - hardening must explicitly satisfy the documentation and repo hygiene policy in this file before final evaluation can begin
176
205
  - hardening must leave the repo statically reviewable enough that the final static evaluator can trace startup, tests, entry points, routes, config, and mock/local-data boundaries without rewriting core code
177
- - hardening must explicitly challenge any remaining gaps against the intended 90 percent meaningful coverage target and require justification or fixes before `P7`
206
+ - hardening must explicitly challenge any remaining gaps against the minimum 90 percent coverage threshold and require proof, fixes, or an explicit prompt-faithful exception before `P7`
178
207
  - before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
179
208
  - before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
180
209
  - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
181
210
  - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
182
211
  - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
183
- - final evaluation readiness requires the cycle-based `P7` self-test model under `../self_test_reports/`; failed initial audits trigger non-counted remediation, counted cycles begin only from a `pass` or `partial pass` initial audit, cycle fix loops stay scoped to that cycle's initial issue list, and 2 successful fresh-session counted cycles are required before final human decision
212
+ - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
184
213
  - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
214
+ - before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
185
215
 
186
216
  ## Acceptance rule
187
217
 
@@ -200,7 +200,7 @@ Hard Rules (must follow)
200
200
  ====================
201
201
  Output Requirements
202
202
 
203
- Produce the final audit in a concise but complete report and write the consolidated report to `./.tmp/**.md`.
203
+ Produce the final audit in a concise but complete report and write the consolidated report to `../.tmp/**.md`.
204
204
 
205
205
  The final report must be organized by the six major acceptance sections in order, even if your scan order was different.
206
206
 
@@ -118,8 +118,8 @@ Based on static evidence only, determine whether the delivery is a credible, Pro
118
118
  - marked Cannot Confirm with a clear boundary explanation
119
119
 
120
120
  5) Exclude Temporary Output
121
- - Exclude `./.tmp/` and all its subdirectories.
122
- - `./.tmp/` must not be used as evidence, search scope, reference, summary source, or factual basis.
121
+ - Exclude `./.tmp/`, `../.tmp/`, and all their subdirectories.
122
+ - `./.tmp/` and `../.tmp/` must not be used as evidence, search scope, reference, summary source, or factual basis.
123
123
 
124
124
  [Pure Frontend-Specific Rules]
125
125
 
@@ -277,7 +277,7 @@ Your output must strictly follow this structure:
277
277
 
278
278
  2. Scope and Verification Boundary
279
279
  - what was reviewed
280
- - which input sources were excluded, including `./.tmp/`
280
+ - which input sources were excluded, including `./.tmp/` and `../.tmp/`
281
281
  - what was not executed
282
282
  - what cannot be statically confirmed
283
283
  - which conclusions require manual verification
@@ -388,10 +388,10 @@ Before finalizing the report, check each of the following:
388
388
  3. Did you wrongly assign backend responsibility to the frontend?
389
389
  4. Did you misclassify reasonable mock / local data / storage usage as a defect?
390
390
  5. Did you state visual or interaction guesses as strong conclusions?
391
- 6. Does any conclusion directly or indirectly rely on `./.tmp/`?
391
+ 6. Does any conclusion directly or indirectly rely on `./.tmp/` or `../.tmp/`?
392
392
  7. Have all required Blocker / High dimensions been closed?
393
393
  8. Have repeated findings been merged by root cause?
394
394
  9. If unsupported observations were removed, would the final Verdict still hold?
395
395
 
396
- If writing files is supported, save the final report to `./.tmp/`.
396
+ If writing files is supported, save the final report to `../.tmp/`.
397
397
  Otherwise, return the report directly in the conversation.
@@ -0,0 +1,81 @@
1
+ # Android Kotlin Compose Scaffold Playbook
2
+
3
+ Use this playbook when the prompt explicitly requires native Android with Kotlin + Compose.
4
+
5
+ ## Current status
6
+
7
+ This family is now **experimentally verified** for a reasonable Linux Docker baseline.
8
+
9
+ Verified lab:
10
+
11
+ - `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-kotlin-compose-baseline`
12
+
13
+ ## What was achieved in the verified lab
14
+
15
+ The verified lab now demonstrates all of the following:
16
+
17
+ - native Kotlin Android project exists
18
+ - Compose is enabled in Gradle
19
+ - pinned Android toolchain Dockerfile exists
20
+ - `docker-compose.yml`, `run_tests.sh`, and artifact-serving scripts exist
21
+ - `artifacts/app-debug.apk` and checksum were produced in the lab tree
22
+ - JVM-side test files exist
23
+ - `docker compose up -d --wait` reached a stable healthy artifact-serving state
24
+ - `./run_tests.sh` passed with containerized lint plus `:app:testDebugUnitTest`
25
+ - the Compose build no longer fails on the broken theme/import issues found during investigation
26
+
27
+ ## Safe default stack
28
+
29
+ - AGP `8.5.x`
30
+ - Gradle `8.7`
31
+ - Java `17`
32
+ - Kotlin `1.9.x`
33
+ - `compileSdk = 34`
34
+ - `targetSdk = 34`
35
+ - `minSdk = 29`
36
+ - Compose BOM pinned explicitly
37
+ - Material 3 default Compose surface
38
+
39
+ ## Runtime contract
40
+
41
+ - required Docker command: `docker compose up --build`
42
+ - required broad test command: `./run_tests.sh`
43
+ - both are now real and working in the verified lab
44
+ - `./run_app.sh` may exist as a helper, but it does not replace the Docker baseline
45
+
46
+ ## Intended Docker strategy
47
+
48
+ This family should follow the same proven Android pattern as the Java/Kotlin-Views baseline:
49
+
50
+ 1. pre-bake the Android SDK/toolchain layers into the image
51
+ 2. bind-mount the workspace for source changes
52
+ 3. avoid default `clean` tasks
53
+ 4. use one long-running artifact-serving/support container for the Compose healthy state
54
+ 5. reuse that same running container for lint and JVM-side test commands via `docker compose exec`
55
+
56
+ ## Honest Linux proof boundary
57
+
58
+ For the verified Linux baseline, Docker honestly proves only:
59
+
60
+ - Compose code compiles
61
+ - debug APK assembles
62
+ - lint passes
63
+ - JVM-side tests pass
64
+ - artifact-serving healthy state works
65
+
66
+ Linux should **not** claim emulator or device runtime proof.
67
+
68
+ ## Verified rerun evidence
69
+
70
+ The final rerun established these concrete facts:
71
+
72
+ - direct `:app:assembleDebug --stacktrace` passed after fixing the broken app theme and missing `rememberSaveable` import
73
+ - `./run_tests.sh` passed with the container reaching `Healthy`
74
+ - containerized Gradle verification completed with `BUILD SUCCESSFUL in 1m 32s`
75
+ - the generated APK size was non-zero and published from the artifact server
76
+
77
+ ## Guidance
78
+
79
+ - use this family only when the prompt explicitly requires Compose
80
+ - keep it non-default for open-ended Android work because the Java Views baseline is still the lighter generic default
81
+ - it is now safe to treat this family as experimentally verified rather than only partially prepared
@@ -0,0 +1,191 @@
1
+ # Android Kotlin Views Scaffold Playbook
2
+
3
+ Use this playbook when the prompt explicitly wants native Android with Kotlin and XML/Views rather than Compose.
4
+
5
+ This concrete playbook follows the shared Docker contract in `docker-shared-contract.md` and is grounded in the experimentally verified lab at `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline`.
6
+
7
+ ## Goal
8
+
9
+ Create a simple Android Kotlin Views baseline that:
10
+
11
+ - is baseline-only, not feature-complete
12
+ - uses Kotlin plus Android Views, not Compose
13
+ - stays honest about Linux-first Docker verification boundaries
14
+ - keeps `docker compose up --build` as the required runtime/support contract
15
+ - keeps `./run_tests.sh` as the required broad containerized verification path
16
+ - requires no emulator
17
+
18
+ ## Runtime contract
19
+
20
+ - required Docker command: `docker compose up --build`
21
+ - required broad test command: `./run_tests.sh`
22
+ - both commands must be real, containerized, and working
23
+ - `./run_app.sh` may exist as a host convenience helper, but it does not replace the required Docker contract
24
+
25
+ ## Verified baseline notes
26
+
27
+ From a real lab verification on 2026-04-15:
28
+
29
+ - the verified lab is `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline`
30
+ - the lab uses Kotlin source plus XML Views and ViewBinding
31
+ - `docker compose up --build -d --wait android-baseline` reached a stable healthy state
32
+ - that healthy state is a loopback-only artifact server serving `artifacts/app-debug.apk` on the mapped host port reported by `docker compose port android-baseline 8080`
33
+ - `./run_tests.sh` reused the running Compose container with `docker compose exec` for `:app:lintDebug` and `:app:testDebugUnitTest`, then smoke-checked the served APK
34
+ - the containerized Gradle verification completed successfully without an emulator
35
+ - the truthful proof boundary is pinned toolchain + APK assembly + lint + JVM unit tests + artifact serving; it does **not** claim emulator boot, adb deployment, or on-device runtime proof
36
+
37
+ ## Safe pinned defaults used in the verified lab
38
+
39
+ - Android Gradle Plugin: `8.5.2`
40
+ - Gradle wrapper: `8.7`
41
+ - Kotlin Android plugin: `1.9.24`
42
+ - Java: `17`
43
+ - `compileSdk = 34`
44
+ - `targetSdk = 34`
45
+ - `minSdk = 29`
46
+ - view system: XML layouts + Android Views + ViewBinding
47
+
48
+ ## Safe default libraries
49
+
50
+ - AppCompat `1.7.0`
51
+ - Material `1.12.0`
52
+ - ConstraintLayout `2.1.4`
53
+ - Lifecycle Runtime `2.8.4`
54
+ - JUnit `4.13.2`
55
+
56
+ Add Room, security, networking, or media libraries only when the prompt actually needs them.
57
+
58
+ ## Preferred repo shape
59
+
60
+ - `app/`
61
+ - `container-build-and-serve.sh`
62
+ - `container-gradle.sh`
63
+ - `docker-compose.yml`
64
+ - `Dockerfile`
65
+ - `run_tests.sh`
66
+ - `run_app.sh`
67
+ - `artifacts/` for the built APK and checksum
68
+
69
+ ## Docker strategy that was experimentally verified
70
+
71
+ For Android-on-Linux Kotlin Views scaffolds, prefer a pinned toolchain image plus one long-running support container instead of pretending Docker proves native Android runtime.
72
+
73
+ Verified pattern:
74
+
75
+ 1. build a pinned Android toolchain image from source in the repo
76
+ 2. pre-bake Java 17, Android command-line tools, platform `android-34`, build-tools `34.0.0`, and a seeded Gradle wrapper/plugin cache into the image
77
+ 3. bind-mount the workspace at runtime so source edits do not invalidate the heavy SDK layers
78
+ 4. start one long-running container that runs `:app:assembleDebug`, copies the APK to `artifacts/`, writes a checksum, and serves that directory over HTTP
79
+ 5. expose only one loopback-only host port with an automatic high host-port mapping: `127.0.0.1::8080`
80
+ 6. declare health only after the APK exists and the in-container HTTP server returns the APK successfully
81
+ 7. reuse that same running container in `./run_tests.sh` with `docker compose exec` so lint/test verification does not rebuild the entire toolchain path again
82
+
83
+ This strategy satisfied the shared contract because `docker compose up --build` reached a meaningful healthy state and reruns avoided repeated SDK bootstrap work.
84
+
85
+ ## `./run_tests.sh`
86
+
87
+ `./run_tests.sh` should remain containerized and should prove the portable Android baseline without an emulator:
88
+
89
+ - start the Compose baseline with `docker compose up --build -d --wait`
90
+ - reuse the running container for Gradle verification with `docker compose exec`
91
+ - run at least `:app:lintDebug` and `:app:testDebugUnitTest`
92
+ - smoke-check the same APK artifact surface that `docker compose up --build` claims to provide
93
+ - tear the stack down after verification
94
+
95
+ ## Minimal real test floor
96
+
97
+ At scaffold time, include at least:
98
+
99
+ - one real Kotlin helper/rule test
100
+ - one real state/helper test exercised by the Android entrypoint flow
101
+ - real `lint` proof
102
+ - real `assembleDebug` proof
103
+
104
+ Do not leave the baseline test path mostly `NO-SOURCE`.
105
+
106
+ ## README floor
107
+
108
+ `README.md` in the scaffold must already state:
109
+
110
+ - that this is a baseline scaffold only
111
+ - Kotlin + Android Views scope
112
+ - required Docker command: `docker compose up --build`
113
+ - required broad test command: `./run_tests.sh`
114
+ - host helper command: `./run_app.sh` when present
115
+ - what healthy state means for the artifact-serving support surface
116
+ - what the Docker path does **not** prove on Linux
117
+ - no `.env` / no hidden secret bootstrap policy
118
+ - any known heavier first-run expectations
119
+
120
+ ## Exact commands actually run in the verified lab
121
+
122
+ ```bash
123
+ docker compose build --no-cache
124
+ docker compose up --build -d --wait android-baseline
125
+ docker compose ps
126
+ curl -fsS "http://$(docker compose port android-baseline 8080)/app-debug.apk" -o /tmp/android-kotlin-views-app-debug.apk
127
+ shasum -a 256 artifacts/app-debug.apk
128
+ docker compose down --remove-orphans
129
+ ./run_tests.sh
130
+ python3 - <<'PY'
131
+ import signal
132
+ import subprocess
133
+ import time
134
+ from urllib.request import urlopen
135
+
136
+ cwd = "/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline"
137
+ proc = subprocess.Popen(["docker", "compose", "up", "--build"], cwd=cwd)
138
+ error = None
139
+ try:
140
+ deadline = time.time() + 600
141
+ while time.time() < deadline:
142
+ try:
143
+ port = subprocess.check_output(["docker", "compose", "port", "android-baseline", "8080"], cwd=cwd, text=True).strip()
144
+ with urlopen(f"http://{port}/app-debug.apk", timeout=5) as response:
145
+ if response.status == 200:
146
+ print("Android Kotlin Views baseline reached artifact-serving healthy state during docker compose up --build")
147
+ break
148
+ except Exception as exc:
149
+ error = exc
150
+ time.sleep(5)
151
+ else:
152
+ raise RuntimeError(f"Android Kotlin Views baseline never became ready: {error}")
153
+ finally:
154
+ proc.send_signal(signal.SIGINT)
155
+ try:
156
+ proc.wait(timeout=30)
157
+ except subprocess.TimeoutExpired:
158
+ proc.kill()
159
+ proc.wait(timeout=30)
160
+ subprocess.run(["docker", "compose", "down", "--remove-orphans"], cwd=cwd, check=True)
161
+ PY
162
+ ```
163
+
164
+ ## Observed verification results in the verified lab
165
+
166
+ - `docker compose up --build -d --wait android-baseline`: passed and reported the service healthy
167
+ - `curl -fsS "http://$(docker compose port android-baseline 8080)/app-debug.apk"`: passed and downloaded a non-empty APK
168
+ - `./run_tests.sh`: passed after running containerized lint, JVM unit tests, and the APK smoke check
169
+ - foreground `docker compose up --build`: reached the documented artifact-serving state before controlled shutdown, so it converged honestly instead of hanging indefinitely with no proof
170
+
171
+ ## Common pitfalls
172
+
173
+ - defaulting to Compose or emulator requirements when the prompt asks for Views-only baseline work
174
+ - requiring Robolectric or device runtime proof when the truthful Linux Docker baseline does not need it
175
+ - making `docker compose up --build` a one-shot build that exits without a stable healthy state
176
+ - rebuilding the Android SDK on ordinary reruns because the Dockerfile copies the whole source tree before caching the heavy toolchain layers
177
+ - sharing mutable Gradle cache state across multiple concurrent containers when one running container is enough
178
+ - publishing more host ports than the artifact-serving support surface actually needs
179
+ - checking in `.env` or any plaintext secrets even though this baseline does not need them
180
+
181
+ ## Acceptance checklist
182
+
183
+ Scaffold is acceptable when:
184
+
185
+ - `docker compose up --build` works and reaches the documented healthy state
186
+ - `./run_tests.sh` works and stays containerized
187
+ - minimal real Kotlin tests exist
188
+ - the Docker path is honest about stopping at APK build/test/artifact proof on Linux
189
+ - README is honest and traceable
190
+ - no `.env` is required or committed
191
+ - the result is experimentally verified, not just theoretically described