npm - theslopmachine - Versions diffs - 0.6.2 → 0.7.1 - Mend

theslopmachine 0.6.2 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

package/assets/skills/verification-gates/SKILL.md CHANGED Viewed

@@ -25,6 +25,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - require the README to explain what the project does, how to run it, how to test it, the main repo contents, and any important new-developer information
 - require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
 - do not require the README to carry a full API catalog
+- require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
 - do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
 - require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
 - if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
@@ -33,7 +34,8 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - require parent-root `../docs/test-coverage.md` to be evaluator-shaped rather than generic: requirement or risk point, mapped test evidence, coverage status, major gap, and minimum test addition
 - when auth or access-control behavior is relevant, require static security-boundary evidence that a fresh reviewer can trace for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug surfaces, and tenant or user isolation when applicable
 - require logging structure and validation or error-handling structure to be statically traceable from repo artifacts and, when needed, owner-maintained external docs
-- for web projects, default the runtime command to `docker compose up --build` unless the prompt or existing repository clearly dictates another model
+- for web projects, require the runtime command to be `docker compose up --build`
+- for backend, fullstack, and web projects, allow and expect an additional README compatibility note containing the exact string `docker-compose up` for the strict README audit, but do not treat that as a replacement for the canonical `docker compose up --build` contract
 - for Dockerized web projects, require a dev-only runtime bootstrap script or equivalent startup path so `docker compose up --build` works without user exports or `.env`
 - do not accept Dockerized web startup that depends on manual export steps before the runtime command
 - do not accept Dockerized web startup that relies on checked-in `.env` files or hardcoded runtime values to satisfy local startup
@@ -41,12 +43,13 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - require `./run_tests.sh` to use the same runtime bootstrap model or an equivalent model with the same generated-value rules as `docker compose up --build`
 - if runtime values persist across restarts, require them to live in Docker-managed runtime state rather than committed repo files
 - require README disclosure that the bootstrap path is local-development-only behavior rather than the production secret-management path
-- when `docker compose up --build` is not the runtime contract, require `./run_app.sh` to be the documented primary runtime wrapper
+- for Android, mobile, desktop, and iOS-targeted projects, require a meaningful `docker compose up --build` command even when platform-specific runtime proof differs from web semantics
+- for Android, mobile, desktop, and iOS-targeted projects, allow `./run_app.sh` as an additional platform helper but not as a replacement for the required Docker command
 - require `./run_tests.sh` to be self-sufficient enough to run from a clean Linux VM that only has Docker and curl available by default
 - do not accept a broad test path that depends on host package managers or preinstalled host language runtimes when Docker can provide the execution environment instead
-- for web projects using the default Docker-first runtime model, require `./run_tests.sh` to be the Dockerized broad test path used only for the limited broad verification moments rather than as the ordinary development verification path
+- for web projects, require `./run_tests.sh` to be the Dockerized broad test path used only for the limited broad verification moments rather than as the ordinary development verification path
 - when host-level setup would otherwise be required, prefer a Dockerized `./run_tests.sh` path even outside traditional web stacks so the broad verification remains portable
-- for non-web or non-Docker projects, require `./run_tests.sh` to be the platform-equivalent broad test path used for final broad verification
+- for non-web projects, require `./run_tests.sh` to remain containerized and usable as the platform-equivalent broad test path used for final broad verification
 ## Review standard
@@ -67,7 +70,11 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - do not accept fake-success paths that materially hide missing failure handling
 - do not accept frontend/backend drift in fullstack work
 - do not accept missing end-to-end coverage for major fullstack flows
-- do not accept coverage posture that clearly falls short of roughly 90 percent meaningful coverage of the relevant behavior surface without a prompt-faithful reason
+- do not accept coverage posture that falls short of the minimum 90 percent coverage threshold for the relevant behavior surface without an explicit prompt-faithful exception
+- when backend or fullstack APIs exist, do not accept missing endpoint inventory or missing API-test mapping for the important `METHOD + PATH` surfaces
+- when backend or fullstack APIs exist, do not accept mocked or indirect tests being presented as equivalent to true no-mock HTTP endpoint coverage
+- do not accept a README that is missing project type, startup instructions, access method, verification method, or auth disclosure when the strict README audit would expect them
+- do not accept final delivered docs or wrapper flows that still depend on `npm install`, `pip install`, `apt-get`, manual DB setup, or other host-only setup assumptions after development is complete
 - do not accept a repo that only becomes understandable by reading parent-root docs or sibling workflow artifacts
 - do not accept frontend-bearing work that lacks repo-local build/preview/config guidance when those commands or surfaces are material to the product
 - do not accept frontend-bearing work that lacks a credible state model for prompt-critical flows
@@ -84,6 +91,16 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - do not accept module completion that ignores integration seams or cross-cutting consistency with the existing system
 - do not accept end-to-end evidence that bypasses a required user-facing or admin-facing surface with direct API shortcuts
+## Gate-demand rule
+- when setting a planning, scaffold, development, integrated-verification, hardening, or evaluation gate, reference the relevant accepted plan sections and then give an explicit stage-exclusive checklist for that gate
+- the gate checklist should name:
+  - the exact outcomes that must now be true
+  - the exact evidence that must now exist
+  - the important shortcuts, omissions, or future-work excuses that are not acceptable for this gate
+- do not re-dump the whole plan; isolate the exact subset of plan-backed expectations that must now be closed
+- at gate moments, prefer more explicit owner messages over ultra-short prompts so the developer cannot plausibly misread what acceptance depends on
 ## Cadence rule
 - use targeted local verification as the default during scaffold corrections, development, hardening, and evaluation fix loops
@@ -91,7 +108,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - do not turn ordinary acceptance into repeated integrated-style gate runs
 - do not run `./run_tests.sh` casually on the owner side
 - do not run `docker compose up --build` casually on the owner side
-- for web projects using the default Docker-first runtime model, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
+- for web projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
 - after that scaffold confirmation, the next Docker-based run should be at development completion or integrated-verification entry unless a real blocker forces earlier escalation
 - in between those two broad checks, ordinary development should rely on local fast verification only
 - ordinary in-phase verification should not invoke `docker compose up --build` or `./run_tests.sh` unless the workflow is explicitly at one of those broad gate moments or a blocker justifies an earlier escalation
@@ -101,8 +118,10 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - inspect the result and evidence, not just the developer claim
 - review technical quality, prompt alignment, architecture impact, and verification depth of the current work
 - after planning is accepted, treat the accepted plan and its relevant section as the default slice baseline instead of restating the full slice contract in every owner prompt
-- for ordinary slice work after planning, keep the owner prompt to one short paragraph plus a small checklist of slice-specific guardrails, review concerns, or deltas that are not already clear from the accepted plan
+- for ordinary slice work after planning, keep the owner prompt anchored to the relevant accepted plan sections and use an explicit checklist of slice-specific required outcomes, verification expectations, and review concerns that are not already clear from the accepted plan
+- when the current step is a real gate or phase-exit decision, be more explicit than ordinary slice prompts and enumerate the full stage-exclusive acceptance checklist
 - during normal implementation iteration, always prefer fast local language-native or framework-native verification for the changed area instead of the selected stack's broad gate path
+- during normal implementation iteration, fast local tooling setup is allowed when it helps iteration speed, but treat it as temporary engineering scaffolding rather than part of the final delivered runtime or test contract
 - require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
 - require the developer to report the exact verification commands that were run and the concrete results they produced
 - when API tests are used as evidence, require them to hit real endpoints and expose simple useful response evidence such as status codes and message/body summaries
@@ -126,11 +145,11 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - the evaluator-session cycles required inside `P7` are not part of the ordinary owner-run broad-gate budget; they are the formal final evaluation model for that phase
 - for Electron or other Linux-targetable desktop projects, the broad gate should use the Dockerized desktop build/test path plus headless UI/runtime verification rather than pretending web-style Docker runtime semantics apply
 - for Android projects, the broad gate should use the Dockerized Android build/test path without depending on an emulator
-- for iOS-targeted projects on Linux, the broad gate should rely on `./run_tests.sh` plus static/code review evidence and should not claim native iOS runtime proof unless a real macOS/Xcode checkpoint exists
+- for iOS-targeted projects on Linux, the broad gate should include `docker compose up --build` plus `./run_tests.sh` and static/code review evidence, and should not claim native iOS runtime proof unless a real macOS/Xcode checkpoint exists
 - the workflow target is at most 3 broad owner-run verification moments across the whole cycle
 - ordinary planning, ordinary slice acceptance, and routine in-phase verification are not broad gates by default and should rely on targeted local verification unless the risk profile says otherwise
-For web projects using the default Docker-first runtime model, the default Docker cadence is:
+For web projects, the default Docker cadence is:
 1. one owner-run `docker compose up --build` plus one owner-run `./run_tests.sh` after scaffold completion
 2. no more Docker-based runs during ordinary development work
@@ -144,24 +163,34 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - clarification requires the `clarification-gate` conditions plus explicit approval record
 - planning requires the `developer-session-lifecycle` and planning-gate conditions plus a fresh planning-oriented start and the required documentation and repo hygiene state when relevant
+- planning exit also requires explicit owner review that the accepted planning artifacts cover the section-addressable contract deeply enough for later implementation: in-scope and out-of-scope, actors and success paths, modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
+- planning exit does not pass if those sections exist only nominally or remain too vague to drive implementation without broad reinvention
+- planning exit also requires that the accepted plan covers the final README hard-gate shape and, when backend or fullstack APIs exist, the endpoint-inventory and API-test mapping strategy needed for the strict coverage audit
 - scaffold requires evidence for the bounded scaffold gate, baseline logging/config, and when relevant the chosen frontend stack and UI approach being set intentionally
 - scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, a usable local test environment in the current working directory, and the correct primary runtime command plus `./run_tests.sh` documented and working when practical
-- for web projects, scaffold normally requires Docker-first runtime foundations unless the prompt or existing repository clearly dictates another model
+- for web projects, scaffold requires Docker runtime foundations
+- for Android, mobile, desktop, and iOS-targeted projects, scaffold also requires a meaningful `docker compose up --build` path plus containerized `./run_tests.sh`
 - for Dockerized web projects, scaffold also requires the dev-only runtime bootstrap path to be wired so `docker compose up --build` works without manual exports or `.env`
 - for Dockerized web projects, scaffold also requires owner review of Compose files, runtime bootstrap scripts, entrypoints or wrappers, and `./run_tests.sh` to confirm the no-export, no-`.env`, no-pre-seeded-secret-literals model is actually implemented
 - when the project has database dependencies, scaffold also requires a real `./init_db.sh` created during scaffold, wired into the runtime/test flow when needed, and populated with the database setup already known at that stage
 - scaffold also requires `./run_tests.sh` to handle its own required setup from a clean Linux VM that only has Docker and curl available by default
 - local tests should still exist for ordinary development work even when the primary broad test command is Dockerized
+- scaffold also requires `README.md` to have the baseline section shape needed for the final README audit, even when many sections are still scaffold-level placeholders
 - when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
 - for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
 - scaffold acceptance also requires clean startup and teardown behavior in the selected runtime model; for Dockerized web projects this includes self-contained Compose namespacing and no unnecessary fragile `container_name` usage
 - for Dockerized web projects, scaffold acceptance also requires collision-resistant shared-machine defaults: only the primary app-facing port exposed to host by default, internal services not bound to host without prompt need, default host binding on `127.0.0.1`, and either random host-port assignment or a real free-port fallback when fixed ports are required
-- for web projects using the default Docker-first runtime model, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
+- for web projects, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
+- for Android, mobile, desktop, and iOS-targeted projects, scaffold acceptance is not complete until the owner has also run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
 - module implementation requires targeted local verification only; browser E2E and other broad gate evidence belong to owner-run major checkpoints rather than ordinary slice acceptance
+- module implementation acceptance requires explicit checking against the relevant accepted plan sections and the current stage-exclusive checklist, not just a loose sense that the feature exists
 - module implementation acceptance should challenge tenant isolation, path confinement, sanitized error behavior, prototype residue, integration seams, and cross-cutting consistency when those concerns are in scope
 - module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
+- when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
 - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
-- module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the planned 90 percent meaningful coverage target instead of accumulating test debt
+- module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
+- before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
+- before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
 - integrated verification completion requires explicit full-system evidence before the phase can close
 - integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
 - web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
@@ -174,14 +203,15 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - hardening must explicitly re-check secret handling, redaction, and frontend/backend observability hygiene
 - hardening must explicitly satisfy the documentation and repo hygiene policy in this file before final evaluation can begin
 - hardening must leave the repo statically reviewable enough that the final static evaluator can trace startup, tests, entry points, routes, config, and mock/local-data boundaries without rewriting core code
-- hardening must explicitly challenge any remaining gaps against the intended 90 percent meaningful coverage target and require justification or fixes before `P7`
+- hardening must explicitly challenge any remaining gaps against the minimum 90 percent coverage threshold and require proof, fixes, or an explicit prompt-faithful exception before `P7`
 - before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
 - before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
 - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
 - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
 - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
-- final evaluation readiness requires the cycle-based `P7` self-test model under `../self_test_reports/`; failed initial audits trigger non-counted remediation, counted cycles begin only from a `pass` or `partial pass` initial audit, cycle fix loops stay scoped to that cycle's initial issue list, and 2 successful fresh-session counted cycles are required before final human decision
+- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
 - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
+- before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
 ## Acceptance rule

package/assets/slopmachine/backend-evaluation-prompt.md CHANGED Viewed

@@ -200,7 +200,7 @@ Hard Rules (must follow)
 ====================
 Output Requirements
-Produce the final audit in a concise but complete report and write the consolidated report to `./.tmp/**.md`.
+Produce the final audit in a concise but complete report and write the consolidated report to `../.tmp/**.md`.
 The final report must be organized by the six major acceptance sections in order, even if your scan order was different.

package/assets/slopmachine/frontend-evaluation-prompt.md CHANGED Viewed

@@ -118,8 +118,8 @@ Based on static evidence only, determine whether the delivery is a credible, Pro
   - marked Cannot Confirm with a clear boundary explanation
 5) Exclude Temporary Output
-- Exclude `./.tmp/` and all its subdirectories.
-- `./.tmp/` must not be used as evidence, search scope, reference, summary source, or factual basis.
+- Exclude `./.tmp/`, `../.tmp/`, and all their subdirectories.
+- `./.tmp/` and `../.tmp/` must not be used as evidence, search scope, reference, summary source, or factual basis.
 [Pure Frontend-Specific Rules]
@@ -277,7 +277,7 @@ Your output must strictly follow this structure:
 2. Scope and Verification Boundary
 - what was reviewed
-- which input sources were excluded, including `./.tmp/`
+- which input sources were excluded, including `./.tmp/` and `../.tmp/`
 - what was not executed
 - what cannot be statically confirmed
 - which conclusions require manual verification
@@ -388,10 +388,10 @@ Before finalizing the report, check each of the following:
 3. Did you wrongly assign backend responsibility to the frontend?
 4. Did you misclassify reasonable mock / local data / storage usage as a defect?
 5. Did you state visual or interaction guesses as strong conclusions?
-6. Does any conclusion directly or indirectly rely on `./.tmp/`?
+6. Does any conclusion directly or indirectly rely on `./.tmp/` or `../.tmp/`?
 7. Have all required Blocker / High dimensions been closed?
 8. Have repeated findings been merged by root cause?
 9. If unsupported observations were removed, would the final Verdict still hold?
-If writing files is supported, save the final report to `./.tmp/`.
+If writing files is supported, save the final report to `../.tmp/`.
 Otherwise, return the report directly in the conversation.

package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Android Kotlin Compose Scaffold Playbook
+Use this playbook when the prompt explicitly requires native Android with Kotlin + Compose.
+## Current status
+This family is now **experimentally verified** for a reasonable Linux Docker baseline.
+Verified lab:
+- `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-kotlin-compose-baseline`
+## What was achieved in the verified lab
+The verified lab now demonstrates all of the following:
+- native Kotlin Android project exists
+- Compose is enabled in Gradle
+- pinned Android toolchain Dockerfile exists
+- `docker-compose.yml`, `run_tests.sh`, and artifact-serving scripts exist
+- `artifacts/app-debug.apk` and checksum were produced in the lab tree
+- JVM-side test files exist
+- `docker compose up -d --wait` reached a stable healthy artifact-serving state
+- `./run_tests.sh` passed with containerized lint plus `:app:testDebugUnitTest`
+- the Compose build no longer fails on the broken theme/import issues found during investigation
+## Safe default stack
+- AGP `8.5.x`
+- Gradle `8.7`
+- Java `17`
+- Kotlin `1.9.x`
+- `compileSdk = 34`
+- `targetSdk = 34`
+- `minSdk = 29`
+- Compose BOM pinned explicitly
+- Material 3 default Compose surface
+## Runtime contract
+- required Docker command: `docker compose up --build`
+- required broad test command: `./run_tests.sh`
+- both are now real and working in the verified lab
+- `./run_app.sh` may exist as a helper, but it does not replace the Docker baseline
+## Intended Docker strategy
+This family should follow the same proven Android pattern as the Java/Kotlin-Views baseline:
+1. pre-bake the Android SDK/toolchain layers into the image
+2. bind-mount the workspace for source changes
+3. avoid default `clean` tasks
+4. use one long-running artifact-serving/support container for the Compose healthy state
+5. reuse that same running container for lint and JVM-side test commands via `docker compose exec`
+## Honest Linux proof boundary
+For the verified Linux baseline, Docker honestly proves only:
+- Compose code compiles
+- debug APK assembles
+- lint passes
+- JVM-side tests pass
+- artifact-serving healthy state works
+Linux should **not** claim emulator or device runtime proof.
+## Verified rerun evidence
+The final rerun established these concrete facts:
+- direct `:app:assembleDebug --stacktrace` passed after fixing the broken app theme and missing `rememberSaveable` import
+- `./run_tests.sh` passed with the container reaching `Healthy`
+- containerized Gradle verification completed with `BUILD SUCCESSFUL in 1m 32s`
+- the generated APK size was non-zero and published from the artifact server
+## Guidance
+- use this family only when the prompt explicitly requires Compose
+- keep it non-default for open-ended Android work because the Java Views baseline is still the lighter generic default
+- it is now safe to treat this family as experimentally verified rather than only partially prepared

package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md ADDED Viewed

@@ -0,0 +1,191 @@
+# Android Kotlin Views Scaffold Playbook
+Use this playbook when the prompt explicitly wants native Android with Kotlin and XML/Views rather than Compose.
+This concrete playbook follows the shared Docker contract in `docker-shared-contract.md` and is grounded in the experimentally verified lab at `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline`.
+## Goal
+Create a simple Android Kotlin Views baseline that:
+- is baseline-only, not feature-complete
+- uses Kotlin plus Android Views, not Compose
+- stays honest about Linux-first Docker verification boundaries
+- keeps `docker compose up --build` as the required runtime/support contract
+- keeps `./run_tests.sh` as the required broad containerized verification path
+- requires no emulator
+## Runtime contract
+- required Docker command: `docker compose up --build`
+- required broad test command: `./run_tests.sh`
+- both commands must be real, containerized, and working
+- `./run_app.sh` may exist as a host convenience helper, but it does not replace the required Docker contract
+## Verified baseline notes
+From a real lab verification on 2026-04-15:
+- the verified lab is `/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline`
+- the lab uses Kotlin source plus XML Views and ViewBinding
+- `docker compose up --build -d --wait android-baseline` reached a stable healthy state
+- that healthy state is a loopback-only artifact server serving `artifacts/app-debug.apk` on the mapped host port reported by `docker compose port android-baseline 8080`
+- `./run_tests.sh` reused the running Compose container with `docker compose exec` for `:app:lintDebug` and `:app:testDebugUnitTest`, then smoke-checked the served APK
+- the containerized Gradle verification completed successfully without an emulator
+- the truthful proof boundary is pinned toolchain + APK assembly + lint + JVM unit tests + artifact serving; it does **not** claim emulator boot, adb deployment, or on-device runtime proof
+## Safe pinned defaults used in the verified lab
+- Android Gradle Plugin: `8.5.2`
+- Gradle wrapper: `8.7`
+- Kotlin Android plugin: `1.9.24`
+- Java: `17`
+- `compileSdk = 34`
+- `targetSdk = 34`
+- `minSdk = 29`
+- view system: XML layouts + Android Views + ViewBinding
+## Safe default libraries
+- AppCompat `1.7.0`
+- Material `1.12.0`
+- ConstraintLayout `2.1.4`
+- Lifecycle Runtime `2.8.4`
+- JUnit `4.13.2`
+Add Room, security, networking, or media libraries only when the prompt actually needs them.
+## Preferred repo shape
+- `app/`
+- `container-build-and-serve.sh`
+- `container-gradle.sh`
+- `docker-compose.yml`
+- `Dockerfile`
+- `run_tests.sh`
+- `run_app.sh`
+- `artifacts/` for the built APK and checksum
+## Docker strategy that was experimentally verified
+For Android-on-Linux Kotlin Views scaffolds, prefer a pinned toolchain image plus one long-running support container instead of pretending Docker proves native Android runtime.
+Verified pattern:
+1. build a pinned Android toolchain image from source in the repo
+2. pre-bake Java 17, Android command-line tools, platform `android-34`, build-tools `34.0.0`, and a seeded Gradle wrapper/plugin cache into the image
+3. bind-mount the workspace at runtime so source edits do not invalidate the heavy SDK layers
+4. start one long-running container that runs `:app:assembleDebug`, copies the APK to `artifacts/`, writes a checksum, and serves that directory over HTTP
+5. expose only one loopback-only host port with an automatic high host-port mapping: `127.0.0.1::8080`
+6. declare health only after the APK exists and the in-container HTTP server returns the APK successfully
+7. reuse that same running container in `./run_tests.sh` with `docker compose exec` so lint/test verification does not rebuild the entire toolchain path again
+This strategy satisfied the shared contract because `docker compose up --build` reached a meaningful healthy state and reruns avoided repeated SDK bootstrap work.
+## `./run_tests.sh`
+`./run_tests.sh` should remain containerized and should prove the portable Android baseline without an emulator:
+- start the Compose baseline with `docker compose up --build -d --wait`
+- reuse the running container for Gradle verification with `docker compose exec`
+- run at least `:app:lintDebug` and `:app:testDebugUnitTest`
+- smoke-check the same APK artifact surface that `docker compose up --build` claims to provide
+- tear the stack down after verification
+## Minimal real test floor
+At scaffold time, include at least:
+- one real Kotlin helper/rule test
+- one real state/helper test exercised by the Android entrypoint flow
+- real `lint` proof
+- real `assembleDebug` proof
+Do not leave the baseline test path mostly `NO-SOURCE`.
+## README floor
+`README.md` in the scaffold must already state:
+- that this is a baseline scaffold only
+- Kotlin + Android Views scope
+- required Docker command: `docker compose up --build`
+- required broad test command: `./run_tests.sh`
+- host helper command: `./run_app.sh` when present
+- what healthy state means for the artifact-serving support surface
+- what the Docker path does **not** prove on Linux
+- no `.env` / no hidden secret bootstrap policy
+- any known heavier first-run expectations
+## Exact commands actually run in the verified lab
+```bash
+docker compose build --no-cache
+docker compose up --build -d --wait android-baseline
+docker compose ps
+curl -fsS "http://$(docker compose port android-baseline 8080)/app-debug.apk" -o /tmp/android-kotlin-views-app-debug.apk
+shasum -a 256 artifacts/app-debug.apk
+docker compose down --remove-orphans
+./run_tests.sh
+python3 - <<'PY'
+import signal
+import subprocess
+import time
+from urllib.request import urlopen
+cwd = "/Users/yohannesakd/code/eaglepoint/demonstration/scaffold-lab/android-baseline"
+proc = subprocess.Popen(["docker", "compose", "up", "--build"], cwd=cwd)
+error = None
+try:
+    deadline = time.time() + 600
+    while time.time() < deadline:
+        try:
+            port = subprocess.check_output(["docker", "compose", "port", "android-baseline", "8080"], cwd=cwd, text=True).strip()
+            with urlopen(f"http://{port}/app-debug.apk", timeout=5) as response:
+                if response.status == 200:
+                    print("Android Kotlin Views baseline reached artifact-serving healthy state during docker compose up --build")
+                    break
+        except Exception as exc:
+            error = exc
+        time.sleep(5)
+    else:
+        raise RuntimeError(f"Android Kotlin Views baseline never became ready: {error}")
+finally:
+    proc.send_signal(signal.SIGINT)
+    try:
+        proc.wait(timeout=30)
+    except subprocess.TimeoutExpired:
+        proc.kill()
+        proc.wait(timeout=30)
+    subprocess.run(["docker", "compose", "down", "--remove-orphans"], cwd=cwd, check=True)
+PY
+```
+## Observed verification results in the verified lab
+- `docker compose up --build -d --wait android-baseline`: passed and reported the service healthy
+- `curl -fsS "http://$(docker compose port android-baseline 8080)/app-debug.apk"`: passed and downloaded a non-empty APK
+- `./run_tests.sh`: passed after running containerized lint, JVM unit tests, and the APK smoke check
+- foreground `docker compose up --build`: reached the documented artifact-serving state before controlled shutdown, so it converged honestly instead of hanging indefinitely with no proof
+## Common pitfalls
+- defaulting to Compose or emulator requirements when the prompt asks for Views-only baseline work
+- requiring Robolectric or device runtime proof when the truthful Linux Docker baseline does not need it
+- making `docker compose up --build` a one-shot build that exits without a stable healthy state
+- rebuilding the Android SDK on ordinary reruns because the Dockerfile copies the whole source tree before caching the heavy toolchain layers
+- sharing mutable Gradle cache state across multiple concurrent containers when one running container is enough
+- publishing more host ports than the artifact-serving support surface actually needs
+- checking in `.env` or any plaintext secrets even though this baseline does not need them
+## Acceptance checklist
+Scaffold is acceptable when:
+- `docker compose up --build` works and reaches the documented healthy state
+- `./run_tests.sh` works and stays containerized
+- minimal real Kotlin tests exist
+- the Docker path is honest about stopping at APK build/test/artifact proof on Linux
+- README is honest and traceable
+- no `.env` is required or committed
+- the result is experimentally verified, not just theoretically described