npm - theslopmachine - Versions diffs - 0.6.2 → 0.7.0 - Mend

theslopmachine 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -52,6 +52,8 @@ At the beginning of owner work:
 Do not use this preflight to perform clarification, planning, scaffold, or implementation work.
+If bootstrap seeded a later `current_phase` from `requested_start_phase`, verify that the adopted repo and evidence really support starting there; if not, repair the phase conservatively before continuing.
 ## Prompt and project metadata boundary
 - keep the real project prompt in `../metadata.json` under `prompt`
@@ -60,15 +62,27 @@ Do not use this preflight to perform clarification, planning, scaffold, or imple
 - map appended stack/context lines into structured project metadata fields when defensible and keep the raw remainder in `../.ai/startup-context.md`
 - if the separation is unclear, resolve it before clarification proceeds
+## Developer rulebook decision
+- during `P1`, choose the repo-local developer rulebook file that matches the active backend
+- for the normal OpenCode developer backend, use `AGENTS.md`
+- for `slopmachine-claude`, use `CLAUDE.md`
+- record that decision in `../.ai/metadata.json` as `developer_rulebook_file`
+- before launching the developer, ensure the chosen rulebook file exists in `repo/` and reflects the current packaged template or an explicitly approved equivalent
+- for `slopmachine-claude`, if `repo/CLAUDE.md` is missing but `repo/AGENTS.md` exists, rename `repo/AGENTS.md` to `repo/CLAUDE.md` during `P1` before the first Claude developer launch
 ## Required startup outputs
 - `../.ai/metadata.json` exists with the current workflow/session schema
 - `../metadata.json` exists with project-fact fields
 - `../.ai/startup-context.md` exists
 - seeded parent-root docs exist, including `../docs/questions.md`, `../docs/design.md`, `../docs/test-coverage.md`, and `../docs/api-spec.md`
+- parent-root `../.tmp/` exists as the evaluation artifact directory
 - seeded repo `README.md` exists
+- seeded repo `.claude/settings.json` exists with the repo-local Claude default-agent configuration
 - root workflow Beads exist for `P1` through `P10`
 - developer-session tracking is initialized
+- the backend-appropriate repo-local developer rulebook file has been chosen or is ready to be chosen in `P1`
 ## Workflow metadata fields
@@ -82,19 +96,24 @@ Track at least these fields in `../.ai/metadata.json`:
 - `bootstrap_mode`
 - `requested_start_phase`
 - `packaging_completed`
-- `claude_trace_root`
+- `claude_live_root`
+- `developer_rulebook_file`
 - `current_developer_lane`
 - `active_developer_session_id`
+- `primary_develop_session_id`
+- `latest_develop_session_id`
 - `next_develop_session_number`
 - `next_bugfix_session_number`
 - `developer_sessions`
 - `evaluation_prompt_kind`
 - `active_evaluator_session_id`
-- `self_test_reports_root`
-- `self_test_successful_cycle_count`
-- `self_test_cycles`
-- `failed_audit_count`
-- `failed_audits`
+- `evaluation_reports_root`
+- `evaluation_audit_count`
+- `evaluation_runs`
+- `completed_bugfix_session_count`
+- `required_bugfix_session_count`
+- `coverage_readme_audit_completed`
+- `coverage_readme_audit_report_path`
 Each `developer_sessions[]` record should include enough to recover and export it later, such as:
@@ -106,30 +125,30 @@ Each `developer_sessions[]` record should include enough to recover and export i
 - `created_phase`
 - `session_id`
 - `status`
-- `trace_dir`
+- `runtime_dir`
+- `tmux_session`
+- `transcript_path`
+- `opened_from_audit_number`
 - `orientation_completed`
 - `last_result_summary`
-- `last_resumed_at`
+- `last_turn_at`
-Each `self_test_cycles[]` record should include enough to recover the counted `P7` flow later, such as:
+If legacy metadata still contains `claude_trace_root`, normalize it to `claude_live_root` when repairing workflow state.
-- `cycle`
-- `session_id`
-- `status`
-- `initial_audit_result`
-- `cycle_dir`
-- `audit_report_path`
-- `fix_check_paths`
-- `open_issues_summary`
-- `completed_at`
+Each `evaluation_runs[]` record should include enough to recover deterministic `P7` routing later, such as:
-Each `failed_audits[]` record should include enough to recover non-counted failed-initial-audit remediation history, such as:
-- `attempt`
+- `audit_number`
 - `session_id`
+- `verdict`
 - `audit_report_path`
-- `archived_to`
+- `route_target`
+- `routed_developer_session_id`
+- `routed_developer_label`
+- `started_bugfix_session_id`
+- `started_bugfix_label`
+- `fix_check_paths`
 - `status`
+- `completed_at`
 ## Project metadata fields
@@ -152,11 +171,16 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
 - keep exactly one active developer session at a time
 - record every developer session in `developer_sessions`
-- from `P2` through `P6`, the main implementation lane is `develop-N`
-- if multiple pre-`P7` developer sessions are needed, they still stay in the `develop-N` lane
-- when `P7` begins, switch to a separate remediation lane `bugfix-N`
-- if multiple `P7` or post-`P7` remediation developer sessions are needed, they stay in the `bugfix-N` lane
+- from `P2` through `P6`, default to one long-lived `develop-1` lane
+- if a new `develop-N` session is created, it should happen only for controlled replacement or explicit user direction, not because `P7` found more issues
+- keep `primary_develop_session_id` pointing at the original long-lived develop session when that distinction matters
+- keep `latest_develop_session_id` pointing at the most recent recoverable `develop-N` session so `fail` audits can route back deterministically
+- when a fresh `P7` evaluation returns `partial pass`, create the next `bugfix-N` session tied to that audit number
+- when a fresh `P7` evaluation returns `fail`, route the issue list back to the latest `develop-N` session instead of opening `bugfix-N`
+- require 2 completed `bugfix-N` sessions before the final post-bugfix coverage/README audit can run
+- after the second bugfix session completes, run the separate coverage/README audit and keep its remediation on the most recently used recoverable developer session until the report is clean
 - keep the currently active lane mirrored in `current_developer_lane`
+- each tracked Claude-backed session should point at its live runtime directory so the lane can be recovered deterministically
 ## `P2` planning-entry rule
@@ -166,19 +190,24 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
 ## `P7` lane-transition rule
-- when `P7` starts, do not continue evaluator-driven fixes in the existing `develop-N` lane
-- create a new `bugfix-N` developer session for `P7` remediation work
-- run the repo-orientation prompt for that new `bugfix-N` session before sending evaluator issues
-- after the orientation completes, mark that `bugfix-N` session as the active developer session
-- preserve the earlier `develop-N` session records for continuity and export; do not overwrite them
+- when `P7` starts, keep the latest `develop-N` session recoverable and ready; do not automatically switch to `bugfix-N`
+- after each fresh evaluation report, branch deterministically by verdict:
+  - `fail` -> hand the issue list to the latest `develop-N` session and keep that develop session as the remediation lane
+  - `partial pass` -> create the next `bugfix-N` developer session, tie it to that audit number, and keep its loop scoped to that audit's issue list until the evaluator confirms the scoped issues are fixed
+  - `pass` -> record the audit, but do not treat it as `P7` completion if fewer than 2 bugfix sessions have been completed
+- run the repo-orientation prompt for a new `bugfix-N` session before sending its issue list
+- after the orientation completes, mark that `bugfix-N` session as the active developer session for that scoped remediation window
+- after a `bugfix-N` session is fully closed, preserve it in metadata and keep the latest `develop-N` session available for any later `fail` audit routing
 ## Recovery rule
 - if session or phase records disagree, stop and repair the inconsistency before proceeding
 - if the current phase already has an active developer session, recover it instead of silently creating a replacement
 - if an evaluator session is marked active, recover it before continuing the current `P7` cycle
-- treat resume as deterministic recovery, not guesswork
-- if the active Claude developer session is marked `rate_limited`, do not replace it with owner-side coding; preserve it, record the pause, and wait for the user to resume later
+- treat live-lane recovery as deterministic recovery, not guesswork
+- if the active Claude developer session is marked `rate_limited`, do not replace it with owner-side coding; preserve it, record the blocked state, and auto-wait for reset or resume from the same session when the wait helper completes
+- if an open evaluation run is tied to a `fail` audit, recover the latest `develop-N` session for remediation rather than starting `bugfix-N`
+- if an open evaluation run is tied to a `partial pass` audit, recover the linked `bugfix-N` session and its scoped fix-check loop instead of broadening the work
 On recovery, inspect at least:
@@ -193,9 +222,22 @@ On recovery, inspect at least:
 - store the active developer session id in Beads comments using `SESSION:`
 - mirror the active developer session id in `../.ai/metadata.json`
 - mirror the active developer session id in `../metadata.json` as `session_id`
-- for Claude-backed sessions, include backend and trace directory in the recorded session state so recovery and export remain deterministic
+- for Claude-backed sessions, include backend, runtime directory, tmux session name, and transcript path in the recorded session state so recovery and export remain deterministic
 - if these records disagree, repair them before continuing
-- do not silently create a replacement developer session if the intended existing one can still be resumed
+- do not silently create a replacement developer session if the intended existing one can still be recovered
+For live Claude lanes specifically:
+- treat bridge `state.json` as the authoritative transport truth
+- when `../.ai/metadata.json` disagrees with the bridge on the active Claude session id, runtime directory, tmux session, transcript path, or lane status, repair metadata from the bridge state before continuing
+- preserve the same tracked session id when the bridge reports `blocked`; do not replace it just to clear a temporary capacity issue
+- for final Claude session packaging, resolve the stored `session_id` within `~/.claude/projects` using the project `cwd`; tracked runtime or transcript pointers remain useful for recovery/debugging but are not the final package lookup key
+## Managed-lane guardrail
+- bridge-managed Claude TUI lanes are owner-controlled assets during ordinary workflow execution
+- do not manually type into a managed lane during normal operation
+- if a managed lane is touched manually for debugging or recovery, record that fact and then resync workflow metadata and Beads comments from bridge evidence before using the lane again
 ## Boundary-summary rule
@@ -208,7 +250,9 @@ On recovery, inspect at least:
 ## Initial structure rule
 - parent-root `../docs/` is the owner-maintained external documentation directory
-- parent-root `../sessions/` is the converted session-artifact directory for exported conversation traces
-- parent-root `../self_test_reports/` is the counted `P7` evaluation-cycle artifact directory
+- parent-root `../sessions/` is the cleaned raw session-export directory for non-Claude developer sessions
+- Claude-backed developer sessions are packaged once as parent-root `claude-sessions.zip` instead of per-session `../sessions/` entries
+- parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check-<M>.md`, and `test_coverage_and_readme_audit_report.md`
+- parent-root `../.ai/claude-live/` is the live Claude bridge runtime directory root
 - `../docs/questions.md` is the mandatory clarification record artifact
 - do not treat repo-local `docs/` as the active external documentation location

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -13,6 +13,9 @@ Use this skill during `P4 Development` before prompting the developer.
 - complete the real user-facing and admin-facing surface for the slice
 - keep slice-local planning, implementation, verification, and doc sync together
 - after planning is accepted, use the relevant accepted plan section as the slice baseline instead of expecting the owner to restate the full slice contract
+- when the owner provides a stage-exclusive checklist for the current slice or gate, treat that checklist as a hard acceptance contract and respond against it explicitly rather than answering loosely
+- before deeper implementation, do a quick serial-versus-parallel check for the current slice instead of defaulting to one long serial branch
+- when the slice contains 2 or 3 independent units with stable interfaces and low shared-file overlap, use parallel task fan-out for those units and then merge back into one reviewed result
 ## Module implementation guidance
@@ -25,9 +28,15 @@ Use this skill during `P4 Development` before prompting the developer.
 - handle failure paths and boundary conditions
 - add or update tests as part of the module work
 - prefer TDD when the behavior is well defined and the module is practical to drive test-first; otherwise define the expected tests before implementation and keep them tied to the module plan
+- when backend or fullstack API endpoints are added or changed, add or update real HTTP tests for the exact `METHOD + PATH` where practical instead of relying only on controller/service-level tests
+- when mocked HTTP coverage or unit-only coverage still exists, keep it explicit in the coverage notes instead of overstating it as equivalent to true no-mock endpoint coverage
+- when backend or fullstack API tests are material, keep the test names, fixtures, or assertions audit-readable enough that a reviewer can trace the endpoint, request input, and expected response behavior statically
+- keep track of important modules that still lack meaningful tests so hardening does not have to rediscover them from scratch
+- define the branch contract before parallelizing: expected outcome, boundaries, shared constraints, merge condition, and required verification
 - keep parent-root `../docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
 - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
 - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
+- for backend or fullstack work, keep configuration reads on the shared config path instead of introducing new scattered direct environment access in feature code
 - keep frontend and backend contracts synchronized when the module spans both sides
 - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
 - before closing the slice, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this slice lands?
@@ -50,11 +59,17 @@ Use this skill during `P4 Development` before prompting the developer.
 - do not hardcode secrets or persist local sensitive values in the repo while implementing
 - explain behavior changes clearly enough that the README and owner-maintained external documentation can be kept accurate
 - update `README.md` when runtime, build/preview, configuration, routes, tests, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
+- keep `README.md` aligned with the strict audit contract as the implementation matures: project type near the top, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
+- for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance for the strict README audit
+- for Android, iOS, and desktop work, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections the strict README audit expects
 - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
 - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
+- before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
 - verify the module against its planned behavior before trying to move on
 - do not move on while the module is still obviously weak or half-finished
 - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
+- do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
+- after parallel fan-in, run final targeted verification on the integrated result rather than trusting the branch-local checks alone
 ## Verification model
@@ -63,6 +78,8 @@ Use this skill during `P4 Development` before prompting the developer.
 - prefer fast local language-native or framework-native test commands for the changed area during normal iteration
 - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
 - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
+- fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
+- do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
 - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
 - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
 - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
@@ -70,9 +87,12 @@ Use this skill during `P4 Development` before prompting the developer.
 - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
 - for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
 - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
+- when backend logging matters, keep request or route outcomes, exceptions, and background failure logging on the shared structured logging path with redaction intact
 - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
-- keep the test surface moving toward at least 90 percent meaningful coverage of the relevant behavior area as slices are completed
+- keep the test surface moving toward the hard minimum 90 percent coverage threshold as slices are completed, and do not defer obvious coverage debt to hardening
+- for backend or fullstack APIs, keep `../docs/test-coverage.md` moving toward an endpoint inventory plus API test mapping table, not just a generic risk matrix
 - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
+- when the owner names specific expected outcomes for the slice or gate, tie the reported verification and changed files back to those expected outcomes explicitly
 - keep ordinary slice-complete replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
 ## Quality rules

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -5,39 +5,46 @@ description: Owner-side evaluation issue handoff and scoped fix-verification rul
 # Evaluation Issue Handoff
-Use this skill during `P7 Evaluation and Fix Verification` after an initial audit report exists.
+Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit report exists.
 ## Core rules
-- treat the current initial audit report as the authoritative issue source for the current cycle
+- treat the current fresh audit report as the authoritative issue source for that audit number
 - keep the issue set concrete and exact
-- use the active `bugfix-N` developer session for evaluator-driven remediation
+- route `fail` audits back to the latest `develop-N` session
+- use `bugfix-N` only for `partial pass` audits that explicitly opened a bugfix session
 - do not split the issue set into backend/frontend tracks
-- do not silently drop, merge away, or wave through issues from the current initial audit report
+- do not silently drop, merge away, or wave through issues from the current audit report
 - the owner must read the current audit report and extract the issues before talking to the developer
-- after the developer claims the fixes are complete, return to the same evaluator session that produced that cycle's initial audit report
+- after the developer claims the fixes are complete for a `partial pass` audit, return to the same evaluator session that produced that audit report
 - keep ordinary post-hardening evaluation remediation inside `P7`
-## Initial-audit result handling
+## Fresh-audit result handling
 ### `fail`
-- treat the audit as a non-counted remediation trigger
-- extract and hand off all issues to the active `bugfix-N` developer session
+- treat the audit as a remediation trigger that routes back to develop
+- extract and hand off all issues to the latest `develop-N` developer session
 - fix them
-- archive the failed initial audit report under `../.ai/`
-- do not count that audit as a successful cycle
-- run a fresh new evaluator session for the next initial audit
+- keep the audit report at its normalized `../.tmp/audit_report-<N>.md` path
+- do not open `bugfix-N` for this audit
+- run a fresh new evaluator session for the next audit
-### `partial pass` or `pass`
+### `partial pass`
-- treat the audit as the start of a counted cycle
-- use its exact issue list as the scope of the cycle
+- treat the audit as the start of a scoped bugfix session
+- use its exact issue list as the scope of that bugfix session
 - send that exact issue list to the developer in explicit but compact detail
+### `pass`
+- record the audit as a discarded clean audit and do not hand off an issue list
+- do not treat it as `P7` completion
+- immediately rerun a fresh evaluation until a `partial pass` opens the next scoped bugfix session
 ## Issue handoff standard
-- send the developer the exact issues from the current cycle's initial audit in explicit but trimmed detail
+- send the developer the exact issues from the current audit in explicit but trimmed detail
 - do not tell the developer to read the audit report directly
 - require the developer to address the full scoped issue list or its explicitly unresolved subset on later loop passes
 - require the developer to report the exact verification commands that were run and the concrete results they produced
@@ -47,21 +54,23 @@ Use this skill during `P7 Evaluation and Fix Verification` after an initial audi
 ## Scoped fix-check standard
-- the follow-up verification must happen in the same evaluator session that produced the current cycle's initial audit report
-- that same evaluator session should receive only the exact cycle-scoped issue list or the current unresolved subset
+- the follow-up verification must happen only for `partial pass` audits and must use the same evaluator session that produced that audit report
+- that same evaluator session should receive only the exact audit-scoped issue list or the current unresolved subset
 - that same evaluator session should only confirm whether those exact earlier items are fixed; it should not perform a broader new review
 - the follow-up report should describe what is resolved, what remains open, and any important verification caveats
-- store follow-up reports in the current cycle directory as `audit_fix_check_1.md`, `audit_fix_check_2.md`, and so on
+- store follow-up reports as `../.tmp/audit_report-<N>-fix_check-<M>.md`
 - do not rewrite the report text after generation except for file moves and filename normalization
 ## Scope discipline
-- counted-cycle remediation is strictly scoped to the issues from that cycle's initial audit report
+- `partial pass` remediation is strictly scoped to the issues from that audit report
 - do not let the fix-check loop expand into a fresh issue hunt
-- if a broader new review is needed, finish or abandon the current cycle appropriately and start a fresh evaluator session
+- if a broader new review is needed, finish or abandon the current scoped bugfix loop appropriately and start a fresh evaluator session
 ## Exit standard
-- do not move to `P8` until 2 successful counted cycles exist under `../self_test_reports/`
-- failed initial audits may exist under `../.ai/`, but they never count toward the required successful cycles
-- each successful cycle must have its initial audit report and any fix-check reports stored together in its cycle directory
+- after the second bugfix session completes, run the separate coverage/README audit and treat every issue in that report as blocking work for the most recently used recoverable developer session until the report is clean
+- keep the coverage/README report path fixed at `../.tmp/test_coverage_and_readme_audit_report.md` and replace the prior copy on each rerun instead of numbering it
+- do not move to `P8` until 2 bugfix sessions have been completed and the coverage/README audit report is clean
+- keep every fresh audit report under `../.tmp/audit_report-<N>.md`
+- for each bugfix session, keep its starting partial-pass audit report and any fix-check reports together by shared audit number in `../.tmp/`

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -22,36 +22,56 @@ The canonical evaluation prompt files are:
   - `assets/slopmachine/backend-evaluation-prompt.md`
   - `assets/slopmachine/frontend-evaluation-prompt.md`
 - installed runtime copies used during ordinary evaluation runs:
-- `~/slopmachine/backend-evaluation-prompt.md`
-- `~/slopmachine/frontend-evaluation-prompt.md`
+  - `~/slopmachine/backend-evaluation-prompt.md`
+  - `~/slopmachine/frontend-evaluation-prompt.md`
+  - `~/slopmachine/test-coverage-prompt.md`
 The installed runtime copies under `~/slopmachine/` are the ordinary evaluation prompt sources at runtime.
 ## Evaluation selection rule
-- choose one evaluation prompt kind for the whole `P7` cycle set
+- choose one fresh-audit evaluation prompt kind for the whole `P7` cycle set
 - if the project is frontend-only, use `~/slopmachine/frontend-evaluation-prompt.md`
 - if the project is backend-only, fullstack, or any other project type, use `~/slopmachine/backend-evaluation-prompt.md`
 - do not run both prompts in the same ordinary workflow cycle
+- the post-bugfix coverage/README audit is additional and always uses `~/slopmachine/test-coverage-prompt.md`
-## Report root
+## Report root and naming
-- counted evaluation-cycle reports live under parent-root `../self_test_reports/`
-- require 2 successful counted cycles before `P7` can finish
-- use zero-based cycle directories:
-  - `../self_test_reports/cycle-0/`
-  - `../self_test_reports/cycle-1/`
+- all `P7` audit and fix-check reports live under parent-root `../.tmp/`
+- do not use the older cycle-directory report-root model
+- number every fresh evaluation audit sequentially across the whole run:
+  - `../.tmp/audit_report-1.md`
+  - `../.tmp/audit_report-2.md`
+  - and so on
+- for a `partial pass` audit that opens a bugfix session, store each scoped fix-check under that audit number:
+  - `../.tmp/audit_report-<N>-fix_check-1.md`
+  - `../.tmp/audit_report-<N>-fix_check-2.md`
+  - and so on
 ## Evaluator-session model
-- every initial audit must start from a fresh `General` evaluator session
-- track the active evaluator session id in workflow metadata
-- keep using that same evaluator session for the counted cycle's scoped fix-check loop
-- do not reuse a failed-initial-audit evaluator session for the next fresh audit
+- every fresh audit must start from a fresh `General` evaluator session
+- track the active evaluator session id and current audit number in workflow metadata
+- if a fresh audit returns `partial pass`, keep using that same evaluator session only for the scoped fix-check loop tied to that audit's issue list
+- do not reuse a `fail` audit evaluator session for the next fresh audit
+- do not reuse an evaluator session from one audit number for another audit number
+## Developer-routing model
+- `P7` does not automatically switch to `bugfix-N`
+- keep the latest `develop-N` session recoverable throughout `P7`
+- branch fresh audit results this way:
+  - `fail` -> send the issue list back to the latest `develop-N` session
+  - `partial pass` -> start the next `bugfix-N` session tied to that audit number
+  - `pass` -> discard it as a non-counting clean audit and immediately run another fresh evaluation until a `partial pass` opens the next bugfix session
+- require 2 completed bugfix sessions before the final post-bugfix coverage/README audit can run
+- after `bugfix-1` completes, run a fresh new evaluation
+- after `bugfix-2` completes through its scoped fix-check loop, run the separate coverage/README audit before `P7` can close
 ## Audit launch rule
-For each fresh initial audit:
+For each fresh audit:
 - compose the chosen evaluation prompt yourself; do not tell the evaluator to read prompt files on its own
 - use the original project prompt from metadata
@@ -62,64 +82,80 @@ For each fresh initial audit:
 - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
 - send that fully composed text block directly to one fresh `General` evaluator session
 - require that session to produce a detailed file-backed audit report plus an issue summary
-- record the evaluator session id, prompt kind, and current audit/cycle state in metadata
+- assign the next audit number and normalize the report path to `../.tmp/audit_report-<N>.md`
+- record the evaluator session id, prompt kind, audit number, verdict, report path, and routing decision in metadata
-## Initial-audit branching rule
+## Fresh-audit branching rule
-After the initial audit report is produced, branch by audit result:
+After each fresh audit report is produced, branch by verdict:
 ### `fail`
-- this does not count as a successful cycle
-- extract all reported issues and send them to the active developer remediation session
-- fix the issues outside the counted-cycle flow
-- move the failed audit report out of the counted report set into `../.ai/`
-- record that failed audit in metadata under `failed_audits`
-- then start a brand new evaluator session and run a fresh initial audit again
+- record the audit as a `fail` under its `audit_report-<N>.md` path
+- extract all reported issues and send them to the latest `develop-N` session
+- do not open `bugfix-N` for a `fail` audit
+- fix the issues in that develop session
+- after remediation, start a brand new evaluator session and run the next fresh audit as `audit_report-<N+1>.md`
+### `partial pass`
-### `partial pass` or `pass`
+- record the audit as a `partial pass` under its `audit_report-<N>.md` path
+- start the next `bugfix-N` session and tie that session to audit number `<N>`
+- treat the exact issue list from `audit_report-<N>.md` as the full scope of that bugfix session
+- keep using the same evaluator session only for the scoped fix-check loop for that audit number
-- this begins a counted evaluation cycle
-- assign the next counted cycle number
-- create the corresponding cycle directory under `../self_test_reports/`
-- store the initial audit report in that cycle directory as `audit_report_1.md`
-- record the counted cycle in metadata under `self_test_cycles`
+### `pass`
-## Counted-cycle fix loop
+- record the audit as a discarded clean audit under its `audit_report-<N>.md` path
+- do not open `bugfix-N`
+- do not count it toward `P7` completion
+- immediately start another fresh evaluator session and continue `P7` until a `partial pass` opens the next bugfix session
-Inside a counted cycle:
+## Partial-pass fix-check loop
-- treat the exact issue list from that cycle's initial audit report as the scope of the cycle
+Inside a `partial pass` audit's bugfix loop:
+- treat the exact issue list from `audit_report-<N>.md` as the scope of the loop
 - send that exact issue list to the active `bugfix-N` developer session
 - do not tell the developer to read the audit report file directly
 - require the developer to fix the issues and report the exact verification commands and concrete results
 - after the developer claims the fixes are done, run a rough targeted owner-side verification pass on the affected behavior before asking for evaluator confirmation
-- then return to the same evaluator session and send only the exact issue list for scoped fix confirmation
+- then return to the same evaluator session and send only the exact issue list or current unresolved subset for scoped fix confirmation
 - require a file-backed fix-check report for that scoped verification pass
-- store each fix-check report inside the current cycle directory as `audit_fix_check_1.md`, `audit_fix_check_2.md`, and so on
-- if unresolved issues remain, take only that unresolved subset back to the developer and repeat the same-session fix-check loop
-- continue until all issues from that cycle's initial audit are resolved
+- store each fix-check report as `../.tmp/audit_report-<N>-fix_check-<M>.md`
+- if unresolved issues remain, take only that unresolved subset back to the same bugfix session and repeat the same-session fix-check loop
+- once all issues from `audit_report-<N>.md` are resolved, mark that bugfix session completed in metadata
+## Post-bugfix coverage and README audit
+- after 2 bugfix sessions have been completed, do not leave `P7` yet
+- read `~/slopmachine/test-coverage-prompt.md` yourself before launching the audit
+- launch a fresh `General` evaluator session for this audit
+- prepare the audit workspace with `node ~/slopmachine/utils/prepare_strict_audit_workspace.mjs --workspace-root .. --name test-coverage-readme-audit` and use the returned `run_dir` as the evaluator working directory so `repo/README.md` and `../.tmp/` both resolve correctly
+- compose the request yourself and make clear that the reviewer is working in the current project directory and must write the report to `../.tmp/test_coverage_and_readme_audit_report.md`
+- before each rerun, remove or replace the previous `../.tmp/test_coverage_and_readme_audit_report.md`; do not keep numbered variants for this report
+- if the report finds any issue, treat that as blocking `P7` completion
+- route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
+- require fixes plus concrete verification evidence from that developer session
+- after the fixes land, run a fresh new coverage/README audit again and replace the old report
+- keep looping until `../.tmp/test_coverage_and_readme_audit_report.md` is clean and the report confirms the minimum 90 percent coverage threshold is satisfied
 ## Scope rule
-- the counted-cycle fix loop is strictly scoped to the issues reported by that cycle's initial audit report
-- no new issue hunt belongs inside a counted cycle's fix-check loop
-- if a broader new review is needed, that belongs to the next fresh initial audit in a new evaluator session
-## Success target
-- require 2 successful counted cycles
-- each successful counted cycle must start from a fresh evaluator session
-- each successful counted cycle must fully resolve the scoped issues from its own initial audit before the next counted cycle begins
+- a bugfix session opened from `audit_report-<N>.md` is strictly scoped to that audit's issue list
+- no new issue hunt belongs inside that audit's fix-check loop
+- if a broader new review is needed, that belongs to the next fresh audit in a new evaluator session
+- if a later fresh audit fails after a bugfix session completes, route that new fail issue list back to the latest `develop-N` session instead of reopening the completed bugfix session
 ## Exit target
-- `P7` is complete only after 2 successful counted cycles exist under `../self_test_reports/`
-- failed initial audits may exist under `../.ai/`, but they do not count toward the 2 successful cycles
-- after the second successful counted cycle completes, move to `P8 Final Human Decision`
+- `P7` is complete only after 2 bugfix sessions have been completed and the post-bugfix coverage/README audit report is clean
+- the second bugfix session must be completed by resolving its scoped issue list through the same-audit fix-check loop
+- fresh `pass` audits before that point are discarded clean audits and do not replace the 2-bugfix-session requirement
+- after the second bugfix session completes, run the coverage/README audit; move to `P8 Final Human Decision` only after that audit passes cleanly
 ## Boundaries
 - this phase is owner-side evaluation orchestration, not the final human decision gate
-- keep the active `bugfix-N` developer lane for evaluator-driven fixes during `P7`
-- do not reopen the old numbered `self-test-run-N.md` / `self-test-fixes.md` model
+- keep audit numbering deterministic and monotonic across the whole run
+- do not reopen the old counted-cycle report-root model

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -33,8 +33,11 @@ Hardening should treat these as the main review buckets before final evaluation
 - audit security boundaries, validation, ownership, and secret handling
 - prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
 - audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
-- audit whether parent-root `../docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, and gaps in a way the owner can follow quickly
-- audit whether the project is actually approaching or achieving at least 90 percent meaningful coverage of the relevant behavior surface rather than relying on a thin happy-path suite
+- audit whether parent-root `../docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, gaps, and the intended minimum 90 percent threshold in a way the owner can follow quickly
+- audit whether the project actually meets the minimum 90 percent coverage threshold for the relevant behavior surface rather than relying on a thin happy-path suite
+- require concrete coverage evidence during hardening, such as a stack-native coverage report, configured threshold, or equally explicit proof; do not accept approximate claims here
+- when backend or fullstack APIs exist, audit whether `../docs/test-coverage.md` includes a resolved endpoint inventory, API test mapping, mock classification, and the important modules that still lack meaningful tests
+- when backend or fullstack APIs exist, audit whether core endpoint coverage is truly no-mock HTTP where it matters, and whether mocked or indirect tests are being overstated as stronger evidence than they are
 - audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
 - inspect architecture, coupling, file size, and maintainability risks
 - focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
@@ -55,6 +58,15 @@ Hardening should treat these as the main review buckets before final evaluation
 - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
 - enforce env-file discipline during hardening
 - run documentation verification against the real codebase and runtime behavior, not just document existence
+- audit README compliance against the strict post-bugfix README review shape:
+  - project type near the top
+  - startup instructions
+  - access method
+  - verification method
+  - demo credentials for every known role or the exact statement `No authentication required`
+  - architecture and workflow clarity
+- for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
+- verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
 - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
 - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
 - make sure the system is genuinely reviewable and reproducible
@@ -68,11 +80,13 @@ Before `P6` can close, the owner should have a clear answer for each of these:
 - prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
 - security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
 - test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
-- coverage depth: does the current evidence support roughly 90 percent meaningful coverage of the relevant behavior surface, and if not, what remains weak?
+- coverage depth: does the current evidence prove the minimum 90 percent coverage threshold for the relevant behavior surface, and if not, what remains weak?
+- endpoint coverage readiness: if backend or fullstack APIs exist, could a strict static reviewer map the important `METHOD + PATH` surfaces to true no-mock HTTP tests, mocked HTTP tests, or unit-only coverage without guessing?
 - major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
 - static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
 - security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
 - coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
+- README hard-gate readiness: would a fresh static reviewer find the required project type, startup, access, verification, and auth-disclosure sections in `README.md` without reconstructing them from code?
 - frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
 - repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?