npm - theslopmachine - Versions diffs - 0.6.0 → 0.6.1 - Mend

theslopmachine 0.6.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/assets/agents/developer.md +17 -3
package/assets/agents/slopmachine-claude.md +17 -3
package/assets/agents/slopmachine.md +11 -1
package/assets/skills/claude-worker-management/SKILL.md +18 -3
package/assets/skills/developer-session-lifecycle/SKILL.md +9 -0
package/assets/skills/development-guidance/SKILL.md +4 -0
package/assets/skills/evaluation-triage/SKILL.md +4 -2
package/assets/skills/final-evaluation-orchestration/SKILL.md +5 -1
package/assets/skills/hardening-gate/SKILL.md +2 -0
package/assets/skills/integrated-verification/SKILL.md +8 -0
package/assets/skills/planning-gate/SKILL.md +1 -0
package/assets/skills/planning-guidance/SKILL.md +6 -0
package/assets/skills/scaffold-guidance/SKILL.md +16 -0
package/assets/skills/submission-packaging/SKILL.md +22 -8
package/assets/skills/verification-gates/SKILL.md +9 -0
package/assets/slopmachine/templates/AGENTS.md +2 -0
package/assets/slopmachine/utils/claude_create_session.mjs +15 -1
package/assets/slopmachine/utils/claude_resume_session.mjs +15 -1
package/assets/slopmachine/utils/claude_worker_common.mjs +126 -35
package/package.json +1 -1
package/src/install.js +6 -1
package/src/send-data.js +95 -8

package/assets/agents/developer.md CHANGED Viewed

@@ -54,11 +54,18 @@ Do not introduce convenience-based simplifications, `v1` reductions, future-phas
 If a simplification would make implementation easier but is not explicitly authorized, keep the full prompt scope and plan the real complexity instead.
+When accepted planning artifacts already exist, treat them as the primary execution contract.
+- read the relevant accepted plan section before implementing the next slice
+- do not wait for the owner to restate what is already in the plan
+- treat owner follow-up prompts mainly as narrow deltas, guardrails, or correction signals
 ## Execution Model
 - implement real behavior, not placeholders
 - keep user-facing and admin-facing flows complete through their real surfaces
 - verify the changed area locally and realistically before reporting completion
+- when closing a slice, think briefly about what adjacent flows, runtime paths, or doc/spec claims this slice could have affected before claiming readiness
 - keep `README.md` as the only documentation file inside the repo unless the user explicitly asks for something else
 - keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
 - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
@@ -153,12 +160,19 @@ If the owner asks you to help shape test-coverage evidence, make it acceptance-g
 - if you ran no verification command for part of the work, say that explicitly instead of implying broader proof than you have
 - if a problem needs a real fix, fix it instead of explaining around it
-Use this reply shape for substantive work:
+Default reply shape for ordinary slice completion, hardening, and fix responses:
+1. short summary
+2. exact changed files
+3. exact verification commands and results
+4. real unresolved issues only
+Keep the reply compact. Point to the exact changed files and the narrow supporting files the owner should read next.
+Use the larger reply shape only when the owner explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
 1. `Changed files` — exact files changed
 2. `What changed` — the concrete behavior/contract updates in those files
 3. `Why this should pass review` — prompt-fit, no unauthorized narrowing, and consistency check in 2-5 bullets
 4. `Verification` — exact commands run and exact results
 5. `Remaining risks` — only the real unresolved weaknesses, if any
-Keep the reply compact. Point to the exact changed files and the narrow supporting files the owner should read next.

package/assets/agents/slopmachine-claude.md CHANGED Viewed

@@ -50,6 +50,12 @@ The only allowed human-stop moments are:
 If you are not at one of those two gates, continue working.
+Claude-capacity exception:
+- if the active Claude developer session becomes rate-limited or capacity-blocked, do not take over implementation work yourself
+- preserve the current developer session record, mark it blocked by rate limit, and pause gracefully for the user to resume later
+- this is the only non-gate pause allowed in `slopmachine-claude`, and it exists only to wait for developer-session capacity recovery
 ## Core Role
 - own lifecycle state, review pressure, and final readiness decisions
@@ -79,7 +85,7 @@ Agent-integrity rule:
 - the only in-process agents you may ever use are `General` and `Explore`
 - do not use the OpenCode `developer` subagent for implementation work in this backend
 - use the Claude CLI `developer` worker session for codebase implementation work
-- if the work does not fit those paths, do it yourself with your own tools
+- if the Claude developer worker is unavailable because of rate limits or capacity exhaustion, do not replace it by coding yourself; pause and wait for resume
 ## Optimization Goal
@@ -164,6 +170,12 @@ Outside those two moments, do not stop just to report status, summarize progress
 If the work is outside those two gates, continue execution and make the best prompt-faithful decision from the available evidence.
 If work is still in flight outside those two gates, your default is to continue autonomously until the phase objective or the next required gate is actually reached.
+Claude-capacity exception:
+- if the active Claude developer session becomes rate-limited or otherwise capacity-blocked, pause gracefully and wait for the user to resume the run later
+- before pausing, update metadata and Beads comments to record that the active developer session is blocked by rate limit
+- do not reinterpret a rate-limited developer session as permission for owner-side implementation takeover
 ## Lifecycle Model
 Use these exact root phases:
@@ -198,6 +210,7 @@ Maintain exactly one active developer session at a time.
 - if multiple sessions are needed before `P7`, keep them in the `develop-N` lane
 - if multiple sessions are needed during `P7` remediation, keep them in the `bugfix-N` lane
 - track the active evaluator session separately in metadata during `P7`
+- if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and pause for resume instead of replacing it with owner implementation
 Do not launch the developer before clarification is complete and the workflow is ready to enter `P2`.
@@ -369,8 +382,8 @@ Operation map:
   - `node ~/slopmachine/utils/claude_resume_session.mjs`
 - export worker session for packaging:
   - `node ~/slopmachine/utils/export_ai_session.mjs --backend claude`
-- prepare exported session for conversion:
-  - `python3 ~/slopmachine/utils/strip_session_parent.py`
+- convert exported worker session directly for trajectory packaging:
+  - `node ~/slopmachine/utils/convert_exported_ai_session.mjs --converter-script ~/slopmachine/utils/convert_ai_session.py`
 Timeout rule:
@@ -381,6 +394,7 @@ Use wrapper outputs as the owner-facing contract:
 - success: compact parsed fields such as `sid` and `res`
 - failure: compact parsed fields such as `code` and `msg`
+- for long-running or flaky calls, inspect the wrapper `state-file` and `result-file` rather than treating Bash process lifetime alone as the source of truth
 Do not paste raw Claude JSON payloads into owner prompts, Beads comments, or metadata fields.

package/assets/agents/slopmachine.md CHANGED Viewed

@@ -274,7 +274,10 @@ Between those moments, rely on:
 - targeted unit tests
 - targeted integration tests
 - targeted module or route-family reruns
-- the selected stack's local UI or E2E tool when UI is material
+- targeted local non-E2E UI-adjacent checks when UI is material; keep browser E2E and Playwright for the owner-run broad gate moments unless a concrete blocker justifies earlier escalation
+The `P7` evaluator-cycle model is separate from the ordinary owner-run broad-verification budget above.
+Do not count the required evaluator sessions or counted cycles inside `P7` as ordinary broad owner-run verification moments.
 If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
@@ -313,10 +316,15 @@ When talking to the developer:
 - use direct coworker-like language
 - lead with the engineering point, not process framing
 - keep prompts natural, sharp, and compact unless the moment really needs more context
+- after planning is accepted, treat the accepted plan as the primary persistent implementation contract
+- after planning is accepted, do not restate large sections of the plan back to the developer unless the plan is wrong or incomplete
+- for normal slice work after planning, prefer one short paragraph plus a small checklist of the slice-specific guardrails or reminder items that are not already obvious from the accepted plan
+- when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
 - translate workflow intent into normal software-project language
 - do not mention session names, slot labels, phase labels, or workflow state to the developer
 - do not describe the interaction as a workflow handoff, session restart, or phase transition
 - express boundaries as plain engineering instructions such as `plan this but do not start implementation yet` rather than workflow labels like `planning only` or `stop before scaffold`
+- for slice-close or hardening-close requests, require compact replies by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues
 - for each development slice or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
 - require the developer to point to the exact changed files and the narrow supporting files worth review
 - require the developer to self-check prompt-fit, consistency, and likely review defects before claiming readiness
@@ -340,6 +348,7 @@ Do not speak as a relay for a third party.
 - prefer one strong correction request over many tiny nudges
 - keep work moving without low-information continuation chatter
 - read only what is needed to answer the current decision
+- after planning is accepted, prefer plan-section references plus narrow checklists over repeated prompt dumps
 - keep comments and metadata auditable and specific
 - keep external docs owner-maintained under parent-root `../docs/` as reference copies, and keep `README.md` as the only normal documentation file inside the repo
 - default review scope to the changed files and the specific supporting files named by the developer
@@ -373,6 +382,7 @@ After each substantive developer reply, do one of four things:
 Treat packaging as a first-class delivery contract from the start, not as late cleanup.
 - the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
+- the packaged source copies of those prompts live under `assets/slopmachine/`, and the installed runtime copies live under `~/slopmachine/`; ordinary evaluation runs should use the installed runtime copies
 - load `submission-packaging` before any packaging action
 - follow its exact artifact, export, cleanup, and output contract
 - do not invent extra artifact structures during ordinary packaging

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -40,7 +40,7 @@ For a new bounded developer session slot:
 Preferred creation pattern:
 ```bash
-node ~/slopmachine/utils/claude_create_session.mjs --cwd "$PWD" --prompt-file <file> --raw-output <file> --raw-error <file>
+node ~/slopmachine/utils/claude_create_session.mjs --cwd "$PWD" --prompt-file <file> --raw-output <file> --raw-error <file> --state-file <file> --result-file <file>
 ```
 When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
@@ -58,7 +58,7 @@ The default pattern is to let Claude create the session and then persist the ret
 For all later turns in the same bounded developer slot:
 ```bash
-node ~/slopmachine/utils/claude_resume_session.mjs --cwd "$PWD" --session-id <session_id> --prompt-file <file> --raw-output <file> --raw-error <file>
+node ~/slopmachine/utils/claude_resume_session.mjs --cwd "$PWD" --session-id <session_id> --prompt-file <file> --raw-output <file> --raw-error <file> --state-file <file> --result-file <file>
 ```
 - use `--resume` inside the wrapper implementation, not `-r`
@@ -68,16 +68,22 @@ node ~/slopmachine/utils/claude_resume_session.mjs --cwd "$PWD" --session-id <se
 ## Result capture rule
-The wrapper scripts should reduce the raw Claude result to a tiny machine-parseable object.
+The wrapper scripts should reduce the raw Claude result to a tiny machine-parseable object and also persist state/result files for monitoring.
 Use these fields only:
 - `sid`
 - `res`
+Monitoring files should include at least:
+- a live `state-file` showing running/completed/failed state, pid, byte counts, timestamps, and exit code
+- a final `result-file` containing the normalized success or failure object
 Treat `res` as the worker's answer.
 Do not feed raw Claude JSON into the owner session.
 Do not rely on transcript scraping for normal turn-to-turn orchestration.
+Do not rely on Bash stdout alone when the wrapper state or result files provide a clearer source of truth.
 ## Developer-slot continuity
@@ -161,6 +167,15 @@ Recommended additional fields when useful:
 - if a replacement session is required, record the handoff clearly in metadata and tracker comments
 - write raw stdout and stderr to trace files for debugging, but do not surface those raw files back into normal owner prompts unless debugging is explicitly needed
+## Rate-limit handling
+- if Claude returns a usage-limit or capacity-exhaustion result for the active developer session, do not take over implementation work in the owner session
+- mark the active developer session status as `rate_limited`
+- preserve the same Claude session id as the active tracked developer session
+- update `../.ai/metadata.json` and Beads `SESSION:` or `HANDOFF:` comments to record the rate-limit pause clearly
+- set workflow state to await user resume rather than creating owner-side implementation fallback work
+- when the user later resumes the run, continue from the same Claude developer session if it is resumable
 ## Worker prompt discipline
 - rely on the installed Claude `developer` agent definition for the worker persona

package/assets/skills/developer-session-lifecycle/SKILL.md CHANGED Viewed

@@ -178,6 +178,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata,
 - if the current phase already has an active developer session, recover it instead of silently creating a replacement
 - if an evaluator session is marked active, recover it before continuing the current `P7` cycle
 - treat resume as deterministic recovery, not guesswork
+- if the active Claude developer session is marked `rate_limited`, do not replace it with owner-side coding; preserve it, record the pause, and wait for the user to resume later
 On recovery, inspect at least:
@@ -196,6 +197,14 @@ On recovery, inspect at least:
 - if these records disagree, repair them before continuing
 - do not silently create a replacement developer session if the intended existing one can still be resumed
+## Boundary-summary rule
+- at meaningful accepted boundaries inside a long developer lane, refresh `last_result_summary` with a compact current-state snapshot instead of relying on the full prior conversation history
+- the boundary summary should capture only the current accepted contract, the current major guardrails, the most relevant changed areas, and the real unresolved issues that still matter
+- prefer boundary summaries at least at: accepted planning, scaffold acceptance, development-complete, integrated-verification completion, hardening completion, and bugfix-lane entry
+- when resuming a long-lived developer lane, use the boundary summary plus the relevant accepted plan section before replaying or re-describing broader history
+- keep these summaries short and decision-oriented so they reduce future context drag instead of becoming another source of prompt bloat
 ## Initial structure rule
 - parent-root `../docs/` is the owner-maintained external documentation directory

package/assets/skills/development-guidance/SKILL.md CHANGED Viewed

@@ -12,6 +12,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - work in bounded vertical slices
 - complete the real user-facing and admin-facing surface for the slice
 - keep slice-local planning, implementation, verification, and doc sync together
+- after planning is accepted, use the relevant accepted plan section as the slice baseline instead of expecting the owner to restate the full slice contract
 ## Module implementation guidance
@@ -19,6 +20,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - define the module purpose, constraints, and edge cases before coding
 - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
 - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
+- when working inside a slice, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims this slice could affect before reporting readiness
 - implement real behavior, not partial scattered logic
 - handle failure paths and boundary conditions
 - add or update tests as part of the module work
@@ -28,6 +30,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
 - keep frontend and backend contracts synchronized when the module spans both sides
 - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
+- before closing the slice, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this slice lands?
 - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
 - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
 - verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
@@ -70,6 +73,7 @@ Use this skill during `P4 Development` before prompting the developer.
 - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
 - keep the test surface moving toward at least 90 percent meaningful coverage of the relevant behavior area as slices are completed
 - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
+- keep ordinary slice-complete replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
 ## Quality rules

package/assets/skills/evaluation-triage/SKILL.md CHANGED Viewed

@@ -33,15 +33,17 @@ Use this skill during `P7 Evaluation and Fix Verification` after an initial audi
 - treat the audit as the start of a counted cycle
 - use its exact issue list as the scope of the cycle
-- send that exact issue list to the developer in explicit detail
+- send that exact issue list to the developer in explicit but compact detail
 ## Issue handoff standard
-- send the developer the exact issues from the current cycle's initial audit in explicit detail
+- send the developer the exact issues from the current cycle's initial audit in explicit but trimmed detail
 - do not tell the developer to read the audit report directly
 - require the developer to address the full scoped issue list or its explicitly unresolved subset on later loop passes
 - require the developer to report the exact verification commands that were run and the concrete results they produced
 - if the developer claims an issue is invalid or already fixed, require a concrete justification against the audit report instead of silently omitting it
+- keep the handoff complete, but avoid replaying large narrative chunks from the audit when a tighter issue bundle is enough
+- prefer a compact issue-bundle format such as: issue id or short label, exact finding, narrow evidence reference, required fix, and exact verification target
 ## Scoped fix-check standard

package/assets/skills/final-evaluation-orchestration/SKILL.md CHANGED Viewed

@@ -18,10 +18,14 @@ Use this skill only during `P7 Evaluation and Fix Verification`.
 The canonical evaluation prompt files are:
+- packaged source copies:
+  - `assets/slopmachine/backend-evaluation-prompt.md`
+  - `assets/slopmachine/frontend-evaluation-prompt.md`
+- installed runtime copies used during ordinary evaluation runs:
 - `~/slopmachine/backend-evaluation-prompt.md`
 - `~/slopmachine/frontend-evaluation-prompt.md`
-These two files are the only evaluation prompt sources for ordinary evaluation runs.
+The installed runtime copies under `~/slopmachine/` are the ordinary evaluation prompt sources at runtime.
 ## Evaluation selection rule

package/assets/skills/hardening-gate/SKILL.md CHANGED Viewed

@@ -58,6 +58,8 @@ Hardening should treat these as the main review buckets before final evaluation
 - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
 - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
 - make sure the system is genuinely reviewable and reproducible
+- keep hardening narrow: do not turn this phase into a hidden extra development slice or a broad rediscovery pass
+- prefer final honesty, consistency, static-review, and release-readiness cleanup over new implementation work
 ## Required hardening output

package/assets/skills/integrated-verification/SKILL.md CHANGED Viewed

@@ -11,6 +11,11 @@ Use this skill only during `P5 Integrated Verification`.
 Treat the first broad integrated run as a discovery pass.
+Integrated verification is expected to find some cross-slice issues.
+The optimization goal is not to pretend those issues should never exist.
+The optimization goal is to reduce avoidable hard failures and reduce how much debt survives into this phase.
 Once a failure class is known:
 - classify it
@@ -29,6 +34,7 @@ Once a failure class is known:
 - verify requirement closure, not just feature existence
 - verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
 - verify end-to-end flow behavior where the change affects real workflows
+- verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
 - for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
 - for mobile and desktop work, run the selected stack's platform-appropriate UI/E2E coverage for major flows and review screenshots or equivalent artifacts for real UI behavior and regressions
 - for Electron or other Linux-targetable desktop work, use the Dockerized desktop build/test path plus headless UI/runtime verification through Xvfb or an equivalent Linux-capable harness
@@ -43,6 +49,8 @@ Once a failure class is known:
 - verify secrets are not committed, hardcoded, or leaking through logs/config/docs
 - verify error surfaces and auth-related failures are sanitized for users and operators appropriately
 - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
+- when integrated verification repeatedly finds the same avoidable failure class, treat that as evidence that earlier slice execution or slice-close acceptance must become more system-aware in future runs
+- before closing the phase, verify the delivered startup path is genuinely runnable, the documented tests really execute, frontend behavior is usable when applicable, UI quality is acceptable, core running logic is complete, and Docker startup works when Docker is the runtime contract
 - tighten parent-root `../docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
 - when security-bearing behavior changes, tighten parent-root `../docs/design.md` and `../docs/api-spec.md` as needed so enforcement points and mapped tests stay accurate
 - when frontend-bearing behavior changes, tighten `README.md` plus parent-root `../docs/design.md` as needed so key pages, interactions, and required UI states stay accurate

package/assets/skills/planning-gate/SKILL.md CHANGED Viewed

@@ -36,6 +36,7 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
 ## Core planning gate
 - the developer should produce the first in-depth technical plan
+- once accepted, the plan should be detailed and section-addressable enough that later owner prompts can stay short and point the developer back to the relevant accepted section instead of re-dumping the implementation contract
 - do not create deep execution sub-items before the technical plan is accepted
 - do not accept planning that reduces, weakens, narrows, or silently reinterprets the original prompt
 - do not accept convenience-based narrowing, including unauthorized `v1` simplifications, deferred workflows, reduced actor/role models, weaker enforcement, or omitted operator/admin surfaces

package/assets/skills/planning-guidance/SKILL.md CHANGED Viewed

@@ -60,6 +60,7 @@ Selected-stack defaults:
 - require implementation-grade planning, not brainstorming
 - start from the actual project prompt and build the plan from there
 - carry the settled project requirements forward consistently as you plan
+- make the accepted plan durable enough to serve as the primary execution contract for later scaffold and development prompts instead of forcing the owner to restate the same implementation context repeatedly
 - identify the hard non-negotiable requirements early and do not quietly trade them away for implementation convenience
 - explicitly check that the plan still fits the business goal, main flows, and implicit constraints from the prompt
 - when planning technical items that depend on a library, framework, API, or tool, check Context7 documentation first for authoritative usage details
@@ -71,6 +72,7 @@ Selected-stack defaults:
 - keep the spec focused on required behavior rather than turning it into a progress or completion narrative
 - make the plan include system overview, architecture choice and reasoning, major modules or chunks, domain model, data model where relevant, interface contracts, failure paths, state transitions, logging strategy, testing strategy, README implications, and Docker execution assumptions when those dimensions apply
 - keep the primary planning package concentrated in parent-root `../docs/design.md`
+- organize the accepted plan so later slices can reference concrete sections cleanly instead of requiring the owner to rewrite the plan in follow-up prompts
 - put the risk-to-test matrix in parent-root `../docs/test-coverage.md`
 - when prompt-critical API/interface details need a dedicated document, keep them in parent-root `../docs/api-spec.md`
 - when additional prompt-critical boundaries must be statically reviewed, prefer adding narrow sections to `../docs/design.md` instead of creating extra in-repo docs
@@ -110,6 +112,7 @@ Selected-stack defaults:
 - if the project has database dependencies, plan for runtime and test entrypoints to call `./init_db.sh` whenever database preparation is required
 - do not hardcode database connection values or database bootstrap values anywhere in the repo; require the database setup flow to be driven by `./init_db.sh`
 - start `./init_db.sh` during scaffold with the real database setup already known, then keep expanding it as migrations, schema setup, bootstrap data, and other database dependencies become real through implementation
+- when the project has database dependencies, plan to inject database setup through initialization scripts rather than packaging local database dependency artifacts or environment-specific database state
 - define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
 - for web projects, default to a Docker-first runtime contract unless the prompt or existing repository clearly dictates another model
 - for web projects, default the primary runtime command to `docker compose up --build`
@@ -150,6 +153,9 @@ Selected-stack defaults:
 - define end-to-end coverage for major user flows before coding
 - define enough test coverage up front to catch major issues later, especially core happy path, important failure paths, security-critical paths, and obvious high-risk boundaries
 - enforce a plan to reach at least 90 percent meaningful coverage of the relevant behavior surface, not decorative line coverage
+- require API tests to exercise real API endpoints and real call flows rather than bypassing the endpoint layer with internal helper-only checks
+- when API tests are material, plan for them to print simple useful response evidence such as status codes and message/body summaries so verification output is easy to inspect
+- plan endpoint coverage so prompt-required functions and dependent multi-step API flows are actually exercised, not just isolated happy-path fragments
 - plan `../docs/test-coverage.md` in evaluator-facing shape rather than loose prose: requirement or risk point, mapped test file(s), key assertion(s) or fixtures, coverage status, major gap, and minimum test addition
 - do not satisfy `../docs/test-coverage.md` with generic test categories alone; make the matrix concrete enough that the owner can review prompt-critical risks without reconstructing the test story manually
 - when multiple prompt-critical domains exist, group the matrix by domain or risk cluster so each section names the requirement, planned test location, key assertions, current status, and remaining gap explicitly

package/assets/skills/scaffold-guidance/SKILL.md CHANGED Viewed

@@ -15,6 +15,7 @@ Use this skill during `P3 Scaffold` before prompting the developer.
 - make prompt-critical baseline behavior real where required
 - keep repo-local `README.md` honest from the start
 - make the selected-stack primary runtime command and the universal `./run_tests.sh` broad test command real from the scaffold stage
+- make the first scaffold pass strong enough that owner scaffold acceptance can rely on a narrow checklist rather than rereading the whole scaffold broadly
 For web projects using the default runtime model, scaffold must make these commands real and working before scaffold can pass:
@@ -60,6 +61,7 @@ For web projects using the default runtime model, scaffold must make these comma
 - put baseline config, logging, validation, and error-normalization structure in place
 - install and configure the local test tooling needed for ordinary iteration during scaffold rather than deferring local testing setup to later phases
 - create baseline test structure intentionally during scaffold so the project can grow toward at least 90 percent meaningful coverage instead of retrofitting tests late
+- when API tests are material, scaffold them so they hit real endpoints and print simple useful response evidence such as status codes and message/body summaries instead of hiding the real API behavior behind helper-only checks
 - for frontend-bearing web projects, install the local browser E2E tooling plus the component/page-or-route frontend test layer during scaffold when the project will need them
 - for mobile projects, install the local mobile testing layer during scaffold, defaulting to Jest plus React Native Testing Library for Expo/React Native work
 - for desktop projects, install the local desktop testing layer during scaffold, defaulting to the selected project test runner and Playwright Electron support or an equivalent desktop UI/E2E tool when UI verification is required
@@ -71,6 +73,7 @@ For web projects using the default runtime model, scaffold must make these comma
 - if the project has database dependencies, wire the runtime and test entrypoints to call `./init_db.sh` whenever database preparation is required
 - if the project has database dependencies, treat `./init_db.sh` as a living project artifact that must be expanded as migrations, schema setup, bootstrap data, and other database dependencies become real through implementation
 - do not hardcode database connection values or database bootstrap values in the repo; drive database setup through `./init_db.sh`
+- when the project has database dependencies, do not package local database dependency files or local database state as part of delivery; the delivery should rely on the initialization-script path instead
 - treat prompt-critical security controls as real baseline runtime behavior, not placeholder checks or visual wiring
 - if a requirement implies enforcement, persistence, statefulness, or rejection behavior, make that behavior real in the scaffold unless the prompt clearly scopes it down
 - do not accept shape-only security implementations such as header presence checks, passive constants, or partially wired middleware when the requirement implies real protection
@@ -91,6 +94,7 @@ For web projects using the default runtime model, scaffold must make these comma
 - establish README structure early instead of leaving it until the end
 - ensure `README.md` clearly documents the primary runtime command and the broad `./run_tests.sh` contract for the selected stack
 - ensure `README.md` focuses on what the project does, how to run it, how to test it, the main repo contents, and any important new-developer information rather than trying to replace the full API catalog
+- ensure `README.md` also explains the delivered architecture and major implementation structure clearly enough for code review and handoff
 - ensure `README.md` stands on its own and does not tell users or reviewers to rely on parent-root docs for core repo understanding
 - for Dockerized web projects, ensure `README.md` explains that local runtime values are bootstrapped automatically by the development startup path and that this is local-development behavior rather than production secret management
 - maintain the seeded parent-root `../docs/design.md` as the owner-maintained planning/design contract from the start
@@ -103,6 +107,7 @@ For web projects using the default runtime model, scaffold must make these comma
 - establish a shared validation path during scaffold so forms, requests, boundary checks, and normalized error behavior do not get invented ad hoc later
 - prove the scaffold in a clean state before deeper feature work
 - verify clean startup and teardown behavior under the selected stack's runtime contract
+- make the scaffold handoff compact and checklist-driven: the developer should be able to state runtime proof, test proof, docs honesty, and required repo-surface proof without a long narrative dump
 - for Dockerized web projects, verify clean startup and teardown behavior under the chosen project namespace
 - when the architecture materially depends on infrastructure capabilities such as rate limiting, encryption, offline support, or browser-storage policy, put the baseline framework and policy in place during scaffold rather than deferring it to late implementation
 - for backend integration paths, prefer production-equivalent test infrastructure when practical rather than silently substituting a weaker database or runtime model that can hide real defects
@@ -125,6 +130,17 @@ For web projects using the default runtime model, scaffold must make these comma
 Scaffold should make later slices easier, not force them to retrofit missing fundamentals.
+Before scaffold is handed back for owner acceptance, the developer should already have a compact answer for these scaffold checklist items:
+- runtime bootstrap works
+- database/bootstrap path works when relevant
+- `./run_tests.sh` works at the broad-scaffold level
+- frontend/backend wiring shape is real
+- config/env/bootstrap path is honest
+- `README.md` and scaffold docs are honest about what is and is not implemented
+- required scaffold files and directories exist
+- prohibited shortcuts or residue are not present
 ## Verification cadence
 - use local and narrow checks while correcting scaffold work

package/assets/skills/submission-packaging/SKILL.md CHANGED Viewed

@@ -16,6 +16,8 @@ Use this skill only during `P9 Submission Packaging`.
 - do not create `submission/` or other packaging-only directories for ordinary final delivery
 - packaging is incomplete until every required final artifact path has been verified to exist
 - do not stop packaging for approval, status confirmation, or handoff once this phase has begun; continue until the package is complete
+- when a task or platform question id exists such as `TASK-123`, use that exact id as the final deliverable/archive name without adding an extra `ID-` prefix
+- normalize project-type metadata and packaging labels to the expected engineering categories such as `full_stack` or `fullstack`, `pure_backend`, `pure_frontend`, `cross_platform_app`, or `mobile_app`
 ## Required final structure
@@ -54,38 +56,42 @@ No screenshots are required as packaging artifacts.
 - verify parent-root `../docs/test-coverage.md` exists and reflects the final delivered verification coverage
 - verify parent-root `../docs/questions.md` exists from the accepted clarification/question record
 - ensure `README.md` matches the delivered codebase, functionality, runtime steps, test steps, main repo contents, and important new-developer information, and stays friendly to a junior developer
+- ensure `README.md` also describes the delivered architecture at an implementation-review level rather than only listing commands
 - ensure `README.md` remains the primary in-repo documentation surface
 - verify no repo-local file depends on parent-root docs or sibling workflow artifacts for startup, build/preview, configuration, static review, or basic project understanding
 - if the project uses mock, stub, fake, interception, or local-data behavior, ensure `README.md` discloses that scope accurately and does not imply undisclosed real integration
 - if mock or interception behavior is enabled by default, ensure `README.md` says so clearly
 - include `./run_tests.sh` and any supporting runner logic it needs to execute the project's broad test path from a clean environment
 - when the project has database dependencies, include `./init_db.sh` and ensure it reflects the final delivered database setup rather than an earlier scaffold placeholder
+- when the project has database dependencies, package the initialization-script path rather than raw environment-specific database dependency artifacts or local database state
 - verify parent-root `../self_test_reports/` exists and contains the required counted cycle directories
 - export all tracked developer sessions before closing packaging
 - when packaging succeeds, update workflow metadata to mark `packaging_completed` as true
 ## Session export sequence
-Export every tracked developer session from metadata using its tracked label.
+Export every tracked developer session from metadata, keep a numbered cleaned root export, and convert each session into its lane-aware trajectory file.
-Use the tracked label for each developer session, for example:
+Use the tracked lane labels for converted developer sessions, for example:
 - `develop-1`
 - `bugfix-1`
 For each tracked developer session:
-1. `node ~/slopmachine/utils/export_ai_session.mjs --backend <backend> --cwd "$PWD" --session-id <session-id> --output ../session-export-<label>.raw`
-2. `python3 ~/slopmachine/utils/strip_session_parent.py ../session-export-<label>.raw --output ../session-<label>.json`
-3. `node ~/slopmachine/utils/convert_exported_ai_session.mjs --converter-script ~/slopmachine/utils/convert_ai_session.py --input ../session-<label>.json --output ../sessions/<label>.json`
+1. if `<backend>` is `claude`, run `node ~/slopmachine/utils/export_ai_session.mjs --backend claude --cwd "$PWD" --session-id <session-id> --output ../session-<N>.json`
+2. if `<backend>` is not `claude`, run `opencode export <session-id> > ../session-export-<label>.raw`
+3. if `<backend>` is not `claude`, run `python3 ~/slopmachine/utils/strip_session_parent.py ../session-export-<label>.raw --output ../session-<N>.json`
+4. `node ~/slopmachine/utils/convert_exported_ai_session.mjs --converter-script ~/slopmachine/utils/convert_ai_session.py --input ../session-<N>.json --output ../sessions/<label>.json`
 Where `<backend>` comes from the tracked developer session record in metadata.
 Use `opencode` when no explicit backend field exists.
+Use the tracked developer-session order to assign `<N>`.
 After those steps:
 - verify every tracked developer session has been exported and converted into `../sessions/` before continuing
-- keep `../session-<label>.json` in the parent root as the cleaned session export
+- keep `../session-<N>.json` in the parent root as the cleaned or direct exported session artifact
 - treat only the raw `../session-export-<label>.raw` files as temporary packaging intermediates
 - remove the raw `../session-export-<label>.raw` files before closing packaging
 - if the required utilities, metadata session ids, or output files are missing, packaging is not ready to continue
@@ -95,23 +101,30 @@ After those steps:
 - run `python3 ~/slopmachine/utils/cleanup_delivery_artifacts.py .` once near the end of packaging to remove known recursive cleanup targets from the delivered repo tree
 - remove runtime, editor, cache, tooling noise, generated artifacts, and environment junk recursively anywhere in the delivered repo tree
 - do not remove required delivery artifacts just because they look noisy
-- remove `.opencode/`, `.codex/`, `.vscode/`, env-file variants, caches, `node_modules/`, build outputs not part of delivery, raw test artifact directories, `__pycache__/`, `.pytest_cache/`, repo-local `AGENTS.md`, and accidental in-repo docs directories or extra documentation files beyond `README.md`
+- remove `.opencode/`, `.codex/`, `.vscode/`, env-file variants, caches, `node_modules/`, `.venv/`, `.net/`, build outputs not part of delivery, raw test artifact directories, `__pycache__/`, `.pytest_cache/`, repo-local `AGENTS.md`, and accidental in-repo docs directories or extra documentation files beyond `README.md`
+- remove environment-dependent content, local dependency trees, editor state, package-manager caches, and runtime caches anywhere in the delivery tree
+- do not package database dependency files or local database state when the delivered database setup is supposed to be injected through initialization scripts
+- do not package AI session conversion scripts or similar workflow utility scripts inside the delivered product attachment
 - remove repo-local `.tmp/` or parent-root `../.tmp/` if they exist; they are not part of the final delivery contract
 - the cleanup is recursive; do not leave forbidden directories or generated junk buried deeper in the repo hierarchy after cleanup
 ## Validation checklist
 - confirm the final package contains only the required delivery structure and necessary repo contents
+- confirm the final archive/deliverable naming uses the task/question id directly when one exists and does not invent an extra `ID-` prefix
 - confirm docs describe delivered behavior, not planned or aspirational behavior
 - confirm the delivered repo is statically reviewable enough that startup, test commands, core entry points, and any mock/local-data boundaries can be traced from repo artifacts alone
 - confirm `README.md` covers build/preview/runtime guidance, test commands, main repo contents, feature flags, debug/demo surfaces, mock defaults, and important new-developer information when those dimensions are material
+- confirm `README.md` also explains the delivered architecture and major implementation structure clearly enough for review
 - when the project has database dependencies, confirm `./init_db.sh` exists in the delivered repo and matches the final schema/bootstrap requirements
+- when the project has database dependencies, confirm database setup is injected through initialization scripts rather than packaged local database dependency artifacts
 - confirm the cleanup helper has been run and that no known recursive cleanup targets remain in the delivered repo tree
+- confirm no environment-dependent dependency directories, editor-state folders, runtime caches, or workflow utility scripts are packaged into the delivered product
 - confirm parent-root `../self_test_reports/` exists and contains the required counted cycle directories
 - confirm each counted cycle directory contains the initial audit report plus any fix-check reports generated for that cycle
 - confirm parent-root `../docs/test-coverage.md` explains the tested flows, mapped tests, and coverage boundaries
 - confirm exported developer sessions exist under parent-root `../sessions/` using the tracked `<label>.json` names
-- confirm cleaned session exports exist in the parent root as `../session-<label>.json`
+- confirm cleaned session exports exist in the parent root as numbered `../session-<N>.json` files
 - confirm parent-root `../docs/` remains consistent as an external reference set when workflow policy still requires it, but the delivered repo does not depend on it
 - confirm parent-root metadata fields are populated correctly
 - confirm workflow metadata marks `packaging_completed` as true
@@ -121,6 +134,7 @@ After those steps:
 - do one final package review before declaring packaging complete
 - confirm the package is coherent as a delivered project, not just a working repo snapshot
+- confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
 - confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
 - if packaging reveals a real defect or missing artifact, fix it before closing the phase
 - do not close packaging until all required docs, session exports, self-test files, cleanup conditions, and final structure checks are satisfied

package/assets/skills/verification-gates/SKILL.md CHANGED Viewed

@@ -100,14 +100,20 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - inspect the result and evidence, not just the developer claim
 - review technical quality, prompt alignment, architecture impact, and verification depth of the current work
+- after planning is accepted, treat the accepted plan and its relevant section as the default slice baseline instead of restating the full slice contract in every owner prompt
+- for ordinary slice work after planning, keep the owner prompt to one short paragraph plus a small checklist of slice-specific guardrails, review concerns, or deltas that are not already clear from the accepted plan
 - during normal implementation iteration, always prefer fast local language-native or framework-native verification for the changed area instead of the selected stack's broad gate path
 - require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
 - require the developer to report the exact verification commands that were run and the concrete results they produced
+- when API tests are used as evidence, require them to hit real endpoints and expose simple useful response evidence such as status codes and message/body summaries
 - require local runtime proof when relevant by starting the app or service through the selected stack's local run path and exercising the changed behavior directly rather than jumping to the broad gate path
 - if the local toolchain is missing, require the developer to install or enable it first; do not jump to Docker, `./run_tests.sh`, Playwright, or another broad gate path during ordinary iteration just because local setup is inconvenient
 - do not accept hand-wavy claims that local verification is unavailable without a real setup attempt and clear explanation
 - do not ask the developer to run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary implementation slices
 - if the developer already ran the relevant targeted local test command and reported it clearly, do not rerun the same command on the owner side unless the evidence is weak, contradictory, flaky, high-risk, or needed to answer a new question
+- for ordinary slice acceptance, default review scope to the changed files and the narrow supporting files named by the developer; expand only when a concrete inconsistency, missing dependency, or suspicious claim forces wider review
+- for ordinary slice acceptance, prefer a narrow acceptance checklist over broad exploratory rereads
+- require compact ordinary slice-close replies by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless a deeper mapping is explicitly needed
 - if verification is weak, missing, or failing, require fixes and reruns before acceptance
 - if documentation or repo hygiene drifts, secrets leak, contracts drift, or frontend integrity is compromised, require cleanup before acceptance
 - keep looping until the current work is genuinely acceptable
@@ -117,6 +123,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
 - a broad gate is an owner-run integrated verification boundary, not every ordinary phase change
 - a phase change alone does not automatically require a broad gate unless that phase exit explicitly calls for one
 - a broad gate normally means some combination of full clean runtime proof, `./run_tests.sh`, and platform-appropriate UI/E2E evidence when UI-bearing flows exist
+- the evaluator-session cycles required inside `P7` are not part of the ordinary owner-run broad-gate budget; they are the formal final evaluation model for that phase
 - for Electron or other Linux-targetable desktop projects, the broad gate should use the Dockerized desktop build/test path plus headless UI/runtime verification rather than pretending web-style Docker runtime semantics apply
 - for Android projects, the broad gate should use the Dockerized Android build/test path without depending on an emulator
 - for iOS-targeted projects on Linux, the broad gate should rely on `./run_tests.sh` plus static/code review evidence and should not claim native iOS runtime proof unless a real macOS/Xcode checkpoint exists
@@ -152,9 +159,11 @@ Use evidence such as internal metadata files, structured Beads comments, verific
 - for web projects using the default Docker-first runtime model, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
 - module implementation requires targeted local verification only; browser E2E and other broad gate evidence belong to owner-run major checkpoints rather than ordinary slice acceptance
 - module implementation acceptance should challenge tenant isolation, path confinement, sanitized error behavior, prototype residue, integration seams, and cross-cutting consistency when those concerns are in scope
+- module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
 - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
 - module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the planned 90 percent meaningful coverage target instead of accumulating test debt
 - integrated verification completion requires explicit full-system evidence before the phase can close
+- integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
 - web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
 - mobile and desktop integrated verification must include the selected stack's platform-appropriate UI/E2E coverage for every major user flow when UI-bearing flows are material
 - for Electron or other Linux-targetable desktop projects, integrated verification should use the Dockerized desktop build/test path plus headless UI/runtime verification artifacts

package/assets/slopmachine/templates/AGENTS.md CHANGED Viewed

@@ -25,6 +25,7 @@ This file is the repo-local engineering rulebook for `slopmachine` projects.
 - Do not rerun broad runtime/test commands on every small change.
 - During ordinary development slices, do not run Docker runtime commands, browser E2E, Playwright, full test suites, or `./run_tests.sh`.
 - Use targeted local tests during ordinary development slices and leave browser E2E plus broad-gate commands for later comprehensive verification.
+- When API tests are material, make them hit real endpoints and print simple useful response evidence such as status codes and message/body summaries.
 - For web projects, default the runtime contract to `docker compose up --build` unless the prompt or existing repository clearly dictates another model.
 - When `docker compose up --build` is not the runtime contract, provide `./run_app.sh` as the single primary runtime wrapper.
 - If the project has database dependencies, keep `./init_db.sh` as the only project-standard database initialization path.
@@ -33,6 +34,7 @@ This file is the repo-local engineering rulebook for `slopmachine` projects.
 - Keep `README.md` accurate.
 - The README must explain what the project is, what it does, how to run it, how to test it, the main repo contents, and any important information a new developer needs immediately.
+- The README must also explain the delivered architecture and major implementation structure clearly enough for review and handoff.
 - The README must clearly document whether the primary runtime command is `docker compose up --build` or `./run_app.sh`.
 - The README must clearly document `./run_tests.sh` as the broad test command.
 - The README must stand on its own for basic codebase use.

package/assets/slopmachine/utils/claude_create_session.mjs CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env node
-import { parseArgs, readPrompt, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry } from './claude_worker_common.mjs'
+import { parseArgs, readPrompt, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
 const argv = parseArgs(process.argv.slice(2))
@@ -11,18 +11,32 @@ try {
     cwd: argv.cwd,
     rawOutputPath: argv['raw-output'],
     rawErrorPath: argv['raw-error'],
+    statePath: argv['state-file'],
     args: buildCreateArgs(argv.agent || 'developer', prompt),
     retryOnLimit: argv['retry-on-limit'] !== '0',
+    maxAttempts: Number.parseInt(argv['max-attempts'] || '2', 10),
   })
   if (failure || !parsed || parsed.is_error === true) {
+    await writeJsonIfNeeded(argv['result-file'], {
+      ok: false,
+      code: failure?.code || 'claude_create_failed',
+      msg: failure?.msg || 'claude_create_failed',
+      sid: failure?.sid || null,
+    })
     emitFailure(failure?.code || 'claude_create_failed', failure?.msg || 'claude_create_failed', failure?.sid ? { sid: failure.sid } : {})
     process.exit(1)
   }
   const compact = compactClaudeResult(parsed)
+  await writeJsonIfNeeded(argv['result-file'], { ok: true, sid: compact.sid, res: compact.res })
   emitSuccess(compact.sid, compact.res)
 } catch (error) {
+  await writeJsonIfNeeded(argv['result-file'], {
+    ok: false,
+    code: 'claude_create_exception',
+    msg: error instanceof Error ? error.message : String(error),
+  })
   emitFailure('claude_create_exception', error instanceof Error ? error.message : String(error))
   process.exit(1)
 }

package/assets/slopmachine/utils/claude_resume_session.mjs CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env node
-import { parseArgs, readPrompt, buildResumeArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry } from './claude_worker_common.mjs'
+import { parseArgs, readPrompt, buildResumeArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
 const argv = parseArgs(process.argv.slice(2))
@@ -11,18 +11,32 @@ try {
     cwd: argv.cwd,
     rawOutputPath: argv['raw-output'],
     rawErrorPath: argv['raw-error'],
+    statePath: argv['state-file'],
     args: buildResumeArgs(argv.agent || 'developer', argv['session-id'], prompt),
     retryOnLimit: argv['retry-on-limit'] !== '0',
+    maxAttempts: Number.parseInt(argv['max-attempts'] || '2', 10),
   })
   if (failure || !parsed || parsed.is_error === true) {
+    await writeJsonIfNeeded(argv['result-file'], {
+      ok: false,
+      code: failure?.code || 'claude_resume_failed',
+      msg: failure?.msg || 'claude_resume_failed',
+      sid: failure?.sid || null,
+    })
     emitFailure(failure?.code || 'claude_resume_failed', failure?.msg || 'claude_resume_failed', failure?.sid ? { sid: failure.sid } : {})
     process.exit(1)
   }
   const compact = compactClaudeResult(parsed)
+  await writeJsonIfNeeded(argv['result-file'], { ok: true, sid: compact.sid, res: compact.res })
   emitSuccess(compact.sid, compact.res)
 } catch (error) {
+  await writeJsonIfNeeded(argv['result-file'], {
+    ok: false,
+    code: 'claude_resume_exception',
+    msg: error instanceof Error ? error.message : String(error),
+  })
   emitFailure('claude_resume_exception', error instanceof Error ? error.message : String(error))
   process.exit(1)
 }

package/assets/slopmachine/utils/claude_worker_common.mjs CHANGED Viewed

@@ -1,6 +1,7 @@
 #!/usr/bin/env node
 import fs from 'node:fs/promises'
+import { createWriteStream } from 'node:fs'
 import os from 'node:os'
 import path from 'node:path'
 import { spawn } from 'node:child_process'
@@ -33,6 +34,11 @@ export async function writeFileIfNeeded(filePath, content) {
   await fs.writeFile(filePath, content, 'utf8')
 }
+export async function writeJsonIfNeeded(filePath, value) {
+  if (!filePath) return
+  await writeFileIfNeeded(filePath, `${JSON.stringify(value, null, 2)}\n`)
+}
 export async function readPrompt(promptFile) {
   const content = await fs.readFile(promptFile, 'utf8')
   return content.trim()
@@ -72,10 +78,63 @@ function buildBaseArgs(agentName) {
   ]
 }
-export async function runClaude({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath }) {
-  const fullArgs = [...args]
+function buildStateWriter(statePath, baseState = {}) {
+  let state = { ...baseState }
+  let chain = Promise.resolve()
+  function queueWrite() {
+    if (!statePath) {
+      return chain
+    }
+    const snapshot = { ...state, updated_at: new Date().toISOString() }
+    chain = chain
+      .catch(() => {})
+      .then(() => writeJsonIfNeeded(statePath, snapshot))
+    return chain
+  }
+  return {
+    update(patch) {
+      state = { ...state, ...patch }
+      return queueWrite()
+    },
+    async flush() {
+      await queueWrite()
+      await chain.catch(() => {})
+    },
+  }
+}
+function buildInitialState({ args, cwd, attempt }) {
+  return {
+    status: 'starting',
+    started_at: new Date().toISOString(),
+    cwd,
+    args,
+    attempt,
+    pid: null,
+    exit_code: null,
+    stdout_bytes: 0,
+    stderr_bytes: 0,
+    last_stdout_at: null,
+    last_stderr_at: null,
+  }
+}
+function isRetryableTransportFailure(message) {
+  return /network|econnreset|timed? out|timeout|temporar|socket|transport|unavailable|rate limit/i.test(String(message || ''))
+}
+export async function runClaude({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath, statePath, attempt = 1 }) {
+  const stateWriter = buildStateWriter(statePath, buildInitialState({ args, cwd, attempt }))
+  await stateWriter.flush()
+  const stdoutWriter = rawOutputPath ? createWriteStream(rawOutputPath, { flags: 'w' }) : null
+  const stderrWriter = rawErrorPath ? createWriteStream(rawErrorPath, { flags: 'w' }) : null
   const result = await new Promise((resolve, reject) => {
-    const child = spawn(claudeCommand, fullArgs, {
+    const child = spawn(claudeCommand, [...args], {
       cwd,
       env: process.env,
       stdio: ['ignore', 'pipe', 'pipe'],
@@ -84,20 +143,42 @@ export async function runClaude({ claudeCommand, args, cwd, rawOutputPath, rawEr
     let stdout = ''
     let stderr = ''
+    void stateWriter.update({ status: 'running', pid: child.pid ?? null })
     child.stdout.on('data', (chunk) => {
-      stdout += chunk.toString()
+      const text = chunk.toString()
+      stdout += text
+      stdoutWriter?.write(text)
+      void stateWriter.update({
+        stdout_bytes: Buffer.byteLength(stdout, 'utf8'),
+        last_stdout_at: new Date().toISOString(),
+      })
     })
     child.stderr.on('data', (chunk) => {
-      stderr += chunk.toString()
+      const text = chunk.toString()
+      stderr += text
+      stderrWriter?.write(text)
+      void stateWriter.update({
+        stderr_bytes: Buffer.byteLength(stderr, 'utf8'),
+        last_stderr_at: new Date().toISOString(),
+      })
     })
     child.on('error', reject)
-    child.on('close', (code) => resolve({ code: code ?? 1, stdout, stderr }))
+    child.on('close', async (code) => {
+      stdoutWriter?.end()
+      stderrWriter?.end()
+      await stateWriter.update({
+        status: (code ?? 1) === 0 ? 'completed' : 'failed',
+        finished_at: new Date().toISOString(),
+        exit_code: code ?? 1,
+      })
+      resolve({ code: code ?? 1, stdout, stderr })
+    })
   })
-  await writeFileIfNeeded(rawOutputPath, result.stdout)
-  await writeFileIfNeeded(rawErrorPath, result.stderr)
+  await stateWriter.flush()
   return result
 }
@@ -148,6 +229,15 @@ export function classifyClaudeFailure(parsed, fallbackMessage = '') {
     }
   }
+  if (isRetryableTransportFailure(rawMessage)) {
+    return {
+      code: 'claude_transport_failed',
+      msg: rawMessage || 'claude_transport_failed',
+      retryable: true,
+      sid: sessionId,
+    }
+  }
   return {
     code: 'claude_call_failed',
     msg: rawMessage || 'claude_call_failed',
@@ -189,37 +279,38 @@ export function msUntilNextQuotaReset(now = new Date(), { hour = DEFAULT_LIMIT_H
   return waitMs + 60 * 1000
 }
-export async function runClaudeWithRetry({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath, retryOnLimit = true }) {
-  const result = await runClaude({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath })
-  let parsed = null
-  try {
-    parsed = parseClaudeJson(result.stdout)
-  } catch {}
+export async function runClaudeWithRetry({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath, statePath, retryOnLimit = true, maxAttempts = 2 }) {
+  let attempt = 1
-  const hasClaudeError = parsed && (parsed.is_error === true || result.code !== 0)
-  if (!hasClaudeError) {
-    return { result, parsed }
-  }
+  while (attempt <= maxAttempts) {
+    const result = await runClaude({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath, statePath, attempt })
+    let parsed = null
+    try {
+      parsed = parseClaudeJson(result.stdout)
+    } catch {}
-  const failure = classifyClaudeFailure(parsed, (result.stderr || result.stdout).trim())
-  if (!retryOnLimit || !failure.retryable) {
-    return { result, parsed, failure }
-  }
+    if (parsed && parsed.is_error !== true && result.code === 0) {
+      return { result, parsed, attempts: attempt }
+    }
-  const waitMs = msUntilNextQuotaReset()
-  await sleep(waitMs)
-  const retried = await runClaude({ claudeCommand, args, cwd, rawOutputPath, rawErrorPath })
-  let retriedParsed = null
-  try {
-    retriedParsed = parseClaudeJson(retried.stdout)
-  } catch {}
-  if (retriedParsed && retriedParsed.is_error !== true && retried.code === 0) {
-    return { result: retried, parsed: retriedParsed, retried: true }
+    const failure = classifyClaudeFailure(parsed, (result.stderr || result.stdout).trim())
+    const canRetry = attempt < maxAttempts && (failure.retryable || (!parsed && result.code !== 0))
+    if (!canRetry) {
+      return { result, parsed, failure, attempts: attempt }
+    }
+    if (failure.code === 'claude_usage_limit' && retryOnLimit) {
+      const waitMs = msUntilNextQuotaReset()
+      await sleep(waitMs)
+    }
+    attempt += 1
   }
   return {
-    result: retried,
-    parsed: retriedParsed,
-    failure: classifyClaudeFailure(retriedParsed, (retried.stderr || retried.stdout).trim()),
-    retried: true,
+    result: { code: 1, stdout: '', stderr: 'claude_retry_exhausted' },
+    parsed: null,
+    failure: { code: 'claude_retry_exhausted', msg: 'claude_retry_exhausted', retryable: false, sid: null },
+    attempts: maxAttempts,
   }
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "theslopmachine",
-  "version": "0.6.0",
+  "version": "0.6.1",
   "description": "SlopMachine installer and project bootstrap CLI",
   "license": "MIT",
   "type": "module",

package/src/install.js CHANGED Viewed

@@ -13,7 +13,12 @@ import {
   REQUIRED_SKILL_DIRS,
   REQUIRED_SLOPMACHINE_FILES,
 } from './constants.js'
-import { ensureUploadEndpoint, getSlopmachineConfigPath, hasStoredUploadToken, promptAndStoreUploadToken } from './config.js'
+import {
+  ensureUploadEndpoint,
+  getSlopmachineConfigPath,
+  hasStoredUploadToken,
+  promptAndStoreUploadToken,
+} from './config.js'
 import {
   backupFile,
   commandExists,

package/src/send-data.js CHANGED Viewed

@@ -140,8 +140,10 @@ function getTrackedDeveloperSessions(metadata) {
   return sessions
     .filter((entry) => entry && typeof entry.session_id === 'string' && entry.session_id.trim())
     .map((entry, index) => ({
+      order: index,
       sequence: Number.isFinite(Number(entry.sequence)) ? Number(entry.sequence) : index + 1,
       label: entry.label || `${entry.lane || 'develop'}-${index + 1}`,
+      backend: typeof entry.backend === 'string' && entry.backend.trim() ? entry.backend.trim() : 'opencode',
       sessionId: entry.session_id.trim(),
     }))
 }
@@ -316,8 +318,29 @@ async function generateBeadsExport(workspaceRoot, projectId, runId, stagingDir)
   return exportPath
 }
+async function listFilesRecursive(rootDir, relativePrefix = '') {
+  const entries = await fs.readdir(rootDir, { withFileTypes: true })
+  const files = []
+  for (const entry of entries) {
+    const relativePath = path.join(relativePrefix, entry.name)
+    const absolutePath = path.join(rootDir, entry.name)
+    if (entry.isDirectory()) {
+      files.push(...await listFilesRecursive(absolutePath, relativePath))
+      continue
+    }
+    if (entry.isFile()) {
+      files.push(relativePath)
+    }
+  }
+  return files
+}
 async function buildManifest(stagingDir, projectId, runId, label, workspaceRoot) {
-  const fileNames = (await fs.readdir(stagingDir))
+  const fileNames = (await listFilesRecursive(stagingDir))
     .filter((name) => name !== 'manifest.json')
     .sort((left, right) => left.localeCompare(right))
@@ -362,6 +385,75 @@ async function buildManifest(stagingDir, projectId, runId, label, workspaceRoot)
   return { manifest, manifestPath }
 }
+async function exportDeveloperSessionArtifacts(session, workspaceRoot, stagingDir) {
+  const utilsDir = path.join(buildPaths().slopmachineDir, 'utils')
+  const exportScript = path.join(utilsDir, 'export_ai_session.mjs')
+  const stripScript = path.join(utilsDir, 'strip_session_parent.py')
+  const convertScript = path.join(utilsDir, 'convert_exported_ai_session.mjs')
+  const converterPythonScript = path.join(utilsDir, 'convert_ai_session.py')
+  const exportNumber = session.order + 1
+  const rootSessionFile = path.join(stagingDir, `session-${exportNumber}.json`)
+  const convertedSessionFile = path.join(stagingDir, 'sessions', `${session.label}.json`)
+  const rawSessionFile = path.join(stagingDir, `.session-export-${session.label}.raw`)
+  await ensureDir(path.dirname(convertedSessionFile))
+  if (session.backend === 'claude') {
+    const exportResult = await runCommand(process.execPath, [
+      exportScript,
+      '--backend',
+      'claude',
+      '--cwd',
+      workspaceRoot,
+      '--session-id',
+      session.sessionId,
+      '--output',
+      rootSessionFile,
+    ])
+    if (exportResult.code !== 0) {
+      throw new Error(`Failed to export session ${session.label}: ${(exportResult.stderr || exportResult.stdout).trim()}`)
+    }
+  } else {
+    const opencodeCommand = await resolveCommand('opencode')
+    if (!opencodeCommand) {
+      throw new Error('Unable to find `opencode` for developer session export')
+    }
+    await exportOpenCodeSession(opencodeCommand, session.sessionId, rawSessionFile, workspaceRoot)
+    const stripResult = await runCommand('python3', [
+      stripScript,
+      rawSessionFile,
+      '--output',
+      rootSessionFile,
+    ])
+    if (stripResult.code !== 0) {
+      throw new Error(`Failed to clean exported session ${session.label}: ${(stripResult.stderr || stripResult.stdout).trim()}`)
+    }
+  }
+  const convertResult = await runCommand(process.execPath, [
+    convertScript,
+    '--converter-script',
+    converterPythonScript,
+    '--input',
+    rootSessionFile,
+    '--output',
+    convertedSessionFile,
+  ])
+  if (convertResult.code !== 0) {
+    throw new Error(`Failed to convert exported session ${session.label}: ${(convertResult.stderr || convertResult.stdout).trim()}`)
+  }
+  if (await pathExists(rawSessionFile)) {
+    await fs.rm(rawSessionFile, { force: true })
+  }
+}
 async function createZipArchive(stagingDir, outputPath) {
   await ensureDir(path.dirname(outputPath))
@@ -458,13 +550,8 @@ async function stageSendDataBundle({ workspaceRoot, repoPath, ownerSessionId, de
     await exportOpenCodeSession(opencodeCommand, ownerSessionId, path.join(stagingDir, 'owner-session.json'), workspaceRoot)
-    for (const session of developerSessions.sort((left, right) => left.sequence - right.sequence)) {
-      await exportOpenCodeSession(
-        opencodeCommand,
-        session.sessionId,
-        path.join(stagingDir, `${session.label}.json`),
-        workspaceRoot,
-      )
+    for (const session of developerSessions.sort((left, right) => left.order - right.order)) {
+      await exportDeveloperSessionArtifacts(session, workspaceRoot, stagingDir)
     }
     await copyOptionalPath(