npm - theslopmachine - Versions diffs - 0.6.2 → 0.7.0 - Mend

theslopmachine 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

package/assets/skills/claude-worker-management/SKILL.md CHANGED Viewed

@@ -1,91 +1,236 @@
 ---
 name: claude-worker-management
-description: Launch, resume, and persist the Claude CLI developer worker session used by slopmachine-claude.
+description: Launch, persist, and message the live Claude developer lane used by slopmachine-claude.
 ---
 # Claude Worker Management
-Use this skill whenever `slopmachine-claude` needs to create, resume, or message the persistent Claude developer worker.
+Use this skill whenever `slopmachine-claude` needs to launch, inspect, or message the persistent live Claude developer lane.
 ## Purpose
 - keep the Claude developer worker as a large complete conversation per bounded developer slot
 - avoid losing worker context by accidentally creating fresh sessions for ordinary follow-up turns
 - make session persistence and response capture deterministic
+- make OpenCode talk to a live Claude TUI through a bridge instead of non-interactive resume calls
 ## Core rules
 - the Claude worker must be invoked by the installed Claude agent name `developer`
 - do not use the OpenCode `developer` subagent for implementation work in the `slopmachine-claude` path
 - do not read Claude transcript files as the normal communication channel
-- communicate with the Claude worker through the packaged wrapper scripts in `~/slopmachine/utils/`
-- treat raw Claude stdout and stderr as trace artifacts written to files, not as owner-session context
-- treat the wrapper `result-file` as the semantic source of truth in normal owner flow
-- treat terminal stdout from the wrapper as only a tiny pointer or status channel
-- always capture the session id and normalized result from the `result-file`
-- always re-pass `--agent developer` on every call, even when resuming an existing session
-- always constrain Claude to a single-session developer lane by limiting tools to `Read Write Edit Bash Glob Grep`
-- do not allow Claude internal agent fan-out in the normal developer path
-- use `--dangerously-skip-permissions` in the wrapper path so the worker does not stall on routine file-edit permission prompts inside the bounded repo
-## Session creation rule
+- communicate with the Claude worker through the packaged live bridge scripts in `~/slopmachine/utils/`
+- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each owner message into that lane
+- set the Claude live runtime settings default `agent` to `developer` so the lane stays on the intended system prompt even if the session is resumed or inspected through Claude-native controls
+- treat bridge `state.json` as the durable control-plane truth for lane status, routing, and Claude session identity
+- treat bridge `result.json` as the semantic source of truth after each completed turn
+- treat terminal stdout from bridge scripts as only a tiny pointer or status channel
+- always capture the session id from the launched bridge state and the normalized turn result from bridge `result.json`
+- always constrain Claude to a single-session developer lane even when it uses internal Claude task fan-out
+- allow Claude internal task fan-out inside that one continuous live session when it reduces serial churn cleanly
+- encourage Claude to parallelize independent search, reading, verification, and bounded implementation subtasks through internal task fan-out when that reduces serial churn cleanly
+- launch the live lane with `--dangerously-skip-permissions` so the worker does not stall on routine file-edit permission prompts inside the bounded repo
+- when Claude uses internal task fan-out and the environment allows explicit agent selection, prefer the installed `developer` agent for implementation-capable branches so the same engineering standard applies across those branches
+- there is no repo-controlled guarantee that every Claude helper subagent globally reuses the `developer` prompt, so keep critical implementation in the main developer lane or in explicitly developer-scoped helper branches rather than relying on unspecified built-in helper behavior
+- make every owner-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
+- do not send vague owner prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
+- each substantive owner message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
+- default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
+## Lane launch rule
 For a new bounded developer session slot:
-1. run Claude in print mode with the installed `developer` agent
-2. capture the returned `session_id`
+1. launch one live Claude TUI lane inside `tmux`
+2. wait for bridge registration to capture the Claude `session_id`
 3. store it in `../.ai/metadata.json`
 4. mirror it in tracker comments using `SESSION:`
-5. keep using that same session for all later turns in the same bounded slot
+5. keep using that same live lane for all later turns in the same bounded slot
-Preferred creation pattern:
+Preferred launch pattern:
 ```bash
-node ~/slopmachine/utils/claude_create_session.mjs --cwd "$PWD" --prompt-file <file> --raw-output <file> --raw-error <file> --state-file <file> --result-file <file>
+node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --runtime-dir <dir>
 ```
+The launch implementation must pass Claude `--dangerously-skip-permissions` in the live TUI command path.
 When the owner invokes this through the OpenCode Bash tool, use a long-running timeout suitable for real developer work.
 Default:
-- Claude create and resume worker turns should use a Bash timeout of at least `3600000` ms (1 hour)
-- do not use ordinary short Bash timeouts for Claude worker turns
+- Claude launch and turn bridge operations should not use ordinary short Bash timeouts
+- when automatic rate-limit waiting is enabled, prefer no outer timeout at all for live Claude worker turns; if the host wrapper forces a timeout value, it must exceed the possible reset wait plus buffer rather than using a generic 1 hour cap
 Do not pre-generate a UUID unless there is a strong reason to do so.
-The default pattern is to let Claude create the session and then persist the returned `session_id`.
+The default pattern is to let the live lane start normally and then persist the `session_id` captured by bridge registration.
-## Resume rule
+## Turn rule
 For all later turns in the same bounded developer slot:
 ```bash
-node ~/slopmachine/utils/claude_resume_session.mjs --cwd "$PWD" --session-id <session_id> --prompt-file <file> --raw-output <file> --raw-error <file> --state-file <file> --result-file <file>
+node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --prompt-file <file> --timeout-ms <turn-timeout>
 ```
-- use `--resume` inside the wrapper implementation, not `-r`
-- when calling the resume wrapper from the owner session, treat it as a long-running operation and keep the Bash timeout at or above `3600000` ms
-- do not reuse `--session-id` after creation
-- if resume fails, stop and recover explicitly instead of silently creating a new worker
+- inject exactly one owner message at a time into the idle live lane
+- wait for `Stop` or `StopFailure` before sending the next message
+- do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
+- if turn execution fails, stop and recover explicitly instead of silently creating a new worker
+## Turn-preflight checklist
+Before sending any owner message into the live lane:
+1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
+2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
+3. decide the prompt kind explicitly, such as `planning-start`, `planning-revision`, `scaffold-start`, `scaffold-review`, `development-slice`, `development-correction`, `bugfix-orientation`, `bugfix-fix`, `resume`, or `recovery`
+4. gather only the minimum accepted-plan sections, clarified requirements, boundary summary, and fresh deltas needed for this turn
+5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
+If the stop boundary is fuzzy, the turn is too broad.
+If the owner prompt would span multiple major boundaries, split it.
+Do not send the next turn until the prior turn has been reviewed and either accepted, corrected, or explicitly rerouted.
+## Canonical owner-message contract
+For substantive live-lane turns, write the owner message in natural engineering language but make sure it includes all of these ingredients:
+- `Context snapshot`: the current accepted state and only the fresh deltas that matter now
+- `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
+- `This turn only`: the bounded deliverable for this turn and whether this is planning-only, scaffold-only, coding allowed, or correction-only
+- `Expected outcomes now`: the exact behaviors, artifacts, or fixes that must exist before this turn can be considered successful
+- `Evidence required now`: the exact verification, file updates, or summaries Claude must return for owner review
+- `Disallowed shortcuts now`: future-work deferrals, placeholder implementations, bypassed auth/validation, fake verification, mixed-boundary drift, or other shortcuts that would make the result misleading
+- `Stop boundary`: what Claude should stop after producing, and what it must not start yet
+- `Reply contract`: request the exact changed files, exact verification commands and results, and only the real remaining risks or blockers
+When the turn intentionally uses internal parallel fan-out, also include:
+- `Branch map`: the 2 or 3 independent branches, their boundaries, and their expected outputs
+- `Shared constraints`: the contracts or files that must stay aligned across branches
+- `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
+Keep the wording natural. Do not turn every prompt into a rigid template dump.
+But do make the contract mechanically obvious enough that Claude cannot plausibly misunderstand what acceptance depends on.
+## Canonical prompt shapes
+### Planning-start shape
+For the second owner message in the first `develop` lane and for other explicit planning-entry turns:
+- inline the approved clarification content and requirements-ambiguity resolutions directly in the message
+- include the owner's initial planning view so Claude refines a direction instead of inventing one from zero
+- restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
+- say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
+- require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
+- require a concise changed-files summary with the planning response
+### Planning-revision shape
+When a planning draft is not good enough:
+- point to the exact plan sections or requirement areas that are weak or incomplete
+- state the exact missing detail or unacceptable vagueness that must be corrected now
+- keep the turn planning-only; do not let the worker start coding as a compensation move
+- require the revised planning artifacts plus a short summary of what changed and what is still explicitly unresolved
+### Scaffold-start shape
+When entering scaffold work:
+- cite the relevant accepted design sections and the intended baseline runtime/test/config contract
+- state that the turn is scaffold-only and name the exact baseline surfaces expected now, such as app shell, routing skeleton, persistence skeleton, config wiring, logging path, validation path, auth foundation, test harness, or README baseline when they apply
+- state explicitly which feature work must not begin yet
+- require exact local verification evidence for the scaffold baseline and exact changed files
+- say to stop after the scaffold baseline is complete and verified
+### Development-slice shape
+For ordinary implementation turns:
+- anchor the request to the relevant accepted plan sections and current boundary summary
+- name the exact slice, user/admin actor path, modules, or surfaces to complete now
+- itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
+- require targeted local verification tied back to those expected outcomes
+- explicitly prohibit owner-only broad verification commands and unrelated follow-on work
+- when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
+- say to stop after this slice and report the exact changed files plus exact verification results
+### Development-correction shape
+When the worker partially missed the slice or crossed boundaries:
+- quote the exact missing outcome, regression risk, or evidence gap
+- ask for a correction-only turn focused on those gaps
+- require fresh verification evidence for the corrected surface
+- do not mix new feature asks into the correction turn
+### Resume shape
+When resuming a long-lived lane:
+- start from the stored boundary summary and the relevant accepted plan sections instead of replaying broad history
+- include only the new delta since the last accepted state
+- restate the current bounded task, evidence required, and stop boundary
+- do not re-dump the entire project or workflow unless continuity is genuinely broken
+### Bugfix issue-turn shape
+For evaluator-driven remediation inside a `bugfix-N` session opened by a `partial pass` audit:
+- lead with the concrete evaluator finding or owner-reviewed issue statement
+- state the expected fix and the affected non-regression surfaces
+- require proof for the issue path plus the nearby happy path and security/ownership boundary when relevant
+- say to stop after the named issue set rather than reopening unrelated refactors
+## Turn anti-patterns
+Do not do these:
+- send `continue`, `next`, or `keep going` as a substantive owner prompt
+- ask for planning and implementation in the same turn unless that mixed boundary is intentional and explicitly stated
+- ask for multiple gate exits in one turn
+- let Claude decide its own stopping point implicitly
+- pass parent-directory file paths as hidden instructions instead of restating the needed content directly
+- paste raw bridge state, raw transcript payloads, or workflow bookkeeping into normal developer prompts
+- respond to a weak result by broadening the next prompt instead of correcting the specific gap
+## Status rule
+When owner logic needs to inspect the lane without sending a new message:
+```bash
+node ~/slopmachine/utils/claude_live_status.mjs --runtime-dir <dir>
+```
+Use `state.json` plus `claude_live_status.mjs` to determine whether the lane is:
+- `idle`
+- `running`
+- `blocked`
+- `failed`
 ## Result capture rule
-The wrapper scripts should pipe the raw Claude JSON output to file, parse it after process exit, and persist a normalized `result-file` plus a live `state-file`.
+The live bridge should persist a normalized turn `result.json` plus a durable lane `state.json`.
-Use the `result-file` fields only:
+Use the turn result fields only:
 - `sid`
 - `res`
 Monitoring files should include at least:
-- a live `state-file` showing running/completed/failed state, pid, byte counts, timestamps, and exit code
-- a final `result-file` containing the normalized success or failure object
+- a live `state.json` showing lane status, Claude session id, tmux session id, transcript pointer, and current turn state
+- a final `result.json` containing the normalized success or failure object for the latest completed turn
+- `hook-events.jsonl` as the live outward event feed
 Treat `res` as the worker's answer.
-Do not feed raw Claude JSON into the owner session.
 Do not rely on transcript scraping for normal turn-to-turn orchestration.
-Do not rely on Bash stdout alone when the wrapper state or result files provide a clearer source of truth.
-Read `result-file` after process completion before deciding the next owner turn.
+Do not rely on Bash stdout alone when bridge state or result files provide a clearer source of truth.
+Read bridge `result.json` after turn completion before deciding the next owner turn.
 ## Developer-slot continuity
@@ -95,7 +240,8 @@ The purpose of this backend is to preserve one large complete conversation per b
 - the `bugfix` slot should stay one continuous Claude session unless irrecoverable failure forces replacement
 - do not start a fresh Claude worker for every slice, clarification, or review loop
 - do not roll sessions casually just because the conversation is long
-- do not let the Claude worker create its own internal sub-agents for routine planning, scaffold, or implementation work
+- internal Claude task sub-agents are allowed inside the same developer session when they help parallelize independent bounded work cleanly
+- prefer task fan-out for parallel discovery, repo reading, comparison, or verification passes when those branches can be merged back without ambiguity
 ## First-session handshakes
@@ -103,12 +249,12 @@ The purpose of this backend is to preserve one large complete conversation per b
 When the first `develop` slot begins in planning:
-1. create the Claude developer session with:
-   - the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
-2. wait for the first response and store the returned Claude session id from wrapper field `sid`
-3. form an initial owner planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
-4. resume the same session and send a compact second owner message that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial owner planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
-5. continue the planning conversation in that same Claude session
+1. launch the live `develop` lane if it is not already running
+2. send the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction through the bridge
+3. store the Claude session id from bridge `state.json`
+4. form an initial owner planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
+5. send a compact second owner message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial owner planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
+6. continue the planning conversation in that same Claude session
 Do not merge those two first messages.
 Do not ask for a plan in the first message.
@@ -118,8 +264,10 @@ Preferred second owner message shape:
 - inline the approved clarification content and the requirements-ambiguity resolutions directly in the owner message
 - include the owner's initial planning view so planning is refined collaboratively rather than invented from zero
 - add any short delta notes that are not already captured in that inlined summary
-- express the current boundary in plain engineering language and then ask for the implementation plan plus major risks or assumptions
+- express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
+- require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
 - ask for repo-local planning artifacts plus a concise changed-files summary
+- say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
 Do not tell the developer worker to read files outside `repo/`.
 If owner-side artifacts outside `repo/` matter, restate their content directly in the owner message instead of passing file paths.
@@ -127,13 +275,13 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
 ### `bugfix-N` orientation handshake
-When `P7` begins and the workflow opens the remediation lane:
+When a fresh `partial pass` evaluation result opens the next remediation lane:
-1. create a fresh Claude developer session for the next `bugfix-N` label
+1. launch a fresh live Claude developer lane for the next `bugfix-N` label
 2. use the first owner message only to orient that session to the repo and the current delivered state
 3. make clear in plain engineering language that follow-up work will be focused remediation against evaluator findings
-4. wait for the first response and store the returned Claude session id from wrapper field `sid`
-5. only after that orientation exchange, resume the same `bugfix-N` session with the first evaluator-driven issue list
+4. wait for the first response and store the Claude session id from bridge `state.json`
+5. only after that orientation exchange, continue the same `bugfix-N` live lane with the first evaluator-driven issue list
 The orientation message should:
@@ -142,12 +290,24 @@ The orientation message should:
 - state that incoming work will be a sequence of concrete issue-fix requests against evaluator findings
 - avoid mentioning workflow internals, phase labels, or session-lane labels
+## Between-turn owner review rule
+After each meaningful Claude response and before the next owner turn:
+1. review the normalized bridge `result.json`
+2. decide whether the result was accepted, needs correction, or crossed a boundary that must be rolled back in the next prompt
+3. update metadata and boundary summary only after that review decision
+4. compose the next turn as a deliberate correction, continuation, or new bounded objective rather than a vague nudge
+If Claude starts coding during a planning-only turn, treat that as a boundary violation and correct it explicitly.
+If Claude continues into extra work beyond the requested stop boundary, do not silently accept the spillover; review the requested boundary first and then decide whether any spillover is acceptable.
 ## Metadata expectations
 The active developer session record should include at least:
 - `lane`
-- `backend: "claude"`
+- `backend: "claude-live"`
 - `session_id`
 - `label`
 - `status`
@@ -157,26 +317,89 @@ Recommended additional fields when useful:
 - `agent_name: "developer"`
 - `created_phase`
-- `trace_dir`
+- `runtime_dir`
+- `tmux_session`
+- `transcript_path`
+- `opened_from_audit_number`
 - `last_result_summary`
-- `last_resumed_at`
+- `last_turn_at`
+## Owner state-sync rule
+Bridge lane state is the authoritative transport state for Claude-backed developer work.
+After each meaningful bridge action, immediately read bridge `state.json` and mirror the important fields into `../.ai/metadata.json`, `../metadata.json`, and Beads comments before advancing workflow state.
+### After lane launch
+- read bridge `state.json`
+- set or confirm:
+  - `current_developer_lane`
+  - `active_developer_session_id`
+- create or update the active `developer_sessions[]` record with:
+  - `lane`
+  - `sequence`
+  - `label`
+  - `backend: "claude-live"`
+  - `agent_name: "developer"`
+  - `created_phase`
+  - `session_id`
+  - `status`
+  - `runtime_dir`
+  - `tmux_session`
+  - `transcript_path`
+  - `opened_from_audit_number` when the session was opened from a `partial pass` audit
+  - `orientation_completed: false`
+- mirror `session_id` into `../metadata.json` as `session_id`
+- record the session in Beads using `SESSION:`
+### After each successful turn
+- read bridge `state.json` and bridge `result.json`
+- update the active `developer_sessions[]` record with:
+  - `status: "idle"`
+  - `session_id`
+  - `transcript_path`
+  - `last_result_summary`
+  - `last_turn_at`
+- if the first orientation or first planning handshake completed, set `orientation_completed: true`
+- keep `active_developer_session_id` and `current_developer_lane` aligned with that same active session
+### After a blocked or failed turn
+- read bridge `state.json` and bridge `result.json`
+- preserve the same tracked Claude session id and runtime pointers
+- update the active `developer_sessions[]` record status to match the real workflow meaning, such as:
+  - `rate_limited` for bridge `blocked` / `claude_usage_limit`
+  - `failed` for bridge `failed`
+- update `last_result_summary` and `last_turn_at` when there is meaningful result text
+- update Beads comments so the pause or failure is auditable without reading bridge artifacts directly
+Do not advance the workflow based only on Bash success if bridge files and metadata are not yet aligned.
+## Owner-controlled lane rule
+- treat a bridge-managed Claude lane as owner-controlled during ordinary operation
+- do not manually type into the managed Claude TUI or send ad hoc prompts outside the bridge during the workflow
+- if manual recovery or debugging ever happens in that TUI, record it clearly and resync metadata from bridge state and hook evidence before continuing normal workflow
 ## Failure handling
-- if Claude CLI returns a parseable result with a session id, persist it immediately
-- if Claude CLI returns malformed output, treat that as a worker communication failure and stop to recover it cleanly
-- if the saved session id cannot be resumed, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
+- if bridge launch captures a Claude session id, persist it immediately
+- if the bridge reports `failed`, treat that as a worker communication failure and recover it cleanly
+- if the bridge reports `blocked` because of `claude_usage_limit`, treat that as an automatic wait-and-resume path rather than a handoff-stop condition unless the wait or resume path itself fails
+- if the saved live lane cannot continue, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
 - if a replacement session is required, record the handoff clearly in metadata and tracker comments
-- write raw stdout and stderr to trace files for debugging, but do not surface those raw files back into normal owner prompts unless debugging is explicitly needed
+- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal owner prompts unless debugging is explicitly needed
 ## Rate-limit handling
-- if Claude returns a usage-limit or capacity-exhaustion result for the active developer session, do not take over implementation work in the owner session
+- if the bridge returns `claude_usage_limit` or the live lane becomes capacity-blocked, do not take over implementation work in the owner session
 - mark the active developer session status as `rate_limited`
 - preserve the same Claude session id as the active tracked developer session
-- update `../.ai/metadata.json` and Beads `SESSION:` or `HANDOFF:` comments to record the rate-limit pause clearly
-- set workflow state to await user resume rather than creating owner-side implementation fallback work
-- when the user later resumes the run, continue from the same Claude developer session if it is resumable
+- use the packaged `~/slopmachine/utils/claude_wait_for_rate_limit_reset.sh` helper or the built-in turn retry path to wait until the reset time specified by Claude, then continue from the same live lane
+- update `../.ai/metadata.json` and Beads `SESSION:` or `HANDOFF:` comments to record the blocked state, wait window, and resumed continuity clearly
+- only surface the situation to the user if the reset time cannot be determined or the wait or resume path itself fails
 ## Worker prompt discipline