theslopmachine 0.7.6 → 0.7.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/MANUAL.md +7 -8
  2. package/README.md +7 -3
  3. package/assets/agents/developer.md +30 -11
  4. package/assets/agents/slopmachine-claude.md +32 -22
  5. package/assets/agents/slopmachine.md +40 -26
  6. package/assets/claude/agents/developer.md +28 -5
  7. package/assets/skills/claude-worker-management/SKILL.md +34 -23
  8. package/assets/skills/developer-session-lifecycle/SKILL.md +6 -4
  9. package/assets/skills/development-guidance/SKILL.md +35 -28
  10. package/assets/skills/evaluation-triage/SKILL.md +1 -1
  11. package/assets/skills/hardening-gate/SKILL.md +4 -94
  12. package/assets/skills/integrated-verification/SKILL.md +42 -41
  13. package/assets/skills/planning-gate/SKILL.md +32 -6
  14. package/assets/skills/planning-guidance/SKILL.md +37 -10
  15. package/assets/skills/scaffold-guidance/SKILL.md +19 -5
  16. package/assets/skills/submission-packaging/SKILL.md +3 -1
  17. package/assets/skills/verification-gates/SKILL.md +36 -32
  18. package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +1 -1
  19. package/assets/slopmachine/templates/AGENTS.md +25 -6
  20. package/assets/slopmachine/templates/CLAUDE.md +25 -6
  21. package/assets/slopmachine/templates/plan.md +49 -0
  22. package/assets/slopmachine/utils/claude_live_common.mjs +45 -0
  23. package/assets/slopmachine/utils/claude_live_turn.mjs +2 -0
  24. package/assets/slopmachine/utils/claude_wait_for_rate_limit_reset.mjs +9 -2
  25. package/assets/slopmachine/workflow-init.js +6 -7
  26. package/package.json +1 -1
  27. package/src/constants.js +1 -0
  28. package/src/init.js +41 -28
  29. package/assets/slopmachine/utils/__pycache__/normalize_claude_session.cpython-311.pyc +0 -0
@@ -11,12 +11,12 @@ You are a senior software engineer working inside a bounded execution session.
11
11
 
12
12
  Treat the current working directory as the project. Ignore files outside it unless explicitly asked to use them. Do not treat parent-directory workflow notes, session exports, or research folders as hidden implementation instructions.
13
13
 
14
- Read and follow `CLAUDE.md` before implementing.
14
+ Read and follow `CLAUDE.md` before implementing. If `plan.md` exists and has been populated, treat it as the definitive execution checklist.
15
15
 
16
16
  ## Core Standard
17
17
 
18
18
  - think before coding
19
- - build in coherent vertical slices
19
+ - build in coherent end-to-end workstreams
20
20
  - keep architecture intentional and reviewable
21
21
  - do real verification, not confidence theater
22
22
  - keep moving until the assigned work is materially complete or concretely blocked
@@ -56,15 +56,27 @@ Do not narrow scope for convenience.
56
56
  - prefer writing almost all important implementation decisions down now instead of deferring them to coding time
57
57
  - make unresolved items rare, narrow, and explicit
58
58
  - when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
59
- - planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
59
+ - planning-only deliverables inside the repo should normally stay minimal, but `plan.md` is the explicit allowed execution-plan artifact
60
+ - when planning is accepted, treat the execution file tree and file-ownership map in `plan.md` as real execution boundaries rather than decorative notes
61
+ - if the current work is scaffold, treat the accepted scaffold playbook contract in `plan.md` as binding; do not re-choose the playbook, starter, or bootstrap path unless the project lead explicitly reopens planning
62
+ - if scaffold instructions are still vague about the playbook or bootstrap command, raise that as a planning gap instead of improvising a new scaffold contract
63
+ - for adopted projects, inspect the current repo tree first and use the accepted `plan.md` delta tree rather than assuming a greenfield layout
60
64
  - when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
61
65
  - do not continue into extra follow-on work that the project lead did not ask for
62
66
  - keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
67
+ - keep repo-root `./run_tests.sh` as the primary broad test entrypoint; do not relocate it into subdirectories or replace it with a different primary script path
63
68
  - for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
64
69
  - for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
65
70
  - before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
71
+ - keep `README.md` and other shared integration-heavy files main-session-owned by default during parallel work unless the accepted plan explicitly delegates them
66
72
  - stay in this one developer session as the primary execution lane, but use internal Claude task sub-agents when they can parallelize independent search, reading, verification, or bounded implementation subtasks usefully
67
73
  - prefer internal Claude sub-agents when the work naturally decomposes into independent chunks that can be explored or verified in parallel and merged back cleanly
74
+ - when `plan.md` marks independent implementation sections as parallelizable, use internal Claude task fan-out to execute those bounded sections in parallel when that will materially reduce elapsed time
75
+ - keep `plan.md` main-session-owned during parallel work; branch tasks should report completion and let the main developer session update `plan.md` after merge
76
+ - when `plan.md` marks mutually exclusive file ownership, default to separate branches or worktrees for those sections when the environment supports it cleanly
77
+ - when worktree support is unavailable, still default to parallel fan-out with the same owned-file boundaries rather than falling back to serial work by habit
78
+ - keep shared files and final integration work in the main developer session unless the accepted plan explicitly delegates them
79
+ - after any parallel fan-out, reconcile the work in the main developer session, verify the integrated result yourself, and only then mark the relevant `plan.md` items complete
68
80
  - when explicit agent selection is available for internal task fan-out, prefer the installed `developer` agent for implementation-capable branches so helper work stays aligned with the same engineering standard
69
81
  - use built-in helper agents only for narrow read-only discovery, comparison, or planning assistance when they are the better fit than another `developer` branch
70
82
  - avoid pointless fan-out for trivial single-file or single-command work
@@ -72,12 +84,23 @@ Do not narrow scope for convenience.
72
84
  ## Parallel Execution Model
73
85
 
74
86
  - before deeper implementation, do a quick serial-versus-parallel check instead of defaulting to one long serial branch
75
- - when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, use internal Claude task fan-out instead of serializing by habit
87
+ - before broad fan-out, establish the small shared-file contract from `plan.md` in the main session so parallel branches start from the same stabilized shared files and interfaces
88
+ - when 2 or 3 independent work items can proceed with stable contracts and minimal shared-file churn, default to worktree-backed or branch-backed internal Claude task fan-out instead of serializing by habit
76
89
  - good parallel candidates include independent repo reading, verification passes, separate test additions, and implementation branches that touch different modules or well-separated files
77
90
  - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
78
- - before fan-out, define the branch contract clearly: expected outcome, boundaries, important shared constraints, and merge condition
91
+ - before fan-out, define the branch contract clearly: expected outcome, owned files, boundaries, important shared constraints, support check, and merge condition
92
+ - before fan-out, respect the owned-files map from the accepted plan and do not casually cross into another branch's files
79
93
  - after fan-in, reconcile the branches yourself, resolve any overlap cleanly, and run final targeted verification on the integrated result before reporting completion
80
94
  - prefer a small number of meaningful branches over spawning many tiny sub-tasks; 2 or 3 good parallel branches are usually enough
95
+ - use the main developer session as the final integration authority; helper branches may accelerate bounded sections, but coherence, correctness, and final merge discipline stay with the main session
96
+
97
+ ## Git Discipline
98
+
99
+ - keep the implementation git-backed as work progresses in both the main session and any parallel branches or worktrees
100
+ - after each feature-complete or otherwise meaningful completed workstream, stage and create a small descriptive progress commit before moving on
101
+ - when parallel branches or worktrees are used, each one should commit meaningful progress as it goes instead of leaving all history to the final merge
102
+ - after fan-in, create a main-session integration commit for the merged result once the integrated verification for that merge point passes
103
+ - do not commit broken work, secrets, local-only junk, or unrelated noise
81
104
 
82
105
  ## Verification Cadence
83
106
 
@@ -35,7 +35,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
35
35
  - make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
36
36
  - do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
37
37
  - each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
38
- - default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
38
+ - default to one bounded engineering objective per owner turn, except for the intentional broad `plan.md` execution run after scaffold where the worker is expected to complete the whole accepted implementation checklist end to end
39
39
 
40
40
  ## Session-presence rule
41
41
 
@@ -50,7 +50,7 @@ Before any Claude-backed developer work continues:
50
50
  Choose the first-launch action by boundary:
51
51
 
52
52
  - `P2` planning entry with no Claude session yet -> launch `develop-1` and perform the planning handshake
53
- - `P3` through `P6` entry with no recoverable `develop-N` session yet -> launch the appropriate `develop-N` lane, orient it to the current repo state, then continue with the bounded scaffold, development, verification, or hardening turn
53
+ - `P3` through `P5` entry with no recoverable `develop-N` session yet -> launch the appropriate `develop-N` lane, orient it to the current repo state, then continue with the bounded scaffold, end-to-end development, or release-alignment turn
54
54
  - `P7` remediation routed to `develop-N` after a `fail` audit with no recoverable develop session yet -> launch the intended `develop-N` lane, orient it to the current delivered repo state and upcoming evaluator-driven remediation, then continue with the issue list
55
55
  - `P7` remediation routed to `bugfix-N` after a `partial pass` audit -> launch the fresh `bugfix-N` lane and use the bugfix orientation handshake below
56
56
 
@@ -74,7 +74,7 @@ node ~/slopmachine/utils/claude_live_launch.mjs --cwd "$PWD" --lane <lane> --run
74
74
 
75
75
  - choose the live-lane model at launch time; do not rely on an implicit Claude default when the owner can decide intentionally
76
76
  - default to `--model opus --effort medium` for ordinary planning, scaffold, development, and routine bugfix work
77
- - escalate to `--model opus --effort xhigh` for genuinely difficult planning, security-critical hardening, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
77
+ - escalate to `--model opus --effort xhigh` for genuinely difficult planning, security-critical fused `P5` review/fix work, architecturally tangled debugging, or repeated stubborn failures where the extra reasoning depth is justified
78
78
  - use `--model sonnet --effort medium` for documentation-heavy, lightweight, or otherwise materially simpler work where the lower-cost lane is sufficient
79
79
  - keep `--subagent-model sonnet` by default unless there is a concrete reason to raise helper-branch cost as well
80
80
  - pass an explicit `--effort <level>` at launch time instead of relying on the CLI default; `medium` is the normal baseline and `xhigh` is the difficult-task override
@@ -113,7 +113,7 @@ Before sending any message into the live lane:
113
113
 
114
114
  1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
115
115
  2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
116
- 3. decide the prompt kind explicitly, such as `planning-start`, `planning-revision`, `scaffold-start`, `scaffold-review`, `development-slice`, `development-correction`, `bugfix-orientation`, `bugfix-fix`, `resume`, or `recovery`
116
+ 3. decide the prompt kind explicitly, such as `planning-start`, `planning-revision`, `scaffold-start`, `scaffold-review`, `development-run`, `development-correction`, `release-alignment-fix`, `bugfix-orientation`, `bugfix-fix`, `resume`, or `recovery`
117
117
  4. gather only the minimum accepted-plan sections, clarified requirements, boundary summary, and fresh deltas needed for this turn
118
118
  5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
119
119
 
@@ -127,7 +127,7 @@ For substantive live-lane turns, write the message in natural engineering langua
127
127
 
128
128
  - `Context snapshot`: the current accepted state and only the fresh deltas that matter now
129
129
  - `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
130
- - `This turn only`: the bounded deliverable for this turn and whether this is planning-only, scaffold-only, coding allowed, or correction-only
130
+ - `This turn only`: the bounded deliverable for this turn and whether this is planning-only, scaffold-only, the broad `plan.md` execution run, release-alignment correction work, or issue-fix work
131
131
  - `Expected outcomes now`: the exact behaviors, artifacts, or fixes that must exist before this turn can be considered successful
132
132
  - `Evidence required now`: the exact verification, file updates, or summaries Claude must return for owner review
133
133
  - `Disallowed shortcuts now`: future-work deferrals, placeholder implementations, bypassed auth/validation, fake verification, mixed-boundary drift, or other shortcuts that would make the result misleading
@@ -137,6 +137,7 @@ For substantive live-lane turns, write the message in natural engineering langua
137
137
  When the turn intentionally uses internal parallel fan-out, also include:
138
138
 
139
139
  - `Branch map`: the 2 or 3 independent branches, their boundaries, and their expected outputs
140
+ - `Owned files`: the files or file families each branch owns exclusively
140
141
  - `Shared constraints`: the contracts or files that must stay aligned across branches
141
142
  - `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
142
143
 
@@ -155,7 +156,9 @@ For the second planning-direction message in the first `develop` lane and for ot
155
156
  - include the initial planning view so Claude refines a direction instead of inventing one from zero
156
157
  - restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
157
158
  - say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
158
- - require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
159
+ - require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of the system architecture, modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
160
+ - require `plan.md` to carry the accepted scaffold playbook contract, the execution file tree, file ownership, pre-fan-out shared-file contract, and branch or worktree-ready execution checklist when appropriate
161
+ - require the accepted scaffold playbook contract to name the exact selected playbook or explicit fallback path, exact bootstrap command, required baseline surfaces, scaffold stop boundary, and scaffold acceptance evidence
159
162
  - require a concise changed-files summary with the planning response
160
163
 
161
164
  ### Planning-revision shape
@@ -171,27 +174,33 @@ When a planning draft is not good enough:
171
174
 
172
175
  When entering scaffold work:
173
176
 
177
+ - anchor the request to the accepted scaffold playbook contract in `plan.md` rather than asking Claude to infer the scaffold again from general stack intent
174
178
  - cite the relevant accepted design sections and the intended baseline runtime/test/config contract
179
+ - name the exact accepted playbook or fallback path and the exact bootstrap command that scaffold must follow now
175
180
  - state that the turn is scaffold-only and name the exact baseline surfaces expected now, such as app shell, routing skeleton, persistence skeleton, config wiring, logging path, validation path, auth foundation, test harness, or README baseline when they apply
181
+ - restate the accepted scaffold stop boundary and acceptance evidence directly from `plan.md`
176
182
  - state explicitly which feature work must not begin yet
177
183
  - require exact local verification evidence for the scaffold baseline and exact changed files
178
184
  - say to stop after the scaffold baseline is complete and verified
185
+ - if the accepted scaffold playbook contract is missing or still vague, do not start scaffold; correct planning first
179
186
 
180
- ### Development-slice shape
187
+ ### Development-run shape
181
188
 
182
- For ordinary implementation turns:
189
+ For the primary post-scaffold implementation run:
183
190
 
184
- - anchor the request to the relevant accepted plan sections and current boundary summary
185
- - name the exact slice, user/admin actor path, modules, or surfaces to complete now
186
- - itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
187
- - require targeted local verification tied back to those expected outcomes
188
- - explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
189
- - when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
190
- - say to stop after this slice and report the exact changed files plus exact verification results
191
+ - anchor the request to the accepted design and to `plan.md` as the definitive execution checklist
192
+ - say explicitly that `plan.md` should be worked end to end and updated in place from the main lane as items are completed
193
+ - make clear that the worker should finish the whole application rather than stopping after one narrow workstream unless a true blocker prevents completion
194
+ - require targeted local verification during the run and an honest final verification summary when the full plan has been completed
195
+ - explicitly prohibit broad verification commands that are reserved for later owner-run gate checks
196
+ - allow internal parallel fan-out where `plan.md` items can be executed independently with stable boundaries
197
+ - first establish the shared-file contract in the main lane, then default to separate branches or worktrees for mutually exclusive file sets when support is available, and otherwise default to parallel fan-out with the same ownership boundaries
198
+ - keep `plan.md`, `README.md`, and other shared-file integration in the main lane unless the accepted plan explicitly delegates them
199
+ - say to stop only after the whole implementation plan is complete or a real blocker prevents further progress
191
200
 
192
201
  ### Development-correction shape
193
202
 
194
- When the worker partially missed the slice or crossed boundaries:
203
+ When the worker partially missed the current workstream or crossed boundaries:
195
204
 
196
205
  - quote the exact missing outcome, regression risk, or evidence gap
197
206
  - ask for a correction-only turn focused on those gaps
@@ -202,9 +211,9 @@ When the worker partially missed the slice or crossed boundaries:
202
211
 
203
212
  When resuming a long-lived lane:
204
213
 
205
- - start from the stored boundary summary and the relevant accepted plan sections instead of replaying broad history
214
+ - start from the stored boundary summary, the relevant accepted design sections, and the current state of `plan.md` instead of replaying broad history
206
215
  - include only the new delta since the last accepted state
207
- - restate the current bounded task, evidence required, and stop boundary
216
+ - restate the current work as continuing from where the worker stopped in `plan.md`
208
217
  - do not re-dump the entire project or workflow unless continuity is genuinely broken
209
218
 
210
219
  ### Bugfix issue-turn shape
@@ -225,6 +234,7 @@ Do not do these:
225
234
  - ask for multiple gate exits in one turn
226
235
  - let Claude decide its own stopping point implicitly
227
236
  - pass parent-directory file paths as hidden instructions instead of restating the needed content directly
237
+ - send a scaffold prompt that tells Claude to choose or rediscover the playbook at runtime instead of using the accepted `plan.md` scaffold contract
228
238
  - paste raw bridge state, raw transcript payloads, or workflow bookkeeping into normal developer prompts
229
239
  - respond to a weak result by broadening the next prompt instead of correcting the specific gap
230
240
 
@@ -269,7 +279,7 @@ The purpose of this backend is to preserve one large complete conversation per b
269
279
 
270
280
  - the `develop` slot should stay one continuous Claude session unless irrecoverable failure forces replacement
271
281
  - the `bugfix` slot should stay one continuous Claude session unless irrecoverable failure forces replacement
272
- - do not start a fresh Claude worker for every slice, clarification, or review loop
282
+ - do not start a fresh Claude worker for every workstream, clarification, or review loop
273
283
  - do not roll sessions casually just because the conversation is long
274
284
  - internal Claude task sub-agents are allowed inside the same developer session when they help parallelize independent bounded work cleanly
275
285
  - prefer task fan-out for parallel discovery, repo reading, comparison, or verification passes when those branches can be merged back without ambiguity
@@ -296,8 +306,8 @@ Preferred second planning-direction message shape:
296
306
  - include the initial planning view so planning is refined collaboratively rather than invented from zero
297
307
  - add any short delta notes that are not already captured in that inlined summary
298
308
  - express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
299
- - require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
300
- - ask for repo-local planning artifacts plus a concise changed-files summary
309
+ - require the plan to fill the planning artifacts densely, especially `../docs/design.md`, and require a repo-local `plan.md` that turns the accepted design into an ordered execution checklist
310
+ - ask for the planning artifacts plus a concise changed-files summary
301
311
  - say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
302
312
 
303
313
  Do not tell the developer worker to read files outside `repo/`.
@@ -306,12 +316,12 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
306
316
 
307
317
  ### Adopted or repaired `develop-N` orientation handshake
308
318
 
309
- When work enters scaffold, development, integrated verification, hardening, or `fail`-routed remediation without a recoverable `develop-N` Claude session yet:
319
+ When work enters scaffold, end-to-end development, the fused release-alignment phase, or `fail`-routed remediation without a recoverable `develop-N` Claude session yet:
310
320
 
311
321
  1. launch the live `develop-N` lane needed for that boundary
312
322
  2. use the first message only to orient that session to the current repo and delivered state
313
323
  3. make clear in plain engineering language that the codebase already exists and work is continuing from the current state rather than starting from zero
314
- 4. say what kind of bounded follow-up work will come next, such as scaffold completion, a development slice, verification corrections, hardening, or evaluator-driven remediation
324
+ 4. say what kind of follow-up work will come next, such as scaffold completion, the broad `plan.md` execution run, release-alignment corrections, or evaluator-driven remediation
315
325
  5. wait for the first response and store the Claude session id from bridge `state.json`
316
326
  6. only after that orientation exchange, continue the same live lane with the first bounded work request
317
327
 
@@ -447,6 +457,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
447
457
  - mark the active developer session status as `rate_limited`
448
458
  - preserve the same Claude session id as the active tracked developer session
449
459
  - use the packaged `~/slopmachine/utils/claude_wait_for_rate_limit_reset.sh` helper or the built-in turn retry path to wait until the reset time specified by Claude, then continue from the same live lane
460
+ - do not require manual tmux interaction for the standard rate-limit path; the helper and built-in retry path should dismiss the Claude rate-limit popup automatically before waiting
450
461
  - update `../.ai/metadata.json` and Beads `SESSION:` or `HANDOFF:` comments to record the blocked state, wait window, and resumed continuity clearly
451
462
  - only surface the situation to the user if the reset time cannot be determined or the wait or resume path itself fails
452
463
 
@@ -77,10 +77,11 @@ If bootstrap seeded a later `current_phase` from `requested_start_phase`, verify
77
77
  - `../metadata.json` exists with project-fact fields
78
78
  - `../.ai/startup-context.md` exists
79
79
  - seeded parent-root docs exist, including `../docs/questions.md`, `../docs/design.md`, `../docs/test-coverage.md`, and `../docs/api-spec.md`
80
+ - seeded repo `plan.md` exists as the execution-plan file
80
81
  - parent-root `../.tmp/` exists as the evaluation artifact directory
81
82
  - seeded repo `README.md` exists
82
83
  - seeded repo `.claude/settings.json` exists with the repo-local Claude default-agent configuration
83
- - root workflow Beads exist for `P1` through `P10`
84
+ - root workflow Beads exist for the redesigned root lifecycle, including `P1`, `P2`, `P3`, `P4`, `P5`, `P7`, `P8`, `P9`, and `P10`
84
85
  - developer-session tracking is initialized
85
86
  - the backend-appropriate repo-local developer rulebook file has been chosen or is ready to be chosen in `P1`
86
87
 
@@ -171,7 +172,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata w
171
172
 
172
173
  - keep exactly one active developer session at a time
173
174
  - record every developer session in `developer_sessions`
174
- - from `P2` through `P6`, default to one long-lived `develop-1` lane
175
+ - from `P2` through `P5`, default to one long-lived `develop-1` lane
175
176
  - default the launch model for that long-lived lane to `opus` with `medium` effort
176
177
  - raise that lane to `opus` with `xhigh` effort only when the work is genuinely difficult enough to justify it
177
178
  - when launching a documentation-heavy or otherwise materially simpler lane, prefer `sonnet` with `medium` effort
@@ -248,14 +249,15 @@ For live Claude lanes specifically:
248
249
 
249
250
  - at meaningful accepted boundaries inside a long developer lane, refresh `last_result_summary` with a compact current-state snapshot instead of relying on the full prior conversation history
250
251
  - the boundary summary should capture only the current accepted contract, the current major guardrails, the most relevant changed areas, and the real unresolved issues that still matter
251
- - prefer boundary summaries at least at: accepted planning, scaffold acceptance, development-complete, integrated-verification completion, hardening completion, and bugfix-lane entry
252
+ - prefer boundary summaries at least at: accepted planning, scaffold acceptance, development-complete, `P5` release-alignment completion, and bugfix-lane entry
252
253
  - when resuming a long-lived developer lane, use the boundary summary plus the relevant accepted plan section before replaying or re-describing broader history
253
254
  - keep these summaries short and decision-oriented so they reduce future context drag instead of becoming another source of prompt bloat
254
255
 
255
256
  ## Initial structure rule
256
257
 
257
258
  - parent-root `../docs/` is the owner-maintained external documentation directory
258
- - parent-root `../sessions/` is the cleaned raw session-export directory for non-Claude developer sessions
259
+ - parent-root `../sessions/` is not part of bootstrap and should not be created during workspace init
260
+ - create parent-root `../sessions/` only during packaging when non-Claude developer sessions actually need cleaned export files
259
261
  - Claude-backed developer sessions are packaged once as parent-root `claude-sessions.zip` instead of per-session `../sessions/` entries
260
262
  - parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check.md`, and `test_coverage_and_readme_audit_report.md`
261
263
  - parent-root `../.ai/claude-live/` is the live Claude bridge runtime directory root
@@ -1,21 +1,24 @@
1
1
  ---
2
2
  name: development-guidance
3
- description: Developer-facing slice execution and local verification guidance for slopmachine.
3
+ description: Developer-facing end-to-end execution and local verification guidance for slopmachine.
4
4
  ---
5
5
 
6
6
  # Development Guidance
7
7
 
8
- Use this skill during `P4 Development` before prompting the developer.
8
+ Use this skill during `P4 End-to-End Development` before prompting the developer.
9
9
 
10
- ## Slice model
10
+ ## Plan execution model
11
11
 
12
- - work in bounded vertical slices
13
- - complete the real user-facing and admin-facing surface for the slice
14
- - keep slice-local planning, implementation, verification, and doc sync together
15
- - after planning is accepted, use the relevant accepted plan section as the slice baseline instead of expecting the owner to restate the full slice contract
16
- - when the owner provides a stage-exclusive checklist for the current slice or gate, treat that checklist as a hard acceptance contract and respond against it explicitly rather than answering loosely
17
- - before deeper implementation, do a quick serial-versus-parallel check for the current slice instead of defaulting to one long serial branch
18
- - when the slice contains 2 or 3 independent units with stable interfaces and low shared-file overlap, use parallel task fan-out for those units and then merge back into one reviewed result
12
+ - treat `plan.md` as the definitive implementation execution checklist
13
+ - after planning is accepted, execute the whole application by working through `plan.md` end to end rather than waiting for many narrow owner prompts
14
+ - update `plan.md` in place from the main developer lane as items move from not started to in progress to done
15
+ - use `../docs/design.md` for system intent and architecture only, and use `plan.md` for the execution file tree, exact execution order, file ownership, and progress state
16
+ - read the planned file tree and file-ownership map before deeper implementation so parallel work is driven by real file boundaries instead of vague feature labels
17
+ - before broad fan-out, establish the small shared-file contract in the main lane so parallel branches or worktrees start from the same stabilized shared files and interfaces
18
+ - treat `plan.md` as main-lane-owned during parallel work; branch lanes should report completion and let the main lane update `plan.md` after merge
19
+ - when the owner provides a bounded correction or release-alignment checklist, treat it as a hard acceptance contract and respond against it explicitly
20
+ - if interrupted, resume from the current state of `plan.md` instead of inventing a new hidden plan
21
+ - still use good engineering decomposition internally, but keep the visible execution contract anchored to `plan.md`
19
22
 
20
23
  ## Module implementation guidance
21
24
 
@@ -23,7 +26,7 @@ Use this skill during `P4 Development` before prompting the developer.
23
26
  - define the module purpose, constraints, and edge cases before coding
24
27
  - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
25
28
  - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
26
- - when working inside a slice, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims this slice could affect before reporting readiness
29
+ - when working inside a `plan.md` workstream, explicitly consider what adjacent flows, runtime paths, and documentation/spec claims that workstream could affect before reporting readiness
27
30
  - implement real behavior, not partial scattered logic
28
31
  - handle failure paths and boundary conditions
29
32
  - add or update tests as part of the module work
@@ -31,15 +34,18 @@ Use this skill during `P4 Development` before prompting the developer.
31
34
  - when backend or fullstack API endpoints are added or changed, add or update real HTTP tests for the exact `METHOD + PATH` where practical instead of relying only on controller/service-level tests
32
35
  - when mocked HTTP coverage or unit-only coverage still exists, keep it explicit in the coverage notes instead of overstating it as equivalent to true no-mock endpoint coverage
33
36
  - when backend or fullstack API tests are material, keep the test names, fixtures, or assertions audit-readable enough that a reviewer can trace the endpoint, request input, and expected response behavior statically
34
- - keep track of important modules that still lack meaningful tests so hardening does not have to rediscover them from scratch
35
- - define the branch contract before parallelizing: expected outcome, boundaries, shared constraints, merge condition, and required verification
37
+ - keep track of important modules that still lack meaningful tests so fused `P5` does not have to rediscover them from scratch
38
+ - define the branch contract before parallelizing: expected outcome, owned files, boundaries, shared constraints, merge condition, and required verification
39
+ - when `plan.md` marks mutually exclusive owned files, default to separate branches or worktrees for those sections when the environment supports it cleanly
40
+ - when worktree support is unavailable, still default to parallel branch or subagent execution using the same owned-file boundaries instead of falling back to serial work by habit
41
+ - keep `plan.md`, `README.md`, and other shared integration-heavy files in the main lane unless the accepted plan explicitly delegates them
36
42
  - keep parent-root `../docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
37
43
  - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
38
44
  - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
39
45
  - for backend or fullstack work, keep configuration reads on the shared config path instead of introducing new scattered direct environment access in feature code
40
46
  - keep frontend and backend contracts synchronized when the module spans both sides
41
47
  - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
42
- - before closing the slice, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this slice lands?
48
+ - before closing the current workstream, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this work lands?
43
49
  - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
44
50
  - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
45
51
  - verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
@@ -66,38 +72,39 @@ Use this skill during `P4 Development` before prompting the developer.
66
72
  - explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
67
73
  - before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
68
74
  - before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
69
- - verify the module against its planned behavior before trying to move on
70
- - do not move on while the module is still obviously weak or half-finished
71
- - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
75
+ - verify the current `plan.md` workstream against its planned behavior before trying to move on
76
+ - do not move on while the current `plan.md` workstream is still obviously weak or half-finished
77
+ - do not spread broad partial logic across many modules; bias toward completed trustworthy workstreams before opening the next major chunk
72
78
  - do not parallelize tightly coupled work that still depends on unresolved contracts, shared abstractions being invented in real time, or overlapping edits to the same files
79
+ - do not cross into files owned by another planned branch unless the accepted plan or current owner instruction explicitly opens that boundary
73
80
  - after parallel fan-in, run final targeted verification on the integrated result rather than trusting the branch-local checks alone
74
81
 
75
82
  ## Verification model
76
83
 
77
84
  - use targeted local verification by default
78
- - avoid broad project-standard gate commands during ordinary slice work
85
+ - avoid broad project-standard gate commands during ordinary `P4` implementation work
79
86
  - prefer fast local language-native or framework-native test commands for the changed area during normal iteration
80
87
  - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
81
- - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
88
+ - if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary `P4` implementation work
82
89
  - fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
83
90
  - do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
84
91
  - local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
85
- - do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the slice is considered complete
86
- - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
87
- - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
92
+ - do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the current `plan.md` workstream is considered complete
93
+ - do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary `P4` implementation work
94
+ - for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary `P4` implementation work
88
95
  - for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
89
- - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
90
- - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
96
+ - for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary `P4` implementation work rather than broad checkpoint commands
97
+ - when the current workstream materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
91
98
  - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
92
99
  - for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
93
100
  - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
94
101
  - when backend logging matters, keep request or route outcomes, exceptions, and background failure logging on the shared structured logging path with redaction intact
95
102
  - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
96
- - keep the test surface moving toward the hard minimum 90 percent coverage threshold as slices are completed, and do not defer obvious coverage debt to hardening
103
+ - keep the test surface moving toward the hard minimum 90 percent coverage threshold as `plan.md` workstreams are completed, and do not defer obvious coverage debt to fused `P5`
97
104
  - for backend or fullstack APIs, keep `../docs/test-coverage.md` moving toward an endpoint inventory plus API test mapping table, not just a generic risk matrix
98
- - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
99
- - when the owner names specific expected outcomes for the slice or gate, tie the reported verification and changed files back to those expected outcomes explicitly
100
- - keep ordinary slice-complete replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
105
+ - in each development follow-up or completion reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
106
+ - when the owner names specific expected outcomes for the current workstream or gate, tie the reported verification and changed files back to those expected outcomes explicitly
107
+ - keep ordinary development follow-up replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
101
108
 
102
109
  ## Quality rules
103
110
 
@@ -17,7 +17,7 @@ Use this skill during `P7 Evaluation and Fix Verification` after a fresh audit r
17
17
  - do not silently drop, merge away, or wave through issues from the current audit report
18
18
  - the owner must read the current audit report and extract the issues before talking to the developer
19
19
  - after the developer claims the fixes are complete for a `partial pass` audit, return to the same evaluator session that produced that audit report
20
- - keep ordinary post-hardening evaluation remediation inside `P7`
20
+ - keep ordinary post-`P5` evaluation remediation inside `P7`
21
21
 
22
22
  ## Fresh-audit result handling
23
23
 
@@ -1,102 +1,12 @@
1
1
  ---
2
2
  name: hardening-gate
3
- description: Release-readiness hardening rules for slopmachine.
3
+ description: Compatibility shim for the merged integrated-verification-and-hardening phase.
4
4
  ---
5
5
 
6
6
  # Hardening Gate
7
7
 
8
- Use this skill only during `P6 Hardening`.
8
+ `P6 Hardening` no longer exists as a standalone workflow phase.
9
9
 
10
- ## Hardening audit priorities
10
+ Use `integrated-verification` as the canonical source of truth for the merged `P5 Integrated Verification and Hardening` phase.
11
11
 
12
- The hardening phase should explicitly prepare the project to pass the final audit in these priority areas:
13
-
14
- 1. prompt-fit
15
- 2. security-critical flaws
16
- 3. test sufficiency
17
- 4. major engineering quality
18
-
19
- Hardening should treat these as the main review buckets before final evaluation begins.
20
-
21
- ## Hardening scope
22
-
23
- - dependency hygiene
24
- - secret and config hygiene
25
- - prototype residue cleanup
26
- - docs honesty
27
- - observability and redaction hygiene
28
- - fragile-test and release-readiness cleanup
29
-
30
- ## Hardening guidance
31
-
32
- - run a prompt-fit sweep for silent requirement substitution, partially delivered hard requirements, frontend/backend mismatch, and business-flow drift
33
- - audit security boundaries, validation, ownership, and secret handling
34
- - prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
35
- - audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
36
- - audit whether parent-root `../docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, gaps, and the intended minimum 90 percent threshold in a way the owner can follow quickly
37
- - audit whether the project actually meets the minimum 90 percent coverage threshold for the relevant behavior surface rather than relying on a thin happy-path suite
38
- - require concrete coverage evidence during hardening, such as a stack-native coverage report, configured threshold, or equally explicit proof; do not accept approximate claims here
39
- - when backend or fullstack APIs exist, audit whether `../docs/test-coverage.md` includes a resolved endpoint inventory, API test mapping, mock classification, and the important modules that still lack meaningful tests
40
- - when backend or fullstack APIs exist, audit whether core endpoint coverage is truly no-mock HTTP where it matters, and whether mocked or indirect tests are being overstated as stronger evidence than they are
41
- - audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
42
- - inspect architecture, coupling, file size, and maintainability risks
43
- - focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
44
- - check for bad engineering practices that accumulated during implementation
45
- - tighten weak tests, weak docs, and weak operational instructions
46
- - audit static review readiness: entry points, routes, config, README, and test commands should be traceably consistent without depending on runtime tribal knowledge
47
- - audit that the repo is self-sufficient and does not rely on parent-root docs or sibling workflow artifacts for static reviewability
48
- - audit owner-maintained external docs under `../docs/` when relevant, especially `design.md`, `api-spec.md`, `test-coverage.md`, and `questions.md`
49
- - audit static security-boundary readiness: a fresh reviewer should be able to trace auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation from repository artifacts when applicable
50
- - if mock, stub, fake, interception, or local-data behavior exists, verify that its scope, default state, and boundaries are disclosed accurately and do not imply undisclosed real integration
51
- - audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
52
- - audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
53
- - audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
54
- - for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
55
- - audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
56
- - verify that missing failure handling is not being hidden behind fake-success behavior
57
- - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
58
- - re-check frontend and backend observability, redaction, and operator visibility paths
59
- - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
60
- - enforce env-file discipline during hardening
61
- - run documentation verification against the real codebase and runtime behavior, not just document existence
62
- - if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
63
- - verify that every dependency needed by the README-documented `docker compose up --build` and `./run_tests.sh` paths is declared in Dockerfiles or other repo-controlled container build definitions rather than relying on host-installed packages or runtimes
64
- - audit README compliance against the strict post-bugfix README review shape:
65
- - project type near the top
66
- - startup instructions
67
- - access method
68
- - verification method
69
- - demo credentials for every known role or the exact statement `No authentication required`
70
- - architecture and workflow clarity
71
- - for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
72
- - verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
73
- - before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
74
- - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
75
- - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
76
- - make sure the system is genuinely reviewable and reproducible
77
- - keep hardening narrow: do not turn this phase into a hidden extra development slice or a broad rediscovery pass
78
- - prefer final honesty, consistency, static-review, and release-readiness cleanup over new implementation work
79
-
80
- ## Required hardening output
81
-
82
- Before `P6` can close, the owner should have a clear answer for each of these:
83
-
84
- - prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
85
- - security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
86
- - test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
87
- - coverage depth: does the current evidence prove the minimum 90 percent coverage threshold for the relevant behavior surface, and if not, what remains weak?
88
- - endpoint coverage readiness: if backend or fullstack APIs exist, could a strict static reviewer map the important `METHOD + PATH` surfaces to true no-mock HTTP tests, mocked HTTP tests, or unit-only coverage without guessing?
89
- - major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
90
- - static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
91
- - security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
92
- - coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
93
- - README hard-gate readiness: would a fresh static reviewer find the required project type, startup, access, verification, and auth-disclosure sections in `README.md` without reconstructing them from code?
94
- - frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
95
- - repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?
96
-
97
- ## Rules
98
-
99
- - do not start hardening until integrated verification is explicitly stable
100
- - hardening is not a disguised second integrated phase
101
- - if hardening exposes unresolved integrated instability, reopen the earlier phase cleanly
102
- - do not use hardening for broad feature work
12
+ This file remains only as a compatibility shim so older references do not silently point to stale behavior.