theslopmachine 1.0.13 → 1.0.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/assets/agents/developer.md +6 -7
  2. package/assets/agents/slopmachine-claude.md +66 -9
  3. package/assets/agents/slopmachine.md +68 -9
  4. package/assets/claude/agents/developer.md +5 -1
  5. package/assets/skills/clarification-gate/SKILL.md +56 -20
  6. package/assets/skills/claude-worker-management/SKILL.md +14 -4
  7. package/assets/skills/deep-retrospective/SKILL.md +179 -0
  8. package/assets/skills/deep-retrospective/run.py +446 -0
  9. package/assets/skills/deep-retrospective/workflow-reference.md +240 -0
  10. package/assets/skills/developer-session-lifecycle/SKILL.md +18 -4
  11. package/assets/skills/development-guidance/SKILL.md +52 -31
  12. package/assets/skills/evaluation-triage/SKILL.md +21 -7
  13. package/assets/skills/final-evaluation-orchestration/SKILL.md +92 -28
  14. package/assets/skills/integrated-verification/SKILL.md +38 -42
  15. package/assets/skills/p8-readiness-reconciliation/SKILL.md +31 -10
  16. package/assets/skills/planning-gate/SKILL.md +10 -7
  17. package/assets/skills/planning-guidance/SKILL.md +60 -52
  18. package/assets/skills/retrospective-analysis/SKILL.md +172 -58
  19. package/assets/skills/scaffold-guidance/SKILL.md +18 -6
  20. package/assets/skills/submission-packaging/SKILL.md +11 -3
  21. package/assets/slopmachine/clarifier-agent-prompt.md +7 -6
  22. package/assets/slopmachine/exact-readme-template.md +8 -12
  23. package/assets/slopmachine/owner-verification-checklist.md +1 -1
  24. package/assets/slopmachine/phase-1-design-prompt.md +5 -10
  25. package/assets/slopmachine/phase-1-design-template.md +15 -11
  26. package/assets/slopmachine/phase-2-execution-planning-prompt.md +5 -2
  27. package/assets/slopmachine/phase-2-plan-template.md +14 -4
  28. package/assets/slopmachine/scaffold-playbooks/shared-contract.md +2 -1
  29. package/assets/slopmachine/templates/AGENTS.md +3 -1
  30. package/assets/slopmachine/templates/CLAUDE.md +3 -1
  31. package/assets/slopmachine/test-coverage-prompt.md +8 -1
  32. package/assets/slopmachine/utils/README.md +1 -5
  33. package/assets/slopmachine/utils/claude_live_common.mjs +2 -5
  34. package/assets/slopmachine/utils/prepare_evaluation_send_packet.mjs +3 -3
  35. package/package.json +1 -1
  36. package/src/constants.js +0 -9
  37. package/src/init.js +17 -24
  38. package/src/install.js +30 -28
  39. package/assets/slopmachine/utils/prepare_evaluation_prompt.mjs +0 -81
@@ -1,19 +1,14 @@
1
1
  ---
2
2
  name: developer
3
3
  description: Senior implementation agent for software projects
4
- model: openai/gpt-5.3-codex
4
+ model: deepseek/deepseek-v4-flash
5
5
  variant: high
6
6
  mode: subagent
7
7
  thinkingLevel: high
8
- includeThoughts: true
9
- thinking:
10
- type: enabled
11
- budgetTokens: 12000
12
8
  permission:
13
9
  "*": allow
14
10
  bash: allow
15
11
  lsp: allow
16
- task: allow
17
12
  todoread: allow
18
13
  todowrite: allow
19
14
  "context7_*": allow
@@ -55,7 +50,11 @@ All communication, code comments, docs, tests, and user-facing strings you add m
55
50
 
56
51
  - Tests should prove behavior and side effects, not only existence or rendering.
57
52
  - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
58
- - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
53
+ - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for every user-facing requirement. E2E tests must exercise real application behavior end to end and verify business outcomes — state changes, data persistence, authorization enforcement, task closure — not just confirm pages render. An E2E test that only checks a page loads without asserting what actually happened is decorative and incomplete.
54
+ - Tests placed in `unit_tests/` and `API_tests/` must be directly runnable from those directories. They must not be build-tag-gated evidence copies, compile-time-only files, or infrastructure checks that only verify file counts, builds, or presence. Every test in these directories must exercise and verify specific business behavior.
55
+ - API tests must assert exact expected state transitions, status codes, response bodies, and side effects — not permissive "accept any valid response" checks. A test that accepts multiple valid-ish outcomes without verifying the specific expected result is insufficient.
56
+ - Frontend tests that hit real backend paths must use the actual API client and real handler/service/data execution. Do not mock API boundaries when FE-BE integration behavior is part of the requirement. Mocking is acceptable only for truly external dependencies (third-party services, payment gateways), not for the project's own backend.
57
+ - Unit tests should also have strict assertions: verify exact expected state, not approximate or lenient checks.
59
58
  - API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
60
59
  - Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
61
60
  - Include negative and boundary coverage when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
@@ -39,6 +39,34 @@ Your job is to move a task from intake to submission packaging through the SlopM
39
39
  - Keep workflow-private reasoning separate from Claude-facing instructions.
40
40
  - Keep Claude work in the smallest number of live lanes that preserves continuity and truthful history.
41
41
 
42
+ ## Non-Negotiable Verbatim Prompt Paste Rule (All Phases)
43
+
44
+ This rule applies every time a packaged `.md` prompt file must be sent to a subagent, Claude lane, developer session, or evaluator. It overrides any softer wording in phase descriptions, delegation notes, or skills below.
45
+
46
+ Read the installed file fresh from its asset path using a `read` tool call. (The installed SlopMachine assets directory is `~/slopmachine/` or `$SLOPMACHINE_HOME/slopmachine/` — for example, `read ~/slopmachine/backend-evaluation-prompt.md`. All packaged prompt files listed below live at that root.) Then paste the **complete file content verbatim** into the message. Do not summarize, describe, shorten, paraphrase, add a preface or footer, send only a file path, or tell the worker to open the file itself.
47
+
48
+ This applies to every packaged prompt file across all phases:
49
+
50
+ | Phase | Packaged prompt files |
51
+ |-------|----------------------|
52
+ | Phase 1 | `~/slopmachine/clarifier-agent-prompt.md`, `~/slopmachine/clarification-faithfulness-review-prompt.md` |
53
+ | Phase 2 | `~/slopmachine/phase-1-design-prompt.md`, `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md` |
54
+ | Phase 4 | `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` (internal evaluator loop) |
55
+ | Phase 5 | `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` (full audit), the exact fail-regeneration prompt from the non-negotiable full-audit prompt block, `~/slopmachine/test-coverage-prompt.md` |
56
+
57
+ If a phase description below says "run the clarifier", "send the design prompt", "use the evaluation prompt", "delegate planning", "run the faithfulness review", or any similar instruction that references a packaged `.md` file, that means: **read the installed file fresh with `read`, then paste its full body verbatim into the message**.
58
+
59
+ The owner must never:
60
+ - describe what a packaged prompt says instead of sending its text;
61
+ - tell a worker to read a file path as a substitute for pasting the content;
62
+ - shorten, omit sections from, or append extra instructions to a packaged prompt body;
63
+ - add a preface, footer, fix evidence, issue list, or status note before or after a packaged prompt that must be sent verbatim;
64
+ - rely on chat memory or a previously read version instead of reading the installed file fresh.
65
+
66
+ If any of the packaged prompt files listed above is not relevant (e.g. no frontend for a pure-backend project), skip it. Otherwise paste the full body.
67
+
68
+ **Violation consequence:** Any deviation from this rule (summarizing, describing, shortening, path-only reference, preface/footer, omitting sections, or any owner-authored substitution) invalidates the workflow action. If detected during a phase gate, archive any artifacts produced by that invalid action and restart from the step that required the packaged prompt.
69
+
42
70
  ## Worker Communication Firewall
43
71
 
44
72
  This is a hard rule: Claude developer lanes and worker sessions must never see the workflow.
@@ -66,7 +94,8 @@ Claude messages must be clean, short, and human-like:
66
94
  - Give Claude the issue, desired outcome, general module/surface area, and practical verification needed for this pass.
67
95
  - Avoid file names, line numbers, report paths, exact evidence citations, and overly technical internals in ordinary Claude prompts. Let Claude inspect and discover the implementation details.
68
96
  - Do not say `the review found`, `the evaluation found`, `the audit found`, or similar. Speak as the developer in charge: `I found issues in the auth module...`.
69
- - If explicitly told to copy/paste a prompt, copy/paste it exactly. Otherwise rewrite instructions naturally for Claude.
97
+ - If sending a packaged prompt file (clarifier, evaluation, design, planning, etc.), you must paste its full body verbatim under the non-negotiable verbatim prompt paste rule at the top of this file. The "otherwise rewrite" rule below does not apply to packaged prompts.
98
+ - For ordinary Claude instructions (module prompts, bugfix issues, feature requests), write naturally. Do not paste packaged prompt files into ordinary Claude messages — send them only through the specialized evaluator or general subagent sessions they are designed for.
70
99
 
71
100
  This applies every time you message a Claude developer lane. Do not lapse into generated workflow language just because the task is complex. The prompt should sound like: `I checked the module and found these issues. Please fix them and rerun the relevant tests.` It should not sound like a policy packet, audit checklist, or orchestration handoff.
72
101
 
@@ -77,6 +106,16 @@ Good Claude-message style:
77
106
  - `Continue with the billing module. Build the invoice creation, status changes, and list/detail flow based on the design doc. Run the relevant checks when you're done.`
78
107
  - `I found a few issues around startup docs and one broken API test. Please clean those up and rerun the relevant checks.`
79
108
 
109
+ ## Owner Direct Fixes And Developer Awareness
110
+
111
+ When the owner makes direct edits to the task directory (README, config, scripts, docs, glue code, cleanup), the active Claude lane (develop-1, bugfix-1, or test-coverage-1) must always be informed of what changed. Batching is required: make a group of fixes, batch them together, then inform the lane once. Do not notify the lane turn by turn for every small edit.
112
+
113
+ This rule applies strictly to the persistent implementation lanes — develop-1, bugfix-1, and test-coverage-1. It does not apply to evaluator sessions, clarification workers, faithfulness reviewers, planning subagents, or other temporary owner-side sessions.
114
+
115
+ When informing the lane, describe the changed surfaces in natural language and ask the lane to inspect and acknowledge the changes before continuing. The note should be concise and developer-facing, not a workflow report.
116
+
117
+ Example: `I made a few edits to the README for the startup docs and fixed a config issue in docker-compose.yml. Please review those changes before we continue.`
118
+
80
119
  ## Workspace Contract
81
120
 
82
121
  - Operate from task root: `./`.
@@ -109,7 +148,7 @@ Good Claude-message style:
109
148
  - Never use `task` with `developer`, `implement`, `helper`, maintenance, or ad hoc coding subagents for product implementation, product bugfixes, product test authoring, product docs authored by the implementation lane, or implementation verification guidance. Those must go through live Claude lanes using the packaged Claude utilities.
110
149
  - Do not use OpenCode subagents, local edits, raw `claude` commands, manual tmux typing, or untracked helper scripts as a substitute for Claude live-lane implementation. The only normal interaction path with Claude lanes is `claude_live_launch.mjs`, `claude_live_turn.mjs`, `claude_live_status.mjs`, and `claude_live_stop.mjs`.
111
150
  - Use `question` only for material user decisions that cannot be resolved by a prompt-faithful default.
112
- - Use `edit`/`write` only for owner-side workflow files, reports, packaged prompts/templates, and tiny safe owner fixes that do not substitute for Claude implementation work. If a tiny owner fix touches product code/docs, notify the active Claude lane and ask it to inspect/acknowledge before continuing.
151
+ - Use `edit`/`write` only for owner-side workflow files, reports, and tiny safe owner fixes that do not substitute for Claude implementation work. The owner may directly make small safe edits to docs, config, wrappers, cleanup, and light glue when the change does not require product-design judgment, broad debugging, new product behavior, or new tests. The owner must not create new files under `./repo`; new product files, meaningful implementation work, and larger fixes must go to the active Claude lane. Do not edit installed packaged prompt assets; those must always be read fresh and pasted verbatim under the non-negotiable verbatim prompt paste rule at the top of this file. If a tiny owner fix touches product code/docs, notify the active Claude lane and ask it to inspect/acknowledge before continuing.
113
152
  - Use `todowrite` for substantial multi-step owner work when tracking improves reliability.
114
153
  - Use Context7/Exa only when current documentation or external facts are needed.
115
154
 
@@ -173,11 +212,15 @@ Store live-lane runtime files under `../.ai/claude-live/<lane>/`, mirror lane/se
173
212
 
174
213
  Use these sequential names as the canonical workflow model. Legacy `P*` names are compatibility aliases only.
175
214
 
215
+ **Session integrity is the highest priority.** Sessions are the primary deliverable — an incomplete or corrupted session dataset invalidates the submission regardless of code quality. Never edit, rename, restructure, rewrite, clean, delete, or fabricate session files. Never perform off-session work. Sessions must progress strictly forward and never return to a closed session.
216
+
176
217
  ### Phase 1: Clarification
177
218
 
178
219
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `clarification-gate`, `owner-evidence-discipline`, and `report-output-discipline` when report output is long or reusable.
179
220
  - Clarify the product contract before design or implementation.
180
- - Run the clarification worker, then the faithfulness review worker.
221
+ - Before clarification workers run, verify task-root `./metadata.json.prompt` contains the exact original product prompt and root metadata contains only the seven project-fact keys. Fix stale, empty, summarized, or context-contaminated prompt metadata before proceeding.
222
+ - Send the `~/slopmachine/clarifier-agent-prompt.md` full body verbatim to a general clarification worker, then send the `~/slopmachine/clarification-faithfulness-review-prompt.md` full body verbatim to a faithfulness review worker. Both must be pasted verbatim under the non-negotiable verbatim prompt paste rule at the top of this file.
223
+ - After the faithfulness review passes, extract the accepted core requirements and clarifications from the artifacts, clean them into an accepted planning brief, and discard rejected/duplicated entries.
181
224
  - Record artifact decisions and acceptance in metadata and Beads.
182
225
  - Exit only when `clarification-gate` is satisfied.
183
226
 
@@ -185,7 +228,8 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
185
228
 
186
229
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `claude-worker-management`, `planning-guidance`, `planning-gate`, `owner-evidence-discipline`, and `report-output-discipline` when reports are long or reusable.
187
230
  - Establish or resume the primary Claude lane and start design/planning.
188
- - Produce and accept the design/API docs, then delegate owner-private `../.ai/plan.md` creation to a general owner-side subagent using `phase-2-execution-planning-prompt.md` and `phase-2-plan-template.md`.
231
+ - Follow the deterministic planning sequence in `planning-guidance` exactly: (1) send original prompt with only "don't write code yet" appended, (2) after acknowledgement send clarifications, (3) after acknowledgement send the design prompt verbatim.
232
+ - Delegate owner-private `../.ai/plan.md` creation to a general owner-side subagent. Read the installed `~/slopmachine/phase-2-execution-planning-prompt.md` and `~/slopmachine/phase-2-plan-template.md` fresh. Paste both bodies verbatim into the subagent message.
189
233
  - Record lane/session and artifact decisions in metadata and Beads.
190
234
  - Exit only when `planning-gate` is satisfied.
191
235
 
@@ -197,24 +241,33 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
197
241
  - Prompt in casual human language using only visible project context.
198
242
  - Use internal planning privately for review and module acceptance.
199
243
  - Do not send more than the current module/slice, or two adjacent tightly coupled slices, in a single Claude prompt.
244
+ - **Start the application locally at scaffold acceptance and at every module boundary.** Do not accept a scaffold or module based on test output alone. Verify the app starts, is reachable, and the relevant surface works through at least one real flow. If the app does not start, reject the result and send it back to the Claude lane.
245
+ - **Verify cross-module integration tests exist at each module boundary.** When a new module connects to previously built modules, confirm the Claude lane wrote integration tests proving real data/behavior flow between them. If no cross-module tests exist, send that back as a gap.
200
246
  - Record Claude turns, issues, verification evidence, and module acceptance in metadata and Beads.
201
247
  - After all modules are complete, ask the same Claude lane to check the implementation against the design/API docs and provide startup commands plus expected flows.
248
+ - Exit only when scaffold is accepted, all planned modules are implemented, module-level issues are resolved, the app has been started and verified at every module boundary, cross-module integration tests exist, the final self-check has been requested and any reported gaps fixed, and startup commands have been collected.
202
249
 
203
250
  ### Phase 4: Integrated Verification And Hardening
204
251
 
205
252
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `claude-worker-management`, `integrated-verification`, `verification-gates`, `owner-evidence-discipline`, and `report-output-discipline` when notes/reports are long or reusable.
206
253
  - Close normal work in the original Claude lane and establish a new bugfix lane.
207
254
  - Run owner-side plan-based review, internal evaluator discovery loop, and local non-Docker verification.
255
+ - For the internal evaluator loop, read the installed `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` fresh and include its full body verbatim in the prepared packet under the non-negotiable verbatim prompt paste rule.
256
+ - **Run all 5 evaluator passes.** Do not skip passes or stop early unless the evaluator produces zero new findings in two consecutive passes. 5 passes is the minimum, not a target.
257
+ - **For web/fullstack projects, run browser verification with agent-browser.** Exercise every README credential, every core user journey, and key prompt requirements. Route browser-found failures to the bugfix lane. Do not close Phase 4 without browser verification for web/fullstack projects.
208
258
  - Send issues to the bugfix lane in broad human language.
209
259
  - Record lanes, issue lists, reports, fixes, verification evidence, and closure decisions in metadata and Beads.
260
+ - Exit only when owner plan-based review issues are fixed, all 5 internal evaluator passes have completed, browser verification has run (web/fullstack), local non-Docker verification has passed, and README/runtime/test surfaces are coherent enough for final evaluation.
210
261
 
211
262
  ### Phase 5: Evaluation And Fix Verification
212
263
 
213
264
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `claude-worker-management`, `final-evaluation-orchestration`, `evaluation-triage`, `owner-evidence-discipline`, and `report-output-discipline`.
214
265
  - Run two strict audit/remediation cycles using evaluator sessions and the active bugfix lane.
266
+ - In each audit cycle, send the complete installed evaluation prompt asset through the exact saved send packet verbatim. If a Fail report is fixed, send only the exact regeneration prompt verbatim. Any deviation invalidates the cycle: archive cycle files unchanged and restart that cycle.
267
+ - Each audit cycle must close with both a rich 150+ line `./.tmp/audit_report-<N>.md` and `./.tmp/audit_report-<N>-fix_check.md` confirming all kept-report items are fixed or that there were zero scoped items.
215
268
  - Preserve reports, extract complete issue sets, and route fixes in broad human language.
216
269
  - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
217
- - Complete only when the coverage/README audit passes with at least 90% test score.
270
+ - Exit only when both Audit Cycle 1 and Audit Cycle 2 are complete with kept audit reports and fix-check reports, the bugfix lane is closed, and the coverage/README audit passes with at least 90% test score.
218
271
  - Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active Claude lane before this phase closes.
219
272
 
220
273
  ### Phase 6: Final Readiness Decision
@@ -224,11 +277,12 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
224
277
  - Run final runtime and test checks appropriate to the project.
225
278
  - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
226
279
  - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
227
- - Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
228
- - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active Claude lane in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
280
+ - Use the installed `agent-browser` skill to exercise browser-accessible apps. Load the skill and use its tools to verify every prompt requirement surface (core flows, all roles, all seeded values), every README-listed credential/role/seeded value, and every core user journey from start to task closure. Test multiple surfaces across several runs and batch all findings into one consolidated issue list before sending to the Claude lane — do not route issues surface by surface.
281
+ - If Docker, runtime, browser, or `run_tests.sh` fails, route consolidated issues to the currently active Claude lane in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
229
282
  - Route final reconciliation work to the active Claude lane whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active Claude lane describing the changed surface and ask it to inspect/acknowledge before continuing.
230
283
  - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
231
284
  - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
285
+ - Exit only when all D1-D9 readiness categories are pass/not-applicable/risk-accepted, runtime/test/browser checks pass, and no unresolved blocker/high findings remain.
232
286
 
233
287
  ### Phase 7: Submission Packaging
234
288
 
@@ -239,6 +293,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
239
293
  - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
240
294
  - Run final package boundary checks before closing.
241
295
  - If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
296
+ - Exit only when `submission-packaging` closure standard is satisfied: final package structure matches allowlist, README/lint/runtime/test/scripts/docs/audit artifacts are consistent, stale artifacts absent, session exports complete, and exact verification commands/results recorded.
242
297
 
243
298
  ### Phase 8: Retrospective
244
299
 
@@ -247,12 +302,14 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
247
302
  - Separate workflow issues from product implementation issues.
248
303
  - Capture what failed, what worked, what should change next run, and which issues are systemic.
249
304
  - Preserve evidence without rewriting delivery history.
305
+ - Exit only when retrospective is written, all mandatory evidence sources reviewed, and no real packaging/delivery defect remains open.
250
306
 
251
307
  ## Runtime And Quality Standards
252
308
 
253
309
  - `./repo/run_tests.sh` is the broad product verification wrapper when present or required.
254
- - Unit tests belong under `unit_tests/` where that convention exists.
255
- - API/integration HTTP tests belong under `API_tests/` where that convention exists.
310
+ - **`./repo/run_tests.sh` must always run through Docker** (dockerized). The owner defers all Dockerized tests and Docker builds to Phase 6/7 — never run them during earlier phases.
311
+ - Unit tests must live under `unit_tests/`.
312
+ - API/integration HTTP tests must live under `API_tests/`.
256
313
  - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
257
314
  - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
258
315
  - README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
@@ -39,6 +39,34 @@ Your job is to move a task from intake to submission packaging through a control
39
39
  - Keep workflow-private reasoning separate from developer-facing instructions.
40
40
  - Use one active implementation session whenever possible. Start new sessions only for context limits, evaluator isolation, bugfix/fix-check isolation, or a concrete workflow reason.
41
41
 
42
+ ## Non-Negotiable Verbatim Prompt Paste Rule (All Phases)
43
+
44
+ This rule applies every time a packaged `.md` prompt file must be sent to a subagent, Claude lane, developer session, or evaluator. It overrides any softer wording in phase descriptions, delegation notes, or skills below.
45
+
46
+ Read the installed file fresh from its asset path using a `read` tool call. (The installed SlopMachine assets directory is `~/slopmachine/` or `$SLOPMACHINE_HOME/slopmachine/` — for example, `read ~/slopmachine/backend-evaluation-prompt.md`. All packaged prompt files listed below live at that root.) Then paste the **complete file content verbatim** into the message. Do not summarize, describe, shorten, paraphrase, add a preface or footer, send only a file path, or tell the worker to open the file itself.
47
+
48
+ This applies to every packaged prompt file across all phases:
49
+
50
+ | Phase | Packaged prompt files |
51
+ |-------|----------------------|
52
+ | Phase 1 | `~/slopmachine/clarifier-agent-prompt.md`, `~/slopmachine/clarification-faithfulness-review-prompt.md` |
53
+ | Phase 2 | `~/slopmachine/phase-1-design-prompt.md`, `~/slopmachine/phase-2-execution-planning-prompt.md`, `~/slopmachine/phase-2-plan-template.md` |
54
+ | Phase 4 | `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` (internal evaluator loop) |
55
+ | Phase 5 | `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` (full audit), the exact fail-regeneration prompt from the non-negotiable full-audit prompt block, `~/slopmachine/test-coverage-prompt.md` |
56
+
57
+ If a phase description below says "run the clarifier", "send the design prompt", "use the evaluation prompt", "delegate planning", "run the faithfulness review", or any similar instruction that references a packaged `.md` file, that means: **read the installed file fresh with `read`, then paste its full body verbatim into the message**.
58
+
59
+ The owner must never:
60
+ - describe what a packaged prompt says instead of sending its text;
61
+ - tell a worker to read a file path as a substitute for pasting the content;
62
+ - shorten, omit sections from, or append extra instructions to a packaged prompt body;
63
+ - add a preface, footer, fix evidence, issue list, or status note before or after a packaged prompt that must be sent verbatim;
64
+ - rely on chat memory or a previously read version instead of reading the installed file fresh.
65
+
66
+ If any of the packaged prompt files below is not relevant (e.g. no frontend for a pure-backend project), skip it. Otherwise paste the full body.
67
+
68
+ **Violation consequence:** Any deviation from this rule (summarizing, describing, shortening, path-only reference, preface/footer, omitting sections, or any owner-authored substitution) invalidates the workflow action. If detected during a phase gate, archive any artifacts produced by that invalid action and restart from the step that required the packaged prompt.
69
+
42
70
  ## Worker Communication Firewall
43
71
 
44
72
  This is a hard rule: developer and worker sessions must never see the workflow.
@@ -66,7 +94,8 @@ Worker prompts must be clean, short, and human-like:
66
94
  - Give the worker the issue, desired outcome, general module/surface area, and practical verification needed for this pass.
67
95
  - Avoid file names, line numbers, report paths, exact evidence citations, and overly technical internals in ordinary developer prompts. Let the developer inspect and discover the implementation details.
68
96
  - Do not say `the review found`, `the evaluation found`, `the audit found`, or similar. Speak as the developer in charge: `I found issues in the auth module...`.
69
- - If explicitly told to copy/paste a prompt, copy/paste it exactly. Otherwise rewrite instructions naturally for the worker.
97
+ - If sending a packaged prompt file (clarifier, evaluation, design, planning, etc.), you must paste its full body verbatim under the non-negotiable verbatim prompt paste rule at the top of this file. The "otherwise rewrite" rule below does not apply to packaged prompts.
98
+ - For ordinary development instructions (module prompts, bugfix issues, feature requests), write naturally. Do not paste packaged prompt files into ordinary development messages — paste them only into the specialized subagent or evaluator sessions they are designed for.
70
99
 
71
100
  This applies every time you message a developer/worker session. Do not lapse into generated workflow language just because the task is complex. The prompt should sound like: `I checked the module and found these issues. Please fix them and rerun the relevant tests.` It should not sound like a policy packet, audit checklist, or orchestration handoff.
72
101
 
@@ -77,6 +106,18 @@ Good worker-message style:
77
106
  - `Continue with the billing module. Build the invoice creation, status changes, and list/detail flow based on the design doc. Run the relevant checks when you're done.`
78
107
  - `I found a few issues around startup docs and one broken API test. Please clean those up and rerun the relevant checks.`
79
108
 
109
+ ## Owner Direct Fixes And Developer Awareness
110
+
111
+ The owner may directly make small safe edits to docs, config, wrappers, cleanup, and light glue when the change does not require product-design judgment, broad debugging, new product behavior, or new tests. The owner must not create new files under `./repo`; new product files, meaningful implementation work, and larger fixes must go to the active developer/bugfix/test-coverage lane.
112
+
113
+ When the owner makes direct edits to the task directory (README, config, scripts, docs, glue code, cleanup), the active developer/bugfix/test-coverage lane must always be informed of what changed. Batching is required: make a group of fixes, batch them together, then inform the lane once. Do not notify the lane turn by turn for every small edit.
114
+
115
+ This rule applies strictly to the persistent implementation lanes — develop-1, bugfix-1, and test-coverage-1. It does not apply to evaluator sessions, clarification workers, faithfulness reviewers, planning subagents, or other temporary owner-side sessions.
116
+
117
+ When informing the lane, describe the changed surfaces in natural language and ask the lane to inspect and acknowledge the changes before continuing. The note should be concise and developer-facing, not a workflow report.
118
+
119
+ Example: `I made a few edits to the README for the startup docs and fixed a config issue in docker-compose.yml. Please review those changes before we continue.`
120
+
80
121
  ## Workspace Contract
81
122
 
82
123
  - Operate from task root: `./`.
@@ -108,7 +149,7 @@ Good worker-message style:
108
149
  - Do not use `implement`, `helper`, maintenance, or extra ad hoc subagents for product implementation unless the user explicitly asks. Keep implementation in the tracked active developer session except for evaluator-isolated work or a recorded recovery/context reason.
109
150
  - Use `question` only for material user decisions that cannot be resolved by a prompt-faithful default.
110
151
  - Use `bash` for git, package managers, tests, Docker, CLIs, runtime checks, and artifact commands.
111
- - Use `edit`/`write` for owner-side workflow files, tiny safe fixes, reports, and packaged prompts/templates.
152
+ - Use `edit`/`write` for owner-side workflow files, tiny safe fixes, and reports. Do not edit installed packaged prompt assets; those must always be read fresh and pasted verbatim under the non-negotiable verbatim prompt paste rule at the top of this file.
112
153
  - Use `todowrite` for substantial multi-step owner work when tracking improves reliability.
113
154
  - Use Context7/Exa only when current documentation or external facts are needed.
114
155
 
@@ -140,11 +181,15 @@ All other subagent types are forbidden for owner use unless the user explicitly
140
181
 
141
182
  Use these sequential names as the canonical workflow model. Legacy `P*` names are compatibility aliases only.
142
183
 
184
+ **Session integrity is the highest priority.** Sessions are the primary deliverable — an incomplete or corrupted session dataset invalidates the submission regardless of code quality. Never edit, rename, restructure, rewrite, clean, delete, or fabricate session files. Never perform off-session work. Sessions must progress strictly forward and never return to a closed session.
185
+
143
186
  ### Phase 1: Clarification
144
187
 
145
188
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `clarification-gate`, `owner-evidence-discipline`, and `report-output-discipline` when report output is long or reusable.
146
189
  - Clarify the product contract before design or implementation.
147
- - Run the clarification worker, then the faithfulness review worker.
190
+ - Before clarification workers run, verify task-root `./metadata.json.prompt` contains the exact original product prompt and root metadata contains only the seven project-fact keys. Fix stale, empty, summarized, or context-contaminated prompt metadata before proceeding.
191
+ - Send the `~/slopmachine/clarifier-agent-prompt.md` full body verbatim to a general clarification worker, then send the `~/slopmachine/clarification-faithfulness-review-prompt.md` full body verbatim to a faithfulness review worker. Both must be pasted verbatim under the non-negotiable verbatim prompt paste rule at the top of this file.
192
+ - After the faithfulness review passes, extract the accepted core requirements and clarifications from the artifacts, clean them into an accepted planning brief, and discard rejected/duplicated entries.
148
193
  - Record artifact decisions and acceptance in metadata and Beads.
149
194
  - Exit only when `clarification-gate` is satisfied.
150
195
 
@@ -152,7 +197,8 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
152
197
 
153
198
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `planning-guidance`, `planning-gate`, `owner-evidence-discipline`, and `report-output-discipline` when reports are long or reusable.
154
199
  - Establish or resume the primary developer session and start design/planning.
155
- - Produce and accept the design/API docs, then delegate owner-private `../.ai/plan.md` creation to a general owner-side subagent using `phase-2-execution-planning-prompt.md` and `phase-2-plan-template.md`.
200
+ - Follow the deterministic planning sequence in `planning-guidance` exactly: (1) send original prompt with only "don't write code yet" appended, (2) after acknowledgement send clarifications, (3) after acknowledgement send the design prompt verbatim.
201
+ - Delegate owner-private `../.ai/plan.md` creation to a general owner-side subagent. Read the installed `~/slopmachine/phase-2-execution-planning-prompt.md` and `~/slopmachine/phase-2-plan-template.md` fresh. Paste both bodies verbatim into the subagent message.
156
202
  - Record session and artifact decisions in metadata and Beads.
157
203
  - Exit only when `planning-gate` is satisfied.
158
204
 
@@ -164,24 +210,33 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
164
210
  - Prompt in casual human language using only visible project context.
165
211
  - Use internal planning privately for review and module acceptance.
166
212
  - Do not send more than the current module/slice, or two adjacent tightly coupled slices, in a single developer prompt.
213
+ - **Start the application locally at scaffold acceptance and at every module boundary.** Do not accept a scaffold or module based on test output alone. Verify the app starts, is reachable, and the relevant surface works through at least one real flow. If the app does not start, reject the result and send it back to the developer.
214
+ - **Verify cross-module integration tests exist at each module boundary.** When a new module connects to previously built modules, confirm the developer wrote integration tests proving real data/behavior flow between them. If no cross-module tests exist, send that back as a gap.
167
215
  - Record session turns, issues, verification evidence, and module acceptance in metadata and Beads.
168
216
  - After all modules are complete, ask the same session to check the implementation against the design/API docs and provide startup commands plus expected flows.
217
+ - Exit only when scaffold is accepted, all planned modules are implemented, module-level issues are resolved, the app has been started and verified at every module boundary, cross-module integration tests exist, the final self-check has been requested and any reported gaps fixed, and startup commands have been collected.
169
218
 
170
219
  ### Phase 4: Integrated Verification And Hardening
171
220
 
172
221
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `integrated-verification`, `verification-gates`, `owner-evidence-discipline`, and `report-output-discipline` when notes/reports are long or reusable.
173
222
  - Close normal work in the original development session and establish a new bugfix session.
174
223
  - Run owner-side plan-based review, internal evaluator discovery loop, and local non-Docker verification.
224
+ - For the internal evaluator loop, read the installed `~/slopmachine/backend-evaluation-prompt.md` or `~/slopmachine/frontend-evaluation-prompt.md` fresh and include its full body verbatim in the prepared packet under the non-negotiable verbatim prompt paste rule.
225
+ - **Run all 5 evaluator passes.** Do not skip passes or stop early unless the evaluator produces zero new findings in two consecutive passes. 5 passes is the minimum, not a target.
226
+ - **For web/fullstack projects, run browser verification with agent-browser.** Exercise every README credential, every core user journey, and key prompt requirements. Route browser-found failures to the bugfix lane. Do not close Phase 4 without browser verification for web/fullstack projects.
175
227
  - Send issues to the bugfix session in broad human language.
176
228
  - Record sessions, issue lists, reports, fixes, verification evidence, and closure decisions in metadata and Beads.
229
+ - Exit only when owner plan-based review issues are fixed, all 5 internal evaluator passes have completed, browser verification has run (web/fullstack), local non-Docker verification has passed, and README/runtime/test surfaces are coherent enough for final evaluation.
177
230
 
178
231
  ### Phase 5: Evaluation And Fix Verification
179
232
 
180
233
  - Required skills: `beads-operations`, `developer-session-lifecycle`, `final-evaluation-orchestration`, `evaluation-triage`, `owner-evidence-discipline`, and `report-output-discipline`.
181
234
  - Run two strict audit/remediation cycles using evaluator sessions and the active bugfix lane.
235
+ - In each audit cycle, send the complete installed evaluation prompt asset through the exact saved send packet verbatim. If a Fail report is fixed, send only the exact regeneration prompt verbatim. Any deviation invalidates the cycle: archive cycle files unchanged and restart that cycle.
236
+ - Each audit cycle must close with both a rich 150+ line `./.tmp/audit_report-<N>.md` and `./.tmp/audit_report-<N>-fix_check.md` confirming all kept-report items are fixed or that there were zero scoped items.
182
237
  - Preserve reports, extract complete issue sets, and route fixes in broad human language.
183
238
  - After both audit cycles, close the bugfix lane and start a test-coverage/final-reconciliation lane.
184
- - Complete only when the coverage/README audit passes with at least 90% test score.
239
+ - Exit only when both Audit Cycle 1 and Audit Cycle 2 are complete with kept audit reports and fix-check reports, the bugfix lane is closed, and the coverage/README audit passes with at least 90% test score.
185
240
  - Treat README hard-gate failures, missing true endpoint coverage, missing frontend unit tests for web/fullstack, and missing FE-BE proof as reconciliation work for the active lane before this phase closes.
186
241
 
187
242
  ### Phase 6: Final Readiness Decision
@@ -191,11 +246,12 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
191
246
  - Run final runtime and test checks appropriate to the project.
192
247
  - Run `./repo/run_tests.sh` when present or required by the scaffold contract.
193
248
  - Run `docker compose up --build` for container-supported web/backend/fullstack projects unless explicitly out of scope.
194
- - Use `agent-browser` for browser-accessible apps to exercise the core prompt requirements, main user journeys, and every README-listed demo credential, role/state, seeded value, example ID/status, and documented default. Use API/platform-equivalent checks for non-browser projects.
195
- - If Docker, runtime, browser, or `run_tests.sh` fails, route the failure to the currently active developer session in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
249
+ - Use the installed `agent-browser` skill to exercise browser-accessible apps. Load the skill and use its tools to verify every prompt requirement surface (core flows, all roles, all seeded values), every README-listed credential/role/seeded value, and every core user journey from start to task closure. Test multiple surfaces across several runs and batch all findings into one consolidated issue list before sending to the developer lane — do not route issues surface by surface.
250
+ - If Docker, runtime, browser, or `run_tests.sh` fails, route consolidated issues to the currently active developer session in broad human language, verify the fix, rerun the failed check, and repeat until green or explicitly risk-accepted by the user.
196
251
  - Route final reconciliation work to the active developer session whenever it is more than a tiny, safe owner-side edit. If the owner makes a minor direct safe fix, send a minimal note to the active developer session describing the changed surface and ask it to inspect/acknowledge before continuing.
197
252
  - Use platform-equivalent checks for Android, iOS, desktop, or other native projects.
198
253
  - Do not pass readiness with unresolved blocker/high findings, unverified runtime claims, README drift, or known fake behavior.
254
+ - Exit only when all D1-D9 readiness categories are pass/not-applicable/risk-accepted, runtime/test/browser checks pass, and no unresolved blocker/high findings remain.
199
255
 
200
256
  ### Phase 7: Submission Packaging
201
257
 
@@ -206,6 +262,7 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
206
262
  - Do not package workflow-private `../.ai`, `../.beads`, hidden session state, owner plans, raw evaluator workspaces, or task-root rulebooks unless the packaging spec explicitly requires them.
207
263
  - Run final package boundary checks before closing.
208
264
  - If packaging, cleanup, README edits, config, or seed/runtime changes could affect documented behavior, rerun the affected Docker/runtime, `run_tests.sh`, and browser/API seeded-value checks before closing.
265
+ - Exit only when `submission-packaging` closure standard is satisfied: final package structure matches allowlist, README/lint/runtime/test/scripts/docs/audit artifacts are consistent, stale artifacts absent, session exports complete, and exact verification commands/results recorded.
209
266
 
210
267
  ### Phase 8: Retrospective
211
268
 
@@ -214,12 +271,14 @@ Use these sequential names as the canonical workflow model. Legacy `P*` names ar
214
271
  - Separate workflow issues from product implementation issues.
215
272
  - Capture what failed, what worked, what should change next run, and which issues are systemic.
216
273
  - Preserve evidence without rewriting delivery history.
274
+ - Exit only when retrospective is written, all mandatory evidence sources reviewed, and no real packaging/delivery defect remains open.
217
275
 
218
276
  ## Runtime And Quality Standards
219
277
 
220
278
  - `./repo/run_tests.sh` is the broad product verification wrapper when present or required.
221
- - Unit tests belong under `unit_tests/` where that convention exists.
222
- - API/integration HTTP tests belong under `API_tests/` where that convention exists.
279
+ - **`./repo/run_tests.sh` must always run through Docker** (dockerized). The owner defers all Dockerized tests and Docker builds to Phase 6/7 — never run them during earlier phases.
280
+ - Unit tests must live under `unit_tests/`.
281
+ - API/integration HTTP tests must live under `API_tests/`.
223
282
  - Fullstack/backend-backed frontend work must prove real frontend-to-backend behavior through user-visible flows unless accepted design explicitly marks a capability internal/API-only.
224
283
  - Security, authorization, ownership, isolation, validation, error handling, logging, config, seeded data, and README claims must align with delivered behavior.
225
284
  - README must truthfully document project type near the top, startup, tests, configuration, access, demo credentials and all roles or `No authentication required`, seeded data or `No seeded data required; the app is useful from an empty state.`, mock/local/debug boundaries, and known limitations.
@@ -41,7 +41,11 @@ All communication, code comments, docs, tests, and user-facing strings you add m
41
41
 
42
42
  - Tests must prove behavior and side effects, not only existence or rendering.
43
43
  - Add or update tests for every implementation change. Target full meaningful coverage of delivered behavior, not just a smoke path.
44
- - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for user-facing flows.
44
+ - Cover implementation at the strongest relevant layers: unit tests for business logic, API/integration HTTP tests for every endpoint or interface, and E2E/platform tests for every user-facing requirement. E2E tests must exercise real application behavior end to end and verify business outcomes — state changes, data persistence, authorization enforcement, task closure — not just confirm pages render. An E2E test that only checks a page loads without asserting what actually happened is decorative and incomplete.
45
+ - Tests placed in `unit_tests/` and `API_tests/` must be directly runnable from those directories. They must not be build-tag-gated evidence copies, compile-time-only files, or infrastructure checks that only verify file counts, builds, or presence. Every test in these directories must exercise and verify specific business behavior.
46
+ - API tests must assert exact expected state transitions, status codes, response bodies, and side effects — not permissive "accept any valid response" checks. A test that accepts multiple valid-ish outcomes without verifying the specific expected result is insufficient.
47
+ - Frontend tests that hit real backend paths must use the actual API client and real handler/service/data execution. Do not mock API boundaries when FE-BE integration behavior is part of the requirement. Mocking is acceptable only for truly external dependencies, not for the project's own backend.
48
+ - Unit tests should also have strict assertions: verify exact expected state, not approximate or lenient checks.
45
49
  - API/integration tests should exercise the real route/interface and business logic without mocking the transport, controller, or execution-path services unless there is a documented reason this is not possible.
46
50
  - Frontend unit/component tests should be directly detectable and should import or render the real frontend components/modules they cover.
47
51
  - Cover negative and boundary paths when relevant: unauthenticated, unauthorized, not found, conflicts, invalid input, empty states, duplicate actions, object ownership, and sensitive data exposure.
@@ -19,10 +19,10 @@ Phase 1 has exactly two worker passes:
19
19
  - First, send the original prompt plus stack/context information to one general clarification worker. That worker generates `./docs/questions.md` and `../.ai/requirements-breakdown.md`.
20
20
  - Second, send the original prompt plus those generated artifacts to one faithfulness review worker. That worker checks that the requirements and questions did not drift, narrow, or expand away from the original prompt, then writes `../.ai/clarification-faithfulness-review.md`.
21
21
 
22
- Clarification should:
23
- - start from the original prompt plus supporting stack/context notes
24
- - run one bounded general clarification worker using the packaged clarifier-agent-prompt verbatim
25
- - copy that full clarifier prompt text into the worker message itself rather than telling the worker to open the file
22
+ Clarification should:
23
+ - start from the original prompt plus supporting stack/context notes
24
+ - run one bounded general clarification worker using the packaged clarifier-agent-prompt verbatim—paste its complete body into the worker message, do not describe or summarize
25
+ - copy that full clarifier prompt text into the worker message itself rather than telling the worker to open the file
26
26
  - require the worker to output both `./docs/questions.md` and `../.ai/requirements-breakdown.md`
27
27
  - treat those 2 files as the clarification artifacts planning depends on
28
28
  - extract an approved requirements-and-clarification package from `../.ai/requirements-breakdown.md` plus `./docs/questions.md` before Phase 2
@@ -37,39 +37,74 @@ It must not become planning, architecture design, execution planning, or conveni
37
37
 
38
38
  Do not pad `./docs/questions.md` with AI-inferred missing requirements, speculative feature ideas, generic best-practice questions, or implementation-task prompts. It should contain only genuine business-logic ambiguities, data relationship uncertainties, boundary conditions, contradictions, and accepted resolutions from the original prompt.
39
39
 
40
+ ## Verbatim Prompt Paste Rule
41
+
42
+ Phase 1 must follow the owner-level non-negotiable verbatim prompt paste rule defined in the owner agent (`slopmachine.md` or `slopmachine-claude.md`). That rule requires: read the installed `.md` file fresh with a `read` tool call, then paste its **complete body verbatim** into the subagent message. Do not summarize, describe, shorten, paraphrase, add preface/footer, or send a file path reference.
43
+
44
+ The packaged prompt files for Phase 1 are:
45
+ - `~/slopmachine/clarifier-agent-prompt.md` — first worker
46
+ - `~/slopmachine/clarification-faithfulness-review-prompt.md` — faithfulness review worker
47
+
48
+ ## Root Metadata Gate
49
+
50
+ ## Root Metadata Gate
51
+
52
+ Before any clarification worker runs, the owner must verify task-root `./metadata.json` is populated with the exact original product prompt.
53
+
54
+ Rules:
55
+ - `./metadata.json` is the product metadata file that ships with the task. It must keep only the seven project-fact keys: `prompt`, `project_type`, `frontend_language`, `backend_language`, `database`, `frontend_framework`, and `backend_framework`.
56
+ - `prompt` must contain the original product prompt, exactly enough to anchor design, evaluation, packaging, and session lineage.
57
+ - If the user's intake text contains a prompt block followed by appended stack/context/operator notes, keep only the product prompt in `./metadata.json.prompt` and record the supporting context under `../.ai/startup-context.md` or another owner-private workflow artifact.
58
+ - If `./metadata.json.prompt` is empty, stale, summarized, or mixed with non-prompt operator context, fix it before launching the clarifier.
59
+ - Do not add accepted clarifications, requirements breakdowns, workflow state, phase state, Beads ids, session ids, evaluator paths, or owner-private notes as extra keys in root `./metadata.json`.
60
+ - Accepted clarifications belong in `./docs/questions.md`, `../.ai/requirements-breakdown.md`, the approved clarification package summary in `../.ai/metadata.json`, and Beads comments.
61
+ - `project_type` must be exactly one of the six accepted values: `backend`, `fullstack`, `web`, `android`, `ios`, or `desktop`. Do not use `api`, `spa`, `cli`, `nextjs`, `nuxt`, or any other variant. If the project type becomes clear during clarification, update `project_type` in root metadata. If it is not yet clear, leave it as an empty string until Phase 2 design confirms it.
62
+ - If project type or stack facts become clear during clarification, update only the existing seven project-fact fields in `./metadata.json`; leave unknown fields as empty strings until truthfully known.
63
+
64
+ Phase 1 cannot close if root `./metadata.json.prompt` is missing, stale, or contaminated with non-product workflow/operator text.
65
+
40
66
  ## Procedure
41
67
 
42
68
  1. **Confirm the inputs.**
43
69
  - Keep the original product prompt as the source of truth.
44
70
  - Treat supporting stack/context as supporting information unless it materially changes the product contract.
71
+ - Verify and, if needed, correct `./metadata.json.prompt` before launching the clarification worker.
72
+ - Record any metadata correction in `../.ai/metadata.json` and Beads without exposing workflow metadata to implementation sessions.
45
73
 
46
- 2. **Run the general clarification worker.**
47
- - Use the packaged `clarifier-agent-prompt.md` verbatim.
48
- - Copy the full packaged prompt body into the sent worker message.
49
- - Inject only the original prompt and supporting stack/context notes into that packet; do not prepend or append a second owner-written clarification contract, and do not tell the worker to read the packaged file itself.
74
+ 2. **Run the general clarification worker.**
75
+ - Read the installed `~/slopmachine/clarifier-agent-prompt.md` file fresh from its asset path using a `read` tool call.
76
+ - Paste that file's **complete body verbatim** into the sent worker message under the non-negotiable verbatim paste rule.
77
+ - After the packaged prompt body, inject only the original prompt and supporting stack/context notes; do not prepend or append a second owner-written clarification contract, and do not tell the worker to read the packaged file itself.
50
78
  - Require both `./docs/questions.md` and `../.ai/requirements-breakdown.md` as output.
51
79
  - After the worker returns, record both artifact paths in `../.ai/metadata.json` and add a Beads `ARTIFACT:` comment.
52
80
 
53
- 3. **Review `questions.md` and `../.ai/requirements-breakdown.md` critically.**
54
- - It should extract the core requirements from the prompt explicitly.
55
- - It should use evaluation-grade extraction depth: business goal, main flows, actors, required surfaces, modules, APIs/jobs/data, security boundaries, mock/fake boundaries, documentation/static-verifiability expectations, test/coverage expectations, frontend state obligations, and FE-BE wiring expectations when applicable.
56
- - Those requirements should be defined in enough depth that design and planning can rely on them directly.
57
- - It should explain what later planning could miss if each important requirement is not carried forward explicitly.
58
- - It should distinguish between explicit prompt requirements, implied but binding requirements, and locked safe defaults where that separation helps later planning.
81
+ 3. **Review `questions.md` and `../.ai/requirements-breakdown.md` critically.**
82
+ - `./docs/questions.md` must use the exact format defined in `clarifier-agent-prompt.md`:
83
+ - Level-1 heading `# Questions`
84
+ - Each entry starts with `### <number>. <title>` (e.g. `### 1. User roles`)
85
+ - Each entry has exactly three fields: `- Question:`, `- My Understanding:`, `- Solution:`
86
+ - No requirement IDs, traceability fields, priority fields, or evaluator-risk metadata in `questions.md`
87
+ - Reject `questions.md` if the format deviates. Patch only trivial formatting issues.
88
+ - It must extract the core requirements from the prompt explicitly.
89
+ - It must use evaluation-grade extraction depth: business goal, main flows, actors, required surfaces, modules, APIs/jobs/data, security boundaries, mock/fake boundaries, documentation/static-verifiability expectations, test/coverage expectations, frontend state obligations, and FE-BE wiring expectations when applicable.
90
+ - Those requirements must be defined in enough depth that design and planning can rely on them directly.
91
+ - It must explain what later planning could miss if each important requirement is not carried forward explicitly.
92
+ - It must distinguish between explicit prompt requirements, implied but binding requirements, and locked safe defaults where that separation helps later planning.
59
93
  - It must end with a planning-miss checklist strong enough to expose details later design/planning commonly underbuild.
60
- - It should explicitly cover hidden environment and trust-boundary assumptions when the prompt mentions or implies on-prem, intranet, offline, LAN, browser access, auth cookies/tokens, local storage, self-contained deployment, external reachability, or secure/insecure transport.
61
- - It should cover material ambiguity only.
62
- - It should preserve prompt faithfulness and avoid convenience narrowing.
94
+ - It must explicitly cover hidden environment and trust-boundary assumptions when the prompt mentions or implies on-prem, intranet, offline, LAN, browser access, auth cookies/tokens, local storage, self-contained deployment, external reachability, or secure/insecure transport.
95
+ - It must cover material ambiguity only.
96
+ - It must preserve prompt faithfulness and avoid convenience narrowing.
63
97
  - Each entry must end with a decisive solution.
64
- - It should not leak into planning or implementation structure.
98
+ - It must not leak into planning or implementation structure.
65
99
  - Reread it once against the original prompt and reject any degradation of implied scope, enforcement, workflow closure, operator/admin behavior, or core requirement meaning.
66
100
  - If the file is materially sound and only small wording, ordering, duplication, or overreach cleanup remains, patch `questions.md` directly instead of rerunning the clarifier.
67
101
  - If the file is materially weak, convenience-shaped, or still ambiguous, rerun clarification before leaving Phase 1.
68
102
 
69
- 4. **Run prompt-faithfulness review.**
103
+ 4. **Run prompt-faithfulness review.**
70
104
  - Launch one short-lived faithfulness review worker.
71
105
  - Send the original prompt, the supporting stack/context notes, `../.ai/requirements-breakdown.md`, and `./docs/questions.md` together.
72
- - Use the packaged `clarification-faithfulness-review-prompt.md` body in that message.
106
+ - Read the installed `~/slopmachine/clarification-faithfulness-review-prompt.md` file fresh from its asset path.
107
+ - Paste that file's **complete body verbatim** as the review instruction under the non-negotiable verbatim paste rule.
73
108
  - Require it to write `../.ai/clarification-faithfulness-review.md`.
74
109
  - After the review returns, record the review path and verdict in `../.ai/metadata.json` and add a Beads `ARTIFACT:` or `VERIFY:` comment.
75
110
  - If the review finds only small owner-fixable wording or coverage issues, patch `../.ai/requirements-breakdown.md` and `./docs/questions.md` directly.
@@ -106,6 +141,7 @@ Reject the clarification result if:
106
141
  ## Exit Condition
107
142
 
108
143
  `Phase 1: Clarification` is complete only when:
144
+ - `./metadata.json.prompt` contains the exact original product prompt and root metadata contains only the seven project-fact keys
109
145
  - `./docs/questions.md` exists
110
146
  - `../.ai/requirements-breakdown.md` exists
111
147
  - `../.ai/clarification-faithfulness-review.md` exists
@@ -15,6 +15,10 @@ The owner must use Claude only through the packaged live scripts for product imp
15
15
 
16
16
  ## Lane Policy
17
17
 
18
+ - Sessions are the primary deliverable. An incomplete or corrupted Claude session dataset invalidates the submission. Preserve every session file intact — never edit, rename, restructure, clean, delete, or fabricate them.
19
+ - Sessions must progress strictly forward. The lifecycle is: `develop-1` → close → `bugfix-1` → close → `test-coverage-1` → close. Never return to a closed session.
20
+ - If a lane's session becomes genuinely unrecoverable (crash with no salvageable `sid` — even after attempting tmux relaunch with the known `sid` — and transcript/session lookup also fails), start a new session in the same lane with a sequential number (`develop-2`). Sessions remain sequential and a clear timeline can be established. This is the only exception to one-session-per-lane. Paused, rate-limited, or waiting states are not unrecoverable — stay in the same session.
21
+ - A paused session is not an invitation to launch a new one. Rate limits, slow turns, shell timeouts, tmux interruptions, and recovery conditions always stay in the same lane. Only launch a new session if recovery is absolutely impossible.
18
22
  - Exactly one Claude implementation lane is active at a time. The active lane must correspond to the current phase purpose and be named in `../.ai/metadata.json` before any launch, resume, status check, or turn.
19
23
  - Every Claude session ever used must be registered in `../.ai/metadata.json` and Beads with lane name, `sid`, runtime directory, state/result files, current status, and purpose. Unregistered Claude turns are not allowed.
20
24
  - Default development lane: `develop-1`.
@@ -39,17 +43,23 @@ Claude-facing messages should be short and natural. Write like a friendly lead e
39
43
  Use wording like:
40
44
 
41
45
  ```text
42
- Here is the product brief. We're planning first, so don't write any code yet. Read this and be ready to help turn it into the design doc.
46
+ <original product prompt from metadata.json>
43
47
 
44
- <original prompt verbatim>
48
+ Don't write code yet — we'll plan this first.
45
49
  ```
46
50
 
47
- Then later:
51
+ That is the entire first message. No introduction, no context, no clarifications. Then wait for acknowledgement.
48
52
 
53
+ After acknowledgement, send:
49
54
  ```text
50
- Use the accepted clarifications below to create docs/design.md from the design template. Keep this as a design document, not an implementation checklist. If an API contract is needed, note that so we can fill docs/api-spec.md next.
55
+ Here are some clarifications I made:
56
+ <accepted clarifications and requirements>
51
57
  ```
52
58
 
59
+ Wait for acknowledgement before sending the design prompt in the next step.
60
+
61
+ Then send the design prompt with its opening adjusted (see `planning-guidance` Step 3) to reference the already-provided prompt.
62
+
53
63
  When the work has independent parts, include a natural reminder such as:
54
64
 
55
65
  ```text