theslopmachine 0.7.1 → 0.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -1
- package/assets/agents/developer.md +13 -13
- package/assets/agents/slopmachine-claude.md +2 -1
- package/assets/agents/slopmachine.md +2 -1
- package/assets/claude/agents/developer.md +6 -6
- package/assets/skills/clarification-gate/SKILL.md +9 -18
- package/assets/skills/claude-worker-management/SKILL.md +24 -22
- package/assets/skills/development-guidance/SKILL.md +4 -0
- package/assets/skills/final-evaluation-orchestration/SKILL.md +1 -0
- package/assets/skills/hardening-gate/SKILL.md +4 -0
- package/assets/skills/integrated-verification/SKILL.md +2 -0
- package/assets/skills/planning-guidance/SKILL.md +2 -0
- package/assets/skills/scaffold-guidance/SKILL.md +1 -0
- package/assets/skills/submission-packaging/SKILL.md +2 -0
- package/assets/skills/verification-gates/SKILL.md +6 -0
- package/assets/slopmachine/scaffold-playbooks/docker-shared-contract.md +4 -0
- package/assets/slopmachine/templates/AGENTS.md +1 -0
- package/assets/slopmachine/templates/CLAUDE.md +1 -0
- package/package.json +1 -1
- package/src/init.js +7 -1
package/README.md
CHANGED
|
@@ -57,23 +57,23 @@ Do not introduce convenience-based simplifications, `v1` reductions, future-phas
|
|
|
57
57
|
|
|
58
58
|
- the original prompt explicitly allows it
|
|
59
59
|
- the approved clarification explicitly allows it
|
|
60
|
-
- the
|
|
60
|
+
- the project lead explicitly instructs it in the current session
|
|
61
61
|
|
|
62
62
|
If a simplification would make implementation easier but is not explicitly authorized, keep the full prompt scope and plan the real complexity instead.
|
|
63
63
|
|
|
64
64
|
When accepted planning artifacts already exist, treat them as the primary execution contract.
|
|
65
65
|
|
|
66
66
|
- read the relevant accepted plan section before implementing the next slice
|
|
67
|
-
- do not wait for the
|
|
68
|
-
- treat
|
|
67
|
+
- do not wait for the project lead to restate what is already in the plan
|
|
68
|
+
- treat project-lead follow-up prompts mainly as narrow deltas, guardrails, or correction signals
|
|
69
69
|
|
|
70
|
-
When the
|
|
70
|
+
When the project lead asks for planning without coding yet:
|
|
71
71
|
|
|
72
72
|
- produce an exhaustive, section-addressable implementation plan rather than a high-level summary
|
|
73
73
|
- prefer writing almost all important implementation decisions down now instead of deferring them to coding time
|
|
74
74
|
- make unresolved items rare, narrow, and explicit
|
|
75
|
-
- if the
|
|
76
|
-
- when the
|
|
75
|
+
- if the project lead asks you to write planning artifacts, fill them densely enough that later implementation can mostly execute by following the plan rather than inventing new structure
|
|
76
|
+
- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
|
|
77
77
|
|
|
78
78
|
## Execution Model
|
|
79
79
|
|
|
@@ -91,7 +91,7 @@ When the owner asks for planning without coding yet:
|
|
|
91
91
|
- keep the repo self-sufficient and statically reviewable through code plus `README.md`; do not rely on runtime success alone to make the project understandable
|
|
92
92
|
- keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
|
|
93
93
|
- do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
|
|
94
|
-
- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the
|
|
94
|
+
- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming the project lead will catch inconsistencies later
|
|
95
95
|
- keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
|
|
96
96
|
- for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
|
|
97
97
|
- for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
|
|
@@ -170,15 +170,15 @@ Before reporting work as ready, run this preflight yourself:
|
|
|
170
170
|
- consistency: do code, docs, route contracts, security notes, and runtime/test commands agree?
|
|
171
171
|
- flow completeness: are the user-facing and operator-facing flows touched by this work actually covered end to end?
|
|
172
172
|
- security and permissions: are auth, RBAC, object-level checks, sensitive actions, and audit implications handled where relevant?
|
|
173
|
-
- verification: did you run the strongest targeted checks that are appropriate without using
|
|
174
|
-
- reviewability: can the
|
|
175
|
-
- test-coverage specificity: if the
|
|
173
|
+
- verification: did you run the strongest targeted checks that are appropriate without using lead-only broad gates?
|
|
174
|
+
- reviewability: can the project lead review this work by reading the changed files and a small number of directly related files?
|
|
175
|
+
- test-coverage specificity: if the project lead asked you to help shape coverage evidence, does it map concrete requirement/risk points to planned test files, key assertions, coverage status, and real remaining gaps rather than generic categories?
|
|
176
176
|
|
|
177
177
|
If any answer is no, fix it before replying or call out the blocker explicitly.
|
|
178
178
|
|
|
179
179
|
When you make an assumption, keep it prompt-preserving by default. If an assumption would reduce scope, mark it as unresolved instead of silently locking it in.
|
|
180
180
|
|
|
181
|
-
If the
|
|
181
|
+
If the project lead asks you to help shape test-coverage evidence, make it acceptance-grade on first pass:
|
|
182
182
|
|
|
183
183
|
- one explicit row or subsection per requirement/risk cluster
|
|
184
184
|
- planned test file or test layer named concretely
|
|
@@ -207,9 +207,9 @@ Default reply shape for ordinary slice completion, hardening, and fix responses:
|
|
|
207
207
|
3. exact verification commands and results
|
|
208
208
|
4. real unresolved issues only
|
|
209
209
|
|
|
210
|
-
Keep the reply compact. Point to the exact changed files and the narrow supporting files the
|
|
210
|
+
Keep the reply compact. Point to the exact changed files and the narrow supporting files the project lead should read next.
|
|
211
211
|
|
|
212
|
-
Use the larger reply shape only when the
|
|
212
|
+
Use the larger reply shape only when the project lead explicitly asks for a deeper mapping or when you are delivering a first-pass planning/scaffold artifact that genuinely needs it:
|
|
213
213
|
|
|
214
214
|
1. `Changed files` — exact files changed
|
|
215
215
|
2. `What changed` — the concrete behavior/contract updates in those files
|
|
@@ -237,7 +237,7 @@ When the first develop developer session begins in `P2`, start it in this exact
|
|
|
237
237
|
2. send the original prompt and a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction
|
|
238
238
|
3. capture and persist the Claude session id returned through bridge state
|
|
239
239
|
4. form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
|
|
240
|
-
5. send a compact second
|
|
240
|
+
5. send a compact second planning-direction message through that same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, your initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
|
|
241
241
|
6. continue with planning from there in that same Claude session
|
|
242
242
|
|
|
243
243
|
Do not reorder that sequence.
|
|
@@ -347,6 +347,7 @@ When talking to the Claude developer worker:
|
|
|
347
347
|
- when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
|
|
348
348
|
- when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
|
|
349
349
|
- during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
|
|
350
|
+
- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
|
|
350
351
|
- use the canonical prompt-shape discipline from `claude-worker-management`: every substantive turn should make the current boundary, expected outcomes, required evidence, disallowed shortcuts, and stop boundary unmistakable
|
|
351
352
|
- default to one bounded engineering objective per Claude turn; split cross-boundary work into separate turns instead of hoping Claude infers the boundary correctly
|
|
352
353
|
- never use bare continuation prompts such as `continue`, `next`, `keep going`, or `fix it` when the turn materially changes what acceptance depends on
|
|
@@ -222,7 +222,7 @@ When the first develop developer session begins in `P2`, use this planning hands
|
|
|
222
222
|
1. send the original prompt and tell the developer to read it carefully, not plan yet, and wait for clarifications and planning direction
|
|
223
223
|
2. wait for the developer's first reply
|
|
224
224
|
3. before the second message, form your own initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
|
|
225
|
-
4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second
|
|
225
|
+
4. send the approved clarification content, your initial planning view, and the explicit plain-language planning brief as the second planning-direction message in that same session; that brief should summarize the prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky areas that planning must resolve
|
|
226
226
|
5. only then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions, with the planning artifacts filled densely enough that later implementation mostly follows the accepted plan instead of inventing new structure
|
|
227
227
|
6. continue with planning from there
|
|
228
228
|
|
|
@@ -338,6 +338,7 @@ When talking to the developer:
|
|
|
338
338
|
- when backend or fullstack APIs are relevant, explicitly require progress on endpoint inventory, true no-mock HTTP coverage for important `METHOD + PATH` surfaces, and honest classification of mocked or indirect tests
|
|
339
339
|
- when README compliance is relevant, explicitly require the strict audit sections: project type, startup instructions, access method, verification method, and demo credentials or the exact statement `No authentication required`
|
|
340
340
|
- during ordinary development you may allow fast local iteration, but before development closes and before hardening closes require cleanup of local-only setup traces so the delivered runtime and broad test contract is Docker-contained and reviewable
|
|
341
|
+
- speak to the developer like a human project manager or technical lead who cares about the project outcome; do not sound like workflow software or an orchestration relay
|
|
341
342
|
- do not re-dump the entire plan, but do enumerate the exact subset of plan-backed outcomes that must now be delivered
|
|
342
343
|
- when the next slice is already described in the accepted plan, tell the developer to use the relevant accepted plan section and only add the narrow delta, guardrail, or review concern for that slice
|
|
343
344
|
- when 2 or 3 independent items can move at once, explicitly authorize parallel execution and name the separate branch contracts instead of serializing them into one vague request
|
|
@@ -50,14 +50,14 @@ Do not narrow scope for convenience.
|
|
|
50
50
|
- if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
|
|
51
51
|
- update `README.md` when behavior or run/test instructions change
|
|
52
52
|
- do not touch workflow or rulebook files such as `CLAUDE.md` unless explicitly asked
|
|
53
|
-
- when the
|
|
53
|
+
- when the project lead says to plan without coding yet, produce planning artifacts and stop
|
|
54
54
|
- when planning, produce an exhaustive, section-addressable implementation plan rather than a high-level summary
|
|
55
55
|
- prefer writing almost all important implementation decisions down now instead of deferring them to coding time
|
|
56
56
|
- make unresolved items rare, narrow, and explicit
|
|
57
|
-
- when the
|
|
58
|
-
- planning-only deliverables inside the repo should be limited to `README.md` unless the
|
|
59
|
-
- when the
|
|
60
|
-
- do not continue into extra follow-on work that the
|
|
57
|
+
- when the project lead asks for planning artifacts, prefer putting the real planning depth into the requested planning files rather than leaving the important detail only in chat
|
|
58
|
+
- planning-only deliverables inside the repo should be limited to `README.md` unless the project lead explicitly asks for another in-repo artifact
|
|
59
|
+
- when the project lead says to finish the scaffold and not start feature implementation yet, stop before starting development work
|
|
60
|
+
- do not continue into extra follow-on work that the project lead did not ask for
|
|
61
61
|
- keep `README.md` compatible with the strict audit contract as the project matures: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
|
|
62
62
|
- for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
|
|
63
63
|
- for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
|
|
@@ -121,7 +121,7 @@ Selected-stack defaults:
|
|
|
121
121
|
- be direct and technically clear
|
|
122
122
|
- report what changed, what was verified, and what still looks weak
|
|
123
123
|
- if a problem needs a real fix, fix it instead of explaining around it
|
|
124
|
-
- when the
|
|
124
|
+
- when the project lead asks for a bounded deliverable, end with a concise summary of what was completed and what remains
|
|
125
125
|
- when you write or update files, end with:
|
|
126
126
|
- `FILES_CHANGED:` followed by the exact repo-local file paths changed
|
|
127
127
|
- `NEXT_STEP:` followed by the next concrete engineering step or remaining blocker when useful
|
|
@@ -133,12 +133,12 @@ Its primary target is requirements ambiguity from the original prompt.
|
|
|
133
133
|
|
|
134
134
|
Prefer questions about missing or unclear product behavior, actor expectations, workflow requirements, business rules, scope boundaries, output expectations, and other prompt-level ambiguities.
|
|
135
135
|
|
|
136
|
-
Each entry should
|
|
136
|
+
Each entry should use this exact structure:
|
|
137
137
|
|
|
138
|
-
1.
|
|
139
|
-
2.
|
|
140
|
-
3.
|
|
141
|
-
4.
|
|
138
|
+
1. a numbered clarification heading
|
|
139
|
+
2. `Question:`
|
|
140
|
+
3. `My Understanding:`
|
|
141
|
+
4. `Solution:`
|
|
142
142
|
|
|
143
143
|
Keep the file narrow and explicit.
|
|
144
144
|
|
|
@@ -156,19 +156,10 @@ Do not use `questions.md` for:
|
|
|
156
156
|
Preferred entry shape:
|
|
157
157
|
|
|
158
158
|
```md
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
<the
|
|
163
|
-
|
|
164
|
-
### Interpretation
|
|
165
|
-
<how it was interpreted>
|
|
166
|
-
|
|
167
|
-
### Decision
|
|
168
|
-
<the chosen resolution or safe default>
|
|
169
|
-
|
|
170
|
-
### Why this is reasonable
|
|
171
|
-
<brief justification tied to prompt faithfulness>
|
|
159
|
+
### 1. <short clarification title>
|
|
160
|
+
- Question: <the exact ambiguity or missing detail>
|
|
161
|
+
- My Understanding: <how it was interpreted and why this needed to be locked>
|
|
162
|
+
- Solution: <the chosen resolution or safe default>
|
|
172
163
|
```
|
|
173
164
|
|
|
174
165
|
If nothing material was unclear, still create `questions.md` and keep it minimal rather than inventing content.
|
|
@@ -20,7 +20,7 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
|
|
|
20
20
|
- do not use the OpenCode `developer` subagent for implementation work in the `slopmachine-claude` path
|
|
21
21
|
- do not read Claude transcript files as the normal communication channel
|
|
22
22
|
- communicate with the Claude worker through the packaged live bridge scripts in `~/slopmachine/utils/`
|
|
23
|
-
- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each
|
|
23
|
+
- use `claude_live_launch.mjs` once per lane and `claude_live_turn.mjs` for each message into that lane
|
|
24
24
|
- set the Claude live runtime settings default `agent` to `developer` so the lane stays on the intended system prompt even if the session is resumed or inspected through Claude-native controls
|
|
25
25
|
- treat bridge `state.json` as the durable control-plane truth for lane status, routing, and Claude session identity
|
|
26
26
|
- treat bridge `result.json` as the semantic source of truth after each completed turn
|
|
@@ -32,9 +32,9 @@ Use this skill whenever `slopmachine-claude` needs to launch, inspect, or messag
|
|
|
32
32
|
- launch the live lane with `--dangerously-skip-permissions` so the worker does not stall on routine file-edit permission prompts inside the bounded repo
|
|
33
33
|
- when Claude uses internal task fan-out and the environment allows explicit agent selection, prefer the installed `developer` agent for implementation-capable branches so the same engineering standard applies across those branches
|
|
34
34
|
- there is no repo-controlled guarantee that every Claude helper subagent globally reuses the `developer` prompt, so keep critical implementation in the main developer lane or in explicitly developer-scoped helper branches rather than relying on unspecified built-in helper behavior
|
|
35
|
-
- make every
|
|
36
|
-
- do not send vague
|
|
37
|
-
- each substantive
|
|
35
|
+
- make every project-manager-to-Claude turn boundary-controlled, reviewable, and explicit about what must happen now versus later
|
|
36
|
+
- do not send vague prompts such as `continue`, `keep going`, `handle the rest`, or `fix it` without a precise bounded contract
|
|
37
|
+
- each substantive message should state the current engineering boundary, exact expected outcomes for that turn, the evidence required back, the important shortcuts that are not acceptable, and the stopping point
|
|
38
38
|
- default to one bounded engineering objective per owner turn; if a request would naturally cross planning, scaffold, development, or gate-review boundaries, split it into separate turns
|
|
39
39
|
|
|
40
40
|
## Lane launch rule
|
|
@@ -82,7 +82,7 @@ For all later turns in the same bounded developer slot:
|
|
|
82
82
|
printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-dir <dir> --timeout-ms <turn-timeout>
|
|
83
83
|
```
|
|
84
84
|
|
|
85
|
-
- inject exactly one
|
|
85
|
+
- inject exactly one message at a time into the idle live lane
|
|
86
86
|
- pass the prompt directly to the wrapper through stdin as the primary input path instead of requiring an owner-side prompt file
|
|
87
87
|
- wait for `Stop` or `StopFailure` before sending the next message
|
|
88
88
|
- do not bypass the bridge by calling the channel HTTP endpoint directly from owner logic
|
|
@@ -90,7 +90,7 @@ printf '%s' "$PROMPT" | node ~/slopmachine/utils/claude_live_turn.mjs --runtime-
|
|
|
90
90
|
|
|
91
91
|
## Turn-preflight checklist
|
|
92
92
|
|
|
93
|
-
Before sending any
|
|
93
|
+
Before sending any message into the live lane:
|
|
94
94
|
|
|
95
95
|
1. read bridge `state.json` and confirm the lane is the intended lane and currently `idle`
|
|
96
96
|
2. read the latest bridge `result.json` when it exists and review the last normalized Claude answer before composing the next turn
|
|
@@ -99,12 +99,12 @@ Before sending any owner message into the live lane:
|
|
|
99
99
|
5. define the turn contract before writing the prompt: what Claude must produce now, what evidence it must return now, and exactly where it must stop
|
|
100
100
|
|
|
101
101
|
If the stop boundary is fuzzy, the turn is too broad.
|
|
102
|
-
If the
|
|
102
|
+
If the message would span multiple major boundaries, split it.
|
|
103
103
|
Do not send the next turn until the prior turn has been reviewed and either accepted, corrected, or explicitly rerouted.
|
|
104
104
|
|
|
105
|
-
## Canonical
|
|
105
|
+
## Canonical lead-message contract
|
|
106
106
|
|
|
107
|
-
For substantive live-lane turns, write the
|
|
107
|
+
For substantive live-lane turns, write the message in natural engineering language but make sure it includes all of these ingredients:
|
|
108
108
|
|
|
109
109
|
- `Context snapshot`: the current accepted state and only the fresh deltas that matter now
|
|
110
110
|
- `Contract anchor`: the relevant accepted plan sections, clarified decisions, or concrete evaluator findings that define the work
|
|
@@ -122,16 +122,18 @@ When the turn intentionally uses internal parallel fan-out, also include:
|
|
|
122
122
|
- `Fan-in rule`: how Claude should merge the branch results and what integrated verification must run before stopping
|
|
123
123
|
|
|
124
124
|
Keep the wording natural. Do not turn every prompt into a rigid template dump.
|
|
125
|
+
The actual message should read like it came from a human project manager or technical lead who is invested in the project, not from workflow software.
|
|
126
|
+
Do not use obvious automation phrasing such as `owner`, `workflow`, `phase`, `session slot`, `contract anchor`, or `reply contract` in the message sent to Claude unless the user explicitly wants that style.
|
|
125
127
|
But do make the contract mechanically obvious enough that Claude cannot plausibly misunderstand what acceptance depends on.
|
|
126
128
|
|
|
127
129
|
## Canonical prompt shapes
|
|
128
130
|
|
|
129
131
|
### Planning-start shape
|
|
130
132
|
|
|
131
|
-
For the second
|
|
133
|
+
For the second planning-direction message in the first `develop` lane and for other explicit planning-entry turns:
|
|
132
134
|
|
|
133
135
|
- inline the approved clarification content and requirements-ambiguity resolutions directly in the message
|
|
134
|
-
- include the
|
|
136
|
+
- include the initial planning view so Claude refines a direction instead of inventing one from zero
|
|
135
137
|
- restate prompt-critical requirements, actors, required surfaces, locked defaults, explicit non-goals, and risky areas in plain engineering language
|
|
136
138
|
- say clearly that the worker should produce an exhaustive, section-addressable implementation plan and must not start coding yet
|
|
137
139
|
- require dense planning artifacts, especially `../docs/design.md`, with explicit treatment of modules, business rules, state machines, permissions, validation, verification strategy, checkpoints, and definition of done when applicable
|
|
@@ -164,7 +166,7 @@ For ordinary implementation turns:
|
|
|
164
166
|
- name the exact slice, user/admin actor path, modules, or surfaces to complete now
|
|
165
167
|
- itemize the expected outcomes for happy path, failure path, and auth/ownership/validation behavior when those dimensions matter
|
|
166
168
|
- require targeted local verification tied back to those expected outcomes
|
|
167
|
-
- explicitly prohibit
|
|
169
|
+
- explicitly prohibit broad verification commands that are reserved for later gate checks and unrelated follow-on work
|
|
168
170
|
- when the slice can truly be parallelized, name the separate branch contracts explicitly instead of asking Claude to infer them
|
|
169
171
|
- say to stop after this slice and report the exact changed files plus exact verification results
|
|
170
172
|
|
|
@@ -190,7 +192,7 @@ When resuming a long-lived lane:
|
|
|
190
192
|
|
|
191
193
|
For evaluator-driven remediation inside a `bugfix-N` session opened by a `partial pass` audit:
|
|
192
194
|
|
|
193
|
-
- lead with the concrete evaluator finding or
|
|
195
|
+
- lead with the concrete evaluator finding or reviewed issue statement
|
|
194
196
|
- state the expected fix and the affected non-regression surfaces
|
|
195
197
|
- require proof for the issue path plus the nearby happy path and security/ownership boundary when relevant
|
|
196
198
|
- say to stop after the named issue set rather than reopening unrelated refactors
|
|
@@ -199,7 +201,7 @@ For evaluator-driven remediation inside a `bugfix-N` session opened by a `partia
|
|
|
199
201
|
|
|
200
202
|
Do not do these:
|
|
201
203
|
|
|
202
|
-
- send `continue`, `next`, or `keep going` as a substantive
|
|
204
|
+
- send `continue`, `next`, or `keep going` as a substantive prompt
|
|
203
205
|
- ask for planning and implementation in the same turn unless that mixed boundary is intentional and explicitly stated
|
|
204
206
|
- ask for multiple gate exits in one turn
|
|
205
207
|
- let Claude decide its own stopping point implicitly
|
|
@@ -262,17 +264,17 @@ When the first `develop` slot begins in planning:
|
|
|
262
264
|
1. launch the live `develop` lane if it is not already running
|
|
263
265
|
2. send the original prompt plus a plain instruction to read it carefully, not plan yet, and wait for clarifications and planning direction through the bridge
|
|
264
266
|
3. store the Claude session id from bridge `state.json`
|
|
265
|
-
4. form an initial
|
|
266
|
-
5. send a compact second
|
|
267
|
+
4. form an initial planning view covering the likely architecture shape, obvious risks, and the major design questions that still need resolution
|
|
268
|
+
5. send a compact second message through the same live lane that directly includes the approved clarification content, the requirements-ambiguity resolutions, that initial planning view, the explicit plain-language planning brief summarizing prompt-critical requirements, actors, required surfaces, constraints, explicit non-goals, locked defaults, and risky planning areas, and a direct request for the implementation plan plus major risks or assumptions
|
|
267
269
|
6. continue the planning conversation in that same Claude session
|
|
268
270
|
|
|
269
271
|
Do not merge those two first messages.
|
|
270
272
|
Do not ask for a plan in the first message.
|
|
271
273
|
|
|
272
|
-
Preferred second
|
|
274
|
+
Preferred second planning-direction message shape:
|
|
273
275
|
|
|
274
|
-
- inline the approved clarification content and the requirements-ambiguity resolutions directly in the
|
|
275
|
-
- include the
|
|
276
|
+
- inline the approved clarification content and the requirements-ambiguity resolutions directly in the message
|
|
277
|
+
- include the initial planning view so planning is refined collaboratively rather than invented from zero
|
|
276
278
|
- add any short delta notes that are not already captured in that inlined summary
|
|
277
279
|
- express the current boundary in plain engineering language and then ask for an exhaustive, section-addressable implementation plan plus major risks or assumptions
|
|
278
280
|
- require the plan to fill the planning artifacts densely, especially `../docs/design.md`, with explicit sections for actors, success paths, modules, business rules, state machines, permissions, validation, test strategy, checkpoints, and definition of done when those dimensions matter
|
|
@@ -280,7 +282,7 @@ Preferred second owner message shape:
|
|
|
280
282
|
- say explicitly that coding must not start yet and that the response should stop after the planning artifacts and summary are complete
|
|
281
283
|
|
|
282
284
|
Do not tell the developer worker to read files outside `repo/`.
|
|
283
|
-
If
|
|
285
|
+
If project-lead artifacts outside `repo/` matter, restate their content directly in the message instead of passing file paths.
|
|
284
286
|
Do not mention session names, slot labels, or workflow phase labels to the developer worker.
|
|
285
287
|
|
|
286
288
|
### `bugfix-N` orientation handshake
|
|
@@ -288,7 +290,7 @@ Do not mention session names, slot labels, or workflow phase labels to the devel
|
|
|
288
290
|
When a fresh `partial pass` evaluation result opens the next remediation lane:
|
|
289
291
|
|
|
290
292
|
1. launch a fresh live Claude developer lane for the next `bugfix-N` label
|
|
291
|
-
2. use the first
|
|
293
|
+
2. use the first message only to orient that session to the repo and the current delivered state
|
|
292
294
|
3. make clear in plain engineering language that follow-up work will be focused remediation against evaluator findings
|
|
293
295
|
4. wait for the first response and store the Claude session id from bridge `state.json`
|
|
294
296
|
5. only after that orientation exchange, continue the same `bugfix-N` live lane with the first evaluator-driven issue list
|
|
@@ -400,7 +402,7 @@ Do not advance the workflow based only on Bash success if bridge files and metad
|
|
|
400
402
|
- if the bridge reports `blocked` because of `claude_usage_limit`, treat that as an automatic wait-and-resume path rather than a handoff-stop condition unless the wait or resume path itself fails
|
|
401
403
|
- if the saved live lane cannot continue, do not silently create a replacement session unless the workflow explicitly chooses a controlled replacement
|
|
402
404
|
- if a replacement session is required, record the handoff clearly in metadata and tracker comments
|
|
403
|
-
- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal
|
|
405
|
+
- keep hook logs and transcript pointers for debugging, but do not surface raw bridge artifacts back into normal developer-facing prompts unless debugging is explicitly needed
|
|
404
406
|
|
|
405
407
|
## Rate-limit handling
|
|
406
408
|
|
|
@@ -65,6 +65,7 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
65
65
|
- do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
|
|
66
66
|
- explain behavior changes clearly enough that the owner can keep parent-root `../docs/design.md`, `../docs/api-spec.md`, and `../docs/test-coverage.md` accurate when they apply
|
|
67
67
|
- before reporting development complete, remove or correct local-only setup instructions, host-only dependency assumptions, and other fast-iteration traces that should not survive into the final Docker-contained delivery
|
|
68
|
+
- before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
|
|
68
69
|
- verify the module against its planned behavior before trying to move on
|
|
69
70
|
- do not move on while the module is still obviously weak or half-finished
|
|
70
71
|
- do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
|
|
@@ -80,8 +81,11 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
80
81
|
- if the local toolchain is missing, install or enable the local targeted test tooling; do not fall back to Docker, `./run_tests.sh`, Playwright, or other broad-gate tooling during ordinary slice work
|
|
81
82
|
- fast local iteration is allowed during development even when the final delivered runtime and broad verification contract must be Docker-contained
|
|
82
83
|
- do not let temporary local tooling or host-only setup assumptions leak into the final README, wrapper scripts, or declared delivery contract
|
|
84
|
+
- local verification is for speed during development; the README-documented runtime and broad test commands are the final contract that must pass at the later gate when they are part of the README promise
|
|
85
|
+
- do not add runtime/test scripts, Compose services, or Docker entrypoints that shell out to host package managers or assume host-installed toolchains for the final delivered path; move those dependencies into Dockerfiles or container build definitions before the slice is considered complete
|
|
83
86
|
- do not run browser E2E, Playwright, full test suites, `./run_tests.sh`, or Docker runtime commands during ordinary development slices
|
|
84
87
|
- for frontend-bearing projects, rely on targeted local tests such as unit, component, route, page, or state-focused tests instead of browser E2E during ordinary slice work
|
|
88
|
+
- for `fullstack` and `web` projects, treat frontend unit tests as a real expected deliverable rather than optional polish; do not rely on package manifests or tooling presence as a substitute for real test files
|
|
85
89
|
- for mobile and desktop projects, rely on targeted local non-E2E verification during ordinary slice work rather than broad checkpoint commands
|
|
86
90
|
- when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
|
|
87
91
|
- for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
|
|
@@ -138,6 +138,7 @@ Inside a `partial pass` audit's bugfix loop:
|
|
|
138
138
|
- if the report finds any issue, treat that as blocking `P7` completion
|
|
139
139
|
- route those issues to the currently active recoverable developer session; prefer the most recently used developer session, which will usually be `bugfix-2`
|
|
140
140
|
- require fixes plus concrete verification evidence from that developer session
|
|
141
|
+
- after the fixes land, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands before the next static coverage/README rerun and treat failures as unresolved issues
|
|
141
142
|
- after the fixes land, run a fresh new coverage/README audit again and replace the old report
|
|
142
143
|
- allow at most 3 remediation attempts for this final coverage/README audit
|
|
143
144
|
- if the report is still not clean after the third remediation attempt, stop the retry loop, preserve the latest `../.tmp/test_coverage_and_readme_audit_report.md`, and treat that as the final evidence carried forward
|
|
@@ -51,6 +51,7 @@ Hardening should treat these as the main review buckets before final evaluation
|
|
|
51
51
|
- audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in `README.md` and reflected in external docs when they exist
|
|
52
52
|
- audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
|
|
53
53
|
- audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
|
|
54
|
+
- for `fullstack` and `web` projects, explicitly determine whether frontend unit tests are PRESENT or MISSING under the strict audit criteria, and treat missing or insufficient frontend unit tests as a critical gap before `P7`
|
|
54
55
|
- audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
|
|
55
56
|
- verify that missing failure handling is not being hidden behind fake-success behavior
|
|
56
57
|
- run exploratory testing around awkward states, repeated actions, and realistic edge behavior
|
|
@@ -58,6 +59,8 @@ Hardening should treat these as the main review buckets before final evaluation
|
|
|
58
59
|
- run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
|
|
59
60
|
- enforce env-file discipline during hardening
|
|
60
61
|
- run documentation verification against the real codebase and runtime behavior, not just document existence
|
|
62
|
+
- if `README.md` declares containerized runtime or broad test commands, verify that the final delivered output really supports those exact commands and that the docs do not overpromise beyond what the repo actually does
|
|
63
|
+
- verify that every dependency needed by the README-documented `docker compose up --build` and `./run_tests.sh` paths is declared in Dockerfiles or other repo-controlled container build definitions rather than relying on host-installed packages or runtimes
|
|
61
64
|
- audit README compliance against the strict post-bugfix README review shape:
|
|
62
65
|
- project type near the top
|
|
63
66
|
- startup instructions
|
|
@@ -67,6 +70,7 @@ Hardening should treat these as the main review buckets before final evaluation
|
|
|
67
70
|
- architecture and workflow clarity
|
|
68
71
|
- for backend, fullstack, and web projects, verify the README still documents the canonical `docker compose up --build` contract while also containing the exact legacy compatibility string `docker-compose up` for the strict README audit
|
|
69
72
|
- verify that fast local-iteration traces have been cleaned up before hardening closes: no lingering README dependence on `npm install`, `pip install`, `apt-get`, host-only runtime setup, or manual DB setup for the final delivered flow
|
|
73
|
+
- before hardening closes, if the README-documented final contract includes `docker compose up --build` and/or `./run_tests.sh`, require those exact commands to pass or explicitly fail the phase
|
|
70
74
|
- re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
|
|
71
75
|
- enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
|
|
72
76
|
- make sure the system is genuinely reviewable and reproducible
|
|
@@ -33,6 +33,7 @@ Once a failure class is known:
|
|
|
33
33
|
- for applicable UI-bearing work, this owner-run phase may use the selected stack's platform-appropriate UI/E2E tool for the affected flows, capture screenshots or equivalent artifacts, and verify the UI behavior and quality directly
|
|
34
34
|
- verify requirement closure, not just feature existence
|
|
35
35
|
- verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
|
|
36
|
+
- verify the delivered runtime and broad-test behavior against `README.md`; if the README says a command is how the project should be run or verified, treat that command as part of the real external contract
|
|
36
37
|
- verify end-to-end flow behavior where the change affects real workflows
|
|
37
38
|
- verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
|
|
38
39
|
- for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
|
|
@@ -51,6 +52,7 @@ Once a failure class is known:
|
|
|
51
52
|
- trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
|
|
52
53
|
- when integrated verification repeatedly finds the same avoidable failure class, treat that as evidence that earlier slice execution or slice-close acceptance must become more system-aware in future runs
|
|
53
54
|
- before closing the phase, verify the delivered startup path is genuinely runnable, the documented tests really execute, frontend behavior is usable when applicable, UI quality is acceptable, core running logic is complete, and Docker startup works when Docker is the runtime contract
|
|
55
|
+
- before closing the phase, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands here as part of the final integrated proof for the phase
|
|
54
56
|
- tighten parent-root `../docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
|
|
55
57
|
- when security-bearing behavior changes, tighten parent-root `../docs/design.md` and `../docs/api-spec.md` as needed so enforcement points and mapped tests stay accurate
|
|
56
58
|
- when frontend-bearing behavior changes, tighten `README.md` plus parent-root `../docs/design.md` as needed so key pages, interactions, and required UI states stay accurate
|
|
@@ -163,6 +163,7 @@ Selected-stack defaults:
|
|
|
163
163
|
- `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
|
|
164
164
|
- `./run_tests.sh` must be able to run on a clean Linux VM that only has Docker and curl available by default
|
|
165
165
|
- do not require host package managers, host language runtimes, or host test tooling for the broad test path unless the stack absolutely forces it and the exception is explicitly justified
|
|
166
|
+
- define all runtime and broad-test dependencies in Dockerfiles, image build stages, or other repo-controlled container build definitions; do not rely on hidden host packages even when the developer machine happens to have them
|
|
166
167
|
- `./run_tests.sh` must prepare or install anything required inside its own controlled execution path when that setup is needed for a clean environment
|
|
167
168
|
- for web projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
|
|
168
169
|
- when host-level setup would otherwise be required, prefer a Dockerized `./run_tests.sh` path even outside traditional web stacks so the broad verification remains portable
|
|
@@ -210,6 +211,7 @@ Selected-stack defaults:
|
|
|
210
211
|
- for backend or fullstack projects, explicitly plan coverage for 401, 403, 404, conflicts or duplicate submission when relevant, object-level authorization, tenant or user isolation, sensitive-log exposure, and pagination/filter/sort when those behaviors exist
|
|
211
212
|
- for frontend-bearing projects, explicitly plan a layered frontend test story when UI state or routing is material: unit, component, page or route integration, and E2E where applicable
|
|
212
213
|
- for non-trivial frontend projects, explicitly plan a frontend test layer beyond runtime-only confidence: component, page, route, or state-focused tests when UI state complexity is meaningful
|
|
214
|
+
- for `fullstack` and `web` projects, explicitly plan real frontend unit tests and make it possible for later audit output to state `Frontend unit tests: PRESENT` with direct file-level evidence rather than inference
|
|
213
215
|
- for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable, but treat Playwright as a real verified dependency rather than a decorative default
|
|
214
216
|
- for mobile work, plan Jest plus React Native Testing Library as the local default test layer and add a platform-appropriate mobile UI/E2E tool when real device-flow proof is needed
|
|
215
217
|
- for desktop work, plan a local desktop test runner plus Playwright Electron support or another platform-appropriate desktop UI/E2E tool when real window-flow proof is needed
|
|
@@ -90,6 +90,7 @@ At scaffold time, do not require:
|
|
|
90
90
|
- create `./run_tests.sh` during scaffold for every project
|
|
91
91
|
- create `./run_app.sh` during scaffold for non-web platforms when it helps expose the host-side or platform-specific local flow, but keep `docker compose up --build` and containerized `./run_tests.sh` as required baseline commands
|
|
92
92
|
- if the project has database dependencies, create `./init_db.sh` during scaffold as the only project-standard database initialization path
|
|
93
|
+
- define the runtime and broad-test dependency set in Dockerfiles or other repo-controlled container build definitions during scaffold; do not assume host package managers, host SDKs, or host language runtimes beyond Docker and documented baseline prerequisites
|
|
93
94
|
- make the scaffold handoff compact and checklist-driven rather than a long narrative dump
|
|
94
95
|
|
|
95
96
|
## Minimal real test floor
|
|
@@ -64,6 +64,7 @@ No screenshots are required as packaging artifacts.
|
|
|
64
64
|
- ensure `README.md` matches the delivered codebase, functionality, runtime steps, test steps, main repo contents, and important new-developer information, and stays friendly to a junior developer
|
|
65
65
|
- ensure `README.md` also describes the delivered architecture at an implementation-review level rather than only listing commands
|
|
66
66
|
- ensure `README.md` remains the primary in-repo documentation surface
|
|
67
|
+
- treat `README.md` as the final public output format for runtime and broad test expectations: the packaged repo must comply exactly with the commands and constraints it documents
|
|
67
68
|
- verify no repo-local file depends on parent-root docs or sibling workflow artifacts for startup, build/preview, configuration, static review, or basic project understanding
|
|
68
69
|
- if the project uses mock, stub, fake, interception, or local-data behavior, ensure `README.md` discloses that scope accurately and does not imply undisclosed real integration
|
|
69
70
|
- if mock or interception behavior is enabled by default, ensure `README.md` says so clearly
|
|
@@ -141,6 +142,7 @@ After those steps:
|
|
|
141
142
|
- do one final package review before declaring packaging complete
|
|
142
143
|
- confirm the package is coherent as a delivered project, not just a working repo snapshot
|
|
143
144
|
- confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
|
|
145
|
+
- if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, make sure the final package review uses those exact commands rather than a substitute path
|
|
144
146
|
- confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
|
|
145
147
|
- if packaging reveals a real defect or missing artifact, fix it before closing the phase
|
|
146
148
|
- do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied
|
|
@@ -26,6 +26,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
|
|
|
26
26
|
- require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
|
|
27
27
|
- do not require the README to carry a full API catalog
|
|
28
28
|
- require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
|
|
29
|
+
- treat the README as the final public contract for runtime and broad-test behavior: if it documents a runtime command or a broad test command, the delivered output must satisfy that exact contract
|
|
29
30
|
- do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
|
|
30
31
|
- require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
|
|
31
32
|
- if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
|
|
@@ -47,6 +48,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
|
|
|
47
48
|
- for Android, mobile, desktop, and iOS-targeted projects, allow `./run_app.sh` as an additional platform helper but not as a replacement for the required Docker command
|
|
48
49
|
- require `./run_tests.sh` to be self-sufficient enough to run from a clean Linux VM that only has Docker and curl available by default
|
|
49
50
|
- do not accept a broad test path that depends on host package managers or preinstalled host language runtimes when Docker can provide the execution environment instead
|
|
51
|
+
- do not accept Docker runtime/test paths that rely on undeclared host packages, SDKs, compilers, CLIs, or language runtimes; all such dependencies must be defined in Dockerfiles or other repo-controlled container build definitions
|
|
50
52
|
- for web projects, require `./run_tests.sh` to be the Dockerized broad test path used only for the limited broad verification moments rather than as the ordinary development verification path
|
|
51
53
|
- when host-level setup would otherwise be required, prefer a Dockerized `./run_tests.sh` path even outside traditional web stacks so the broad verification remains portable
|
|
52
54
|
- for non-web projects, require `./run_tests.sh` to remain containerized and usable as the platform-equivalent broad test path used for final broad verification
|
|
@@ -188,11 +190,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
|
|
|
188
190
|
- module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
|
|
189
191
|
- when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
|
|
190
192
|
- integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
|
|
193
|
+
- integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; when `README.md` documents `docker compose up --build` and/or `./run_tests.sh`, those exact commands are expected here as part of the final external-contract proof
|
|
191
194
|
- module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
|
|
192
195
|
- before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
|
|
193
196
|
- before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
|
|
194
197
|
- integrated verification completion requires explicit full-system evidence before the phase can close
|
|
195
198
|
- integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
|
|
199
|
+
- before leaving development, hardening, or packaging, if `README.md` documents a containerized final runtime or broad test command, require those exact commands to be run at the appropriate final gate and verify that the README still matches the real output
|
|
196
200
|
- web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
|
|
197
201
|
- mobile and desktop integrated verification must include the selected stack's platform-appropriate UI/E2E coverage for every major user flow when UI-bearing flows are material
|
|
198
202
|
- for Electron or other Linux-targetable desktop projects, integrated verification should use the Dockerized desktop build/test path plus headless UI/runtime verification artifacts
|
|
@@ -207,9 +211,11 @@ Use evidence such as internal metadata files, structured Beads comments, verific
|
|
|
207
211
|
- before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
|
|
208
212
|
- before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
|
|
209
213
|
- before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
|
|
214
|
+
- before `P7`, for `fullstack` and `web` projects, require an explicit frontend unit-test verdict backed by direct file-level evidence; if frontend unit tests are missing or insufficient, treat that as a critical gap
|
|
210
215
|
- before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
|
|
211
216
|
- before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
|
|
212
217
|
- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
|
|
218
|
+
- before leaving `P7`, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered external contract, run those exact commands on the final state and require them to pass before moving to `P8`
|
|
213
219
|
- if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
|
|
214
220
|
- before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
|
|
215
221
|
|
|
@@ -49,6 +49,7 @@ That proof differs by family, but it must always be **meaningful**:
|
|
|
49
49
|
At minimum, `docker compose up --build` must prove all of the following:
|
|
50
50
|
|
|
51
51
|
- images build from source in the repo
|
|
52
|
+
- all system packages, language runtimes, CLIs, compilers, and SDKs required by the runtime path are declared in Dockerfiles or image build stages rather than assumed from the host
|
|
52
53
|
- declared runtime or support services start without hidden host setup
|
|
53
54
|
- readiness can be observed through healthchecks, predictable logs, or explicit successful artifact production
|
|
54
55
|
- the stack reaches a meaningful healthy state without manual container surgery
|
|
@@ -71,6 +72,9 @@ It must prove that the scaffold can be verified from a clean-enough developer or
|
|
|
71
72
|
- run smoke checks against the actual baseline contract
|
|
72
73
|
- clean up one-off test containers and temporary runtime state
|
|
73
74
|
|
|
75
|
+
`docker compose up --build` and `./run_tests.sh` must never depend on host-installed language runtimes, package managers, CLIs, compilers, SDKs, or other local system packages beyond Docker and the documented baseline host prerequisites.
|
|
76
|
+
If a dependency is needed for runtime or broad verification, define it in Dockerfiles, image build stages, or other repo-controlled container build definitions.
|
|
77
|
+
|
|
74
78
|
`./run_tests.sh` must not be a fake wrapper that only prints TODOs, only runs lint when runtime proof is expected, or silently skips the meaningful part of verification.
|
|
75
79
|
|
|
76
80
|
Minimum proof by family:
|
|
@@ -88,6 +88,7 @@ This file is the repo-local engineering rulebook for `slopmachine` projects.
|
|
|
88
88
|
- `./run_tests.sh` should use the same startup-value model as `docker compose up --build` rather than a separate pre-seeded test-secret path.
|
|
89
89
|
- If local-development runtime values must persist across restarts, keep them only in Docker-managed runtime state rather than committed repo files.
|
|
90
90
|
- If such a bootstrap script exists, document in the script and in `README.md` that it is for local development bootstrap only and is not the production secret-management path.
|
|
91
|
+
- Do not let `docker compose up --build` or `./run_tests.sh` depend on host-installed packages, SDKs, language runtimes, CLIs, or toolchains beyond Docker and the documented baseline host prerequisites; define those dependencies in Dockerfiles or other repo-controlled container build definitions.
|
|
91
92
|
|
|
92
93
|
## Product Integrity Rules
|
|
93
94
|
|
|
@@ -88,6 +88,7 @@ This file is the repo-local engineering rulebook for `slopmachine-claude` projec
|
|
|
88
88
|
- `./run_tests.sh` should use the same startup-value model as `docker compose up --build` rather than a separate pre-seeded test-secret path.
|
|
89
89
|
- If local-development runtime values must persist across restarts, keep them only in Docker-managed runtime state rather than committed repo files.
|
|
90
90
|
- If such a bootstrap script exists, document in the script and in `README.md` that it is for local development bootstrap only and is not the production secret-management path.
|
|
91
|
+
- Do not let `docker compose up --build` or `./run_tests.sh` depend on host-installed packages, SDKs, language runtimes, CLIs, or toolchains beyond Docker and the documented baseline host prerequisites; define those dependencies in Dockerfiles or other repo-controlled container build definitions.
|
|
91
92
|
|
|
92
93
|
## Product Integrity Rules
|
|
93
94
|
|
package/package.json
CHANGED
package/src/init.js
CHANGED
|
@@ -288,7 +288,13 @@ async function createInitialPhaseArtifacts(targetPath, options) {
|
|
|
288
288
|
`## Bootstrap Status\n\n` +
|
|
289
289
|
`- Workspace initialized by slopmachine.\n` +
|
|
290
290
|
`${options.adoptExisting ? '- Existing project adoption mode is active.\n' : ''}` +
|
|
291
|
-
`${options.requestedStartPhase ? `- Requested start phase: ${options.requestedStartPhase}.\n` : ''}`
|
|
291
|
+
`${options.requestedStartPhase ? `- Requested start phase: ${options.requestedStartPhase}.\n` : ''}` +
|
|
292
|
+
`\n## Entry Template\n\n` +
|
|
293
|
+
`Copy this exact structure for each clarification item:\n\n` +
|
|
294
|
+
`### 1. Clarification Defaults for Planning\n` +
|
|
295
|
+
`- Question: Can the drafted clarification defaults be used for planning?\n` +
|
|
296
|
+
`- My Understanding: The prompt was large enough that planning needed explicit confirmation that the clarification package was acceptable. We needed to lock this in rather than carrying uncertainty forward into the planning phase.\n` +
|
|
297
|
+
`- Solution: Yes. Proceed with the drafted defaults, allowing planning to start from the approved clarification brief instead of an uncertain baseline.\n`
|
|
292
298
|
|
|
293
299
|
const prePlanningBriefContent = `# Pre-Planning Brief\n\n` +
|
|
294
300
|
`Capture the planning-critical project shape here before real planning begins.\n\n` +
|