theslopmachine 0.4.7 → 0.4.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/MANUAL.md +13 -12
- package/README.md +18 -22
- package/RELEASE.md +6 -3
- package/assets/agents/developer.md +10 -1
- package/assets/agents/slopmachine.md +15 -96
- package/assets/skills/developer-session-lifecycle/SKILL.md +4 -13
- package/assets/skills/development-guidance/SKILL.md +17 -1
- package/assets/skills/final-evaluation-orchestration/SKILL.md +1 -0
- package/assets/skills/hardening-gate/SKILL.md +18 -0
- package/assets/skills/integrated-verification/SKILL.md +5 -0
- package/assets/skills/planning-gate/SKILL.md +24 -2
- package/assets/skills/planning-guidance/SKILL.md +38 -9
- package/assets/skills/scaffold-guidance/SKILL.md +23 -6
- package/assets/skills/submission-packaging/SKILL.md +17 -3
- package/assets/skills/verification-gates/SKILL.md +39 -6
- package/assets/slopmachine/templates/AGENTS.md +33 -3
- package/assets/slopmachine/utils/cleanup_delivery_artifacts.py +124 -0
- package/package.json +1 -1
- package/src/constants.js +5 -7
- package/src/init.js +35 -11
- package/assets/skills/session-rollover/SKILL.md +0 -47
- package/assets/slopmachine/document-completeness.md +0 -59
- package/assets/slopmachine/engineering-results.md +0 -63
- package/assets/slopmachine/implementation-comparison.md +0 -43
- package/assets/slopmachine/quality-document.md +0 -67
package/MANUAL.md
CHANGED
|
@@ -23,7 +23,7 @@ The installed agent set includes the current `slopmachine` and `developer` agent
|
|
|
23
23
|
|
|
24
24
|
## Start a project
|
|
25
25
|
|
|
26
|
-
Inside
|
|
26
|
+
Inside a new or empty project directory, run:
|
|
27
27
|
|
|
28
28
|
```bash
|
|
29
29
|
slopmachine init
|
|
@@ -43,25 +43,26 @@ slopmachine init -o
|
|
|
43
43
|
- bootstraps beads_rust (`br`)
|
|
44
44
|
- creates `repo/`
|
|
45
45
|
- copies the packaged repo rulebook into `repo/AGENTS.md`
|
|
46
|
-
- creates the initial git
|
|
46
|
+
- creates the initial git commit so the workspace starts with a clean tree
|
|
47
47
|
- optionally opens `opencode` in `repo/`
|
|
48
48
|
|
|
49
49
|
## Rough workflow
|
|
50
50
|
|
|
51
|
-
1.
|
|
52
|
-
2.
|
|
53
|
-
3.
|
|
54
|
-
4.
|
|
55
|
-
5.
|
|
56
|
-
6.
|
|
57
|
-
7.
|
|
58
|
-
8.
|
|
59
|
-
9.
|
|
51
|
+
1. Intake and setup
|
|
52
|
+
2. Clarification
|
|
53
|
+
3. Planning
|
|
54
|
+
4. Scaffold/foundation
|
|
55
|
+
5. Development
|
|
56
|
+
6. Integrated verification
|
|
57
|
+
7. Hardening
|
|
58
|
+
8. Evaluation and fix verification
|
|
59
|
+
9. Final human decision
|
|
60
60
|
10. Submission packaging
|
|
61
|
+
11. Retrospective
|
|
61
62
|
|
|
62
63
|
## Important notes
|
|
63
64
|
|
|
64
65
|
- theslopmachine depends on OpenCode, beads_rust (`br`), git, python3, and Docker being available.
|
|
65
66
|
- The workflow-owner agents use mandatory skills for specific phases; skipping them is considered a workflow failure.
|
|
66
67
|
- `slopmachine` is the lighter current engine: it keeps the owner prompt smaller, uses more specialized skills, and keeps one active developer session at a time while preserving rollover history when new sessions are intentionally started.
|
|
67
|
-
- Submission packaging collects the final docs, accepted evaluation reports,
|
|
68
|
+
- Submission packaging collects the final docs, accepted evaluation reports, cleaned session exports, converted session traces, and the cleaned repo into the required final structure.
|
package/README.md
CHANGED
|
@@ -7,7 +7,6 @@
|
|
|
7
7
|
- installs packaged OpenCode agents into `~/.config/opencode/agents/`
|
|
8
8
|
- installs packaged skills into `~/.agents/skills/`
|
|
9
9
|
- installs packaged workflow support files into `~/slopmachine/`
|
|
10
|
-
- installs Claude worker runtime assets under `~/.claude/`
|
|
11
10
|
- bootstraps a new project workspace with `repo/`, `docs/`, `sessions/`, `metadata.json`, `AGENTS.md`, and initialized `br` state
|
|
12
11
|
- configures required OpenCode plugins and MCP entries without overwriting existing `context7` or `exa` configuration
|
|
13
12
|
|
|
@@ -26,9 +25,11 @@ Build and install the package:
|
|
|
26
25
|
npm install
|
|
27
26
|
npm run check
|
|
28
27
|
npm pack
|
|
29
|
-
npm install -g ./theslopmachine
|
|
28
|
+
npm install -g ./theslopmachine-<version>.tgz
|
|
30
29
|
```
|
|
31
30
|
|
|
31
|
+
`package.json` is the package-version source of truth. The packed tarball name and CLI version banner both derive from that version.
|
|
32
|
+
|
|
32
33
|
For local package development instead of global install:
|
|
33
34
|
|
|
34
35
|
```bash
|
|
@@ -52,7 +53,6 @@ slopmachine setup
|
|
|
52
53
|
- installs or refreshes packaged agents
|
|
53
54
|
- installs or refreshes packaged skills
|
|
54
55
|
- installs or refreshes packaged workflow files into `~/slopmachine/`
|
|
55
|
-
- installs or refreshes Claude runtime assets under `~/.claude/`
|
|
56
56
|
- updates `~/.config/opencode/opencode.json`
|
|
57
57
|
- prompts for missing MCP API keys when needed
|
|
58
58
|
|
|
@@ -77,7 +77,7 @@ opencode auth list
|
|
|
77
77
|
|
|
78
78
|
## Startup
|
|
79
79
|
|
|
80
|
-
Create and initialize a new project workspace:
|
|
80
|
+
Create and initialize a new project workspace in a new or empty directory:
|
|
81
81
|
|
|
82
82
|
```bash
|
|
83
83
|
mkdir my-project
|
|
@@ -110,6 +110,8 @@ Bootstrapped workspace layout:
|
|
|
110
110
|
- `metadata.json` for project workflow metadata
|
|
111
111
|
- `repo/AGENTS.md` for the repo-local agent instructions
|
|
112
112
|
|
|
113
|
+
`slopmachine init` creates the initial git commit so the workspace starts from a clean tree.
|
|
114
|
+
|
|
113
115
|
## Testing
|
|
114
116
|
|
|
115
117
|
Package-level checks:
|
|
@@ -142,17 +144,17 @@ Operating model:
|
|
|
142
144
|
|
|
143
145
|
High-level lifecycle:
|
|
144
146
|
|
|
145
|
-
1.
|
|
146
|
-
2.
|
|
147
|
-
3.
|
|
148
|
-
4.
|
|
149
|
-
5.
|
|
150
|
-
6.
|
|
151
|
-
7.
|
|
152
|
-
8.
|
|
153
|
-
9.
|
|
154
|
-
10.
|
|
155
|
-
11.
|
|
147
|
+
1. `P0 Intake and Setup`
|
|
148
|
+
2. `P1 Clarification`
|
|
149
|
+
3. `P2 Planning`
|
|
150
|
+
4. `P3 Scaffold`
|
|
151
|
+
5. `P4 Development`
|
|
152
|
+
6. `P5 Integrated Verification`
|
|
153
|
+
7. `P6 Hardening`
|
|
154
|
+
8. `P7 Evaluation and Fix Verification`
|
|
155
|
+
9. `P8 Final Human Decision`
|
|
156
|
+
10. `P9 Submission Packaging`
|
|
157
|
+
11. `P10 Retrospective`
|
|
156
158
|
|
|
157
159
|
Design constraints:
|
|
158
160
|
|
|
@@ -177,7 +179,6 @@ Main locations:
|
|
|
177
179
|
- skills: `~/.agents/skills/`
|
|
178
180
|
- OpenCode config: `~/.config/opencode/opencode.json`
|
|
179
181
|
- packaged workflow files: `~/slopmachine/`
|
|
180
|
-
- Claude runtime assets: `~/.claude/`
|
|
181
182
|
|
|
182
183
|
Installed agents:
|
|
183
184
|
|
|
@@ -188,7 +189,6 @@ Installed skills:
|
|
|
188
189
|
|
|
189
190
|
- `~/.agents/skills/clarification-gate/`
|
|
190
191
|
- `~/.agents/skills/developer-session-lifecycle/`
|
|
191
|
-
- `~/.agents/skills/session-rollover/`
|
|
192
192
|
- `~/.agents/skills/final-evaluation-orchestration/`
|
|
193
193
|
- `~/.agents/skills/beads-operations/`
|
|
194
194
|
- `~/.agents/skills/planning-guidance/`
|
|
@@ -199,7 +199,6 @@ Installed skills:
|
|
|
199
199
|
- `~/.agents/skills/integrated-verification/`
|
|
200
200
|
- `~/.agents/skills/hardening-gate/`
|
|
201
201
|
- `~/.agents/skills/evaluation-triage/`
|
|
202
|
-
- `~/.agents/skills/remediation-guidance/`
|
|
203
202
|
- `~/.agents/skills/submission-packaging/`
|
|
204
203
|
- `~/.agents/skills/retrospective-analysis/`
|
|
205
204
|
- `~/.agents/skills/owner-evidence-discipline/`
|
|
@@ -210,14 +209,11 @@ Installed workflow files under `~/slopmachine/`:
|
|
|
210
209
|
|
|
211
210
|
- `backend-evaluation-prompt.md`
|
|
212
211
|
- `frontend-evaluation-prompt.md`
|
|
213
|
-
- `document-completeness.md`
|
|
214
|
-
- `engineering-results.md`
|
|
215
|
-
- `implementation-comparison.md`
|
|
216
|
-
- `quality-document.md`
|
|
217
212
|
- `templates/AGENTS.md`
|
|
218
213
|
- `workflow-init.js`
|
|
219
214
|
- `utils/strip_session_parent.py`
|
|
220
215
|
- `utils/convert_ai_session.py`
|
|
216
|
+
- `utils/cleanup_delivery_artifacts.py`
|
|
221
217
|
|
|
222
218
|
OpenCode config entries ensured by `setup`:
|
|
223
219
|
|
package/RELEASE.md
CHANGED
|
@@ -39,16 +39,18 @@ Note:
|
|
|
39
39
|
npm pack
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
This should produce a tarball
|
|
42
|
+
This should produce a tarball named like:
|
|
43
43
|
|
|
44
44
|
```bash
|
|
45
|
-
theslopmachine
|
|
45
|
+
theslopmachine-<version>.tgz
|
|
46
46
|
```
|
|
47
47
|
|
|
48
|
+
`<version>` comes from `package.json`, which is the single package-version source of truth.
|
|
49
|
+
|
|
48
50
|
## Inspect package contents
|
|
49
51
|
|
|
50
52
|
```bash
|
|
51
|
-
tar -tzf theslopmachine
|
|
53
|
+
tar -tzf theslopmachine-<version>.tgz
|
|
52
54
|
```
|
|
53
55
|
|
|
54
56
|
Check that the tarball includes:
|
|
@@ -87,6 +89,7 @@ npm publish --dry-run
|
|
|
87
89
|
|
|
88
90
|
## Versioning
|
|
89
91
|
|
|
92
|
+
- `package.json` is the single package-version source of truth for the tarball name and CLI version banner
|
|
90
93
|
- bump `package.json` version before each release
|
|
91
94
|
- keep the CLI command as `slopmachine`
|
|
92
95
|
- keep the npm package name as `theslopmachine`
|
|
@@ -49,7 +49,9 @@ Do not narrow scope for convenience.
|
|
|
49
49
|
- implement real behavior, not placeholders
|
|
50
50
|
- keep user-facing and admin-facing flows complete through their real surfaces
|
|
51
51
|
- verify the changed area locally and realistically before reporting completion
|
|
52
|
-
- update repo-local docs such as `README.md` when behavior or run/test instructions change
|
|
52
|
+
- update repo-local docs such as `README.md` and `./docs/*` when behavior or run/test instructions change
|
|
53
|
+
- keep repo-local docs and code structure statically reviewable; do not rely on runtime success alone to make the project understandable
|
|
54
|
+
- keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
|
|
53
55
|
- do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
|
|
54
56
|
|
|
55
57
|
## Verification Cadence
|
|
@@ -87,6 +89,13 @@ Selected-stack defaults:
|
|
|
87
89
|
- do not hardcode secrets or leave prototype residue behind
|
|
88
90
|
- when the project has database dependencies, keep database setup in `./init_db.sh` rather than scattered repo logic
|
|
89
91
|
- do not hardcode database connection values or database bootstrap values anywhere in the repo
|
|
92
|
+
- if the project uses mock, stub, fake, or local-data behavior, disclose that scope accurately in the repo-local documentation instead of implying real backend or production behavior
|
|
93
|
+
- if mock or interception behavior is enabled by default, document that clearly
|
|
94
|
+
- disclose feature flags, debug/demo surfaces, and default enabled states clearly in repo-local docs when they exist
|
|
95
|
+
- keep frontend state requirements explicit in code and repo-local docs for prompt-critical flows
|
|
96
|
+
- use a shared logging path and avoid random print-style debugging as the durable implementation pattern
|
|
97
|
+
- use a shared validation/error-handling path when validation materially affects the flow
|
|
98
|
+
- do not hide missing failure handling behind fake-success paths
|
|
90
99
|
|
|
91
100
|
## Skills
|
|
92
101
|
|
|
@@ -176,36 +176,23 @@ Phase rules:
|
|
|
176
176
|
|
|
177
177
|
Maintain exactly one active developer session at a time.
|
|
178
178
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
There may be multiple `develop` sessions over the life of one project.
|
|
186
|
-
|
|
187
|
-
During the first full run from planning through initial packaging, keep all work in the `develop-N` sequence, including integrated verification, hardening, evaluation issue fixing inside `P7`, and packaging follow-through.
|
|
188
|
-
|
|
189
|
-
If the project is reopened after packaging because of later reported issues, continue with the existing developer session unless you explicitly request a new one.
|
|
190
|
-
|
|
191
|
-
Fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule.
|
|
192
|
-
|
|
193
|
-
If you explicitly request a new session while one is active, ask the current developer exactly `give me a summary of all the work that has been done`, then use that handoff to seed the next session.
|
|
194
|
-
|
|
195
|
-
Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
|
|
196
|
-
Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request.
|
|
179
|
+
- track developer sessions in metadata using the `develop-N` line
|
|
180
|
+
- keep the same active developer session through planning, development, verification, hardening, evaluation fixes, and packaging follow-through unless you explicitly request a new one
|
|
181
|
+
- if the project is reopened later, recover and continue the active developer session unless you explicitly request a replacement
|
|
182
|
+
- fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule
|
|
183
|
+
- use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery
|
|
197
184
|
|
|
198
185
|
Do not launch the developer during `P0` or `P1`.
|
|
199
186
|
|
|
200
|
-
When the first develop developer session begins in `P2`,
|
|
187
|
+
When the first develop developer session begins in `P2`, use this planning handshake:
|
|
201
188
|
|
|
202
|
-
1. send
|
|
189
|
+
1. send the original prompt and ask for an initial plan plus major risks or assumptions
|
|
203
190
|
2. wait for the developer's first reply
|
|
204
|
-
3. send the approved clarification prompt
|
|
191
|
+
3. send the approved clarification prompt as the second owner message in that same session
|
|
205
192
|
4. continue with planning from there
|
|
206
193
|
|
|
207
|
-
Do not reorder that sequence.
|
|
208
194
|
Do not merge those messages.
|
|
195
|
+
Do not send the clarification prompt first.
|
|
209
196
|
|
|
210
197
|
## Verification Budget
|
|
211
198
|
|
|
@@ -218,50 +205,7 @@ Owner-side discipline:
|
|
|
218
205
|
- do not rerun expensive local test or E2E commands just because the developer already ran them
|
|
219
206
|
- when the developer reports the exact verification command and its result clearly, use that evidence unless there is a concrete reason to challenge it
|
|
220
207
|
- rerun expensive verification only when the developer evidence is weak, contradictory, flaky, high-risk, needed for a true broad gate, or needed to answer a new question
|
|
221
|
-
|
|
222
|
-
Target budget for the whole workflow:
|
|
223
|
-
|
|
224
|
-
- at most 3 broad owner-run verification moments using the selected stack's full verification path
|
|
225
|
-
|
|
226
|
-
Selected-stack rule:
|
|
227
|
-
|
|
228
|
-
- follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
|
|
229
|
-
- for backend and fullstack web projects, the broad path is usually Docker/runtime plus the full test command
|
|
230
|
-
- for pure frontend web projects, the broad path is the documented production build plus the full test command and browser E2E when applicable
|
|
231
|
-
- for mobile projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI/device verification when applicable
|
|
232
|
-
- for desktop projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI verification when applicable
|
|
233
|
-
|
|
234
|
-
Every project must end up with:
|
|
235
|
-
|
|
236
|
-
- one primary documented runtime command
|
|
237
|
-
- one primary documented full-test command: `./run_tests.sh`
|
|
238
|
-
|
|
239
|
-
Runtime command rule:
|
|
240
|
-
|
|
241
|
-
- for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
|
|
242
|
-
- when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
|
|
243
|
-
|
|
244
|
-
Default moments:
|
|
245
|
-
|
|
246
|
-
1. scaffold acceptance
|
|
247
|
-
2. development complete -> integrated verification entry
|
|
248
|
-
3. final qualified state before packaging
|
|
249
|
-
|
|
250
|
-
For Dockerized web backend/fullstack projects, enforce this cadence:
|
|
251
|
-
|
|
252
|
-
- after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
|
|
253
|
-
- after that, do not run Docker again during ordinary development work
|
|
254
|
-
- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
|
|
255
|
-
|
|
256
|
-
Between those moments, rely on:
|
|
257
|
-
|
|
258
|
-
- local runtime checks
|
|
259
|
-
- targeted unit tests
|
|
260
|
-
- targeted integration tests
|
|
261
|
-
- targeted module or route-family reruns
|
|
262
|
-
- the selected stack's local UI or E2E tool when UI is material
|
|
263
|
-
|
|
264
|
-
If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
|
|
208
|
+
- use phase skills and `verification-gates` for stack-specific runtime and broad-gate cadence details
|
|
265
209
|
|
|
266
210
|
## Mandatory Skill Discipline
|
|
267
211
|
|
|
@@ -272,10 +216,6 @@ Named skills are mandatory, not optional.
|
|
|
272
216
|
- if the required skill is not loaded, stop immediately and load it before continuing
|
|
273
217
|
- do not prompt the developer first and load the skill later
|
|
274
218
|
|
|
275
|
-
## Mandatory Skill Usage
|
|
276
|
-
|
|
277
|
-
Load the required skill before the corresponding phase or activity work begins.
|
|
278
|
-
|
|
279
219
|
Core map:
|
|
280
220
|
|
|
281
221
|
- `P0` -> `developer-session-lifecycle`
|
|
@@ -292,7 +232,6 @@ Core map:
|
|
|
292
232
|
- `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
|
|
293
233
|
- state mutations -> `beads-operations`
|
|
294
234
|
- evidence-heavy review -> `owner-evidence-discipline`
|
|
295
|
-
- intentional new developer session -> `session-rollover`
|
|
296
235
|
|
|
297
236
|
Do not improvise a phase from memory when a phase skill exists.
|
|
298
237
|
|
|
@@ -308,21 +247,6 @@ When talking to the developer:
|
|
|
308
247
|
|
|
309
248
|
Do not leak workflow internals such as:
|
|
310
249
|
|
|
311
|
-
- Beads
|
|
312
|
-
- phases
|
|
313
|
-
- overlays
|
|
314
|
-
- `.ai/` files
|
|
315
|
-
- approval-state machinery
|
|
316
|
-
- session-slot bookkeeping
|
|
317
|
-
- packaging-stage orchestration details
|
|
318
|
-
|
|
319
|
-
Do not sound like workflow software talking to a worker.
|
|
320
|
-
Do not speak as a relay for a third party.
|
|
321
|
-
|
|
322
|
-
## Developer Isolation
|
|
323
|
-
|
|
324
|
-
The developer must not be told about:
|
|
325
|
-
|
|
326
250
|
- Beads workflow mechanics
|
|
327
251
|
- `.ai/` orchestration files
|
|
328
252
|
- approval-state machinery
|
|
@@ -330,6 +254,8 @@ The developer must not be told about:
|
|
|
330
254
|
- packaging-stage orchestration details
|
|
331
255
|
|
|
332
256
|
To the developer, this should feel like a normal engineering conversation with a strong technical lead.
|
|
257
|
+
Do not sound like workflow software talking to a worker.
|
|
258
|
+
Do not speak as a relay for a third party.
|
|
333
259
|
|
|
334
260
|
## Operating Discipline
|
|
335
261
|
|
|
@@ -338,7 +264,7 @@ To the developer, this should feel like a normal engineering conversation with a
|
|
|
338
264
|
- keep work moving without low-information continuation chatter
|
|
339
265
|
- read only what is needed to answer the current decision
|
|
340
266
|
- keep comments and metadata auditable and specific
|
|
341
|
-
- keep external docs owner-maintained and repo-local
|
|
267
|
+
- keep external docs owner-maintained as reference copies and repo-local docs developer-maintained for the repo's self-sufficient source of truth
|
|
342
268
|
|
|
343
269
|
## Review Posture
|
|
344
270
|
|
|
@@ -357,19 +283,14 @@ After each substantive developer reply, do one of four things:
|
|
|
357
283
|
3. request clarification or justification
|
|
358
284
|
4. require verification before deciding
|
|
359
285
|
|
|
360
|
-
## Packaging
|
|
286
|
+
## Packaging
|
|
361
287
|
|
|
362
288
|
Treat packaging as a first-class delivery contract from the start, not as late cleanup.
|
|
363
289
|
|
|
364
290
|
- the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
|
|
365
|
-
- `../self-test-run.md`, `../self-test-fixes.md`, `../sessions/`, `../metadata.json`, `../docs/`, and the delivered `repo/` are the mandatory late-stage artifacts
|
|
366
|
-
- do not invent `submission/`, packaging-only report files, screenshots, or other extra artifact structures during ordinary packaging
|
|
367
|
-
|
|
368
|
-
When `P9 Submission Packaging` begins:
|
|
369
|
-
|
|
370
291
|
- load `submission-packaging` before any packaging action
|
|
371
292
|
- follow its exact artifact, export, cleanup, and output contract
|
|
372
|
-
- do not
|
|
293
|
+
- do not invent extra artifact structures during ordinary packaging
|
|
373
294
|
|
|
374
295
|
## Retrospective
|
|
375
296
|
|
|
@@ -377,8 +298,6 @@ After `P9 Submission Packaging` closes successfully:
|
|
|
377
298
|
|
|
378
299
|
- automatically enter `P10 Retrospective`
|
|
379
300
|
- load `retrospective-analysis`
|
|
380
|
-
- write `run_id`-scoped retrospective output under `~/slopmachine/retrospectives/`
|
|
381
|
-
- keep it owner-only and non-blocking by default
|
|
382
301
|
- reopen packaging only if the retrospective finds a real packaged-result defect
|
|
383
302
|
|
|
384
303
|
## Completion Standard
|
|
@@ -60,22 +60,21 @@ Optional startup inputs may include:
|
|
|
60
60
|
7. wait only for the initial clarification approval before development starts
|
|
61
61
|
8. initialize developer-session tracking for the run
|
|
62
62
|
9. start the develop developer session only after `P2` is ready to begin
|
|
63
|
-
10. send
|
|
63
|
+
10. send the original prompt and ask for an initial plan plus major risks or assumptions as the first owner message in that session
|
|
64
64
|
11. wait for the developer's first exchange
|
|
65
65
|
12. send the approved clarification prompt as the second owner message in that same session
|
|
66
66
|
13. only after that second message, continue with the normal planning conversation
|
|
67
67
|
|
|
68
68
|
## First developer-session handshake
|
|
69
69
|
|
|
70
|
-
The first developer session of the run
|
|
70
|
+
The first developer session of the run should begin in this order:
|
|
71
71
|
|
|
72
72
|
1. owner starts the develop developer session
|
|
73
|
-
2. owner sends
|
|
73
|
+
2. owner sends the original prompt and asks for an initial plan plus major risks or assumptions
|
|
74
74
|
3. developer responds
|
|
75
75
|
4. owner sends the approved clarification prompt
|
|
76
76
|
5. planning proceeds from there
|
|
77
77
|
|
|
78
|
-
Do not skip the initial planning opener.
|
|
79
78
|
Do not send the clarification prompt first.
|
|
80
79
|
Do not merge those two messages into one.
|
|
81
80
|
|
|
@@ -119,8 +118,6 @@ Each developer session record should include enough to recover and export it lat
|
|
|
119
118
|
- `created_phase`
|
|
120
119
|
- `session_id`
|
|
121
120
|
- `status`
|
|
122
|
-
- `handoff_in`
|
|
123
|
-
- `handoff_out`
|
|
124
121
|
|
|
125
122
|
Required project metadata fields in `../metadata.json` when relevant:
|
|
126
123
|
|
|
@@ -143,13 +140,7 @@ Required project metadata fields in `../metadata.json` when relevant:
|
|
|
143
140
|
- record every developer session in `developer_sessions`
|
|
144
141
|
- label every developer session using `develop-N`
|
|
145
142
|
- create a new developer session only when the user explicitly requests a new session
|
|
146
|
-
|
|
147
|
-
If the user explicitly requests a new session while one is active:
|
|
148
|
-
|
|
149
|
-
1. ask the current developer exactly: `give me a summary of all the work that has been done`
|
|
150
|
-
2. treat that reply as the handoff summary
|
|
151
|
-
3. start the new developer session with that summary as the handoff-in context
|
|
152
|
-
4. assign the next `develop-N` label in sequence
|
|
143
|
+
- when a replacement session is explicitly requested, treat it as a manual restart rather than a formal handoff workflow and assign the next `develop-N` label in sequence
|
|
153
144
|
|
|
154
145
|
## Initial structure rule
|
|
155
146
|
|
|
@@ -17,16 +17,20 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
17
17
|
|
|
18
18
|
- define lightweight planning notes for the module before coding
|
|
19
19
|
- define the module purpose, constraints, and edge cases before coding
|
|
20
|
+
- define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
|
|
20
21
|
- keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
|
|
21
22
|
- implement real behavior, not partial scattered logic
|
|
22
23
|
- handle failure paths and boundary conditions
|
|
23
24
|
- add or update tests as part of the module work
|
|
25
|
+
- prefer TDD when the behavior is well defined and the module is practical to drive test-first; otherwise define the expected tests before implementation and keep them tied to the module plan
|
|
26
|
+
- keep `./docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
|
|
24
27
|
- make sure the module is moving toward full definition-of-done completion, not just happy-path completion
|
|
25
28
|
- keep auth, authorization, ownership, validation, and logging concerns in view when relevant
|
|
26
29
|
- keep frontend and backend contracts synchronized when the module spans both sides
|
|
27
30
|
- verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
|
|
28
31
|
- check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
|
|
29
32
|
- verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
|
|
33
|
+
- verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
|
|
30
34
|
- verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
|
|
31
35
|
- verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
|
|
32
36
|
- perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
|
|
@@ -41,9 +45,15 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
41
45
|
- use the `frontend-design` skill for frontend component or page work
|
|
42
46
|
- use the `frontend-design` skill during web or desktop UI verification when reviewing screenshots and tightening the interface
|
|
43
47
|
- do not hardcode secrets or persist local sensitive values in the repo while implementing
|
|
44
|
-
- explain behavior changes clearly enough that the documentation discipline can be satisfied accurately
|
|
48
|
+
- explain behavior changes clearly enough that the repo-local documentation discipline can be satisfied accurately
|
|
49
|
+
- update repo-local docs such as `README.md` and `./docs/*` when runtime, build/preview, configuration, routes, tests, security boundaries, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
|
|
50
|
+
- do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
|
|
51
|
+
- keep `./docs/reviewer-guide.md` aligned when app entry points, route registration, build/preview commands, configuration surfaces, feature flags, debug/demo surfaces, mock defaults, logging structure, validation structure, or major module boundaries change
|
|
52
|
+
- keep `./docs/security-boundaries.md` aligned when auth, authorization, admin/debug, or isolation logic changes
|
|
53
|
+
- keep `./docs/frontend-flow-matrix.md` aligned when frontend pages, interactions, state transitions, or required UI states change
|
|
45
54
|
- verify the module against its planned behavior before trying to move on
|
|
46
55
|
- do not move on while the module is still obviously weak or half-finished
|
|
56
|
+
- do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
|
|
47
57
|
|
|
48
58
|
## Verification model
|
|
49
59
|
|
|
@@ -54,9 +64,15 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
54
64
|
- set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
|
|
55
65
|
- if the local toolchain is missing, try to install or enable it before falling back to the broad gate path
|
|
56
66
|
- for web UI projects, default local UI/E2E verification to Playwright when that stack is in use
|
|
67
|
+
- for frontend-bearing projects, use the component/page-or-route/E2E layers intentionally instead of relying on only one frontend test layer for every kind of proof
|
|
57
68
|
- for mobile projects, default local UI testing to the selected mobile test stack and use a platform-appropriate mobile UI/E2E tool when device-flow proof matters
|
|
58
69
|
- for desktop projects, default local UI verification to Playwright's Electron support or another platform-appropriate desktop UI/E2E tool when window-flow proof matters
|
|
59
70
|
- when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
|
|
71
|
+
- for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
|
|
72
|
+
- for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
|
|
73
|
+
- use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
|
|
74
|
+
- use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
|
|
75
|
+
- keep the test surface moving toward at least 90 percent meaningful coverage of the relevant behavior area as slices are completed
|
|
60
76
|
- in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
|
|
61
77
|
|
|
62
78
|
## Quality rules
|
|
@@ -36,6 +36,7 @@ These two files are the only evaluation prompt sources for evaluation runs.
|
|
|
36
36
|
- read the chosen evaluation prompt file contents yourself before launching evaluation
|
|
37
37
|
- compose one large final prompt block
|
|
38
38
|
- prefix the request with a clear instruction that the reviewer must work in the current project directory and evaluate the delivered project
|
|
39
|
+
- make sure the repo-local docs and code inside the current project directory are sufficient for evaluation; do not assume the evaluator will rely on parent-root docs or sibling workflow artifacts
|
|
39
40
|
- inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
|
|
40
41
|
- send that fully composed text block directly to one fresh `General` evaluator session
|
|
41
42
|
- require that session to produce a detailed file-backed report plus an issue summary
|
|
@@ -33,11 +33,23 @@ Hardening should treat these as the main review buckets before final evaluation
|
|
|
33
33
|
- audit security boundaries, validation, ownership, and secret handling
|
|
34
34
|
- prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
|
|
35
35
|
- audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
|
|
36
|
+
- audit whether `./docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, and gaps in a way a static evaluator can follow quickly
|
|
37
|
+
- audit whether the project is actually approaching or achieving at least 90 percent meaningful coverage of the relevant behavior surface rather than relying on a thin happy-path suite
|
|
36
38
|
- audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
|
|
37
39
|
- inspect architecture, coupling, file size, and maintainability risks
|
|
38
40
|
- focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
|
|
39
41
|
- check for bad engineering practices that accumulated during implementation
|
|
40
42
|
- tighten weak tests, weak docs, and weak operational instructions
|
|
43
|
+
- audit static review readiness: entry points, routes, config, README, and test commands should be traceably consistent without depending on runtime tribal knowledge
|
|
44
|
+
- audit that the repo is self-sufficient and does not rely on parent-root docs or sibling workflow artifacts for static reviewability
|
|
45
|
+
- audit repo-local evaluator docs: `./docs/reviewer-guide.md`, `./docs/test-coverage.md`, `./docs/security-boundaries.md`, `./docs/frontend-flow-matrix.md`, and `./docs/api-spec.md` when relevant
|
|
46
|
+
- audit static security-boundary readiness: a fresh reviewer should be able to trace auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation from repository artifacts when applicable
|
|
47
|
+
- if mock, stub, fake, interception, or local-data behavior exists, verify that its scope, default state, and boundaries are disclosed accurately and do not imply undisclosed real integration
|
|
48
|
+
- audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in repo-local docs when they exist
|
|
49
|
+
- audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
|
|
50
|
+
- audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
|
|
51
|
+
- audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
|
|
52
|
+
- verify that missing failure handling is not being hidden behind fake-success behavior
|
|
41
53
|
- run exploratory testing around awkward states, repeated actions, and realistic edge behavior
|
|
42
54
|
- re-check frontend and backend observability, redaction, and operator visibility paths
|
|
43
55
|
- run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
|
|
@@ -54,7 +66,13 @@ Before `P6` can close, the owner should have a clear answer for each of these:
|
|
|
54
66
|
- prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
|
|
55
67
|
- security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
|
|
56
68
|
- test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
|
|
69
|
+
- coverage depth: does the current evidence support roughly 90 percent meaningful coverage of the relevant behavior surface, and if not, what remains weak?
|
|
57
70
|
- major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
|
|
71
|
+
- static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
|
|
72
|
+
- security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
|
|
73
|
+
- coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
|
|
74
|
+
- frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
|
|
75
|
+
- repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?
|
|
58
76
|
|
|
59
77
|
## Rules
|
|
60
78
|
|
|
@@ -33,12 +33,17 @@ Once a failure class is known:
|
|
|
33
33
|
- for mobile and desktop work, run the selected stack's platform-appropriate UI/E2E coverage for major flows and review screenshots or equivalent artifacts for real UI behavior and regressions
|
|
34
34
|
- end-to-end coverage must use the real intended user-facing or admin-facing surfaces for the flow; if the flow cannot be exercised that way, treat the missing surface as incomplete work
|
|
35
35
|
- verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
|
|
36
|
+
- verify 401, 403, 404, conflict or duplicate-submission, object-authorization, tenant or user-isolation, and sensitive-log-exposure paths where those risks exist
|
|
36
37
|
- verify security-sensitive behavior where applicable
|
|
37
38
|
- verify multi-tenant and cross-user isolation where applicable, including negative checks rather than single-actor happy paths only
|
|
38
39
|
- verify file/path safety for file-bearing flows where applicable, including traversal-style negative cases
|
|
39
40
|
- verify secrets are not committed, hardcoded, or leaking through logs/config/docs
|
|
40
41
|
- verify error surfaces and auth-related failures are sanitized for users and operators appropriately
|
|
41
42
|
- trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
|
|
43
|
+
- tighten `./docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
|
|
44
|
+
- when security-bearing behavior changes, tighten `./docs/security-boundaries.md` so enforcement points and mapped tests stay accurate
|
|
45
|
+
- when frontend-bearing behavior changes, tighten `./docs/frontend-flow-matrix.md` so key pages, interactions, and required UI states stay accurate
|
|
46
|
+
- when routes, entry points, build/preview/config, feature flags, debug/demo surfaces, or mock defaults change, tighten `./docs/reviewer-guide.md` so static traceability stays current
|
|
42
47
|
- challenge integration seams and adjacent-module behavior, not just the changed module local path
|
|
43
48
|
|
|
44
49
|
## Rules
|
|
@@ -42,9 +42,12 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
|
|
|
42
42
|
|
|
43
43
|
## Cross-document discipline
|
|
44
44
|
|
|
45
|
-
- require owner-maintained planning docs under parent-root `../docs/` when relevant,
|
|
46
|
-
- require
|
|
45
|
+
- require owner-maintained planning docs under parent-root `../docs/` when relevant, but do not let the repo depend on them for normal use or evaluation readiness
|
|
46
|
+
- require repo-local evaluator-facing docs under `./docs/` when relevant, especially `./docs/reviewer-guide.md`, `./docs/test-coverage.md`, `./docs/security-boundaries.md`, `./docs/frontend-flow-matrix.md`, and `./docs/api-spec.md`
|
|
47
|
+
- require cross-document consistency so external references, repo-local docs, API/spec notes, and test-planning artifacts do not drift on lifecycle/state models, permissions, flow coverage, or operational behavior
|
|
47
48
|
- if planning docs disagree on core system behavior, planning is still in progress
|
|
49
|
+
- when `./docs/test-coverage.md` is relevant, require it to be structured as explicit requirement or risk mappings rather than generic narrative
|
|
50
|
+
- require the accepted plan to cover system overview, architecture reasoning, major modules or chunks, domain model, data model where relevant, interface contracts, failure paths, state transitions, logging strategy, testing strategy, README implications, and Docker execution assumptions when those dimensions apply
|
|
48
51
|
|
|
49
52
|
## Cross-cutting planning requirements
|
|
50
53
|
|
|
@@ -57,9 +60,17 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
|
|
|
57
60
|
- auth/session edge cases such as expiry, refresh, or clock skew tolerance
|
|
58
61
|
- when the prompt says behavior is configurable, require the real configuration surface, permissions, operator flow, and backend support to be planned explicitly
|
|
59
62
|
- when a feature must be admin-manageable or operator-manageable, require a real usable UI surface for that management flow, not just API endpoints or data-model notes
|
|
63
|
+
- for web projects, require Docker-first runtime planning unless the prompt or existing repository clearly dictates otherwise
|
|
60
64
|
- when the project has database dependencies, require a dedicated `./init_db.sh` plan as the only project-standard database initialization path
|
|
61
65
|
- when the project has database dependencies, require runtime and test entrypoints to rely on `./init_db.sh` for database preparation rather than scattered manual setup
|
|
62
66
|
- do not accept planning that leaves database connection values or database bootstrap values hardcoded in repo logic instead of driven through `./init_db.sh`
|
|
67
|
+
- when the project uses mock, stub, fake, interception, or local-data behavior, require the plan to state how that scope will be disclosed accurately in repo-local docs and visible adaptor/config boundaries
|
|
68
|
+
- do not accept planning that lets a mock-only or local-data-only project look like undisclosed real integration delivery
|
|
69
|
+
- do not accept planning that hides missing failure handling behind fake-success branches
|
|
70
|
+
- when the project has meaningful auth or access control, require a static security-boundary inventory in planning artifacts covering auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug surfaces, and tenant or user isolation rules when applicable
|
|
71
|
+
- require repo-local disclosure planning for feature flags, debug or demo surfaces, default enabled states, and mock or interception defaults whenever they exist
|
|
72
|
+
- require traceability planning for build, preview, configuration, app entry points, route registration, module boundaries, and test entry points through repo-local docs rather than parent-root references
|
|
73
|
+
- require logging and validation contracts to be planned concretely enough for repo-local static review
|
|
63
74
|
|
|
64
75
|
## Architecture-depth requirements
|
|
65
76
|
|
|
@@ -75,6 +86,13 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
|
|
|
75
86
|
- major user-facing flows are mapped to backend support and verification targets
|
|
76
87
|
- security-critical areas are planned early enough that they will not be left to accidental late cleanup
|
|
77
88
|
- test sufficiency has been considered at the level of core happy path, major failure paths, security-critical paths, and obvious high-risk boundaries
|
|
89
|
+
- the plan explicitly defines module-level responsibilities, flows, boundaries, and completion tests before implementation
|
|
90
|
+
- TDD is planned where the behavior is well defined and practical, and where TDD is not practical the expected tests are still defined before implementation
|
|
91
|
+
- test sufficiency is mapped explicitly enough that a fresh reviewer can trace requirement or risk point to test evidence and remaining gaps without guesswork
|
|
92
|
+
- backend or fullstack plans explicitly cover 401, 403, 404, conflict or duplicate submission when relevant, object-level authorization, tenant or user isolation, and sensitive-log exposure in the coverage plan
|
|
93
|
+
- frontend-bearing plans explicitly cover the required state model for major flows, including loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
|
|
94
|
+
- frontend-bearing plans explicitly include component, page or route integration, and E2E coverage where applicable; non-trivial frontend plans explicitly include component, page, route, or state-focused test coverage where UI state complexity is meaningful rather than relying only on E2E or runtime confidence
|
|
95
|
+
- the coverage plan is strong enough to reach at least 90 percent meaningful coverage of the relevant behavior surface
|
|
78
96
|
- major engineering quality has been addressed through maintainable boundaries, clear decomposition, and shared contracts
|
|
79
97
|
- frontend route, page, component, and state boundaries are planned when the UI is material
|
|
80
98
|
- configurable behaviors are concretely planned where the prompt requires configurability
|
|
@@ -82,6 +100,10 @@ If the owner notices a concrete role, contract, or scope mismatch, planning does
|
|
|
82
100
|
- prompt-critical operational obligations and operator visibility paths are concretely planned
|
|
83
101
|
- prompt-literal storage, partitioning, indexing, retention, or performance requirements are explicitly represented
|
|
84
102
|
- database-bearing projects explicitly plan `./init_db.sh`, its runtime/test integration points, and how it grows with real schema/bootstrap needs
|
|
103
|
+
- static review readiness is explicitly planned, including how a fresh reviewer can trace entry points, routes, config, test commands, and any mock or local-data boundaries from repository artifacts alone
|
|
104
|
+
- static security-boundary readiness is explicitly planned in docs or code structure where applicable
|
|
105
|
+
- repo-local docs are sufficient for a new reviewer without requiring parent-root docs for startup, build/preview, config, feature flags, security boundaries, or coverage mapping
|
|
106
|
+
- web projects default to Docker-first runtime planning unless a prompt-faithful exception is clearly justified
|
|
85
107
|
- relevant cross-cutting system contracts are explicitly defined rather than left to per-module invention
|
|
86
108
|
- each major module has a clear integration contract with existing modules and shared patterns
|
|
87
109
|
- verification plans include cross-module seam checks, not just isolated feature tests
|