theslopmachine 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/MANUAL.md CHANGED
@@ -23,7 +23,7 @@ The installed agent set includes the current `slopmachine` and `developer` agent
23
23
 
24
24
  ## Start a project
25
25
 
26
- Inside the project root, run:
26
+ Inside a new or empty project directory, run:
27
27
 
28
28
  ```bash
29
29
  slopmachine init
@@ -43,25 +43,26 @@ slopmachine init -o
43
43
  - bootstraps beads_rust (`br`)
44
44
  - creates `repo/`
45
45
  - copies the packaged repo rulebook into `repo/AGENTS.md`
46
- - creates the initial git checkpoint
46
+ - creates the initial git commit so the workspace starts with a clean tree
47
47
  - optionally opens `opencode` in `repo/`
48
48
 
49
49
  ## Rough workflow
50
50
 
51
- 1. Clarification
52
- 2. Planning
53
- 3. Scaffold/foundation
54
- 4. Development
55
- 5. Integrated verification
56
- 6. Hardening
57
- 7. Evaluation and triage
58
- 8. Final human decision
59
- 9. Remediation when needed
51
+ 1. Intake and setup
52
+ 2. Clarification
53
+ 3. Planning
54
+ 4. Scaffold/foundation
55
+ 5. Development
56
+ 6. Integrated verification
57
+ 7. Hardening
58
+ 8. Evaluation and fix verification
59
+ 9. Final human decision
60
60
  10. Submission packaging
61
+ 11. Retrospective
61
62
 
62
63
  ## Important notes
63
64
 
64
65
  - theslopmachine depends on OpenCode, beads_rust (`br`), git, python3, and Docker being available.
65
66
  - The workflow-owner agents use mandatory skills for specific phases; skipping them is considered a workflow failure.
66
67
  - `slopmachine` is the lighter current engine: it keeps the owner prompt smaller, uses more specialized skills, and keeps one active developer session at a time while preserving rollover history when new sessions are intentionally started.
67
- - Submission packaging collects the final docs, accepted evaluation reports, screenshots, cleaned session exports, converted session traces, and cleaned repo into the required final structure.
68
+ - Submission packaging collects the final docs, accepted evaluation reports, cleaned session exports, converted session traces, and the cleaned repo into the required final structure.
package/README.md CHANGED
@@ -7,7 +7,6 @@
7
7
  - installs packaged OpenCode agents into `~/.config/opencode/agents/`
8
8
  - installs packaged skills into `~/.agents/skills/`
9
9
  - installs packaged workflow support files into `~/slopmachine/`
10
- - installs Claude worker runtime assets under `~/.claude/`
11
10
  - bootstraps a new project workspace with `repo/`, `docs/`, `sessions/`, `metadata.json`, `AGENTS.md`, and initialized `br` state
12
11
  - configures required OpenCode plugins and MCP entries without overwriting existing `context7` or `exa` configuration
13
12
 
@@ -26,9 +25,11 @@ Build and install the package:
26
25
  npm install
27
26
  npm run check
28
27
  npm pack
29
- npm install -g ./theslopmachine-0.4.4.tgz
28
+ npm install -g ./theslopmachine-<version>.tgz
30
29
  ```
31
30
 
31
+ `package.json` is the package-version source of truth. The packed tarball name and CLI version banner both derive from that version.
32
+
32
33
  For local package development instead of global install:
33
34
 
34
35
  ```bash
@@ -47,15 +48,22 @@ slopmachine setup
47
48
 
48
49
  `setup` does the following:
49
50
 
50
- - installs or verifies `opencode`
51
+ - installs `opencode-ai@latest` when missing and refreshes it to `@latest` when already present
51
52
  - installs or verifies `br` (`beads_rust`)
52
53
  - installs or refreshes packaged agents
53
54
  - installs or refreshes packaged skills
54
55
  - installs or refreshes packaged workflow files into `~/slopmachine/`
55
- - installs or refreshes Claude runtime assets under `~/.claude/`
56
56
  - updates `~/.config/opencode/opencode.json`
57
57
  - prompts for missing MCP API keys when needed
58
58
 
59
+ To refresh an existing machine install to the latest published package in one step:
60
+
61
+ ```bash
62
+ slopmachine upgrade
63
+ ```
64
+
65
+ `slopmachine upgrade` installs `theslopmachine@latest` globally and then runs `slopmachine setup` automatically.
66
+
59
67
  If `opencode` was newly installed, open a fresh terminal before running OpenCode commands.
60
68
 
61
69
  MCP API keys:
@@ -77,7 +85,7 @@ opencode auth list
77
85
 
78
86
  ## Startup
79
87
 
80
- Create and initialize a new project workspace:
88
+ Create and initialize a new project workspace in a new or empty directory:
81
89
 
82
90
  ```bash
83
91
  mkdir my-project
@@ -110,6 +118,8 @@ Bootstrapped workspace layout:
110
118
  - `metadata.json` for project workflow metadata
111
119
  - `repo/AGENTS.md` for the repo-local agent instructions
112
120
 
121
+ `slopmachine init` creates the initial git commit so the workspace starts from a clean tree.
122
+
113
123
  ## Testing
114
124
 
115
125
  Package-level checks:
@@ -142,17 +152,17 @@ Operating model:
142
152
 
143
153
  High-level lifecycle:
144
154
 
145
- 1. clarification
146
- 2. planning
147
- 3. scaffold
148
- 4. development
149
- 5. integrated verification
150
- 6. hardening
151
- 7. evaluation and triage
152
- 8. final human decision
153
- 9. remediation when needed
154
- 10. submission packaging
155
- 11. retrospective
155
+ 1. `P0 Intake and Setup`
156
+ 2. `P1 Clarification`
157
+ 3. `P2 Planning`
158
+ 4. `P3 Scaffold`
159
+ 5. `P4 Development`
160
+ 6. `P5 Integrated Verification`
161
+ 7. `P6 Hardening`
162
+ 8. `P7 Evaluation and Fix Verification`
163
+ 9. `P8 Final Human Decision`
164
+ 10. `P9 Submission Packaging`
165
+ 11. `P10 Retrospective`
156
166
 
157
167
  Design constraints:
158
168
 
@@ -177,7 +187,6 @@ Main locations:
177
187
  - skills: `~/.agents/skills/`
178
188
  - OpenCode config: `~/.config/opencode/opencode.json`
179
189
  - packaged workflow files: `~/slopmachine/`
180
- - Claude runtime assets: `~/.claude/`
181
190
 
182
191
  Installed agents:
183
192
 
@@ -188,7 +197,6 @@ Installed skills:
188
197
 
189
198
  - `~/.agents/skills/clarification-gate/`
190
199
  - `~/.agents/skills/developer-session-lifecycle/`
191
- - `~/.agents/skills/session-rollover/`
192
200
  - `~/.agents/skills/final-evaluation-orchestration/`
193
201
  - `~/.agents/skills/beads-operations/`
194
202
  - `~/.agents/skills/planning-guidance/`
@@ -199,7 +207,6 @@ Installed skills:
199
207
  - `~/.agents/skills/integrated-verification/`
200
208
  - `~/.agents/skills/hardening-gate/`
201
209
  - `~/.agents/skills/evaluation-triage/`
202
- - `~/.agents/skills/remediation-guidance/`
203
210
  - `~/.agents/skills/submission-packaging/`
204
211
  - `~/.agents/skills/retrospective-analysis/`
205
212
  - `~/.agents/skills/owner-evidence-discipline/`
@@ -210,14 +217,11 @@ Installed workflow files under `~/slopmachine/`:
210
217
 
211
218
  - `backend-evaluation-prompt.md`
212
219
  - `frontend-evaluation-prompt.md`
213
- - `document-completeness.md`
214
- - `engineering-results.md`
215
- - `implementation-comparison.md`
216
- - `quality-document.md`
217
220
  - `templates/AGENTS.md`
218
221
  - `workflow-init.js`
219
222
  - `utils/strip_session_parent.py`
220
223
  - `utils/convert_ai_session.py`
224
+ - `utils/cleanup_delivery_artifacts.py`
221
225
 
222
226
  OpenCode config entries ensured by `setup`:
223
227
 
package/RELEASE.md CHANGED
@@ -14,6 +14,14 @@ node ./bin/slopmachine.js --help
14
14
  SLOPMACHINE_HOME="$(pwd)/.tmp-home" SLOPMACHINE_NONINTERACTIVE=1 SLOPMACHINE_PLUGIN_BOOTSTRAP=0 node ./bin/slopmachine.js setup
15
15
  ```
16
16
 
17
+ That setup path should install `opencode-ai@latest` when OpenCode is missing and refresh it to `@latest` when it already exists.
18
+
19
+ Users can later refresh to the newest published package with:
20
+
21
+ ```bash
22
+ slopmachine upgrade
23
+ ```
24
+
17
25
  3. Test init into an isolated temp project:
18
26
 
19
27
  ```bash
@@ -39,16 +47,18 @@ Note:
39
47
  npm pack
40
48
  ```
41
49
 
42
- This should produce a tarball such as:
50
+ This should produce a tarball named like:
43
51
 
44
52
  ```bash
45
- theslopmachine-0.4.7.tgz
53
+ theslopmachine-<version>.tgz
46
54
  ```
47
55
 
56
+ `<version>` comes from `package.json`, which is the single package-version source of truth.
57
+
48
58
  ## Inspect package contents
49
59
 
50
60
  ```bash
51
- tar -tzf theslopmachine-0.4.7.tgz
61
+ tar -tzf theslopmachine-<version>.tgz
52
62
  ```
53
63
 
54
64
  Check that the tarball includes:
@@ -87,6 +97,7 @@ npm publish --dry-run
87
97
 
88
98
  ## Versioning
89
99
 
100
+ - `package.json` is the single package-version source of truth for the tarball name and CLI version banner
90
101
  - bump `package.json` version before each release
91
102
  - keep the CLI command as `slopmachine`
92
103
  - keep the npm package name as `theslopmachine`
@@ -49,7 +49,9 @@ Do not narrow scope for convenience.
49
49
  - implement real behavior, not placeholders
50
50
  - keep user-facing and admin-facing flows complete through their real surfaces
51
51
  - verify the changed area locally and realistically before reporting completion
52
- - update repo-local docs such as `README.md` when behavior or run/test instructions change
52
+ - update repo-local docs such as `README.md` and `./docs/*` when behavior or run/test instructions change
53
+ - keep repo-local docs and code structure statically reviewable; do not rely on runtime success alone to make the project understandable
54
+ - keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
53
55
  - do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
54
56
 
55
57
  ## Verification Cadence
@@ -87,6 +89,13 @@ Selected-stack defaults:
87
89
  - do not hardcode secrets or leave prototype residue behind
88
90
  - when the project has database dependencies, keep database setup in `./init_db.sh` rather than scattered repo logic
89
91
  - do not hardcode database connection values or database bootstrap values anywhere in the repo
92
+ - if the project uses mock, stub, fake, or local-data behavior, disclose that scope accurately in the repo-local documentation instead of implying real backend or production behavior
93
+ - if mock or interception behavior is enabled by default, document that clearly
94
+ - disclose feature flags, debug/demo surfaces, and default enabled states clearly in repo-local docs when they exist
95
+ - keep frontend state requirements explicit in code and repo-local docs for prompt-critical flows
96
+ - use a shared logging path and avoid random print-style debugging as the durable implementation pattern
97
+ - use a shared validation/error-handling path when validation materially affects the flow
98
+ - do not hide missing failure handling behind fake-success paths
90
99
 
91
100
  ## Skills
92
101
 
@@ -176,36 +176,23 @@ Phase rules:
176
176
 
177
177
  Maintain exactly one active developer session at a time.
178
178
 
179
- Track every developer session in metadata, but create a new one only in these cases:
180
-
181
- 1. you explicitly request a new session
182
-
183
- All tracked developer sessions use the `develop-N` naming line.
184
-
185
- There may be multiple `develop` sessions over the life of one project.
186
-
187
- During the first full run from planning through initial packaging, keep all work in the `develop-N` sequence, including integrated verification, hardening, evaluation issue fixing inside `P7`, and packaging follow-through.
188
-
189
- If the project is reopened after packaging because of later reported issues, continue with the existing developer session unless you explicitly request a new one.
190
-
191
- Fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule.
192
-
193
- If you explicitly request a new session while one is active, ask the current developer exactly `give me a summary of all the work that has been done`, then use that handoff to seed the next session.
194
-
195
- Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
196
- Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request.
179
+ - track developer sessions in metadata using the `develop-N` line
180
+ - keep the same active developer session through planning, development, verification, hardening, evaluation fixes, and packaging follow-through unless you explicitly request a new one
181
+ - if the project is reopened later, recover and continue the active developer session unless you explicitly request a replacement
182
+ - fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule
183
+ - use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery
197
184
 
198
185
  Do not launch the developer during `P0` or `P1`.
199
186
 
200
- When the first develop developer session begins in `P2`, start it in this exact order:
187
+ When the first develop developer session begins in `P2`, use this planning handshake:
201
188
 
202
- 1. send `lets plan this <original-prompt>`
189
+ 1. send the original prompt and ask for an initial plan plus major risks or assumptions
203
190
  2. wait for the developer's first reply
204
- 3. send the approved clarification prompt
191
+ 3. send the approved clarification prompt as the second owner message in that same session
205
192
  4. continue with planning from there
206
193
 
207
- Do not reorder that sequence.
208
194
  Do not merge those messages.
195
+ Do not send the clarification prompt first.
209
196
 
210
197
  ## Verification Budget
211
198
 
@@ -218,50 +205,7 @@ Owner-side discipline:
218
205
  - do not rerun expensive local test or E2E commands just because the developer already ran them
219
206
  - when the developer reports the exact verification command and its result clearly, use that evidence unless there is a concrete reason to challenge it
220
207
  - rerun expensive verification only when the developer evidence is weak, contradictory, flaky, high-risk, needed for a true broad gate, or needed to answer a new question
221
-
222
- Target budget for the whole workflow:
223
-
224
- - at most 3 broad owner-run verification moments using the selected stack's full verification path
225
-
226
- Selected-stack rule:
227
-
228
- - follow the original prompt and existing repository first; only use package defaults when they do not already specify the platform or stack
229
- - for backend and fullstack web projects, the broad path is usually Docker/runtime plus the full test command
230
- - for pure frontend web projects, the broad path is the documented production build plus the full test command and browser E2E when applicable
231
- - for mobile projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI/device verification when applicable
232
- - for desktop projects, the broad path is the platform-standard app launch path plus the full test command and platform-appropriate UI verification when applicable
233
-
234
- Every project must end up with:
235
-
236
- - one primary documented runtime command
237
- - one primary documented full-test command: `./run_tests.sh`
238
-
239
- Runtime command rule:
240
-
241
- - for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
242
- - when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
243
-
244
- Default moments:
245
-
246
- 1. scaffold acceptance
247
- 2. development complete -> integrated verification entry
248
- 3. final qualified state before packaging
249
-
250
- For Dockerized web backend/fullstack projects, enforce this cadence:
251
-
252
- - after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
253
- - after that, do not run Docker again during ordinary development work
254
- - the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
255
-
256
- Between those moments, rely on:
257
-
258
- - local runtime checks
259
- - targeted unit tests
260
- - targeted integration tests
261
- - targeted module or route-family reruns
262
- - the selected stack's local UI or E2E tool when UI is material
263
-
264
- If you run a Docker-based verification command sequence, end it with `docker compose down` unless the task explicitly requires containers to remain up.
208
+ - use phase skills and `verification-gates` for stack-specific runtime and broad-gate cadence details
265
209
 
266
210
  ## Mandatory Skill Discipline
267
211
 
@@ -272,10 +216,6 @@ Named skills are mandatory, not optional.
272
216
  - if the required skill is not loaded, stop immediately and load it before continuing
273
217
  - do not prompt the developer first and load the skill later
274
218
 
275
- ## Mandatory Skill Usage
276
-
277
- Load the required skill before the corresponding phase or activity work begins.
278
-
279
219
  Core map:
280
220
 
281
221
  - `P0` -> `developer-session-lifecycle`
@@ -292,7 +232,6 @@ Core map:
292
232
  - `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
293
233
  - state mutations -> `beads-operations`
294
234
  - evidence-heavy review -> `owner-evidence-discipline`
295
- - intentional new developer session -> `session-rollover`
296
235
 
297
236
  Do not improvise a phase from memory when a phase skill exists.
298
237
 
@@ -308,21 +247,6 @@ When talking to the developer:
308
247
 
309
248
  Do not leak workflow internals such as:
310
249
 
311
- - Beads
312
- - phases
313
- - overlays
314
- - `.ai/` files
315
- - approval-state machinery
316
- - session-slot bookkeeping
317
- - packaging-stage orchestration details
318
-
319
- Do not sound like workflow software talking to a worker.
320
- Do not speak as a relay for a third party.
321
-
322
- ## Developer Isolation
323
-
324
- The developer must not be told about:
325
-
326
250
  - Beads workflow mechanics
327
251
  - `.ai/` orchestration files
328
252
  - approval-state machinery
@@ -330,6 +254,8 @@ The developer must not be told about:
330
254
  - packaging-stage orchestration details
331
255
 
332
256
  To the developer, this should feel like a normal engineering conversation with a strong technical lead.
257
+ Do not sound like workflow software talking to a worker.
258
+ Do not speak as a relay for a third party.
333
259
 
334
260
  ## Operating Discipline
335
261
 
@@ -338,7 +264,7 @@ To the developer, this should feel like a normal engineering conversation with a
338
264
  - keep work moving without low-information continuation chatter
339
265
  - read only what is needed to answer the current decision
340
266
  - keep comments and metadata auditable and specific
341
- - keep external docs owner-maintained and repo-local README developer-maintained
267
+ - keep external docs owner-maintained as reference copies and repo-local docs developer-maintained for the repo's self-sufficient source of truth
342
268
 
343
269
  ## Review Posture
344
270
 
@@ -357,19 +283,14 @@ After each substantive developer reply, do one of four things:
357
283
  3. request clarification or justification
358
284
  4. require verification before deciding
359
285
 
360
- ## Packaging Explicitness
286
+ ## Packaging
361
287
 
362
288
  Treat packaging as a first-class delivery contract from the start, not as late cleanup.
363
289
 
364
290
  - the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
365
- - `../self-test-run.md`, `../self-test-fixes.md`, `../sessions/`, `../metadata.json`, `../docs/`, and the delivered `repo/` are the mandatory late-stage artifacts
366
- - do not invent `submission/`, packaging-only report files, screenshots, or other extra artifact structures during ordinary packaging
367
-
368
- When `P9 Submission Packaging` begins:
369
-
370
291
  - load `submission-packaging` before any packaging action
371
292
  - follow its exact artifact, export, cleanup, and output contract
372
- - do not close packaging until every required final artifact path has been verified
293
+ - do not invent extra artifact structures during ordinary packaging
373
294
 
374
295
  ## Retrospective
375
296
 
@@ -377,8 +298,6 @@ After `P9 Submission Packaging` closes successfully:
377
298
 
378
299
  - automatically enter `P10 Retrospective`
379
300
  - load `retrospective-analysis`
380
- - write `run_id`-scoped retrospective output under `~/slopmachine/retrospectives/`
381
- - keep it owner-only and non-blocking by default
382
301
  - reopen packaging only if the retrospective finds a real packaged-result defect
383
302
 
384
303
  ## Completion Standard
@@ -60,22 +60,21 @@ Optional startup inputs may include:
60
60
  7. wait only for the initial clarification approval before development starts
61
61
  8. initialize developer-session tracking for the run
62
62
  9. start the develop developer session only after `P2` is ready to begin
63
- 10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
63
+ 10. send the original prompt and ask for an initial plan plus major risks or assumptions as the first owner message in that session
64
64
  11. wait for the developer's first exchange
65
65
  12. send the approved clarification prompt as the second owner message in that same session
66
66
  13. only after that second message, continue with the normal planning conversation
67
67
 
68
68
  ## First developer-session handshake
69
69
 
70
- The first developer session of the run must begin in this exact order:
70
+ The first developer session of the run should begin in this order:
71
71
 
72
72
  1. owner starts the develop developer session
73
- 2. owner sends: `lets plan this <original-prompt>`
73
+ 2. owner sends the original prompt and asks for an initial plan plus major risks or assumptions
74
74
  3. developer responds
75
75
  4. owner sends the approved clarification prompt
76
76
  5. planning proceeds from there
77
77
 
78
- Do not skip the initial planning opener.
79
78
  Do not send the clarification prompt first.
80
79
  Do not merge those two messages into one.
81
80
 
@@ -119,8 +118,6 @@ Each developer session record should include enough to recover and export it lat
119
118
  - `created_phase`
120
119
  - `session_id`
121
120
  - `status`
122
- - `handoff_in`
123
- - `handoff_out`
124
121
 
125
122
  Required project metadata fields in `../metadata.json` when relevant:
126
123
 
@@ -143,13 +140,7 @@ Required project metadata fields in `../metadata.json` when relevant:
143
140
  - record every developer session in `developer_sessions`
144
141
  - label every developer session using `develop-N`
145
142
  - create a new developer session only when the user explicitly requests a new session
146
-
147
- If the user explicitly requests a new session while one is active:
148
-
149
- 1. ask the current developer exactly: `give me a summary of all the work that has been done`
150
- 2. treat that reply as the handoff summary
151
- 3. start the new developer session with that summary as the handoff-in context
152
- 4. assign the next `develop-N` label in sequence
143
+ - when a replacement session is explicitly requested, treat it as a manual restart rather than a formal handoff workflow and assign the next `develop-N` label in sequence
153
144
 
154
145
  ## Initial structure rule
155
146
 
@@ -17,16 +17,20 @@ Use this skill during `P4 Development` before prompting the developer.
17
17
 
18
18
  - define lightweight planning notes for the module before coding
19
19
  - define the module purpose, constraints, and edge cases before coding
20
+ - define module responsibilities, required flows, inputs and outputs, important failure behavior, permissions or boundaries when relevant, and the tests expected at completion before deeper implementation begins
20
21
  - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
21
22
  - implement real behavior, not partial scattered logic
22
23
  - handle failure paths and boundary conditions
23
24
  - add or update tests as part of the module work
25
+ - prefer TDD when the behavior is well defined and the module is practical to drive test-first; otherwise define the expected tests before implementation and keep them tied to the module plan
26
+ - keep `./docs/test-coverage.md` maintainable by making new tests traceable to concrete requirement or risk points instead of vague “more coverage” additions
24
27
  - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
25
28
  - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
26
29
  - keep frontend and backend contracts synchronized when the module spans both sides
27
30
  - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
28
31
  - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
29
32
  - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
33
+ - verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
30
34
  - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
31
35
  - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
32
36
  - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
@@ -41,9 +45,15 @@ Use this skill during `P4 Development` before prompting the developer.
41
45
  - use the `frontend-design` skill for frontend component or page work
42
46
  - use the `frontend-design` skill during web or desktop UI verification when reviewing screenshots and tightening the interface
43
47
  - do not hardcode secrets or persist local sensitive values in the repo while implementing
44
- - explain behavior changes clearly enough that the documentation discipline can be satisfied accurately
48
+ - explain behavior changes clearly enough that the repo-local documentation discipline can be satisfied accurately
49
+ - update repo-local docs such as `README.md` and `./docs/*` when runtime, build/preview, configuration, routes, tests, security boundaries, feature flags, debug/demo surfaces, mock defaults, logging, validation, or state models change
50
+ - do not let implementation depend on parent-root docs or sibling artifacts for normal repo understanding
51
+ - keep `./docs/reviewer-guide.md` aligned when app entry points, route registration, build/preview commands, configuration surfaces, feature flags, debug/demo surfaces, mock defaults, logging structure, validation structure, or major module boundaries change
52
+ - keep `./docs/security-boundaries.md` aligned when auth, authorization, admin/debug, or isolation logic changes
53
+ - keep `./docs/frontend-flow-matrix.md` aligned when frontend pages, interactions, state transitions, or required UI states change
45
54
  - verify the module against its planned behavior before trying to move on
46
55
  - do not move on while the module is still obviously weak or half-finished
56
+ - do not spread broad partial logic across many modules; bias toward completed trustworthy slices before opening the next major chunk
47
57
 
48
58
  ## Verification model
49
59
 
@@ -54,9 +64,15 @@ Use this skill during `P4 Development` before prompting the developer.
54
64
  - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
55
65
  - if the local toolchain is missing, try to install or enable it before falling back to the broad gate path
56
66
  - for web UI projects, default local UI/E2E verification to Playwright when that stack is in use
67
+ - for frontend-bearing projects, use the component/page-or-route/E2E layers intentionally instead of relying on only one frontend test layer for every kind of proof
57
68
  - for mobile projects, default local UI testing to the selected mobile test stack and use a platform-appropriate mobile UI/E2E tool when device-flow proof matters
58
69
  - for desktop projects, default local UI verification to Playwright's Electron support or another platform-appropriate desktop UI/E2E tool when window-flow proof matters
59
70
  - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
71
+ - for non-trivial frontend stateful work, do not rely only on runtime or E2E checks; add component, page, route, or state-focused tests when that is the credible way to prove the behavior statically
72
+ - for frontend-bearing flows, explicitly verify loading, empty, submitting, disabled, success, error, and duplicate-action or re-entry protection states where those states are required by the prompt or core flow
73
+ - use the shared logging path rather than random `console.log` or print-style debugging as the durable implementation pattern
74
+ - use the shared validation and normalized error-handling path rather than per-component or per-route improvisation where a common contract exists
75
+ - keep the test surface moving toward at least 90 percent meaningful coverage of the relevant behavior area as slices are completed
60
76
  - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
61
77
 
62
78
  ## Quality rules
@@ -36,6 +36,7 @@ These two files are the only evaluation prompt sources for evaluation runs.
36
36
  - read the chosen evaluation prompt file contents yourself before launching evaluation
37
37
  - compose one large final prompt block
38
38
  - prefix the request with a clear instruction that the reviewer must work in the current project directory and evaluate the delivered project
39
+ - make sure the repo-local docs and code inside the current project directory are sufficient for evaluation; do not assume the evaluator will rely on parent-root docs or sibling workflow artifacts
39
40
  - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content, but otherwise do not rewrite or replace the template body
40
41
  - send that fully composed text block directly to one fresh `General` evaluator session
41
42
  - require that session to produce a detailed file-backed report plus an issue summary
@@ -33,11 +33,23 @@ Hardening should treat these as the main review buckets before final evaluation
33
33
  - audit security boundaries, validation, ownership, and secret handling
34
34
  - prioritize authentication, authorization, object ownership, tenant isolation, admin/debug exposure, and secret leakage risk over style issues
35
35
  - audit whether the current tests are sufficient to catch major issues in the core business flow, major failure paths, security-critical areas, and obvious high-risk boundaries
36
+ - audit whether `./docs/test-coverage.md` actually maps major requirement and risk points to concrete tests, assertions, and gaps in a way a static evaluator can follow quickly
37
+ - audit whether the project is actually approaching or achieving at least 90 percent meaningful coverage of the relevant behavior surface rather than relying on a thin happy-path suite
36
38
  - audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
37
39
  - inspect architecture, coupling, file size, and maintainability risks
38
40
  - focus engineering review on the major maintainability and architecture concerns that materially affect delivery confidence
39
41
  - check for bad engineering practices that accumulated during implementation
40
42
  - tighten weak tests, weak docs, and weak operational instructions
43
+ - audit static review readiness: entry points, routes, config, README, and test commands should be traceably consistent without depending on runtime tribal knowledge
44
+ - audit that the repo is self-sufficient and does not rely on parent-root docs or sibling workflow artifacts for static reviewability
45
+ - audit repo-local evaluator docs: `./docs/reviewer-guide.md`, `./docs/test-coverage.md`, `./docs/security-boundaries.md`, `./docs/frontend-flow-matrix.md`, and `./docs/api-spec.md` when relevant
46
+ - audit static security-boundary readiness: a fresh reviewer should be able to trace auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation from repository artifacts when applicable
47
+ - if mock, stub, fake, interception, or local-data behavior exists, verify that its scope, default state, and boundaries are disclosed accurately and do not imply undisclosed real integration
48
+ - audit whether feature flags, debug/demo surfaces, default-enabled config states, and mock/interception defaults are disclosed accurately in repo-local docs when they exist
49
+ - audit frontend flow readiness: major pages and interactions should have a traceable state model covering loading, empty, submitting, disabled, success, error, and duplicate-action protection where relevant
50
+ - audit whether frontend-bearing projects have the right mix of component, page/route, and E2E evidence for their complexity rather than only one thin layer
51
+ - audit whether logging categories, redaction expectations, and validation/error-normalization paths are concrete enough for static review
52
+ - verify that missing failure handling is not being hidden behind fake-success behavior
41
53
  - run exploratory testing around awkward states, repeated actions, and realistic edge behavior
42
54
  - re-check frontend and backend observability, redaction, and operator visibility paths
43
55
  - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
@@ -54,7 +66,13 @@ Before `P6` can close, the owner should have a clear answer for each of these:
54
66
  - prompt-fit: does the delivered project still match the business goal, core flows, and implicit constraints?
55
67
  - security-critical flaws: are there any unresolved auth, authorization, isolation, exposure, or secret-handling defects?
56
68
  - test sufficiency: are the current tests strong enough to rule out most major issues, and if not, what was added or strengthened?
69
+ - coverage depth: does the current evidence support roughly 90 percent meaningful coverage of the relevant behavior surface, and if not, what remains weak?
57
70
  - major engineering quality: is the project structurally credible and maintainable, rather than piled-up or demo-grade?
71
+ - static audit readiness: would a fresh static reviewer be able to trace the startup path, test path, core module boundaries, and any mock/local-data scope from repository artifacts alone?
72
+ - security-boundary readiness: would a fresh static reviewer be able to explain the real auth, authorization, admin/debug, and isolation boundaries with file-backed evidence?
73
+ - coverage-mapping readiness: would a fresh static reviewer be able to map the major requirement and risk points to concrete tests and remaining gaps without inventing the matrix themselves?
74
+ - frontend-state readiness: would a fresh static reviewer be able to trace the required frontend state model and key interaction transitions from repo artifacts alone?
75
+ - repo-self-sufficiency: can the repo be reviewed and used without depending on parent-root docs or sibling workflow artifacts?
58
76
 
59
77
  ## Rules
60
78
 
@@ -33,12 +33,17 @@ Once a failure class is known:
33
33
  - for mobile and desktop work, run the selected stack's platform-appropriate UI/E2E coverage for major flows and review screenshots or equivalent artifacts for real UI behavior and regressions
34
34
  - end-to-end coverage must use the real intended user-facing or admin-facing surfaces for the flow; if the flow cannot be exercised that way, treat the missing surface as incomplete work
35
35
  - verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
36
+ - verify 401, 403, 404, conflict or duplicate-submission, object-authorization, tenant or user-isolation, and sensitive-log-exposure paths where those risks exist
36
37
  - verify security-sensitive behavior where applicable
37
38
  - verify multi-tenant and cross-user isolation where applicable, including negative checks rather than single-actor happy paths only
38
39
  - verify file/path safety for file-bearing flows where applicable, including traversal-style negative cases
39
40
  - verify secrets are not committed, hardcoded, or leaking through logs/config/docs
40
41
  - verify error surfaces and auth-related failures are sanitized for users and operators appropriately
41
42
  - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
43
+ - tighten `./docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
44
+ - when security-bearing behavior changes, tighten `./docs/security-boundaries.md` so enforcement points and mapped tests stay accurate
45
+ - when frontend-bearing behavior changes, tighten `./docs/frontend-flow-matrix.md` so key pages, interactions, and required UI states stay accurate
46
+ - when routes, entry points, build/preview/config, feature flags, debug/demo surfaces, or mock defaults change, tighten `./docs/reviewer-guide.md` so static traceability stays current
42
47
  - challenge integration seams and adjacent-module behavior, not just the changed module local path
43
48
 
44
49
  ## Rules