theslopmachine 0.4.5 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,62 +1,45 @@
1
1
  # theslopmachine
2
2
 
3
- `theslopmachine` installs the SlopMachine owner/developer workflow into OpenCode, sets up the required support files on your machine, and bootstraps new project workspaces.
3
+ `theslopmachine` is an installer and bootstrap CLI for the SlopMachine OpenCode workflow. It installs the packaged owner/developer agents, required skills, workflow support files, and project bootstrap logic needed to start a new SlopMachine-managed repository.
4
4
 
5
- **Quickstart**
5
+ ## Features
6
6
 
7
- This is the full machine-to-project flow:
7
+ - installs packaged OpenCode agents into `~/.config/opencode/agents/`
8
+ - installs packaged skills into `~/.agents/skills/`
9
+ - installs packaged workflow support files into `~/slopmachine/`
10
+ - installs Claude worker runtime assets under `~/.claude/`
11
+ - bootstraps a new project workspace with `repo/`, `docs/`, `sessions/`, `metadata.json`, `AGENTS.md`, and initialized `br` state
12
+ - configures required OpenCode plugins and MCP entries without overwriting existing `context7` or `exa` configuration
8
13
 
9
- 1. install the package
10
- 2. run `slopmachine setup`
11
- 3. add MCP API keys if prompted
12
- 4. log into Codex with OpenCode
13
- 5. initialize a project workspace
14
- 6. enter `repo/`
15
- 7. start OpenCode and choose the `slopmachine` agent
14
+ ## Installation
16
15
 
17
- ## Requirements
16
+ Requirements:
18
17
 
19
18
  - Node.js 18+
20
19
  - `git`
21
20
  - Docker running on the machine
22
- - `curl` on Unix-like systems for automatic `br` install
21
+ - `curl` on Unix-like systems for automatic `br` installation
23
22
 
24
- `slopmachine setup` can install or verify:
25
-
26
- - `opencode`
27
- - `br` (`beads_rust`)
28
-
29
- ## 1. Install The Package
30
-
31
- From this package directory:
23
+ Build and install the package:
32
24
 
33
25
  ```bash
34
26
  npm install
35
27
  npm run check
36
28
  npm pack
29
+ npm install -g ./theslopmachine-0.4.4.tgz
37
30
  ```
38
31
 
39
- That produces a tarball such as:
40
-
41
- ```bash
42
- theslopmachine-0.4.5.tgz
43
- ```
44
-
45
- Install it globally:
46
-
47
- ```bash
48
- npm install -g ./theslopmachine-0.4.5.tgz
49
- ```
50
-
51
- For local package development instead:
32
+ For local package development instead of global install:
52
33
 
53
34
  ```bash
54
35
  npm link
55
36
  ```
56
37
 
57
- ## 2. Run Setup
38
+ The published package is intentionally source-only. It packages only `bin/`, `src/`, `assets/`, `README.md`, `RELEASE.md`, and `MANUAL.md`.
58
39
 
59
- Run this once per machine, or rerun it any time you want to refresh packaged assets:
40
+ ## Setup
41
+
42
+ Run machine setup:
60
43
 
61
44
  ```bash
62
45
  slopmachine setup
@@ -64,47 +47,37 @@ slopmachine setup
64
47
 
65
48
  `setup` does the following:
66
49
 
67
- - installs or verifies `git`, `python3`, `opencode`, `br`, and Docker availability
68
- - installs the packaged OpenCode agents into `~/.config/opencode/agents/`
69
- - installs the packaged skills into `~/.agents/skills/`
70
- - installs workflow support files into `~/slopmachine/`
50
+ - installs or verifies `opencode`
51
+ - installs or verifies `br` (`beads_rust`)
52
+ - installs or refreshes packaged agents
53
+ - installs or refreshes packaged skills
54
+ - installs or refreshes packaged workflow files into `~/slopmachine/`
55
+ - installs or refreshes Claude runtime assets under `~/.claude/`
71
56
  - updates `~/.config/opencode/opencode.json`
72
- - prompts for missing Context7 and Exa MCP API keys
73
-
74
- If `setup` installs `opencode` for the first time, open a fresh terminal before running `opencode` commands.
57
+ - prompts for missing MCP API keys when needed
75
58
 
76
- ## 3. Get MCP API Keys
59
+ If `opencode` was newly installed, open a fresh terminal before running OpenCode commands.
77
60
 
78
- During `slopmachine setup`, you may be prompted for:
61
+ MCP API keys:
79
62
 
80
63
  - Context7: `https://context7.com`
81
64
  - Exa: `https://exa.ai`
82
65
 
83
- You can leave either blank and add it later by editing:
84
-
85
- ```bash
86
- ~/.config/opencode/opencode.json
87
- ```
88
-
89
- If `context7` or `exa` is already configured in `opencode.json`, `setup` leaves the existing entries in place.
90
-
91
- ## 4. Log Into Codex With OpenCode
92
-
93
- Authenticate OpenCode against Codex:
66
+ Codex login with OpenCode:
94
67
 
95
68
  ```bash
96
69
  opencode auth login -p codex
97
70
  ```
98
71
 
99
- Optional check:
72
+ Optional verification:
100
73
 
101
74
  ```bash
102
75
  opencode auth list
103
76
  ```
104
77
 
105
- ## 5. Initialize A Project Workspace
78
+ ## Startup
106
79
 
107
- Create a new workspace directory and bootstrap it:
80
+ Create and initialize a new project workspace:
108
81
 
109
82
  ```bash
110
83
  mkdir my-project
@@ -112,54 +85,106 @@ cd my-project
112
85
  slopmachine init
113
86
  ```
114
87
 
115
- This creates:
116
-
117
- - `repo/` for the actual codebase work
118
- - parent-level workflow files such as `metadata.json` and `.ai/metadata.json`
119
- - parent-level `docs/` and `sessions/`
120
- - `repo/AGENTS.md`
121
- - initialized `br` state
122
- - an initial git commit
123
-
124
- If you want `init` to open OpenCode automatically in `repo/`, use:
88
+ Or initialize and open OpenCode immediately:
125
89
 
126
90
  ```bash
91
+ mkdir my-project
92
+ cd my-project
127
93
  slopmachine init -o
128
94
  ```
129
95
 
130
- ## 6. Enter `repo/`
131
-
132
- If you used plain `slopmachine init`, move into the working repository:
96
+ If you used plain `slopmachine init`, then continue with:
133
97
 
134
98
  ```bash
135
99
  cd repo
100
+ opencode
136
101
  ```
137
102
 
138
- ## 7. Start OpenCode
103
+ Inside OpenCode, select the `slopmachine` agent to start the workflow.
139
104
 
140
- Start OpenCode inside `repo/`:
105
+ Bootstrapped workspace layout:
106
+
107
+ - `repo/` for the working codebase
108
+ - `docs/` for workflow documentation and evidence
109
+ - `sessions/` for exported session artifacts
110
+ - `metadata.json` for project workflow metadata
111
+ - `repo/AGENTS.md` for the repo-local agent instructions
112
+
113
+ ## Testing
114
+
115
+ Package-level checks:
141
116
 
142
117
  ```bash
143
- opencode
118
+ npm run check
119
+ npm pack --dry-run
144
120
  ```
145
121
 
146
- Then select the `slopmachine` agent and begin the workflow.
122
+ Generated project conventions:
123
+
124
+ - every bootstrapped project must expose one primary runtime command
125
+ - every bootstrapped project must expose one primary broad test command: `./run_tests.sh`
126
+ - for Dockerized web backend or fullstack projects, the expected broad runtime command is `docker compose up --build`
127
+ - for non-Docker runtime cases, the expected broad runtime command is usually `./run_app.sh`
128
+
129
+ Verification policy:
130
+
131
+ - use local fast verification during normal development
132
+ - treat `./run_tests.sh` as a broad gate, not an ordinary every-step verification command
133
+ - for Dockerized web backend and fullstack projects, scaffold acceptance should establish both `docker compose up --build` and `./run_tests.sh`
147
134
 
148
- The normal operating split is:
135
+ ## Architecture
149
136
 
150
- - `slopmachine` is the owner/orchestrator
137
+ Operating model:
138
+
139
+ - `slopmachine` is the owner and orchestrator
151
140
  - `developer` is the implementation worker
141
+ - detailed workflow behavior is primarily carried by loaded skills rather than one monolithic owner prompt
142
+
143
+ High-level lifecycle:
144
+
145
+ 1. clarification
146
+ 2. planning
147
+ 3. scaffold
148
+ 4. development
149
+ 5. integrated verification
150
+ 6. hardening
151
+ 7. evaluation and triage
152
+ 8. final human decision
153
+ 9. remediation when needed
154
+ 10. submission packaging
155
+ 11. retrospective
156
+
157
+ Design constraints:
158
+
159
+ - keep the owner shell small and load phase-specific skills when needed
160
+ - prefer targeted reads and focused local verification during implementation
161
+ - keep environment-specific state out of the package
162
+ - do not package local runtime artifacts, caches, editor folders, or generated dependency environments
163
+
164
+ Database dependency rule:
165
+
166
+ - database dependencies must be provisioned by initialization scripts, migrations, container startup hooks, or equivalent runtime setup
167
+ - do not hardcode database-specific environment state into packaged assets
168
+ - do not ship database files such as `.db`, `.sqlite`, dumps, or seeded local database artifacts in the package
169
+
170
+ For this package specifically, the installer ships workflow logic and templates only. It does not ship database dependency files or packaged database state.
152
171
 
153
- ## Configured Items
172
+ ## Installed Configuration
154
173
 
155
- These are the main files and directories `setup` configures.
174
+ Main locations:
156
175
 
157
- ### OpenCode Agents
176
+ - agents: `~/.config/opencode/agents/`
177
+ - skills: `~/.agents/skills/`
178
+ - OpenCode config: `~/.config/opencode/opencode.json`
179
+ - packaged workflow files: `~/slopmachine/`
180
+ - Claude runtime assets: `~/.claude/`
181
+
182
+ Installed agents:
158
183
 
159
184
  - `~/.config/opencode/agents/slopmachine.md`
160
185
  - `~/.config/opencode/agents/developer.md`
161
186
 
162
- ### OpenCode Skills
187
+ Installed skills:
163
188
 
164
189
  - `~/.agents/skills/clarification-gate/`
165
190
  - `~/.agents/skills/developer-session-lifecycle/`
@@ -181,30 +206,20 @@ These are the main files and directories `setup` configures.
181
206
  - `~/.agents/skills/report-output-discipline/`
182
207
  - `~/.agents/skills/frontend-design/`
183
208
 
184
- ### SlopMachine Support Files
185
-
186
- Installed under `~/slopmachine/`:
209
+ Installed workflow files under `~/slopmachine/`:
187
210
 
188
211
  - `backend-evaluation-prompt.md`
189
212
  - `frontend-evaluation-prompt.md`
190
213
  - `document-completeness.md`
191
- - `quality-document.md`
192
214
  - `engineering-results.md`
193
215
  - `implementation-comparison.md`
194
- - `workflow-init.js`
216
+ - `quality-document.md`
195
217
  - `templates/AGENTS.md`
218
+ - `workflow-init.js`
196
219
  - `utils/strip_session_parent.py`
197
220
  - `utils/convert_ai_session.py`
198
221
 
199
- ### OpenCode Config
200
-
201
- Config file:
202
-
203
- ```bash
204
- ~/.config/opencode/opencode.json
205
- ```
206
-
207
- `setup` ensures these entries exist:
222
+ OpenCode config entries ensured by `setup`:
208
223
 
209
224
  - plugin: `oc-chatgpt-multi-auth`
210
225
  - MCP server: `chrome-devtools`
@@ -212,21 +227,4 @@ Config file:
212
227
  - MCP server: `exa`
213
228
  - MCP server: `shadcn` disabled by default
214
229
 
215
- If you want to customize agents, MCP settings, or plugins, these are the files to edit.
216
-
217
- ## Daily Use
218
-
219
- After the machine is set up, the common flow is:
220
-
221
- ```bash
222
- cd my-project/repo
223
- opencode
224
- ```
225
-
226
- Or for a brand new project in one shot:
227
-
228
- ```bash
229
- mkdir my-project
230
- cd my-project
231
- slopmachine init -o
232
- ```
230
+ These are the user-editable locations if you want to customize agents, skills, plugins, or MCP configuration after setup.
package/RELEASE.md CHANGED
@@ -42,13 +42,13 @@ npm pack
42
42
  This should produce a tarball such as:
43
43
 
44
44
  ```bash
45
- theslopmachine-0.4.5.tgz
45
+ theslopmachine-0.4.7.tgz
46
46
  ```
47
47
 
48
48
  ## Inspect package contents
49
49
 
50
50
  ```bash
51
- tar -tzf theslopmachine-0.4.5.tgz
51
+ tar -tzf theslopmachine-0.4.7.tgz
52
52
  ```
53
53
 
54
54
  Check that the tarball includes:
@@ -85,6 +85,8 @@ Selected-stack defaults:
85
85
  - do not ship placeholder, demo, setup, or debug UI in product-facing screens
86
86
  - do not create `.env` files or similar env-file variants
87
87
  - do not hardcode secrets or leave prototype residue behind
88
+ - when the project has database dependencies, keep database setup in `./init_db.sh` rather than scattered repo logic
89
+ - do not hardcode database connection values or database bootstrap values anywhere in the repo
88
90
 
89
91
  ## Skills
90
92
 
@@ -115,7 +115,7 @@ Do not create another competing workflow-state system.
115
115
  Use git to preserve meaningful workflow checkpoints.
116
116
 
117
117
  - after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
118
- - meaningful work includes accepted scaffold completion, accepted major development slices, accepted remediation passes, and other clearly reviewable milestones
118
+ - meaningful work includes accepted scaffold completion, accepted major development slices, accepted evaluation-fix rounds, and other clearly reviewable milestones
119
119
  - keep the git flow simple and checkpoint-oriented
120
120
  - commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
121
121
  - keep commit messages descriptive and easy to reason about later
@@ -158,21 +158,19 @@ Use these exact root phases:
158
158
  - `P4 Development`
159
159
  - `P5 Integrated Verification`
160
160
  - `P6 Hardening`
161
- - `P7 Evaluation and Triage`
161
+ - `P7 Evaluation and Fix Verification`
162
162
  - `P8 Final Human Decision`
163
- - `P9 Remediation`
164
- - `P10 Submission Packaging`
165
- - `P11 Retrospective`
163
+ - `P9 Submission Packaging`
164
+ - `P10 Retrospective`
166
165
 
167
166
  Phase rules:
168
167
 
169
168
  - exactly one root phase should normally be active at a time
170
169
  - enter the phase before real work for that phase begins
171
170
  - do not close multiple root phases in one transition block
172
- - `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
173
171
  - `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
174
- - `P11 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
175
- - post-submission external evaluation feedback may reopen `P9 Remediation`, then rerun `P10 Submission Packaging`, and then rerun `P11 Retrospective`
172
+ - `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
173
+ - post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Human Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
176
174
 
177
175
  ## Developer Session Model
178
176
 
@@ -181,21 +179,21 @@ Maintain exactly one active developer session at a time.
181
179
  Track every developer session in metadata, but create a new one only in these cases:
182
180
 
183
181
  1. you explicitly request a new session
184
- 2. after successful submission, you return with external evaluation issues that require more fixes
185
182
 
186
- Session classes:
183
+ All tracked developer sessions use the `develop-N` naming line.
187
184
 
188
- 1. `develop`: every developer session created before the first successful submission packaging
189
- 2. `bugfix`: every developer session created after successful submission packaging when the project is reopened for external-evaluation follow-up
185
+ There may be multiple `develop` sessions over the life of one project.
190
186
 
191
- There may be multiple `develop` sessions and multiple `bugfix` sessions over the life of one project.
187
+ During the first full run from planning through initial packaging, keep all work in the `develop-N` sequence, including integrated verification, hardening, evaluation issue fixing inside `P7`, and packaging follow-through.
192
188
 
193
- During the first full run from planning through initial submission packaging, keep all work in the `develop` session class, including integrated verification, hardening, evaluation-driven remediation, and packaging follow-through.
189
+ If the project is reopened after packaging because of later reported issues, continue with the existing developer session unless you explicitly request a new one.
190
+
191
+ Fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule.
194
192
 
195
193
  If you explicitly request a new session while one is active, ask the current developer exactly `give me a summary of all the work that has been done`, then use that handoff to seed the next session.
196
194
 
197
195
  Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
198
- Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request or post-submission external-feedback reopen.
196
+ Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request.
199
197
 
200
198
  Do not launch the developer during `P0` or `P1`.
201
199
 
@@ -290,9 +288,8 @@ Core map:
290
288
  - `P5` -> `integrated-verification`
291
289
  - `P6` -> `hardening-gate`
292
290
  - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
293
- - `P9` -> `remediation-guidance`
294
- - `P10` -> `submission-packaging`, `report-output-discipline`
295
- - `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
291
+ - `P9` -> `submission-packaging`, `report-output-discipline`
292
+ - `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
296
293
  - state mutations -> `beads-operations`
297
294
  - evidence-heavy review -> `owner-evidence-discipline`
298
295
  - intentional new developer session -> `session-rollover`
@@ -307,7 +304,7 @@ When talking to the developer:
307
304
  - lead with the engineering point, not process framing
308
305
  - keep prompts natural, sharp, and compact unless the moment really needs more context
309
306
  - translate workflow intent into normal software-project language
310
- - for each development slice or bugfix request, require the reply to state the exact verification commands that were run and the concrete results they produced
307
+ - for each development slice or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
311
308
 
312
309
  Do not leak workflow internals such as:
313
310
 
@@ -364,14 +361,11 @@ After each substantive developer reply, do one of four things:
364
361
 
365
362
  Treat packaging as a first-class delivery contract from the start, not as late cleanup.
366
363
 
367
- - the canonical package documents live under `~/slopmachine/`
368
- - the two evaluation prompt files are used exactly during evaluation runs
369
- - the four non-evaluation package documents are used during submission packaging to generate the required submission outputs
370
- - exact packaging file outputs and final paragraph outputs are mandatory in `P10`
371
- - accepted evaluation reports and cleaned original session exports are mandatory submission artifacts in `P10`
372
- - do not leave packaging structure, screenshots, self-test outputs, or exports to be improvised at the end
364
+ - the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
365
+ - `../self-test-run.md`, `../self-test-fixes.md`, `../sessions/`, `../metadata.json`, `../docs/`, and the delivered `repo/` are the mandatory late-stage artifacts
366
+ - do not invent `submission/`, packaging-only report files, screenshots, or other extra artifact structures during ordinary packaging
373
367
 
374
- When `P10 Submission Packaging` begins:
368
+ When `P9 Submission Packaging` begins:
375
369
 
376
370
  - load `submission-packaging` before any packaging action
377
371
  - follow its exact artifact, export, cleanup, and output contract
@@ -379,9 +373,9 @@ When `P10 Submission Packaging` begins:
379
373
 
380
374
  ## Retrospective
381
375
 
382
- After `P10 Submission Packaging` closes successfully:
376
+ After `P9 Submission Packaging` closes successfully:
383
377
 
384
- - automatically enter `P11 Retrospective`
378
+ - automatically enter `P10 Retrospective`
385
379
  - load `retrospective-analysis`
386
380
  - write `run_id`-scoped retrospective output under `~/slopmachine/retrospectives/`
387
381
  - keep it owner-only and non-blocking by default
@@ -101,24 +101,19 @@ Track at least:
101
101
  - `current_phase`
102
102
  - `awaiting_human`
103
103
  - `clarification_approved`
104
- - `remediation_round`
105
104
  - `clarification_validator_session_id`
106
- - `evaluation_pass`
107
- - `backend_evaluation_session_id`
108
- - `frontend_evaluation_session_id`
109
- - `last_evaluation_session_id`
110
- - `backend_evaluation_report_path`
111
- - `frontend_evaluation_report_path`
112
- - `passed_evaluation_tracks`
105
+ - `evaluation_prompt_kind`
106
+ - `evaluation_session_id`
107
+ - `self_test_run_path`
108
+ - `fix_verification_session_id`
109
+ - `self_test_fixes_path`
113
110
  - `developer_sessions`
114
111
  - `active_developer_session_id`
115
112
  - `next_develop_session_number`
116
- - `next_bugfix_session_number`
117
- - `submission_completed`
113
+ - `packaging_completed`
118
114
 
119
115
  Each developer session record should include enough to recover and export it later, such as:
120
116
 
121
- - `session_class`
122
117
  - `sequence`
123
118
  - `label`
124
119
  - `created_phase`
@@ -126,7 +121,6 @@ Each developer session record should include enough to recover and export it lat
126
121
  - `status`
127
122
  - `handoff_in`
128
123
  - `handoff_out`
129
- - `reopened_after_submission`
130
124
 
131
125
  Required project metadata fields in `../metadata.json` when relevant:
132
126
 
@@ -147,19 +141,15 @@ Required project metadata fields in `../metadata.json` when relevant:
147
141
 
148
142
  - keep exactly one active developer session at a time
149
143
  - record every developer session in `developer_sessions`
150
- - classify sessions as `develop` or `bugfix`
151
- - every session created before the first successful submission packaging is `develop`
152
- - every session created after successful submission packaging to address external evaluation follow-up is `bugfix`
153
- - create a new developer session only when:
154
- - the user explicitly requests a new session
155
- - post-submission external evaluation feedback reopens the project for more fixes
144
+ - label every developer session using `develop-N`
145
+ - create a new developer session only when the user explicitly requests a new session
156
146
 
157
147
  If the user explicitly requests a new session while one is active:
158
148
 
159
149
  1. ask the current developer exactly: `give me a summary of all the work that has been done`
160
150
  2. treat that reply as the handoff summary
161
151
  3. start the new developer session with that summary as the handoff-in context
162
- 4. keep the session class as `develop` before first successful submission, otherwise keep it as `bugfix`
152
+ 4. assign the next `develop-N` label in sequence
163
153
 
164
154
  ## Initial structure rule
165
155
 
@@ -29,7 +29,10 @@ Use this skill during `P4 Development` before prompting the developer.
29
29
  - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
30
30
  - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
31
31
  - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
32
- - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts; deterministic non-secret Dockerized dev/test default credentials are allowed only when clearly labeled local-only and required for startup or test stability
32
+ - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
33
+ - when the project has database dependencies, keep `./init_db.sh` aligned with the real schema, migrations, bootstrap data, and dependency setup as implementation evolves
34
+ - do not leave `./init_db.sh` as a scaffold placeholder once real database requirements are known
35
+ - do not hardcode database connection values or database bootstrap values anywhere in the repo; database setup must stay driven by `./init_db.sh`
33
36
  - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
34
37
  - when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
35
38
  - if a required user-facing or admin-facing surface is missing, treat that gap as incomplete implementation rather than a reason to bypass the surface with direct API calls or test-only shortcuts
@@ -1,77 +1,38 @@
1
1
  ---
2
2
  name: evaluation-triage
3
- description: Owner-side evaluation report triage rules for slopmachine.
3
+ description: Owner-side evaluation issue handoff and fix-verification rules for slopmachine.
4
4
  ---
5
5
 
6
- # Evaluation Triage
6
+ # Evaluation Issue Handoff
7
7
 
8
- Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
8
+ Use this skill during `P7 Evaluation and Fix Verification` after `../self-test-run.md` exists.
9
9
 
10
10
  ## Rules
11
11
 
12
- - evaluation findings are advisory inputs, not automatic orders
13
- - accept or reject findings explicitly
14
- - keep accepted findings concrete and bounded
15
- - do not enter remediation just because a report found something; enter it only when the accepted findings justify it
16
- - if no remediation is needed, move directly to the final human decision
12
+ - treat `../self-test-run.md` as the authoritative issue source for ordinary post-hardening completion flow
13
+ - keep the issue set concrete and exact
14
+ - use the existing active developer session; do not start a new developer session for these fixes
15
+ - do not split the issue set into backend/frontend tracks
16
+ - do not silently drop, merge away, or wave through issues from `../self-test-run.md`
17
+ - after the developer claims the fixes are complete, use one fresh `General` fix-verification session to verify the earlier issues and generate `../self-test-fixes.md`
18
+ - do not route ordinary post-hardening evaluation issues into a separate remediation phase; keep them inside `P7`
17
19
 
18
- ## Non-negotiable evaluation buckets
20
+ ## Issue handoff standard
19
21
 
20
- These areas are hard gates and should not be passed with known meaningful failures:
22
+ - send the developer the exact issues from `../self-test-run.md` in explicit detail
23
+ - require the developer to address all listed issues, not a negotiated subset
24
+ - require the developer to report the exact verification commands that were run and the concrete results they produced
25
+ - if the developer reports that some issue is invalid or already fixed, require that claim to be justified concretely against the report rather than silently omitting it
21
26
 
22
- 1. prompt compliance
23
- 2. requirement fulfillment / delivery completeness
24
- 3. security-critical flaws
27
+ ## Fix-verification standard
25
28
 
26
- If evaluation finds a real issue in one of those buckets, the default outcome is remediation, not leniency.
29
+ - the follow-up `General` session should receive the exact earlier issue list and a direct instruction to verify whether each item is now resolved
30
+ - the follow-up `General` session should only confirm whether those exact earlier items are fixed; it should not perform a broader new review
31
+ - the follow-up report should describe what is resolved, what remains open, and any important verification caveats
32
+ - save that report as `../self-test-fixes.md`
33
+ - do not rewrite the report text after generation except for the file move and filename normalization
27
34
 
28
- Do not wave through:
35
+ ## Exit standard
29
36
 
30
- - prompt drift or meaningful requirement mismatch
31
- - missing core flows or partial delivery of prompt-critical functionality
32
- - real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
33
-
34
- ## Leniency buckets
35
-
36
- These areas may pass with minor residual issues when the product is still clearly acceptable overall:
37
-
38
- 1. testing cases / test sufficiency
39
- 2. engineering architecture / engineering quality
40
- 3. aesthetics
41
-
42
- Leniency is allowed only when the issue is:
43
-
44
- - minor in impact
45
- - not hiding a likely blocker in another bucket
46
- - not undermining overall confidence in the delivered product
47
-
48
- High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
49
-
50
- If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
51
-
52
- ## Triage rules
53
-
54
- - read both reports and merge the findings into one explicit triage set before deciding what happens next
55
- - use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
56
- - any finding in the non-negotiable buckets should normally be returned for remediation if it is real
57
- - findings marked `Blocker` should normally be returned for remediation
58
- - findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
59
- - findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
60
- - findings marked `Low` may be passed without remediation
61
- - do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
62
- - if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
63
- - minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
64
- - minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
65
- - if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
66
- - if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
67
- - if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
68
- - challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
69
- - never edit or rewrite the evaluation report itself
70
- - if you need to add context, disagreement, or justification, append it only as a clearly labeled `User comment/message` section at the bottom of the report
71
- - do not loop forever chasing every newly surfaced medium or low issue once the project is otherwise qualified
72
-
73
- ## Output standard
74
-
75
- - keep a clear accepted-finding set
76
- - keep a clear rejected or passed set when disagreement matters
77
- - keep the remediation brief focused on accepted issues only
37
+ - do not move to `P8` until both `../self-test-run.md` and `../self-test-fixes.md` exist
38
+ - if `../self-test-fixes.md` still shows meaningful unresolved issues, stay in `P7` and keep the issue-correction loop focused on those concrete remaining items