theslopmachine 0.4.6 → 0.4.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +111 -113
- package/RELEASE.md +2 -2
- package/assets/agents/developer.md +2 -0
- package/assets/agents/slopmachine.md +22 -28
- package/assets/skills/developer-session-lifecycle/SKILL.md +9 -19
- package/assets/skills/development-guidance/SKILL.md +4 -1
- package/assets/skills/evaluation-triage/SKILL.md +24 -63
- package/assets/skills/final-evaluation-orchestration/SKILL.md +52 -50
- package/assets/skills/planning-gate/SKILL.md +4 -0
- package/assets/skills/planning-guidance/SKILL.md +4 -1
- package/assets/skills/retrospective-analysis/SKILL.md +5 -5
- package/assets/skills/scaffold-guidance/SKILL.md +6 -2
- package/assets/skills/session-rollover/SKILL.md +1 -9
- package/assets/skills/submission-packaging/SKILL.md +47 -225
- package/assets/skills/verification-gates/SKILL.md +7 -5
- package/assets/slopmachine/backend-evaluation-prompt.md +257 -206
- package/assets/slopmachine/frontend-evaluation-prompt.md +368 -282
- package/assets/slopmachine/templates/AGENTS.md +4 -2
- package/package.json +1 -1
- package/src/constants.js +1 -1
- package/assets/skills/remediation-guidance/SKILL.md +0 -31
package/README.md
CHANGED
|
@@ -1,62 +1,45 @@
|
|
|
1
1
|
# theslopmachine
|
|
2
2
|
|
|
3
|
-
`theslopmachine`
|
|
3
|
+
`theslopmachine` is an installer and bootstrap CLI for the SlopMachine OpenCode workflow. It installs the packaged owner/developer agents, required skills, workflow support files, and project bootstrap logic needed to start a new SlopMachine-managed repository.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
## Features
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
- installs packaged OpenCode agents into `~/.config/opencode/agents/`
|
|
8
|
+
- installs packaged skills into `~/.agents/skills/`
|
|
9
|
+
- installs packaged workflow support files into `~/slopmachine/`
|
|
10
|
+
- installs Claude worker runtime assets under `~/.claude/`
|
|
11
|
+
- bootstraps a new project workspace with `repo/`, `docs/`, `sessions/`, `metadata.json`, `AGENTS.md`, and initialized `br` state
|
|
12
|
+
- configures required OpenCode plugins and MCP entries without overwriting existing `context7` or `exa` configuration
|
|
8
13
|
|
|
9
|
-
|
|
10
|
-
2. run `slopmachine setup`
|
|
11
|
-
3. add MCP API keys if prompted
|
|
12
|
-
4. log into Codex with OpenCode
|
|
13
|
-
5. initialize a project workspace
|
|
14
|
-
6. enter `repo/`
|
|
15
|
-
7. start OpenCode and choose the `slopmachine` agent
|
|
14
|
+
## Installation
|
|
16
15
|
|
|
17
|
-
|
|
16
|
+
Requirements:
|
|
18
17
|
|
|
19
18
|
- Node.js 18+
|
|
20
19
|
- `git`
|
|
21
20
|
- Docker running on the machine
|
|
22
|
-
- `curl` on Unix-like systems for automatic `br`
|
|
21
|
+
- `curl` on Unix-like systems for automatic `br` installation
|
|
23
22
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
- `opencode`
|
|
27
|
-
- `br` (`beads_rust`)
|
|
28
|
-
|
|
29
|
-
## 1. Install The Package
|
|
30
|
-
|
|
31
|
-
From this package directory:
|
|
23
|
+
Build and install the package:
|
|
32
24
|
|
|
33
25
|
```bash
|
|
34
26
|
npm install
|
|
35
27
|
npm run check
|
|
36
28
|
npm pack
|
|
29
|
+
npm install -g ./theslopmachine-0.4.4.tgz
|
|
37
30
|
```
|
|
38
31
|
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
```bash
|
|
42
|
-
theslopmachine-0.4.5.tgz
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Install it globally:
|
|
46
|
-
|
|
47
|
-
```bash
|
|
48
|
-
npm install -g ./theslopmachine-0.4.5.tgz
|
|
49
|
-
```
|
|
50
|
-
|
|
51
|
-
For local package development instead:
|
|
32
|
+
For local package development instead of global install:
|
|
52
33
|
|
|
53
34
|
```bash
|
|
54
35
|
npm link
|
|
55
36
|
```
|
|
56
37
|
|
|
57
|
-
|
|
38
|
+
The published package is intentionally source-only. It packages only `bin/`, `src/`, `assets/`, `README.md`, `RELEASE.md`, and `MANUAL.md`.
|
|
58
39
|
|
|
59
|
-
|
|
40
|
+
## Setup
|
|
41
|
+
|
|
42
|
+
Run machine setup:
|
|
60
43
|
|
|
61
44
|
```bash
|
|
62
45
|
slopmachine setup
|
|
@@ -64,47 +47,37 @@ slopmachine setup
|
|
|
64
47
|
|
|
65
48
|
`setup` does the following:
|
|
66
49
|
|
|
67
|
-
- installs or verifies `
|
|
68
|
-
- installs
|
|
69
|
-
- installs
|
|
70
|
-
- installs
|
|
50
|
+
- installs or verifies `opencode`
|
|
51
|
+
- installs or verifies `br` (`beads_rust`)
|
|
52
|
+
- installs or refreshes packaged agents
|
|
53
|
+
- installs or refreshes packaged skills
|
|
54
|
+
- installs or refreshes packaged workflow files into `~/slopmachine/`
|
|
55
|
+
- installs or refreshes Claude runtime assets under `~/.claude/`
|
|
71
56
|
- updates `~/.config/opencode/opencode.json`
|
|
72
|
-
- prompts for missing
|
|
73
|
-
|
|
74
|
-
If `setup` installs `opencode` for the first time, open a fresh terminal before running `opencode` commands.
|
|
57
|
+
- prompts for missing MCP API keys when needed
|
|
75
58
|
|
|
76
|
-
|
|
59
|
+
If `opencode` was newly installed, open a fresh terminal before running OpenCode commands.
|
|
77
60
|
|
|
78
|
-
|
|
61
|
+
MCP API keys:
|
|
79
62
|
|
|
80
63
|
- Context7: `https://context7.com`
|
|
81
64
|
- Exa: `https://exa.ai`
|
|
82
65
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
```bash
|
|
86
|
-
~/.config/opencode/opencode.json
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
If `context7` or `exa` is already configured in `opencode.json`, `setup` leaves the existing entries in place.
|
|
90
|
-
|
|
91
|
-
## 4. Log Into Codex With OpenCode
|
|
92
|
-
|
|
93
|
-
Authenticate OpenCode against Codex:
|
|
66
|
+
Codex login with OpenCode:
|
|
94
67
|
|
|
95
68
|
```bash
|
|
96
69
|
opencode auth login -p codex
|
|
97
70
|
```
|
|
98
71
|
|
|
99
|
-
Optional
|
|
72
|
+
Optional verification:
|
|
100
73
|
|
|
101
74
|
```bash
|
|
102
75
|
opencode auth list
|
|
103
76
|
```
|
|
104
77
|
|
|
105
|
-
##
|
|
78
|
+
## Startup
|
|
106
79
|
|
|
107
|
-
Create a new workspace
|
|
80
|
+
Create and initialize a new project workspace:
|
|
108
81
|
|
|
109
82
|
```bash
|
|
110
83
|
mkdir my-project
|
|
@@ -112,54 +85,106 @@ cd my-project
|
|
|
112
85
|
slopmachine init
|
|
113
86
|
```
|
|
114
87
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
- `repo/` for the actual codebase work
|
|
118
|
-
- parent-level workflow files such as `metadata.json` and `.ai/metadata.json`
|
|
119
|
-
- parent-level `docs/` and `sessions/`
|
|
120
|
-
- `repo/AGENTS.md`
|
|
121
|
-
- initialized `br` state
|
|
122
|
-
- an initial git commit
|
|
123
|
-
|
|
124
|
-
If you want `init` to open OpenCode automatically in `repo/`, use:
|
|
88
|
+
Or initialize and open OpenCode immediately:
|
|
125
89
|
|
|
126
90
|
```bash
|
|
91
|
+
mkdir my-project
|
|
92
|
+
cd my-project
|
|
127
93
|
slopmachine init -o
|
|
128
94
|
```
|
|
129
95
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
If you used plain `slopmachine init`, move into the working repository:
|
|
96
|
+
If you used plain `slopmachine init`, then continue with:
|
|
133
97
|
|
|
134
98
|
```bash
|
|
135
99
|
cd repo
|
|
100
|
+
opencode
|
|
136
101
|
```
|
|
137
102
|
|
|
138
|
-
|
|
103
|
+
Inside OpenCode, select the `slopmachine` agent to start the workflow.
|
|
139
104
|
|
|
140
|
-
|
|
105
|
+
Bootstrapped workspace layout:
|
|
106
|
+
|
|
107
|
+
- `repo/` for the working codebase
|
|
108
|
+
- `docs/` for workflow documentation and evidence
|
|
109
|
+
- `sessions/` for exported session artifacts
|
|
110
|
+
- `metadata.json` for project workflow metadata
|
|
111
|
+
- `repo/AGENTS.md` for the repo-local agent instructions
|
|
112
|
+
|
|
113
|
+
## Testing
|
|
114
|
+
|
|
115
|
+
Package-level checks:
|
|
141
116
|
|
|
142
117
|
```bash
|
|
143
|
-
|
|
118
|
+
npm run check
|
|
119
|
+
npm pack --dry-run
|
|
144
120
|
```
|
|
145
121
|
|
|
146
|
-
|
|
122
|
+
Generated project conventions:
|
|
123
|
+
|
|
124
|
+
- every bootstrapped project must expose one primary runtime command
|
|
125
|
+
- every bootstrapped project must expose one primary broad test command: `./run_tests.sh`
|
|
126
|
+
- for Dockerized web backend or fullstack projects, the expected broad runtime command is `docker compose up --build`
|
|
127
|
+
- for non-Docker runtime cases, the expected broad runtime command is usually `./run_app.sh`
|
|
128
|
+
|
|
129
|
+
Verification policy:
|
|
130
|
+
|
|
131
|
+
- use local fast verification during normal development
|
|
132
|
+
- treat `./run_tests.sh` as a broad gate, not an ordinary every-step verification command
|
|
133
|
+
- for Dockerized web backend and fullstack projects, scaffold acceptance should establish both `docker compose up --build` and `./run_tests.sh`
|
|
147
134
|
|
|
148
|
-
|
|
135
|
+
## Architecture
|
|
149
136
|
|
|
150
|
-
|
|
137
|
+
Operating model:
|
|
138
|
+
|
|
139
|
+
- `slopmachine` is the owner and orchestrator
|
|
151
140
|
- `developer` is the implementation worker
|
|
141
|
+
- detailed workflow behavior is primarily carried by loaded skills rather than one monolithic owner prompt
|
|
142
|
+
|
|
143
|
+
High-level lifecycle:
|
|
144
|
+
|
|
145
|
+
1. clarification
|
|
146
|
+
2. planning
|
|
147
|
+
3. scaffold
|
|
148
|
+
4. development
|
|
149
|
+
5. integrated verification
|
|
150
|
+
6. hardening
|
|
151
|
+
7. evaluation and triage
|
|
152
|
+
8. final human decision
|
|
153
|
+
9. remediation when needed
|
|
154
|
+
10. submission packaging
|
|
155
|
+
11. retrospective
|
|
156
|
+
|
|
157
|
+
Design constraints:
|
|
158
|
+
|
|
159
|
+
- keep the owner shell small and load phase-specific skills when needed
|
|
160
|
+
- prefer targeted reads and focused local verification during implementation
|
|
161
|
+
- keep environment-specific state out of the package
|
|
162
|
+
- do not package local runtime artifacts, caches, editor folders, or generated dependency environments
|
|
163
|
+
|
|
164
|
+
Database dependency rule:
|
|
165
|
+
|
|
166
|
+
- database dependencies must be provisioned by initialization scripts, migrations, container startup hooks, or equivalent runtime setup
|
|
167
|
+
- do not hardcode database-specific environment state into packaged assets
|
|
168
|
+
- do not ship database files such as `.db`, `.sqlite`, dumps, or seeded local database artifacts in the package
|
|
169
|
+
|
|
170
|
+
For this package specifically, the installer ships workflow logic and templates only. It does not ship database dependency files or packaged database state.
|
|
152
171
|
|
|
153
|
-
##
|
|
172
|
+
## Installed Configuration
|
|
154
173
|
|
|
155
|
-
|
|
174
|
+
Main locations:
|
|
156
175
|
|
|
157
|
-
|
|
176
|
+
- agents: `~/.config/opencode/agents/`
|
|
177
|
+
- skills: `~/.agents/skills/`
|
|
178
|
+
- OpenCode config: `~/.config/opencode/opencode.json`
|
|
179
|
+
- packaged workflow files: `~/slopmachine/`
|
|
180
|
+
- Claude runtime assets: `~/.claude/`
|
|
181
|
+
|
|
182
|
+
Installed agents:
|
|
158
183
|
|
|
159
184
|
- `~/.config/opencode/agents/slopmachine.md`
|
|
160
185
|
- `~/.config/opencode/agents/developer.md`
|
|
161
186
|
|
|
162
|
-
|
|
187
|
+
Installed skills:
|
|
163
188
|
|
|
164
189
|
- `~/.agents/skills/clarification-gate/`
|
|
165
190
|
- `~/.agents/skills/developer-session-lifecycle/`
|
|
@@ -181,30 +206,20 @@ These are the main files and directories `setup` configures.
|
|
|
181
206
|
- `~/.agents/skills/report-output-discipline/`
|
|
182
207
|
- `~/.agents/skills/frontend-design/`
|
|
183
208
|
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
Installed under `~/slopmachine/`:
|
|
209
|
+
Installed workflow files under `~/slopmachine/`:
|
|
187
210
|
|
|
188
211
|
- `backend-evaluation-prompt.md`
|
|
189
212
|
- `frontend-evaluation-prompt.md`
|
|
190
213
|
- `document-completeness.md`
|
|
191
|
-
- `quality-document.md`
|
|
192
214
|
- `engineering-results.md`
|
|
193
215
|
- `implementation-comparison.md`
|
|
194
|
-
- `
|
|
216
|
+
- `quality-document.md`
|
|
195
217
|
- `templates/AGENTS.md`
|
|
218
|
+
- `workflow-init.js`
|
|
196
219
|
- `utils/strip_session_parent.py`
|
|
197
220
|
- `utils/convert_ai_session.py`
|
|
198
221
|
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
Config file:
|
|
202
|
-
|
|
203
|
-
```bash
|
|
204
|
-
~/.config/opencode/opencode.json
|
|
205
|
-
```
|
|
206
|
-
|
|
207
|
-
`setup` ensures these entries exist:
|
|
222
|
+
OpenCode config entries ensured by `setup`:
|
|
208
223
|
|
|
209
224
|
- plugin: `oc-chatgpt-multi-auth`
|
|
210
225
|
- MCP server: `chrome-devtools`
|
|
@@ -212,21 +227,4 @@ Config file:
|
|
|
212
227
|
- MCP server: `exa`
|
|
213
228
|
- MCP server: `shadcn` disabled by default
|
|
214
229
|
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
## Daily Use
|
|
218
|
-
|
|
219
|
-
After the machine is set up, the common flow is:
|
|
220
|
-
|
|
221
|
-
```bash
|
|
222
|
-
cd my-project/repo
|
|
223
|
-
opencode
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
Or for a brand new project in one shot:
|
|
227
|
-
|
|
228
|
-
```bash
|
|
229
|
-
mkdir my-project
|
|
230
|
-
cd my-project
|
|
231
|
-
slopmachine init -o
|
|
232
|
-
```
|
|
230
|
+
These are the user-editable locations if you want to customize agents, skills, plugins, or MCP configuration after setup.
|
package/RELEASE.md
CHANGED
|
@@ -42,13 +42,13 @@ npm pack
|
|
|
42
42
|
This should produce a tarball such as:
|
|
43
43
|
|
|
44
44
|
```bash
|
|
45
|
-
theslopmachine-0.4.
|
|
45
|
+
theslopmachine-0.4.7.tgz
|
|
46
46
|
```
|
|
47
47
|
|
|
48
48
|
## Inspect package contents
|
|
49
49
|
|
|
50
50
|
```bash
|
|
51
|
-
tar -tzf theslopmachine-0.4.
|
|
51
|
+
tar -tzf theslopmachine-0.4.7.tgz
|
|
52
52
|
```
|
|
53
53
|
|
|
54
54
|
Check that the tarball includes:
|
|
@@ -85,6 +85,8 @@ Selected-stack defaults:
|
|
|
85
85
|
- do not ship placeholder, demo, setup, or debug UI in product-facing screens
|
|
86
86
|
- do not create `.env` files or similar env-file variants
|
|
87
87
|
- do not hardcode secrets or leave prototype residue behind
|
|
88
|
+
- when the project has database dependencies, keep database setup in `./init_db.sh` rather than scattered repo logic
|
|
89
|
+
- do not hardcode database connection values or database bootstrap values anywhere in the repo
|
|
88
90
|
|
|
89
91
|
## Skills
|
|
90
92
|
|
|
@@ -115,7 +115,7 @@ Do not create another competing workflow-state system.
|
|
|
115
115
|
Use git to preserve meaningful workflow checkpoints.
|
|
116
116
|
|
|
117
117
|
- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
|
|
118
|
-
- meaningful work includes accepted scaffold completion, accepted major development slices, accepted
|
|
118
|
+
- meaningful work includes accepted scaffold completion, accepted major development slices, accepted evaluation-fix rounds, and other clearly reviewable milestones
|
|
119
119
|
- keep the git flow simple and checkpoint-oriented
|
|
120
120
|
- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
|
|
121
121
|
- keep commit messages descriptive and easy to reason about later
|
|
@@ -158,21 +158,19 @@ Use these exact root phases:
|
|
|
158
158
|
- `P4 Development`
|
|
159
159
|
- `P5 Integrated Verification`
|
|
160
160
|
- `P6 Hardening`
|
|
161
|
-
- `P7 Evaluation and
|
|
161
|
+
- `P7 Evaluation and Fix Verification`
|
|
162
162
|
- `P8 Final Human Decision`
|
|
163
|
-
- `P9
|
|
164
|
-
- `P10
|
|
165
|
-
- `P11 Retrospective`
|
|
163
|
+
- `P9 Submission Packaging`
|
|
164
|
+
- `P10 Retrospective`
|
|
166
165
|
|
|
167
166
|
Phase rules:
|
|
168
167
|
|
|
169
168
|
- exactly one root phase should normally be active at a time
|
|
170
169
|
- enter the phase before real work for that phase begins
|
|
171
170
|
- do not close multiple root phases in one transition block
|
|
172
|
-
- `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
|
|
173
171
|
- `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
|
|
174
|
-
- `
|
|
175
|
-
- post-
|
|
172
|
+
- `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
173
|
+
- post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Human Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
|
|
176
174
|
|
|
177
175
|
## Developer Session Model
|
|
178
176
|
|
|
@@ -181,21 +179,21 @@ Maintain exactly one active developer session at a time.
|
|
|
181
179
|
Track every developer session in metadata, but create a new one only in these cases:
|
|
182
180
|
|
|
183
181
|
1. you explicitly request a new session
|
|
184
|
-
2. after successful submission, you return with external evaluation issues that require more fixes
|
|
185
182
|
|
|
186
|
-
|
|
183
|
+
All tracked developer sessions use the `develop-N` naming line.
|
|
187
184
|
|
|
188
|
-
|
|
189
|
-
2. `bugfix`: every developer session created after successful submission packaging when the project is reopened for external-evaluation follow-up
|
|
185
|
+
There may be multiple `develop` sessions over the life of one project.
|
|
190
186
|
|
|
191
|
-
|
|
187
|
+
During the first full run from planning through initial packaging, keep all work in the `develop-N` sequence, including integrated verification, hardening, evaluation issue fixing inside `P7`, and packaging follow-through.
|
|
192
188
|
|
|
193
|
-
|
|
189
|
+
If the project is reopened after packaging because of later reported issues, continue with the existing developer session unless you explicitly request a new one.
|
|
190
|
+
|
|
191
|
+
Fresh `General` sessions used for evaluation and fix verification do not change the single-active-developer-session rule.
|
|
194
192
|
|
|
195
193
|
If you explicitly request a new session while one is active, ask the current developer exactly `give me a summary of all the work that has been done`, then use that handoff to seed the next session.
|
|
196
194
|
|
|
197
195
|
Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
|
|
198
|
-
Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request
|
|
196
|
+
Use `session-rollover` only when intentionally starting a new developer session because of an explicit user request.
|
|
199
197
|
|
|
200
198
|
Do not launch the developer during `P0` or `P1`.
|
|
201
199
|
|
|
@@ -290,9 +288,8 @@ Core map:
|
|
|
290
288
|
- `P5` -> `integrated-verification`
|
|
291
289
|
- `P6` -> `hardening-gate`
|
|
292
290
|
- `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
|
|
293
|
-
- `P9` -> `
|
|
294
|
-
- `P10` -> `
|
|
295
|
-
- `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
|
|
291
|
+
- `P9` -> `submission-packaging`, `report-output-discipline`
|
|
292
|
+
- `P10` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
|
|
296
293
|
- state mutations -> `beads-operations`
|
|
297
294
|
- evidence-heavy review -> `owner-evidence-discipline`
|
|
298
295
|
- intentional new developer session -> `session-rollover`
|
|
@@ -307,7 +304,7 @@ When talking to the developer:
|
|
|
307
304
|
- lead with the engineering point, not process framing
|
|
308
305
|
- keep prompts natural, sharp, and compact unless the moment really needs more context
|
|
309
306
|
- translate workflow intent into normal software-project language
|
|
310
|
-
- for each development slice or
|
|
307
|
+
- for each development slice or follow-up fix request, require the reply to state the exact verification commands that were run and the concrete results they produced
|
|
311
308
|
|
|
312
309
|
Do not leak workflow internals such as:
|
|
313
310
|
|
|
@@ -364,14 +361,11 @@ After each substantive developer reply, do one of four things:
|
|
|
364
361
|
|
|
365
362
|
Treat packaging as a first-class delivery contract from the start, not as late cleanup.
|
|
366
363
|
|
|
367
|
-
- the
|
|
368
|
-
-
|
|
369
|
-
-
|
|
370
|
-
- exact packaging file outputs and final paragraph outputs are mandatory in `P10`
|
|
371
|
-
- accepted evaluation reports and cleaned original session exports are mandatory submission artifacts in `P10`
|
|
372
|
-
- do not leave packaging structure, screenshots, self-test outputs, or exports to be improvised at the end
|
|
364
|
+
- the evaluation prompt files under `~/slopmachine/` are used only during evaluation runs
|
|
365
|
+
- `../self-test-run.md`, `../self-test-fixes.md`, `../sessions/`, `../metadata.json`, `../docs/`, and the delivered `repo/` are the mandatory late-stage artifacts
|
|
366
|
+
- do not invent `submission/`, packaging-only report files, screenshots, or other extra artifact structures during ordinary packaging
|
|
373
367
|
|
|
374
|
-
When `
|
|
368
|
+
When `P9 Submission Packaging` begins:
|
|
375
369
|
|
|
376
370
|
- load `submission-packaging` before any packaging action
|
|
377
371
|
- follow its exact artifact, export, cleanup, and output contract
|
|
@@ -379,9 +373,9 @@ When `P10 Submission Packaging` begins:
|
|
|
379
373
|
|
|
380
374
|
## Retrospective
|
|
381
375
|
|
|
382
|
-
After `
|
|
376
|
+
After `P9 Submission Packaging` closes successfully:
|
|
383
377
|
|
|
384
|
-
- automatically enter `
|
|
378
|
+
- automatically enter `P10 Retrospective`
|
|
385
379
|
- load `retrospective-analysis`
|
|
386
380
|
- write `run_id`-scoped retrospective output under `~/slopmachine/retrospectives/`
|
|
387
381
|
- keep it owner-only and non-blocking by default
|
|
@@ -101,24 +101,19 @@ Track at least:
|
|
|
101
101
|
- `current_phase`
|
|
102
102
|
- `awaiting_human`
|
|
103
103
|
- `clarification_approved`
|
|
104
|
-
- `remediation_round`
|
|
105
104
|
- `clarification_validator_session_id`
|
|
106
|
-
- `
|
|
107
|
-
- `
|
|
108
|
-
- `
|
|
109
|
-
- `
|
|
110
|
-
- `
|
|
111
|
-
- `frontend_evaluation_report_path`
|
|
112
|
-
- `passed_evaluation_tracks`
|
|
105
|
+
- `evaluation_prompt_kind`
|
|
106
|
+
- `evaluation_session_id`
|
|
107
|
+
- `self_test_run_path`
|
|
108
|
+
- `fix_verification_session_id`
|
|
109
|
+
- `self_test_fixes_path`
|
|
113
110
|
- `developer_sessions`
|
|
114
111
|
- `active_developer_session_id`
|
|
115
112
|
- `next_develop_session_number`
|
|
116
|
-
- `
|
|
117
|
-
- `submission_completed`
|
|
113
|
+
- `packaging_completed`
|
|
118
114
|
|
|
119
115
|
Each developer session record should include enough to recover and export it later, such as:
|
|
120
116
|
|
|
121
|
-
- `session_class`
|
|
122
117
|
- `sequence`
|
|
123
118
|
- `label`
|
|
124
119
|
- `created_phase`
|
|
@@ -126,7 +121,6 @@ Each developer session record should include enough to recover and export it lat
|
|
|
126
121
|
- `status`
|
|
127
122
|
- `handoff_in`
|
|
128
123
|
- `handoff_out`
|
|
129
|
-
- `reopened_after_submission`
|
|
130
124
|
|
|
131
125
|
Required project metadata fields in `../metadata.json` when relevant:
|
|
132
126
|
|
|
@@ -147,19 +141,15 @@ Required project metadata fields in `../metadata.json` when relevant:
|
|
|
147
141
|
|
|
148
142
|
- keep exactly one active developer session at a time
|
|
149
143
|
- record every developer session in `developer_sessions`
|
|
150
|
-
-
|
|
151
|
-
-
|
|
152
|
-
- every session created after successful submission packaging to address external evaluation follow-up is `bugfix`
|
|
153
|
-
- create a new developer session only when:
|
|
154
|
-
- the user explicitly requests a new session
|
|
155
|
-
- post-submission external evaluation feedback reopens the project for more fixes
|
|
144
|
+
- label every developer session using `develop-N`
|
|
145
|
+
- create a new developer session only when the user explicitly requests a new session
|
|
156
146
|
|
|
157
147
|
If the user explicitly requests a new session while one is active:
|
|
158
148
|
|
|
159
149
|
1. ask the current developer exactly: `give me a summary of all the work that has been done`
|
|
160
150
|
2. treat that reply as the handoff summary
|
|
161
151
|
3. start the new developer session with that summary as the handoff-in context
|
|
162
|
-
4.
|
|
152
|
+
4. assign the next `develop-N` label in sequence
|
|
163
153
|
|
|
164
154
|
## Initial structure rule
|
|
165
155
|
|
|
@@ -29,7 +29,10 @@ Use this skill during `P4 Development` before prompting the developer.
|
|
|
29
29
|
- verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
|
|
30
30
|
- verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
|
|
31
31
|
- verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
|
|
32
|
-
- perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
|
|
32
|
+
- perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts
|
|
33
|
+
- when the project has database dependencies, keep `./init_db.sh` aligned with the real schema, migrations, bootstrap data, and dependency setup as implementation evolves
|
|
34
|
+
- do not leave `./init_db.sh` as a scaffold placeholder once real database requirements are known
|
|
35
|
+
- do not hardcode database connection values or database bootstrap values anywhere in the repo; database setup must stay driven by `./init_db.sh`
|
|
33
36
|
- do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
|
|
34
37
|
- when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
|
|
35
38
|
- if a required user-facing or admin-facing surface is missing, treat that gap as incomplete implementation rather than a reason to bypass the surface with direct API calls or test-only shortcuts
|
|
@@ -1,77 +1,38 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: evaluation-triage
|
|
3
|
-
description: Owner-side evaluation
|
|
3
|
+
description: Owner-side evaluation issue handoff and fix-verification rules for slopmachine.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Evaluation
|
|
6
|
+
# Evaluation Issue Handoff
|
|
7
7
|
|
|
8
|
-
Use this skill during `P7 Evaluation and
|
|
8
|
+
Use this skill during `P7 Evaluation and Fix Verification` after `../self-test-run.md` exists.
|
|
9
9
|
|
|
10
10
|
## Rules
|
|
11
11
|
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
- do not
|
|
16
|
-
-
|
|
12
|
+
- treat `../self-test-run.md` as the authoritative issue source for ordinary post-hardening completion flow
|
|
13
|
+
- keep the issue set concrete and exact
|
|
14
|
+
- use the existing active developer session; do not start a new developer session for these fixes
|
|
15
|
+
- do not split the issue set into backend/frontend tracks
|
|
16
|
+
- do not silently drop, merge away, or wave through issues from `../self-test-run.md`
|
|
17
|
+
- after the developer claims the fixes are complete, use one fresh `General` fix-verification session to verify the earlier issues and generate `../self-test-fixes.md`
|
|
18
|
+
- do not route ordinary post-hardening evaluation issues into a separate remediation phase; keep them inside `P7`
|
|
17
19
|
|
|
18
|
-
##
|
|
20
|
+
## Issue handoff standard
|
|
19
21
|
|
|
20
|
-
|
|
22
|
+
- send the developer the exact issues from `../self-test-run.md` in explicit detail
|
|
23
|
+
- require the developer to address all listed issues, not a negotiated subset
|
|
24
|
+
- require the developer to report the exact verification commands that were run and the concrete results they produced
|
|
25
|
+
- if the developer reports that some issue is invalid or already fixed, require that claim to be justified concretely against the report rather than silently omitting it
|
|
21
26
|
|
|
22
|
-
|
|
23
|
-
2. requirement fulfillment / delivery completeness
|
|
24
|
-
3. security-critical flaws
|
|
27
|
+
## Fix-verification standard
|
|
25
28
|
|
|
26
|
-
|
|
29
|
+
- the follow-up `General` session should receive the exact earlier issue list and a direct instruction to verify whether each item is now resolved
|
|
30
|
+
- the follow-up `General` session should only confirm whether those exact earlier items are fixed; it should not perform a broader new review
|
|
31
|
+
- the follow-up report should describe what is resolved, what remains open, and any important verification caveats
|
|
32
|
+
- save that report as `../self-test-fixes.md`
|
|
33
|
+
- do not rewrite the report text after generation except for the file move and filename normalization
|
|
27
34
|
|
|
28
|
-
|
|
35
|
+
## Exit standard
|
|
29
36
|
|
|
30
|
-
-
|
|
31
|
-
-
|
|
32
|
-
- real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
|
|
33
|
-
|
|
34
|
-
## Leniency buckets
|
|
35
|
-
|
|
36
|
-
These areas may pass with minor residual issues when the product is still clearly acceptable overall:
|
|
37
|
-
|
|
38
|
-
1. testing cases / test sufficiency
|
|
39
|
-
2. engineering architecture / engineering quality
|
|
40
|
-
3. aesthetics
|
|
41
|
-
|
|
42
|
-
Leniency is allowed only when the issue is:
|
|
43
|
-
|
|
44
|
-
- minor in impact
|
|
45
|
-
- not hiding a likely blocker in another bucket
|
|
46
|
-
- not undermining overall confidence in the delivered product
|
|
47
|
-
|
|
48
|
-
High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
|
|
49
|
-
|
|
50
|
-
If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
|
|
51
|
-
|
|
52
|
-
## Triage rules
|
|
53
|
-
|
|
54
|
-
- read both reports and merge the findings into one explicit triage set before deciding what happens next
|
|
55
|
-
- use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
|
|
56
|
-
- any finding in the non-negotiable buckets should normally be returned for remediation if it is real
|
|
57
|
-
- findings marked `Blocker` should normally be returned for remediation
|
|
58
|
-
- findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
|
|
59
|
-
- findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
|
|
60
|
-
- findings marked `Low` may be passed without remediation
|
|
61
|
-
- do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
|
|
62
|
-
- if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
|
|
63
|
-
- minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
|
|
64
|
-
- minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
|
|
65
|
-
- if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
|
|
66
|
-
- if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
|
|
67
|
-
- if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
|
|
68
|
-
- challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
|
|
69
|
-
- never edit or rewrite the evaluation report itself
|
|
70
|
-
- if you need to add context, disagreement, or justification, append it only as a clearly labeled `User comment/message` section at the bottom of the report
|
|
71
|
-
- do not loop forever chasing every newly surfaced medium or low issue once the project is otherwise qualified
|
|
72
|
-
|
|
73
|
-
## Output standard
|
|
74
|
-
|
|
75
|
-
- keep a clear accepted-finding set
|
|
76
|
-
- keep a clear rejected or passed set when disagreement matters
|
|
77
|
-
- keep the remediation brief focused on accepted issues only
|
|
37
|
+
- do not move to `P8` until both `../self-test-run.md` and `../self-test-fixes.md` exist
|
|
38
|
+
- if `../self-test-fixes.md` still shows meaningful unresolved issues, stay in `P7` and keep the issue-correction loop focused on those concrete remaining items
|