theslopmachine 0.4.1 → 0.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +72 -8
- package/RELEASE.md +2 -2
- package/assets/agents/slopmachine.md +44 -8
- package/assets/skills/clarification-gate/SKILL.md +53 -4
- package/assets/skills/developer-session-lifecycle/SKILL.md +11 -6
- package/assets/skills/evaluation-triage/SKILL.md +40 -1
- package/assets/skills/final-evaluation-orchestration/SKILL.md +15 -1
- package/assets/skills/planning-guidance/SKILL.md +10 -4
- package/assets/skills/retrospective-analysis/SKILL.md +91 -0
- package/assets/skills/scaffold-guidance/SKILL.md +16 -3
- package/assets/skills/session-rollover/SKILL.md +1 -2
- package/assets/skills/submission-packaging/SKILL.md +39 -22
- package/assets/skills/verification-gates/SKILL.md +22 -4
- package/assets/slopmachine/document-completeness.md +46 -32
- package/assets/slopmachine/engineering-results.md +43 -39
- package/assets/slopmachine/implementation-comparison.md +40 -33
- package/assets/slopmachine/quality-document.md +45 -86
- package/assets/slopmachine/templates/AGENTS.md +16 -5
- package/package.json +23 -23
- package/src/constants.js +61 -57
- package/src/init.js +23 -9
- package/src/install.js +13 -2
package/README.md
CHANGED
|
@@ -37,7 +37,7 @@ The current engine is the lighter workflow line:
|
|
|
37
37
|
- smaller always-loaded owner shell
|
|
38
38
|
- smaller developer rulebook
|
|
39
39
|
- richer phase-specific skills loaded when needed
|
|
40
|
-
- bounded 2
|
|
40
|
+
- bounded 2-session developer-session model
|
|
41
41
|
- `beads_rust` bootstrap path
|
|
42
42
|
|
|
43
43
|
## Requirements
|
|
@@ -66,13 +66,13 @@ npm pack
|
|
|
66
66
|
This produces a tarball such as:
|
|
67
67
|
|
|
68
68
|
```bash
|
|
69
|
-
theslopmachine-0.4.
|
|
69
|
+
theslopmachine-0.4.3.tgz
|
|
70
70
|
```
|
|
71
71
|
|
|
72
72
|
You can then install it globally with:
|
|
73
73
|
|
|
74
74
|
```bash
|
|
75
|
-
npm install -g ./theslopmachine-0.4.
|
|
75
|
+
npm install -g ./theslopmachine-0.4.3.tgz
|
|
76
76
|
```
|
|
77
77
|
|
|
78
78
|
For local development instead of global install:
|
|
@@ -143,6 +143,7 @@ The expected high-level lifecycle is:
|
|
|
143
143
|
8. final human decision
|
|
144
144
|
9. remediation when needed
|
|
145
145
|
10. submission packaging
|
|
146
|
+
11. retrospective
|
|
146
147
|
|
|
147
148
|
## How It Is Intended To Operate
|
|
148
149
|
|
|
@@ -154,7 +155,7 @@ That means:
|
|
|
154
155
|
- planning, scaffold, development, verification, hardening, remediation, and packaging load detailed skills only when needed
|
|
155
156
|
- early and late phases do not carry each other's full instruction payloads all the time
|
|
156
157
|
|
|
157
|
-
The
|
|
158
|
+
The current workflow also expects:
|
|
158
159
|
|
|
159
160
|
- targeted reads over broad rereads
|
|
160
161
|
- local and narrow verification during ordinary iteration
|
|
@@ -164,16 +165,79 @@ The v2 workflow also expects:
|
|
|
164
165
|
|
|
165
166
|
Every bootstrapped project should expose:
|
|
166
167
|
|
|
167
|
-
- one primary documented
|
|
168
|
-
- one primary documented
|
|
168
|
+
- one primary documented runtime command
|
|
169
|
+
- one primary documented broad test command: `./run_tests.sh`
|
|
169
170
|
|
|
170
171
|
Follow the original prompt and the existing repository first. Use the examples below only when they do not already specify the platform or stack.
|
|
171
172
|
|
|
172
173
|
Examples:
|
|
173
174
|
|
|
174
175
|
- web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
|
|
175
|
-
-
|
|
176
|
-
|
|
176
|
+
- mobile or desktop when Docker runtime is not the direct run path: `./run_app.sh` and `./run_tests.sh`
|
|
177
|
+
|
|
178
|
+
## What It Does Well
|
|
179
|
+
|
|
180
|
+
- keeps the owner shell strict without carrying a giant monolith prompt
|
|
181
|
+
- loads detailed phase and activity skills only when they are actually needed
|
|
182
|
+
- uses a bounded 2-session model to reduce long-run context drag
|
|
183
|
+
- pushes prompt-fit, security, testing, and engineering-quality concerns earlier into planning and hardening
|
|
184
|
+
- standardizes runtime and broad-test expectations with `docker compose up --build` or `./run_app.sh` plus `./run_tests.sh`
|
|
185
|
+
- preserves strong packaging/report discipline with canonical files in `~/slopmachine/`
|
|
186
|
+
|
|
187
|
+
## Installed Assets
|
|
188
|
+
|
|
189
|
+
The package installs:
|
|
190
|
+
|
|
191
|
+
- owner and developer agents
|
|
192
|
+
- phase and activity skills
|
|
193
|
+
- canonical evaluation and report templates in `~/slopmachine/`
|
|
194
|
+
- workflow bootstrap helper
|
|
195
|
+
- repo rulebook template
|
|
196
|
+
- session export utilities
|
|
197
|
+
|
|
198
|
+
Canonical files in `~/slopmachine/`:
|
|
199
|
+
|
|
200
|
+
- `backend-evaluation-prompt.md`
|
|
201
|
+
- `frontend-evaluation-prompt.md`
|
|
202
|
+
- `document-completeness.md`
|
|
203
|
+
- `engineering-results.md`
|
|
204
|
+
- `implementation-comparison.md`
|
|
205
|
+
- `quality-document.md`
|
|
206
|
+
- `retrospectives/`
|
|
207
|
+
|
|
208
|
+
## Dependencies And Assumptions
|
|
209
|
+
|
|
210
|
+
- Node.js 18+ is required for the package CLI itself
|
|
211
|
+
- OpenCode must already be available on the machine
|
|
212
|
+
- git must be available
|
|
213
|
+
- `beads_rust` / `br` is installed or verified by `slopmachine setup`
|
|
214
|
+
|
|
215
|
+
Generated projects follow the original prompt and the existing repository first.
|
|
216
|
+
|
|
217
|
+
Default runtime/test wrapper expectations:
|
|
218
|
+
|
|
219
|
+
- Dockerized web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
|
|
220
|
+
- non-web or non-Docker runtime cases: `./run_app.sh` and `./run_tests.sh`
|
|
221
|
+
|
|
222
|
+
`./run_tests.sh` is always the broad test wrapper.
|
|
223
|
+
|
|
224
|
+
## Command Summary
|
|
225
|
+
|
|
226
|
+
Package CLI:
|
|
227
|
+
|
|
228
|
+
- `slopmachine setup`
|
|
229
|
+
- `slopmachine init`
|
|
230
|
+
- `slopmachine init -o`
|
|
231
|
+
|
|
232
|
+
Package validation:
|
|
233
|
+
|
|
234
|
+
- `npm run check`
|
|
235
|
+
- `npm pack`
|
|
236
|
+
|
|
237
|
+
Generated project conventions:
|
|
238
|
+
|
|
239
|
+
- `docker compose up --build` or `./run_app.sh`
|
|
240
|
+
- `./run_tests.sh`
|
|
177
241
|
|
|
178
242
|
## Files And Locations
|
|
179
243
|
|
package/RELEASE.md
CHANGED
|
@@ -41,13 +41,13 @@ npm pack
|
|
|
41
41
|
This should produce a tarball such as:
|
|
42
42
|
|
|
43
43
|
```bash
|
|
44
|
-
theslopmachine-0.4.
|
|
44
|
+
theslopmachine-0.4.3.tgz
|
|
45
45
|
```
|
|
46
46
|
|
|
47
47
|
## Inspect package contents
|
|
48
48
|
|
|
49
49
|
```bash
|
|
50
|
-
tar -tzf theslopmachine-0.4.
|
|
50
|
+
tar -tzf theslopmachine-0.4.3.tgz
|
|
51
51
|
```
|
|
52
52
|
|
|
53
53
|
Check that the tarball includes:
|
|
@@ -67,7 +67,7 @@ Agent-integrity rule:
|
|
|
67
67
|
|
|
68
68
|
## Optimization Goal
|
|
69
69
|
|
|
70
|
-
The main
|
|
70
|
+
The main target is:
|
|
71
71
|
|
|
72
72
|
- less token waste
|
|
73
73
|
- less elapsed time
|
|
@@ -109,6 +109,18 @@ State split:
|
|
|
109
109
|
|
|
110
110
|
Do not create another competing workflow-state system.
|
|
111
111
|
|
|
112
|
+
## Git Traceability
|
|
113
|
+
|
|
114
|
+
Use git to preserve meaningful workflow checkpoints.
|
|
115
|
+
|
|
116
|
+
- after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
|
|
117
|
+
- meaningful work includes accepted scaffold completion, accepted major development slices, accepted remediation passes, and other clearly reviewable milestones
|
|
118
|
+
- keep the git flow simple and checkpoint-oriented
|
|
119
|
+
- commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
|
|
120
|
+
- keep commit messages descriptive and easy to reason about later
|
|
121
|
+
- do not push unless explicitly requested
|
|
122
|
+
- do not commit secrets, local-only junk, or accidental noise
|
|
123
|
+
|
|
112
124
|
## Mandatory Operating Order
|
|
113
125
|
|
|
114
126
|
Operate in this order:
|
|
@@ -149,6 +161,7 @@ Use these exact root phases:
|
|
|
149
161
|
- `P8 Final Human Decision`
|
|
150
162
|
- `P9 Remediation`
|
|
151
163
|
- `P10 Submission Packaging`
|
|
164
|
+
- `P11 Retrospective`
|
|
152
165
|
|
|
153
166
|
Phase rules:
|
|
154
167
|
|
|
@@ -157,21 +170,21 @@ Phase rules:
|
|
|
157
170
|
- do not close multiple root phases in one transition block
|
|
158
171
|
- `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
|
|
159
172
|
- `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
|
|
173
|
+
- `P11 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
160
174
|
|
|
161
175
|
## Developer Session Model
|
|
162
176
|
|
|
163
|
-
Use up to
|
|
177
|
+
Use up to two bounded developer sessions:
|
|
164
178
|
|
|
165
|
-
1.
|
|
166
|
-
2.
|
|
167
|
-
3. remediation session: evaluation-response remediation, only if needed
|
|
179
|
+
1. develop session: planning, scaffold, development
|
|
180
|
+
2. bugfix session: integrated verification, hardening, and remediation, only if needed
|
|
168
181
|
|
|
169
182
|
Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
|
|
170
183
|
Use `session-rollover` only for planned transitions between those bounded developer sessions.
|
|
171
184
|
|
|
172
185
|
Do not launch the developer during `P0` or `P1`.
|
|
173
186
|
|
|
174
|
-
When the first
|
|
187
|
+
When the first develop developer session begins in `P2`, start it in this exact order:
|
|
175
188
|
|
|
176
189
|
1. send `lets plan this <original-prompt>`
|
|
177
190
|
2. wait for the developer's first reply
|
|
@@ -199,8 +212,13 @@ Selected-stack rule:
|
|
|
199
212
|
|
|
200
213
|
Every project must end up with:
|
|
201
214
|
|
|
202
|
-
- one primary documented
|
|
203
|
-
- one primary documented full-test command
|
|
215
|
+
- one primary documented runtime command
|
|
216
|
+
- one primary documented full-test command: `./run_tests.sh`
|
|
217
|
+
|
|
218
|
+
Runtime command rule:
|
|
219
|
+
|
|
220
|
+
- for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
|
|
221
|
+
- when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
|
|
204
222
|
|
|
205
223
|
Default moments:
|
|
206
224
|
|
|
@@ -208,6 +226,12 @@ Default moments:
|
|
|
208
226
|
2. development complete -> integrated verification entry
|
|
209
227
|
3. final qualified state before packaging
|
|
210
228
|
|
|
229
|
+
For Dockerized web backend/fullstack projects, enforce this cadence:
|
|
230
|
+
|
|
231
|
+
- after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
|
|
232
|
+
- after that, do not run Docker again during ordinary development work
|
|
233
|
+
- the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
|
|
234
|
+
|
|
211
235
|
Between those moments, rely on:
|
|
212
236
|
|
|
213
237
|
- local runtime checks
|
|
@@ -245,6 +269,7 @@ Core map:
|
|
|
245
269
|
- `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
|
|
246
270
|
- `P9` -> `remediation-guidance`
|
|
247
271
|
- `P10` -> `submission-packaging`, `report-output-discipline`
|
|
272
|
+
- `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
|
|
248
273
|
- state mutations -> `beads-operations`
|
|
249
274
|
- evidence-heavy review -> `owner-evidence-discipline`
|
|
250
275
|
- planned developer-session switch -> `session-rollover`
|
|
@@ -327,6 +352,16 @@ When `P10 Submission Packaging` begins:
|
|
|
327
352
|
- follow its exact artifact, export, cleanup, and output contract
|
|
328
353
|
- do not close packaging until every required final artifact path has been verified
|
|
329
354
|
|
|
355
|
+
## Retrospective
|
|
356
|
+
|
|
357
|
+
After `P10 Submission Packaging` closes successfully:
|
|
358
|
+
|
|
359
|
+
- automatically enter `P11 Retrospective`
|
|
360
|
+
- load `retrospective-analysis`
|
|
361
|
+
- write dated retrospective output under `~/slopmachine/retrospectives/`
|
|
362
|
+
- keep it owner-only and non-blocking by default
|
|
363
|
+
- reopen packaging only if the retrospective finds a real packaged-result defect
|
|
364
|
+
|
|
330
365
|
## Completion Standard
|
|
331
366
|
|
|
332
367
|
The workflow is not done until:
|
|
@@ -335,6 +370,7 @@ The workflow is not done until:
|
|
|
335
370
|
- the current root phase closed cleanly
|
|
336
371
|
- the workflow ledger closed cleanly
|
|
337
372
|
- the final package is assembled and verified in its final structure
|
|
373
|
+
- the retrospective phase has either documented improvements or reopened and resolved any real packaging defect it found
|
|
338
374
|
|
|
339
375
|
Success means:
|
|
340
376
|
|
|
@@ -45,6 +45,8 @@ Use this skill only during `P1 Clarification`.
|
|
|
45
45
|
- never use defaults that drift from the original prompt
|
|
46
46
|
- do not use quick, loose, or simplifying assumptions that shrink what the prompt asked for
|
|
47
47
|
- do not guess through material ambiguity
|
|
48
|
+
- do not expand the clarification artifact just to exhaust every minor edge case when the scope is already clear enough to plan correctly
|
|
49
|
+
- once the core scope is understood, prefer a compact clarification record plus explicit safe defaults over a giant exhaustive rewrite
|
|
48
50
|
|
|
49
51
|
## Required outputs
|
|
50
52
|
|
|
@@ -52,16 +54,63 @@ Use this skill only during `P1 Clarification`.
|
|
|
52
54
|
- developer-facing clarification prompt in `../.ai/clarification-prompt.md`
|
|
53
55
|
- explicit list of safe defaults and resolved ambiguities
|
|
54
56
|
|
|
57
|
+
## `questions.md` contract
|
|
58
|
+
|
|
59
|
+
`../docs/questions.md` is not a general project summary.
|
|
60
|
+
|
|
61
|
+
It exists only for prompt items that needed interpretation because they were unclear, incomplete, or materially ambiguous.
|
|
62
|
+
|
|
63
|
+
Each entry should answer this structure:
|
|
64
|
+
|
|
65
|
+
1. what was unclear from the original prompt
|
|
66
|
+
2. how you interpreted it
|
|
67
|
+
3. what decision or solution you chose for it
|
|
68
|
+
4. why that choice is prompt-faithful and reasonable
|
|
69
|
+
|
|
70
|
+
Keep the file narrow and explicit.
|
|
71
|
+
|
|
72
|
+
Do not use `questions.md` for:
|
|
73
|
+
|
|
74
|
+
- a full restatement of the entire prompt
|
|
75
|
+
- broad planning notes
|
|
76
|
+
- general project requirements that were already clear
|
|
77
|
+
- implementation details that belong in planning or design docs
|
|
78
|
+
|
|
79
|
+
Preferred entry shape:
|
|
80
|
+
|
|
81
|
+
```md
|
|
82
|
+
## Item N: <short ambiguity title>
|
|
83
|
+
|
|
84
|
+
### What was unclear
|
|
85
|
+
<the exact ambiguity or missing detail>
|
|
86
|
+
|
|
87
|
+
### Interpretation
|
|
88
|
+
<how it was interpreted>
|
|
89
|
+
|
|
90
|
+
### Decision
|
|
91
|
+
<the chosen resolution or safe default>
|
|
92
|
+
|
|
93
|
+
### Why this is reasonable
|
|
94
|
+
<brief justification tied to prompt faithfulness>
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
If nothing material was unclear, keep `questions.md` minimal rather than inventing content.
|
|
98
|
+
|
|
55
99
|
## Clarification-prompt validation loop
|
|
56
100
|
|
|
57
|
-
- compare the original prompt and the prepared clarification prompt using
|
|
58
|
-
-
|
|
59
|
-
-
|
|
101
|
+
- compare the original prompt and the prepared clarification prompt using one dedicated `General` validation session, never the developer session
|
|
102
|
+
- do not create a new validation session for every retry unless the session became unusable or a fundamental misunderstanding requires a clean restart
|
|
103
|
+
- on the first validation pass, build one self-contained validation prompt block for that `General` session
|
|
104
|
+
- on that first pass, include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md`
|
|
60
105
|
- do not use placeholders such as `same as previous`, `from context`, `see above`, or `latest artifact`
|
|
61
106
|
- ask that `General` session whether the clarification prompt deviates from, weakens, narrows, or violates the original prompt in any way
|
|
62
107
|
- require it to judge whether the clarification prompt is a genuine improvement in execution quality while remaining faithful to the original intent
|
|
63
|
-
- if
|
|
108
|
+
- if the validator suggests real fixes, patch the existing questions record and clarification prompt directly; do not restart the clarification phase from scratch unless the validator found a fundamental scope misunderstanding
|
|
109
|
+
- treat validator output as a correction list, not as a reason to regenerate giant clarification blocks repeatedly
|
|
110
|
+
- when rerunning validation in the same validator session, send only the improved clarification payload and the concrete fixes you made; do not resend the original prompt block if the session already has that context
|
|
111
|
+
- rerun validation only after applying the concrete fixes that matter
|
|
64
112
|
- keep the validation loop bounded and intentional; prefer one strong pass plus a small number of revision cycles over repeated loose churn
|
|
113
|
+
- once prompt-faithfulness is satisfied and the remaining notes are minor or cosmetic, stop iterating and proceed
|
|
65
114
|
- only treat the clarification prompt as approved for developer use after this validation loop passes and your own review agrees
|
|
66
115
|
- requesting human approval before this validation loop passes is illegal
|
|
67
116
|
|
|
@@ -59,7 +59,7 @@ Optional startup inputs may include:
|
|
|
59
59
|
6. wait only for the initial clarification approval before development starts
|
|
60
60
|
7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
|
|
61
61
|
8. initialize the bounded developer-session slots
|
|
62
|
-
9. start the
|
|
62
|
+
9. start the develop developer session only after `P2` is ready to begin
|
|
63
63
|
10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
|
|
64
64
|
11. wait for the developer's first exchange
|
|
65
65
|
12. send the approved clarification prompt as the second owner message in that same session
|
|
@@ -69,7 +69,7 @@ Optional startup inputs may include:
|
|
|
69
69
|
|
|
70
70
|
The first bounded developer session must begin in this exact order:
|
|
71
71
|
|
|
72
|
-
1. owner starts the
|
|
72
|
+
1. owner starts the develop developer session
|
|
73
73
|
2. owner sends: `lets plan this <original-prompt>`
|
|
74
74
|
3. developer responds
|
|
75
75
|
4. owner sends the approved clarification prompt
|
|
@@ -102,6 +102,12 @@ Track at least:
|
|
|
102
102
|
- `awaiting_human`
|
|
103
103
|
- `clarification_approved`
|
|
104
104
|
- `remediation_round`
|
|
105
|
+
- `clarification_validator_session_id`
|
|
106
|
+
- `evaluation_pass`
|
|
107
|
+
- `backend_evaluation_session_id`
|
|
108
|
+
- `frontend_evaluation_session_id`
|
|
109
|
+
- `last_evaluation_session_id`
|
|
110
|
+
- `passed_evaluation_tracks`
|
|
105
111
|
- `developer_sessions`
|
|
106
112
|
- `active_developer_session_index`
|
|
107
113
|
|
|
@@ -132,11 +138,10 @@ Required project metadata fields in `../metadata.json` when relevant:
|
|
|
132
138
|
|
|
133
139
|
## Bounded session model
|
|
134
140
|
|
|
135
|
-
Track up to
|
|
141
|
+
Track up to two planned developer sessions:
|
|
136
142
|
|
|
137
|
-
1.
|
|
138
|
-
2.
|
|
139
|
-
3. remediation
|
|
143
|
+
1. develop
|
|
144
|
+
2. bugfix
|
|
140
145
|
|
|
141
146
|
Later session slots may remain unused if the workflow never needs them.
|
|
142
147
|
|
|
@@ -15,15 +15,54 @@ Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
|
|
|
15
15
|
- do not enter remediation just because a report found something; enter it only when the accepted findings justify it
|
|
16
16
|
- if no remediation is needed, move directly to the final human decision
|
|
17
17
|
|
|
18
|
+
## Non-negotiable evaluation buckets
|
|
19
|
+
|
|
20
|
+
These areas are hard gates and should not be passed with known meaningful failures:
|
|
21
|
+
|
|
22
|
+
1. prompt compliance
|
|
23
|
+
2. requirement fulfillment / delivery completeness
|
|
24
|
+
3. security-critical flaws
|
|
25
|
+
|
|
26
|
+
If evaluation finds a real issue in one of those buckets, the default outcome is remediation, not leniency.
|
|
27
|
+
|
|
28
|
+
Do not wave through:
|
|
29
|
+
|
|
30
|
+
- prompt drift or meaningful requirement mismatch
|
|
31
|
+
- missing core flows or partial delivery of prompt-critical functionality
|
|
32
|
+
- real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
|
|
33
|
+
|
|
34
|
+
## Leniency buckets
|
|
35
|
+
|
|
36
|
+
These areas may pass with minor residual issues when the product is still clearly acceptable overall:
|
|
37
|
+
|
|
38
|
+
1. testing cases / test sufficiency
|
|
39
|
+
2. engineering architecture / engineering quality
|
|
40
|
+
3. aesthetics
|
|
41
|
+
|
|
42
|
+
Leniency is allowed only when the issue is:
|
|
43
|
+
|
|
44
|
+
- minor in impact
|
|
45
|
+
- not hiding a likely blocker in another bucket
|
|
46
|
+
- not undermining overall confidence in the delivered product
|
|
47
|
+
|
|
48
|
+
High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
|
|
49
|
+
|
|
50
|
+
If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
|
|
51
|
+
|
|
18
52
|
## Triage rules
|
|
19
53
|
|
|
20
54
|
- read both reports and merge the findings into one explicit triage set before deciding what happens next
|
|
21
55
|
- use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
|
|
22
|
-
- any finding
|
|
56
|
+
- any finding in the non-negotiable buckets should normally be returned for remediation if it is real
|
|
57
|
+
- findings marked `Blocker` should normally be returned for remediation
|
|
58
|
+
- findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
|
|
23
59
|
- findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
|
|
24
60
|
- findings marked `Low` may be passed without remediation
|
|
25
61
|
- do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
|
|
26
62
|
- if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
|
|
63
|
+
- minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
|
|
64
|
+
- minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
|
|
65
|
+
- if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
|
|
27
66
|
- if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
|
|
28
67
|
- if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
|
|
29
68
|
- challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
|
|
@@ -46,6 +46,19 @@ These two files are the only evaluation prompt sources for evaluation runs.
|
|
|
46
46
|
- keep reports file-backed and bring only short summaries into chat
|
|
47
47
|
- rerun only the evaluation track that still needs re-evaluation after remediation
|
|
48
48
|
|
|
49
|
+
## Evaluation pass strategy
|
|
50
|
+
|
|
51
|
+
- use a maximum of 3 full evaluation passes
|
|
52
|
+
- after each evaluation pass, extract a detailed concrete issue list from the failing report(s)
|
|
53
|
+
- send that list back to the active developer session with a direct instruction like: `fix these issues found in evaluation, verify affected flows dont regress after your fixes`
|
|
54
|
+
- if one evaluation track passes, mark it as passed and do not rerun that track in later passes unless a later fix clearly reopens it
|
|
55
|
+
- do not rerun both backend and frontend evaluation tracks when only one still needs re-evaluation
|
|
56
|
+
- after pass 1 and pass 2, use the detailed issue list from the latest failing report(s) to drive the next remediation pass
|
|
57
|
+
- after pass 3, do not create a new evaluation session for the still-failing track
|
|
58
|
+
- after pass 3, send the final fix list back to the developer, then return to the last evaluation session used for that still-failing track and ask whether the last reported issues are now fixed
|
|
59
|
+
- if they are fixed, have that same evaluation session update the report to reflect the current state cleanly, without mentioning recheck, retest, previous issues, or iterative review history
|
|
60
|
+
- the final report should read like a normal current-state evaluation report, not like a patch log
|
|
61
|
+
|
|
49
62
|
## Remediation loop
|
|
50
63
|
|
|
51
64
|
- route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
|
|
@@ -55,7 +68,8 @@ These two files are the only evaluation prompt sources for evaluation runs.
|
|
|
55
68
|
- the selected stack's platform-appropriate UI/E2E verification where applicable, with fresh screenshots or equivalent artifacts
|
|
56
69
|
- if remediation materially reopens an owner-run broad milestone boundary, route the project back to that boundary before re-evaluation instead of treating every remediation pass as an automatic broad rerun moment
|
|
57
70
|
- keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
|
|
58
|
-
-
|
|
71
|
+
- store backend, frontend, and last-used evaluation session ids in metadata so later passes and packaging can safely reuse the correct session when needed
|
|
72
|
+
- remember the evaluation flow allows a maximum of 3 full evaluation passes before the final issue-verification update path must be used
|
|
59
73
|
|
|
60
74
|
## Boundaries
|
|
61
75
|
|
|
@@ -82,9 +82,15 @@ Selected-stack defaults:
|
|
|
82
82
|
- define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
|
|
83
83
|
- call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
|
|
84
84
|
- define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
|
|
85
|
-
- define
|
|
86
|
-
- for web backend/fullstack projects,
|
|
87
|
-
-
|
|
85
|
+
- define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
|
|
86
|
+
- for Dockerized web backend/fullstack projects, the runtime contract may be `docker compose up --build` directly when the prompt or existing repo does not already dictate another stack-compatible contract
|
|
87
|
+
- when `docker compose up --build` is not the runtime contract, require `./run_app.sh` as the single primary runtime wrapper for the project
|
|
88
|
+
- for mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow instead of assuming host tooling conventions
|
|
89
|
+
- `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
|
|
90
|
+
- `./run_tests.sh` must prepare or install anything required before running the tests when that setup is needed for a clean environment
|
|
91
|
+
- for Dockerized web backend/fullstack projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
|
|
92
|
+
- for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
|
|
93
|
+
- local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
|
|
88
94
|
- define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
|
|
89
95
|
- if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
|
|
90
96
|
- if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
|
|
@@ -104,7 +110,7 @@ Selected-stack defaults:
|
|
|
104
110
|
- for each major module, define how it integrates with existing modules and which shared contracts it must follow consistently
|
|
105
111
|
- define verification plans that include cross-module scenarios and seam checks, not just isolated feature checks
|
|
106
112
|
- surface real unresolved risks honestly
|
|
107
|
-
- keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the
|
|
113
|
+
- keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the current verification cadence
|
|
108
114
|
|
|
109
115
|
## Exit target
|
|
110
116
|
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: retrospective-analysis
|
|
3
|
+
description: Owner-only final retrospective rules for slopmachine.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Retrospective Analysis
|
|
7
|
+
|
|
8
|
+
Use this skill only after `P10 Submission Packaging` has materially and formally succeeded.
|
|
9
|
+
|
|
10
|
+
## Purpose
|
|
11
|
+
|
|
12
|
+
- inspect what happened across the whole workflow run
|
|
13
|
+
- identify what caused churn, waste, late defects, or preventable corrections
|
|
14
|
+
- capture lessons that should improve future runs
|
|
15
|
+
- write package-specific retrospective files under `~/slopmachine/retrospectives/`
|
|
16
|
+
|
|
17
|
+
## Phase role
|
|
18
|
+
|
|
19
|
+
- this is an automatic owner-only phase
|
|
20
|
+
- it is quiet and non-blocking by default
|
|
21
|
+
- it does not create a new human stop
|
|
22
|
+
- it does not rerun broad verification by default
|
|
23
|
+
- it should not reopen development unless it finds a real defect in the already-packaged result
|
|
24
|
+
|
|
25
|
+
## Output location
|
|
26
|
+
|
|
27
|
+
Write dated retrospective files under:
|
|
28
|
+
|
|
29
|
+
- `~/slopmachine/retrospectives/`
|
|
30
|
+
|
|
31
|
+
Preferred filenames:
|
|
32
|
+
|
|
33
|
+
- `retrospective-YYYY-MM-DD.md`
|
|
34
|
+
- `improvement-actions-YYYY-MM-DD.md`
|
|
35
|
+
|
|
36
|
+
If only one file is needed, the retrospective file is sufficient.
|
|
37
|
+
|
|
38
|
+
## Evidence sources
|
|
39
|
+
|
|
40
|
+
Prefer existing workflow artifacts first:
|
|
41
|
+
|
|
42
|
+
- root metadata
|
|
43
|
+
- questions/clarification record
|
|
44
|
+
- clarification prompt
|
|
45
|
+
- planning artifacts
|
|
46
|
+
- Beads comments and transitions
|
|
47
|
+
- developer-session handoffs
|
|
48
|
+
- review and rejection history
|
|
49
|
+
- verification gate notes
|
|
50
|
+
- evaluation reports
|
|
51
|
+
- remediation records
|
|
52
|
+
- packaging outputs
|
|
53
|
+
|
|
54
|
+
Do not reread the entire codebase unless a real inconsistency requires it.
|
|
55
|
+
Do not rerun broad Docker or full-suite verification just for retrospective analysis.
|
|
56
|
+
|
|
57
|
+
## Required retrospective sections
|
|
58
|
+
|
|
59
|
+
1. outcome summary
|
|
60
|
+
2. what worked well
|
|
61
|
+
3. what caused waste or looping
|
|
62
|
+
4. what was caught too late
|
|
63
|
+
5. findings by phase
|
|
64
|
+
6. findings by instruction plane:
|
|
65
|
+
- owner shell
|
|
66
|
+
- developer prompt
|
|
67
|
+
- skills
|
|
68
|
+
- `AGENTS.md`
|
|
69
|
+
7. actionable improvements
|
|
70
|
+
|
|
71
|
+
## Audit buckets
|
|
72
|
+
|
|
73
|
+
Evaluate at least these buckets in hindsight:
|
|
74
|
+
|
|
75
|
+
1. prompt-fit
|
|
76
|
+
2. security-critical flaws
|
|
77
|
+
3. test sufficiency
|
|
78
|
+
4. major engineering quality
|
|
79
|
+
5. token/time waste
|
|
80
|
+
|
|
81
|
+
For each meaningful finding, prefer:
|
|
82
|
+
|
|
83
|
+
- what happened
|
|
84
|
+
- why it happened
|
|
85
|
+
- where the fix belongs
|
|
86
|
+
- how it should change future runs
|
|
87
|
+
|
|
88
|
+
## Rule for reopening work
|
|
89
|
+
|
|
90
|
+
- if retrospective finds a real packaging or delivery defect, reopen `P10` and fix it
|
|
91
|
+
- if it finds only improvements, document them and close the retrospective phase
|
|
@@ -14,14 +14,24 @@ Use this skill during `P3 Scaffold` before prompting the developer.
|
|
|
14
14
|
- establish the local verification path and the standardized gate path
|
|
15
15
|
- make prompt-critical baseline behavior real where required
|
|
16
16
|
- keep repo-local `README.md` honest from the start
|
|
17
|
-
- make the selected-stack primary
|
|
17
|
+
- make the selected-stack primary runtime command and the universal `./run_tests.sh` broad test command real from the scaffold stage
|
|
18
|
+
|
|
19
|
+
For Dockerized web backend/fullstack projects, scaffold must make these commands real and working before scaffold can pass:
|
|
20
|
+
|
|
21
|
+
- `docker compose up --build`
|
|
22
|
+
- `./run_tests.sh`
|
|
18
23
|
|
|
19
24
|
## Scaffold and foundation guidance
|
|
20
25
|
|
|
21
26
|
- create the initial project structure intentionally
|
|
22
27
|
- follow the original prompt and existing repository first; only use the package defaults below when they do not already specify the platform or stack
|
|
23
|
-
- create
|
|
24
|
-
-
|
|
28
|
+
- create `./run_tests.sh` during scaffold for every project as the single broad test entrypoint
|
|
29
|
+
- for Dockerized web backend/fullstack projects, make `docker compose up --build` real as the primary runtime command during scaffold
|
|
30
|
+
- when `docker compose up --build` is not the runtime contract, create `./run_app.sh` during scaffold as the single primary runtime wrapper
|
|
31
|
+
- make `./run_tests.sh` self-sufficient from a clean environment by preparing or installing anything it needs before executing the tests
|
|
32
|
+
- for Dockerized web backend/fullstack projects, `./run_tests.sh` must execute the broad test path through Docker and should own that Dockerized test flow directly instead of requiring separate manual pre-setup
|
|
33
|
+
- for non-web or non-Docker projects, `./run_tests.sh` must execute the selected stack's platform-equivalent broad test flow while preserving the same single-command interface
|
|
34
|
+
- local non-Docker test commands should still be installed and working for normal development iteration
|
|
25
35
|
- create required testing directories and baseline docs structure
|
|
26
36
|
- put baseline config and logging structure in place
|
|
27
37
|
- install and configure the local test tooling needed for ordinary iteration during scaffold rather than deferring local testing setup to later phases
|
|
@@ -42,6 +52,7 @@ Use this skill during `P3 Scaffold` before prompting the developer.
|
|
|
42
52
|
- require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
|
|
43
53
|
- for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
|
|
44
54
|
- establish README structure early instead of leaving it until the end
|
|
55
|
+
- ensure `README.md` clearly documents the primary runtime command and the broad `./run_tests.sh` contract for the selected stack
|
|
45
56
|
- prove the scaffold in a clean state before deeper feature work
|
|
46
57
|
- verify clean startup and teardown behavior under the selected stack's runtime contract
|
|
47
58
|
- for Dockerized web projects, verify clean startup and teardown behavior under the chosen project namespace
|
|
@@ -66,3 +77,5 @@ Scaffold should make later slices easier, not force them to retrofit missing fun
|
|
|
66
77
|
- use local and narrow checks while correcting scaffold work
|
|
67
78
|
- reserve one broad owner-run scaffold gate for actual scaffold acceptance
|
|
68
79
|
- do not spend extra broad reruns once the acceptance question is already answered
|
|
80
|
+
- for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the baseline actually works
|
|
81
|
+
- after that scaffold confirmation, do not run Docker again during ordinary development work; the next Docker-based run should be at development completion when integrated behavior is checked
|