theslopmachine 0.4.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -37,7 +37,7 @@ The current engine is the lighter workflow line:
37
37
  - smaller always-loaded owner shell
38
38
  - smaller developer rulebook
39
39
  - richer phase-specific skills loaded when needed
40
- - bounded 2/3 developer-session model
40
+ - bounded 2-session developer-session model
41
41
  - `beads_rust` bootstrap path
42
42
 
43
43
  ## Requirements
@@ -66,13 +66,13 @@ npm pack
66
66
  This produces a tarball such as:
67
67
 
68
68
  ```bash
69
- theslopmachine-0.4.0.tgz
69
+ theslopmachine-0.4.3.tgz
70
70
  ```
71
71
 
72
72
  You can then install it globally with:
73
73
 
74
74
  ```bash
75
- npm install -g ./theslopmachine-0.4.0.tgz
75
+ npm install -g ./theslopmachine-0.4.3.tgz
76
76
  ```
77
77
 
78
78
  For local development instead of global install:
@@ -143,6 +143,7 @@ The expected high-level lifecycle is:
143
143
  8. final human decision
144
144
  9. remediation when needed
145
145
  10. submission packaging
146
+ 11. retrospective
146
147
 
147
148
  ## How It Is Intended To Operate
148
149
 
@@ -154,7 +155,7 @@ That means:
154
155
  - planning, scaffold, development, verification, hardening, remediation, and packaging load detailed skills only when needed
155
156
  - early and late phases do not carry each other's full instruction payloads all the time
156
157
 
157
- The v2 workflow also expects:
158
+ The current workflow also expects:
158
159
 
159
160
  - targeted reads over broad rereads
160
161
  - local and narrow verification during ordinary iteration
@@ -164,16 +165,79 @@ The v2 workflow also expects:
164
165
 
165
166
  Every bootstrapped project should expose:
166
167
 
167
- - one primary documented launch/run command for its selected stack
168
- - one primary documented full-test command for its selected stack
168
+ - one primary documented runtime command
169
+ - one primary documented broad test command: `./run_tests.sh`
169
170
 
170
171
  Follow the original prompt and the existing repository first. Use the examples below only when they do not already specify the platform or stack.
171
172
 
172
173
  Examples:
173
174
 
174
175
  - web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
175
- - Expo mobile: `npx expo start` and the project's single full-test command
176
- - Electron desktop: `npm run dev` and the project's single full-test command
176
+ - mobile or desktop when Docker runtime is not the direct run path: `./run_app.sh` and `./run_tests.sh`
177
+
178
+ ## What It Does Well
179
+
180
+ - keeps the owner shell strict without carrying a giant monolith prompt
181
+ - loads detailed phase and activity skills only when they are actually needed
182
+ - uses a bounded 2-session model to reduce long-run context drag
183
+ - pushes prompt-fit, security, testing, and engineering-quality concerns earlier into planning and hardening
184
+ - standardizes runtime and broad-test expectations with `docker compose up --build` or `./run_app.sh` plus `./run_tests.sh`
185
+ - preserves strong packaging/report discipline with canonical files in `~/slopmachine/`
186
+
187
+ ## Installed Assets
188
+
189
+ The package installs:
190
+
191
+ - owner and developer agents
192
+ - phase and activity skills
193
+ - canonical evaluation and report templates in `~/slopmachine/`
194
+ - workflow bootstrap helper
195
+ - repo rulebook template
196
+ - session export utilities
197
+
198
+ Canonical files in `~/slopmachine/`:
199
+
200
+ - `backend-evaluation-prompt.md`
201
+ - `frontend-evaluation-prompt.md`
202
+ - `document-completeness.md`
203
+ - `engineering-results.md`
204
+ - `implementation-comparison.md`
205
+ - `quality-document.md`
206
+ - `retrospectives/`
207
+
208
+ ## Dependencies And Assumptions
209
+
210
+ - Node.js 18+ is required for the package CLI itself
211
+ - OpenCode must already be available on the machine
212
+ - git must be available
213
+ - `beads_rust` / `br` is installed or verified by `slopmachine setup`
214
+
215
+ Generated projects follow the original prompt and the existing repository first.
216
+
217
+ Default runtime/test wrapper expectations:
218
+
219
+ - Dockerized web backend/fullstack: `docker compose up --build` and `./run_tests.sh`
220
+ - non-web or non-Docker runtime cases: `./run_app.sh` and `./run_tests.sh`
221
+
222
+ `./run_tests.sh` is always the broad test wrapper.
223
+
224
+ ## Command Summary
225
+
226
+ Package CLI:
227
+
228
+ - `slopmachine setup`
229
+ - `slopmachine init`
230
+ - `slopmachine init -o`
231
+
232
+ Package validation:
233
+
234
+ - `npm run check`
235
+ - `npm pack`
236
+
237
+ Generated project conventions:
238
+
239
+ - `docker compose up --build` or `./run_app.sh`
240
+ - `./run_tests.sh`
177
241
 
178
242
  ## Files And Locations
179
243
 
package/RELEASE.md CHANGED
@@ -41,13 +41,13 @@ npm pack
41
41
  This should produce a tarball such as:
42
42
 
43
43
  ```bash
44
- theslopmachine-0.4.0.tgz
44
+ theslopmachine-0.4.3.tgz
45
45
  ```
46
46
 
47
47
  ## Inspect package contents
48
48
 
49
49
  ```bash
50
- tar -tzf theslopmachine-0.4.0.tgz
50
+ tar -tzf theslopmachine-0.4.3.tgz
51
51
  ```
52
52
 
53
53
  Check that the tarball includes:
@@ -67,7 +67,7 @@ Agent-integrity rule:
67
67
 
68
68
  ## Optimization Goal
69
69
 
70
- The main v2 target is:
70
+ The main target is:
71
71
 
72
72
  - less token waste
73
73
  - less elapsed time
@@ -109,6 +109,18 @@ State split:
109
109
 
110
110
  Do not create another competing workflow-state system.
111
111
 
112
+ ## Git Traceability
113
+
114
+ Use git to preserve meaningful workflow checkpoints.
115
+
116
+ - after each meaningful accepted work unit, run `git add .` and `git commit -m "<message>"`
117
+ - meaningful work includes accepted scaffold completion, accepted major development slices, accepted remediation passes, and other clearly reviewable milestones
118
+ - keep the git flow simple and checkpoint-oriented
119
+ - commit only after the relevant work and verification for that checkpoint are complete enough to preserve useful history
120
+ - keep commit messages descriptive and easy to reason about later
121
+ - do not push unless explicitly requested
122
+ - do not commit secrets, local-only junk, or accidental noise
123
+
112
124
  ## Mandatory Operating Order
113
125
 
114
126
  Operate in this order:
@@ -149,6 +161,7 @@ Use these exact root phases:
149
161
  - `P8 Final Human Decision`
150
162
  - `P9 Remediation`
151
163
  - `P10 Submission Packaging`
164
+ - `P11 Retrospective`
152
165
 
153
166
  Phase rules:
154
167
 
@@ -157,21 +170,21 @@ Phase rules:
157
170
  - do not close multiple root phases in one transition block
158
171
  - `P9 Remediation` stays its own root phase once evaluation has accepted follow-up work
159
172
  - `P6 Hardening` may reopen `P5` if hardening exposes unresolved integrated instability
173
+ - `P11 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
160
174
 
161
175
  ## Developer Session Model
162
176
 
163
- Use up to three bounded developer sessions:
177
+ Use up to two bounded developer sessions:
164
178
 
165
- 1. build session: planning, scaffold, development
166
- 2. stabilization session: integrated verification and hardening, only if needed
167
- 3. remediation session: evaluation-response remediation, only if needed
179
+ 1. develop session: planning, scaffold, development
180
+ 2. bugfix session: integrated verification, hardening, and remediation, only if needed
168
181
 
169
182
  Use `developer-session-lifecycle` for startup, resume detection, session consistency checks, and recovery.
170
183
  Use `session-rollover` only for planned transitions between those bounded developer sessions.
171
184
 
172
185
  Do not launch the developer during `P0` or `P1`.
173
186
 
174
- When the first build developer session begins in `P2`, start it in this exact order:
187
+ When the first develop developer session begins in `P2`, start it in this exact order:
175
188
 
176
189
  1. send `lets plan this <original-prompt>`
177
190
  2. wait for the developer's first reply
@@ -199,8 +212,13 @@ Selected-stack rule:
199
212
 
200
213
  Every project must end up with:
201
214
 
202
- - one primary documented launch/run command for the selected stack
203
- - one primary documented full-test command for the selected stack
215
+ - one primary documented runtime command
216
+ - one primary documented full-test command: `./run_tests.sh`
217
+
218
+ Runtime command rule:
219
+
220
+ - for Dockerized web backend/fullstack projects, `docker compose up --build` may be the primary runtime command directly
221
+ - when `docker compose up --build` is not the runtime contract, the project must provide `./run_app.sh` as the single primary runtime wrapper
204
222
 
205
223
  Default moments:
206
224
 
@@ -208,6 +226,12 @@ Default moments:
208
226
  2. development complete -> integrated verification entry
209
227
  3. final qualified state before packaging
210
228
 
229
+ For Dockerized web backend/fullstack projects, enforce this cadence:
230
+
231
+ - after scaffold completion, the owner runs `docker compose up --build` and `./run_tests.sh` once to confirm the scaffold baseline really works
232
+ - after that, do not run Docker again during ordinary development work
233
+ - the next Docker-based run is at development completion or integrated-verification entry unless a real blocker forces earlier escalation
234
+
211
235
  Between those moments, rely on:
212
236
 
213
237
  - local runtime checks
@@ -245,6 +269,7 @@ Core map:
245
269
  - `P7` -> `final-evaluation-orchestration`, `evaluation-triage`, `report-output-discipline`
246
270
  - `P9` -> `remediation-guidance`
247
271
  - `P10` -> `submission-packaging`, `report-output-discipline`
272
+ - `P11` -> `retrospective-analysis`, `owner-evidence-discipline`, `report-output-discipline`
248
273
  - state mutations -> `beads-operations`
249
274
  - evidence-heavy review -> `owner-evidence-discipline`
250
275
  - planned developer-session switch -> `session-rollover`
@@ -327,6 +352,16 @@ When `P10 Submission Packaging` begins:
327
352
  - follow its exact artifact, export, cleanup, and output contract
328
353
  - do not close packaging until every required final artifact path has been verified
329
354
 
355
+ ## Retrospective
356
+
357
+ After `P10 Submission Packaging` closes successfully:
358
+
359
+ - automatically enter `P11 Retrospective`
360
+ - load `retrospective-analysis`
361
+ - write dated retrospective output under `~/slopmachine/retrospectives/`
362
+ - keep it owner-only and non-blocking by default
363
+ - reopen packaging only if the retrospective finds a real packaged-result defect
364
+
330
365
  ## Completion Standard
331
366
 
332
367
  The workflow is not done until:
@@ -335,6 +370,7 @@ The workflow is not done until:
335
370
  - the current root phase closed cleanly
336
371
  - the workflow ledger closed cleanly
337
372
  - the final package is assembled and verified in its final structure
373
+ - the retrospective phase has either documented improvements or reopened and resolved any real packaging defect it found
338
374
 
339
375
  Success means:
340
376
 
@@ -45,6 +45,8 @@ Use this skill only during `P1 Clarification`.
45
45
  - never use defaults that drift from the original prompt
46
46
  - do not use quick, loose, or simplifying assumptions that shrink what the prompt asked for
47
47
  - do not guess through material ambiguity
48
+ - do not expand the clarification artifact just to exhaust every minor edge case when the scope is already clear enough to plan correctly
49
+ - once the core scope is understood, prefer a compact clarification record plus explicit safe defaults over a giant exhaustive rewrite
48
50
 
49
51
  ## Required outputs
50
52
 
@@ -52,16 +54,63 @@ Use this skill only during `P1 Clarification`.
52
54
  - developer-facing clarification prompt in `../.ai/clarification-prompt.md`
53
55
  - explicit list of safe defaults and resolved ambiguities
54
56
 
57
+ ## `questions.md` contract
58
+
59
+ `../docs/questions.md` is not a general project summary.
60
+
61
+ It exists only for prompt items that needed interpretation because they were unclear, incomplete, or materially ambiguous.
62
+
63
+ Each entry should answer this structure:
64
+
65
+ 1. what was unclear from the original prompt
66
+ 2. how you interpreted it
67
+ 3. what decision or solution you chose for it
68
+ 4. why that choice is prompt-faithful and reasonable
69
+
70
+ Keep the file narrow and explicit.
71
+
72
+ Do not use `questions.md` for:
73
+
74
+ - a full restatement of the entire prompt
75
+ - broad planning notes
76
+ - general project requirements that were already clear
77
+ - implementation details that belong in planning or design docs
78
+
79
+ Preferred entry shape:
80
+
81
+ ```md
82
+ ## Item N: <short ambiguity title>
83
+
84
+ ### What was unclear
85
+ <the exact ambiguity or missing detail>
86
+
87
+ ### Interpretation
88
+ <how it was interpreted>
89
+
90
+ ### Decision
91
+ <the chosen resolution or safe default>
92
+
93
+ ### Why this is reasonable
94
+ <brief justification tied to prompt faithfulness>
95
+ ```
96
+
97
+ If nothing material was unclear, keep `questions.md` minimal rather than inventing content.
98
+
55
99
  ## Clarification-prompt validation loop
56
100
 
57
- - compare the original prompt and the prepared clarification prompt using a fresh ephemeral `General` session, never the developer session
58
- - build one self-contained validation prompt block for that `General` session every time
59
- - include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md` in that block
101
+ - compare the original prompt and the prepared clarification prompt using one dedicated `General` validation session, never the developer session
102
+ - do not create a new validation session for every retry unless the session became unusable or a fundamental misunderstanding requires a clean restart
103
+ - on the first validation pass, build one self-contained validation prompt block for that `General` session
104
+ - on that first pass, include the full original prompt text, the full current questions or clarification record, and the full current `../.ai/clarification-prompt.md`
60
105
  - do not use placeholders such as `same as previous`, `from context`, `see above`, or `latest artifact`
61
106
  - ask that `General` session whether the clarification prompt deviates from, weakens, narrows, or violates the original prompt in any way
62
107
  - require it to judge whether the clarification prompt is a genuine improvement in execution quality while remaining faithful to the original intent
63
- - if mismatches or prompt drift are found, revise the questions record and clarification prompt, then build a newly composed full validation block and run the check again
108
+ - if the validator suggests real fixes, patch the existing questions record and clarification prompt directly; do not restart the clarification phase from scratch unless the validator found a fundamental scope misunderstanding
109
+ - treat validator output as a correction list, not as a reason to regenerate giant clarification blocks repeatedly
110
+ - when rerunning validation in the same validator session, send only the improved clarification payload and the concrete fixes you made; do not resend the original prompt block if the session already has that context
111
+ - rerun validation only after applying the concrete fixes that matter
64
112
  - keep the validation loop bounded and intentional; prefer one strong pass plus a small number of revision cycles over repeated loose churn
113
+ - once prompt-faithfulness is satisfied and the remaining notes are minor or cosmetic, stop iterating and proceed
65
114
  - only treat the clarification prompt as approved for developer use after this validation loop passes and your own review agrees
66
115
  - requesting human approval before this validation loop passes is illegal
67
116
 
@@ -59,7 +59,7 @@ Optional startup inputs may include:
59
59
  6. wait only for the initial clarification approval before development starts
60
60
  7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
61
61
  8. initialize the bounded developer-session slots
62
- 9. start the build developer session only after `P2` is ready to begin
62
+ 9. start the develop developer session only after `P2` is ready to begin
63
63
  10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
64
64
  11. wait for the developer's first exchange
65
65
  12. send the approved clarification prompt as the second owner message in that same session
@@ -69,7 +69,7 @@ Optional startup inputs may include:
69
69
 
70
70
  The first bounded developer session must begin in this exact order:
71
71
 
72
- 1. owner starts the build developer session
72
+ 1. owner starts the develop developer session
73
73
  2. owner sends: `lets plan this <original-prompt>`
74
74
  3. developer responds
75
75
  4. owner sends the approved clarification prompt
@@ -102,6 +102,12 @@ Track at least:
102
102
  - `awaiting_human`
103
103
  - `clarification_approved`
104
104
  - `remediation_round`
105
+ - `clarification_validator_session_id`
106
+ - `evaluation_pass`
107
+ - `backend_evaluation_session_id`
108
+ - `frontend_evaluation_session_id`
109
+ - `last_evaluation_session_id`
110
+ - `passed_evaluation_tracks`
105
111
  - `developer_sessions`
106
112
  - `active_developer_session_index`
107
113
 
@@ -132,11 +138,10 @@ Required project metadata fields in `../metadata.json` when relevant:
132
138
 
133
139
  ## Bounded session model
134
140
 
135
- Track up to three planned developer sessions:
141
+ Track up to two planned developer sessions:
136
142
 
137
- 1. build
138
- 2. stabilization
139
- 3. remediation
143
+ 1. develop
144
+ 2. bugfix
140
145
 
141
146
  Later session slots may remain unused if the workflow never needs them.
142
147
 
@@ -15,15 +15,54 @@ Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
15
15
  - do not enter remediation just because a report found something; enter it only when the accepted findings justify it
16
16
  - if no remediation is needed, move directly to the final human decision
17
17
 
18
+ ## Non-negotiable evaluation buckets
19
+
20
+ These areas are hard gates and should not be passed with known meaningful failures:
21
+
22
+ 1. prompt compliance
23
+ 2. requirement fulfillment / delivery completeness
24
+ 3. security-critical flaws
25
+
26
+ If evaluation finds a real issue in one of those buckets, the default outcome is remediation, not leniency.
27
+
28
+ Do not wave through:
29
+
30
+ - prompt drift or meaningful requirement mismatch
31
+ - missing core flows or partial delivery of prompt-critical functionality
32
+ - real security defects involving auth, authorization, ownership, isolation, exposure, or secret handling
33
+
34
+ ## Leniency buckets
35
+
36
+ These areas may pass with minor residual issues when the product is still clearly acceptable overall:
37
+
38
+ 1. testing cases / test sufficiency
39
+ 2. engineering architecture / engineering quality
40
+ 3. aesthetics
41
+
42
+ Leniency is allowed only when the issue is:
43
+
44
+ - minor in impact
45
+ - not hiding a likely blocker in another bucket
46
+ - not undermining overall confidence in the delivered product
47
+
48
+ High-severity findings in these leniency buckets may still be passed when they are not materially relevant to actual acceptance readiness, but that should be a deliberate exception backed by direct evidence.
49
+
50
+ If the hard gates pass cleanly, the leniency buckets should usually not force remediation unless the issue is a true `Blocker` or a materially relevant `High` finding.
51
+
18
52
  ## Triage rules
19
53
 
20
54
  - read both reports and merge the findings into one explicit triage set before deciding what happens next
21
55
  - use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
22
- - any finding marked `Blocker` or `High` should normally be returned for remediation
56
+ - any finding in the non-negotiable buckets should normally be returned for remediation if it is real
57
+ - findings marked `Blocker` should normally be returned for remediation
58
+ - findings marked `High` should normally be returned for remediation unless they fall in a leniency bucket and your direct evidence shows they are not materially relevant to acceptance
23
59
  - findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
24
60
  - findings marked `Low` may be passed without remediation
25
61
  - do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
26
62
  - if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
63
+ - minor engineering-architecture quality issues may pass if the system is still structurally credible and maintainable overall
64
+ - minor aesthetics issues may pass if the UI is still clearly usable and credible for the actual use case
65
+ - if prompt compliance, requirement fulfillment, and security all pass, testing/engineering/aesthetics findings should generally be treated more leniently unless they are blocking or materially high-risk
27
66
  - if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
28
67
  - if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
29
68
  - challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
@@ -46,6 +46,19 @@ These two files are the only evaluation prompt sources for evaluation runs.
46
46
  - keep reports file-backed and bring only short summaries into chat
47
47
  - rerun only the evaluation track that still needs re-evaluation after remediation
48
48
 
49
+ ## Evaluation pass strategy
50
+
51
+ - use a maximum of 3 full evaluation passes
52
+ - after each evaluation pass, extract a detailed concrete issue list from the failing report(s)
53
+ - send that list back to the active developer session with a direct instruction like: `fix these issues found in evaluation, verify affected flows dont regress after your fixes`
54
+ - if one evaluation track passes, mark it as passed and do not rerun that track in later passes unless a later fix clearly reopens it
55
+ - do not rerun both backend and frontend evaluation tracks when only one still needs re-evaluation
56
+ - after pass 1 and pass 2, use the detailed issue list from the latest failing report(s) to drive the next remediation pass
57
+ - after pass 3, do not create a new evaluation session for the still-failing track
58
+ - after pass 3, send the final fix list back to the developer, then return to the last evaluation session used for that still-failing track and ask whether the last reported issues are now fixed
59
+ - if they are fixed, have that same evaluation session update the report to reflect the current state cleanly, without mentioning recheck, retest, previous issues, or iterative review history
60
+ - the final report should read like a normal current-state evaluation report, not like a patch log
61
+
49
62
  ## Remediation loop
50
63
 
51
64
  - route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
@@ -55,7 +68,8 @@ These two files are the only evaluation prompt sources for evaluation runs.
55
68
  - the selected stack's platform-appropriate UI/E2E verification where applicable, with fresh screenshots or equivalent artifacts
56
69
  - if remediation materially reopens an owner-run broad milestone boundary, route the project back to that boundary before re-evaluation instead of treating every remediation pass as an automatic broad rerun moment
57
70
  - keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
58
- - remember the external process allows a maximum of 3 repair rounds
71
+ - store backend, frontend, and last-used evaluation session ids in metadata so later passes and packaging can safely reuse the correct session when needed
72
+ - remember the evaluation flow allows a maximum of 3 full evaluation passes before the final issue-verification update path must be used
59
73
 
60
74
  ## Boundaries
61
75
 
@@ -82,9 +82,15 @@ Selected-stack defaults:
82
82
  - define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
83
83
  - call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
84
84
  - define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
85
- - define a project-standard launch command and a project-standard full-test command early, and keep both compatible with the selected stack
86
- - for web backend/fullstack projects, default those to `docker compose up --build` and `./run_tests.sh` only when the prompt or existing repo does not already dictate another stack-compatible contract
87
- - for mobile, desktop, CLI, library, or other non-web projects, define the selected stack's appropriate single documented launch command and single documented full-test command instead of forcing Docker conventions
85
+ - define the project-standard runtime contract and the universal broad test entrypoint `./run_tests.sh` early, and keep both compatible with the selected stack
86
+ - for Dockerized web backend/fullstack projects, the runtime contract may be `docker compose up --build` directly when the prompt or existing repo does not already dictate another stack-compatible contract
87
+ - when `docker compose up --build` is not the runtime contract, require `./run_app.sh` as the single primary runtime wrapper for the project
88
+ - for mobile, desktop, CLI, library, or other non-web projects, `./run_app.sh` should own the selected stack's runtime flow instead of assuming host tooling conventions
89
+ - `./run_tests.sh` must exist for every project as the platform-independent broad test wrapper
90
+ - `./run_tests.sh` must prepare or install anything required before running the tests when that setup is needed for a clean environment
91
+ - for Dockerized web backend/fullstack projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
92
+ - for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
93
+ - local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
88
94
  - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
89
95
  - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
90
96
  - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
@@ -104,7 +110,7 @@ Selected-stack defaults:
104
110
  - for each major module, define how it integrates with existing modules and which shared contracts it must follow consistently
105
111
  - define verification plans that include cross-module scenarios and seam checks, not just isolated feature checks
106
112
  - surface real unresolved risks honestly
107
- - keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the v2 verification cadence
113
+ - keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the current verification cadence
108
114
 
109
115
  ## Exit target
110
116
 
@@ -0,0 +1,91 @@
1
+ ---
2
+ name: retrospective-analysis
3
+ description: Owner-only final retrospective rules for slopmachine.
4
+ ---
5
+
6
+ # Retrospective Analysis
7
+
8
+ Use this skill only after `P10 Submission Packaging` has materially and formally succeeded.
9
+
10
+ ## Purpose
11
+
12
+ - inspect what happened across the whole workflow run
13
+ - identify what caused churn, waste, late defects, or preventable corrections
14
+ - capture lessons that should improve future runs
15
+ - write package-specific retrospective files under `~/slopmachine/retrospectives/`
16
+
17
+ ## Phase role
18
+
19
+ - this is an automatic owner-only phase
20
+ - it is quiet and non-blocking by default
21
+ - it does not create a new human stop
22
+ - it does not rerun broad verification by default
23
+ - it should not reopen development unless it finds a real defect in the already-packaged result
24
+
25
+ ## Output location
26
+
27
+ Write dated retrospective files under:
28
+
29
+ - `~/slopmachine/retrospectives/`
30
+
31
+ Preferred filenames:
32
+
33
+ - `retrospective-YYYY-MM-DD.md`
34
+ - `improvement-actions-YYYY-MM-DD.md`
35
+
36
+ If only one file is needed, the retrospective file is sufficient.
37
+
38
+ ## Evidence sources
39
+
40
+ Prefer existing workflow artifacts first:
41
+
42
+ - root metadata
43
+ - questions/clarification record
44
+ - clarification prompt
45
+ - planning artifacts
46
+ - Beads comments and transitions
47
+ - developer-session handoffs
48
+ - review and rejection history
49
+ - verification gate notes
50
+ - evaluation reports
51
+ - remediation records
52
+ - packaging outputs
53
+
54
+ Do not reread the entire codebase unless a real inconsistency requires it.
55
+ Do not rerun broad Docker or full-suite verification just for retrospective analysis.
56
+
57
+ ## Required retrospective sections
58
+
59
+ 1. outcome summary
60
+ 2. what worked well
61
+ 3. what caused waste or looping
62
+ 4. what was caught too late
63
+ 5. findings by phase
64
+ 6. findings by instruction plane:
65
+ - owner shell
66
+ - developer prompt
67
+ - skills
68
+ - `AGENTS.md`
69
+ 7. actionable improvements
70
+
71
+ ## Audit buckets
72
+
73
+ Evaluate at least these buckets in hindsight:
74
+
75
+ 1. prompt-fit
76
+ 2. security-critical flaws
77
+ 3. test sufficiency
78
+ 4. major engineering quality
79
+ 5. token/time waste
80
+
81
+ For each meaningful finding, prefer:
82
+
83
+ - what happened
84
+ - why it happened
85
+ - where the fix belongs
86
+ - how it should change future runs
87
+
88
+ ## Rule for reopening work
89
+
90
+ - if retrospective finds a real packaging or delivery defect, reopen `P10` and fix it
91
+ - if it finds only improvements, document them and close the retrospective phase
@@ -14,14 +14,24 @@ Use this skill during `P3 Scaffold` before prompting the developer.
14
14
  - establish the local verification path and the standardized gate path
15
15
  - make prompt-critical baseline behavior real where required
16
16
  - keep repo-local `README.md` honest from the start
17
- - make the selected-stack primary launch command and primary full-test command real from the scaffold stage
17
+ - make the selected-stack primary runtime command and the universal `./run_tests.sh` broad test command real from the scaffold stage
18
+
19
+ For Dockerized web backend/fullstack projects, scaffold must make these commands real and working before scaffold can pass:
20
+
21
+ - `docker compose up --build`
22
+ - `./run_tests.sh`
18
23
 
19
24
  ## Scaffold and foundation guidance
20
25
 
21
26
  - create the initial project structure intentionally
22
27
  - follow the original prompt and existing repository first; only use the package defaults below when they do not already specify the platform or stack
23
- - create the selected-stack primary full-test command during scaffold; for web backend/fullstack projects this is usually `./run_tests.sh`, while non-web projects should expose their own single documented full-test command
24
- - create the selected-stack primary launch command during scaffold; for web backend/fullstack projects this is usually `docker compose up --build`, while non-web projects should expose their own single documented launch command
28
+ - create `./run_tests.sh` during scaffold for every project as the single broad test entrypoint
29
+ - for Dockerized web backend/fullstack projects, make `docker compose up --build` real as the primary runtime command during scaffold
30
+ - when `docker compose up --build` is not the runtime contract, create `./run_app.sh` during scaffold as the single primary runtime wrapper
31
+ - make `./run_tests.sh` self-sufficient from a clean environment by preparing or installing anything it needs before executing the tests
32
+ - for Dockerized web backend/fullstack projects, `./run_tests.sh` must execute the broad test path through Docker and should own that Dockerized test flow directly instead of requiring separate manual pre-setup
33
+ - for non-web or non-Docker projects, `./run_tests.sh` must execute the selected stack's platform-equivalent broad test flow while preserving the same single-command interface
34
+ - local non-Docker test commands should still be installed and working for normal development iteration
25
35
  - create required testing directories and baseline docs structure
26
36
  - put baseline config and logging structure in place
27
37
  - install and configure the local test tooling needed for ordinary iteration during scaffold rather than deferring local testing setup to later phases
@@ -42,6 +52,7 @@ Use this skill during `P3 Scaffold` before prompting the developer.
42
52
  - require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
43
53
  - for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
44
54
  - establish README structure early instead of leaving it until the end
55
+ - ensure `README.md` clearly documents the primary runtime command and the broad `./run_tests.sh` contract for the selected stack
45
56
  - prove the scaffold in a clean state before deeper feature work
46
57
  - verify clean startup and teardown behavior under the selected stack's runtime contract
47
58
  - for Dockerized web projects, verify clean startup and teardown behavior under the chosen project namespace
@@ -66,3 +77,5 @@ Scaffold should make later slices easier, not force them to retrofit missing fun
66
77
  - use local and narrow checks while correcting scaffold work
67
78
  - reserve one broad owner-run scaffold gate for actual scaffold acceptance
68
79
  - do not spend extra broad reruns once the acceptance question is already answered
80
+ - for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the baseline actually works
81
+ - after that scaffold confirmation, do not run Docker again during ordinary development work; the next Docker-based run should be at development completion when integrated behavior is checked
@@ -9,8 +9,7 @@ Use this skill only when intentionally moving from one planned developer session
9
9
 
10
10
  ## Typical uses
11
11
 
12
- - build session -> stabilization session
13
- - stabilization session -> remediation session
12
+ - develop session -> bugfix session
14
13
 
15
14
  ## Rules
16
15