theslopmachine 0.4.2 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -11,7 +11,7 @@ Use this skill during `P0 Intake and Setup` and whenever startup or recovery sta
11
11
 
12
12
  - detect whether the run is new or resumed
13
13
  - initialize or recover workflow metadata consistently
14
- - initialize the planned bounded developer-session slots
14
+ - initialize developer-session tracking for the run
15
15
  - recover the current active developer session when one already exists
16
16
 
17
17
  ## Usage rules
@@ -55,11 +55,11 @@ Optional startup inputs may include:
55
55
  2. create `../.ai/metadata.json` for internal workflow state
56
56
  3. initialize parent-root `../metadata.json` with the required schema and store the full prompt text in `prompt`
57
57
  4. initialize root workflow state and top-level phase Beads items
58
- 5. complete clarification using the clarification skill
59
- 6. wait only for the initial clarification approval before development starts
60
- 7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
61
- 8. initialize the bounded developer-session slots
62
- 9. start the build developer session only after `P2` is ready to begin
58
+ 5. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
59
+ 6. complete clarification using the clarification skill
60
+ 7. wait only for the initial clarification approval before development starts
61
+ 8. initialize developer-session tracking for the run
62
+ 9. start the develop developer session only after `P2` is ready to begin
63
63
  10. send this exact first planning opener as the first message in that session: `lets plan this <original-prompt>`
64
64
  11. wait for the developer's first exchange
65
65
  12. send the approved clarification prompt as the second owner message in that same session
@@ -67,9 +67,9 @@ Optional startup inputs may include:
67
67
 
68
68
  ## First developer-session handshake
69
69
 
70
- The first bounded developer session must begin in this exact order:
70
+ The first developer session of the run must begin in this exact order:
71
71
 
72
- 1. owner starts the build developer session
72
+ 1. owner starts the develop developer session
73
73
  2. owner sends: `lets plan this <original-prompt>`
74
74
  3. developer responds
75
75
  4. owner sends the approved clarification prompt
@@ -84,7 +84,7 @@ Do not merge those two messages into one.
84
84
  - root workflow state exists
85
85
  - `../.ai/metadata.json` exists
86
86
  - `../metadata.json` exists
87
- - planned developer session slots are initialized
87
+ - developer-session tracking is initialized
88
88
  - required parent-root directories exist
89
89
 
90
90
  ## Metadata and workflow files
@@ -107,19 +107,26 @@ Track at least:
107
107
  - `backend_evaluation_session_id`
108
108
  - `frontend_evaluation_session_id`
109
109
  - `last_evaluation_session_id`
110
+ - `backend_evaluation_report_path`
111
+ - `frontend_evaluation_report_path`
110
112
  - `passed_evaluation_tracks`
111
113
  - `developer_sessions`
112
- - `active_developer_session_index`
114
+ - `active_developer_session_id`
115
+ - `next_develop_session_number`
116
+ - `next_bugfix_session_number`
117
+ - `submission_completed`
113
118
 
114
- Each planned developer session record should include enough to recover it later, such as:
119
+ Each developer session record should include enough to recover and export it later, such as:
115
120
 
116
- - `index`
121
+ - `session_class`
122
+ - `sequence`
117
123
  - `label`
118
- - `phase_group`
124
+ - `created_phase`
119
125
  - `session_id`
120
126
  - `status`
121
127
  - `handoff_in`
122
128
  - `handoff_out`
129
+ - `reopened_after_submission`
123
130
 
124
131
  Required project metadata fields in `../metadata.json` when relevant:
125
132
 
@@ -136,20 +143,29 @@ Required project metadata fields in `../metadata.json` when relevant:
136
143
  - fill known values immediately and keep the file current as the project becomes clearer
137
144
  - prefer explicit values; use `null` only when a field is genuinely unknown or not applicable
138
145
 
139
- ## Bounded session model
146
+ ## Session model
140
147
 
141
- Track up to three planned developer sessions:
148
+ - keep exactly one active developer session at a time
149
+ - record every developer session in `developer_sessions`
150
+ - classify sessions as `develop` or `bugfix`
151
+ - every session created before the first successful submission packaging is `develop`
152
+ - every session created after successful submission packaging to address external evaluation follow-up is `bugfix`
153
+ - create a new developer session only when:
154
+ - the user explicitly requests a new session
155
+ - post-submission external evaluation feedback reopens the project for more fixes
142
156
 
143
- 1. build
144
- 2. stabilization
145
- 3. remediation
157
+ If the user explicitly requests a new session while one is active:
146
158
 
147
- Later session slots may remain unused if the workflow never needs them.
159
+ 1. ask the current developer exactly: `give me a summary of all the work that has been done`
160
+ 2. treat that reply as the handoff summary
161
+ 3. start the new developer session with that summary as the handoff-in context
162
+ 4. keep the session class as `develop` before first successful submission, otherwise keep it as `bugfix`
148
163
 
149
164
  ## Initial structure rule
150
165
 
151
166
  - parent-root `../docs/` is the owner-maintained external documentation directory
152
167
  - parent-root `../sessions/` is the session artifact directory for exported conversation traces
168
+ - `../docs/questions.md` is a mandatory project artifact produced during clarification and preserved through packaging
153
169
  - do not treat repo-local `docs/` as the active external documentation location
154
170
 
155
171
  ## Recovery rule
@@ -29,7 +29,7 @@ Use this skill during `P4 Development` before prompting the developer.
29
29
  - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
30
30
  - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
31
31
  - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
32
- - perform a clean-slate sweep before reporting module completion: remove seeded credentials, weak demo defaults, test-account hints, prototype residue, and other production-inappropriate artifacts
32
+ - perform a clean-slate sweep before reporting module completion: remove weak demo defaults, stray test-account hints, prototype residue, and other production-inappropriate artifacts; deterministic non-secret Dockerized dev/test default credentials are allowed only when clearly labeled local-only and required for startup or test stability
33
33
  - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
34
34
  - when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
35
35
  - if a required user-facing or admin-facing surface is missing, treat that gap as incomplete implementation rather than a reason to bypass the surface with direct API calls or test-only shortcuts
@@ -54,6 +54,7 @@ Use this skill during `P4 Development` before prompting the developer.
54
54
  - for mobile projects, default local UI testing to the selected mobile test stack and use a platform-appropriate mobile UI/E2E tool when device-flow proof matters
55
55
  - for desktop projects, default local UI verification to Playwright's Electron support or another platform-appropriate desktop UI/E2E tool when window-flow proof matters
56
56
  - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
57
+ - in each slice reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
57
58
 
58
59
  ## Quality rules
59
60
 
@@ -36,12 +36,13 @@ These two files are the only evaluation prompt sources for evaluation runs.
36
36
  - send that fully composed text block directly to the fresh `General` evaluator session
37
37
  - never tell the evaluator to go read prompt files, metadata files, or evaluation template paths on its own
38
38
  - never send only a path, filename, or shorthand reference and expect the evaluator to assemble the prompt itself
39
- - never reuse, resume, or continue a prior evaluation session
39
+ - do not reuse, resume, or continue a prior evaluation session except for the explicit pass-3 fallback on the last still-failing track
40
40
  - run the two evaluations sequentially, not in parallel, so shared runtime state, ports, databases, and artifacts do not conflict
41
41
  - track backend and frontend evaluation status separately
42
42
  - once backend evaluation passes, do not run backend evaluation again in later remediation rounds
43
43
  - once frontend evaluation passes, do not run frontend evaluation again in later remediation rounds
44
44
  - require each evaluation session to produce its own detailed evaluation report artifact
45
+ - record the current accepted backend and frontend evaluation report paths in metadata for later packaging
45
46
  - always compare both evaluations against the original prompt for alignment, not just the delivered implementation
46
47
  - keep reports file-backed and bring only short summaries into chat
47
48
  - rerun only the evaluation track that still needs re-evaluation after remediation
@@ -61,7 +62,7 @@ These two files are the only evaluation prompt sources for evaluation runs.
61
62
 
62
63
  ## Remediation loop
63
64
 
64
- - route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
65
+ - route accepted blocking issues back into the active developer session rather than inventing an untracked side path
65
66
  - after remediation, rerun strong local verification before any re-evaluation:
66
67
  - relevant local test commands
67
68
  - local runtime checks when affected behavior needs runtime proof
@@ -23,6 +23,8 @@ Once a failure class is known:
23
23
  - run the relevant tests for the changed behavior
24
24
  - during in-phase verification, prefer the fastest meaningful local test commands for the known failure class
25
25
  - use local verification to prepare for the next owner-run broad gate rather than duplicating it casually
26
+ - when sending a developer fix request for integrated-verification failures, require the reply to name the exact rerun commands and the concrete results they produced
27
+ - do not reflexively rerun the same expensive local test or E2E command on the owner side when the developer's reported evidence is already clear and sufficient
26
28
  - for applicable UI-bearing work, run the selected stack's platform-appropriate UI/E2E tool for the affected flows in-phase, capture screenshots or equivalent artifacts, and verify the UI behavior and quality directly
27
29
  - verify requirement closure, not just feature existence
28
30
  - verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
@@ -91,6 +91,9 @@ Selected-stack defaults:
91
91
  - for Dockerized web backend/fullstack projects, `./run_tests.sh` must run the full test path through Docker rather than a purely local test invocation
92
92
  - for non-web or non-Docker projects, `./run_tests.sh` must call the selected stack's equivalent full test path while keeping the same single-command interface
93
93
  - local tests should still exist for ordinary developer iteration, but `./run_tests.sh` is the broad final test path for the project
94
+ - for Dockerized web backend/fullstack projects, plan collision-resistant Compose defaults from the start: unique `COMPOSE_PROJECT_NAME`, no unnecessary `container_name`, only the app-facing port exposed to host by default, and internal services kept off host ports unless required
95
+ - for Dockerized web backend/fullstack projects, prefer random host-port binding on `127.0.0.1` for the default runtime so parallel projects can start cleanly; if a fixed host port is genuinely required, plan an override plus a free-port fallback in the runtime or test wrapper
96
+ - when Dockerized dev/test runtime or tests require credentials or bootstrap accounts, plan deterministic non-secret default runtime env values through Compose or wrapper configuration so local startup does not fail on missing credentials, while keeping those defaults out of `.env` files and out of Dockerfile image layers
94
97
  - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
95
98
  - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
96
99
  - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
@@ -110,7 +113,7 @@ Selected-stack defaults:
110
113
  - for each major module, define how it integrates with existing modules and which shared contracts it must follow consistently
111
114
  - define verification plans that include cross-module scenarios and seam checks, not just isolated feature checks
112
115
  - surface real unresolved risks honestly
113
- - keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the v2 verification cadence
116
+ - keep the plan aligned with current policy: owner-managed external docs, no `.env` files, junior-friendly repo-local README, and the current verification cadence
114
117
 
115
118
  ## Exit target
116
119
 
@@ -21,7 +21,7 @@ Use this skill only during `P9 Remediation`.
21
21
  - rerun the relevant verification after each fix
22
22
  - if the issue exposed drift, docs overclaim, or missing acceptance coverage, repair that too before closing the issue
23
23
  - update docs if behavior or instructions changed
24
- - report exactly what was fixed, what was rerun, and what still looks risky if anything remains
24
+ - report exactly what was fixed, the exact verification commands that were rerun, the concrete results they produced, and what still looks risky if anything remains
25
25
 
26
26
  ## Rules
27
27
 
@@ -24,17 +24,19 @@ Use this skill only after `P10 Submission Packaging` has materially and formally
24
24
 
25
25
  ## Output location
26
26
 
27
- Write dated retrospective files under:
27
+ Write run-scoped retrospective files under:
28
28
 
29
29
  - `~/slopmachine/retrospectives/`
30
30
 
31
31
  Preferred filenames:
32
32
 
33
- - `retrospective-YYYY-MM-DD.md`
34
- - `improvement-actions-YYYY-MM-DD.md`
33
+ - `retrospective-<run_id>.md`
34
+ - `improvement-actions-<run_id>.md`
35
35
 
36
36
  If only one file is needed, the retrospective file is sufficient.
37
37
 
38
+ The `run_id` must come from the current project's `../.ai/metadata.json` so the retrospective can be matched back to one exact workflow run.
39
+
38
40
  ## Evidence sources
39
41
 
40
42
  Prefer existing workflow artifacts first:
@@ -49,6 +49,13 @@ For Dockerized web backend/fullstack projects, scaffold must make these commands
49
49
  - remove prototype residue from runtime foundations: no placeholder titles, hidden setup, fake defaults, or seeded live-path assumptions
50
50
  - make prompt-critical runtime behavior visible in the scaffold instead of hand-waving it for later, especially offline, worker, backup, or HTTPS requirements
51
51
  - for Dockerized web projects, keep runtime isolation clean in shared environments: use self-contained Compose namespacing, avoid fragile generic project names, and prefer Compose-managed service naming over unnecessary hardcoded `container_name` values
52
+ - for Dockerized web projects, derive a unique `COMPOSE_PROJECT_NAME` from the repo or worktree identity for runtime wrappers, and use a separate unique test namespace for `./run_tests.sh` so parallel local projects do not collide
53
+ - for Dockerized web projects, expose only the primary app-facing port to the host by default, keep databases/cache/internal services off host ports unless the prompt truly requires exposure, and bind exposed ports to `127.0.0.1`
54
+ - for Dockerized web projects, prefer Docker-assigned random host ports for the default host binding so plain `docker compose up --build` can run without host-port collisions; if the prompt requires a fixed host port, support an overrideable host-port variable and make the runtime or test wrapper fall back to a free port automatically when needed
55
+ - for Dockerized web projects, keep image, network, and volume naming under Compose project scoping; if explicit image names are needed, namespace them with the Compose project name instead of using generic shared names
56
+ - for Dockerized web projects, add healthchecks and make runtime or test wrappers wait for service readiness before proceeding so startup is reliable on slower machines
57
+ - when Dockerized dev/test startup or tests require credentials or bootstrap accounts, provide deterministic non-secret default values through Compose/runtime environment configuration so `docker compose up --build` and `./run_tests.sh` do not fail due to missing credentials
58
+ - keep those Dockerized dev/test default credentials clearly marked as local-only test credentials, do not store them in `.env` files, and do not bake them into the Dockerfile image layers
52
59
  - require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
53
60
  - for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
54
61
  - establish README structure early instead of leaving it until the end
@@ -1,23 +1,36 @@
1
1
  ---
2
2
  name: session-rollover
3
- description: Planned developer-session handoff and rollover rules for bounded slopmachine developer sessions.
3
+ description: Developer-session handoff and rollover rules for intentional slopmachine session switches.
4
4
  ---
5
5
 
6
6
  # Session Rollover
7
7
 
8
- Use this skill only when intentionally moving from one planned developer session slot to the next.
8
+ Use this skill only when intentionally starting a new developer session while preserving the old one as history.
9
9
 
10
10
  ## Typical uses
11
11
 
12
- - build session -> stabilization session
13
- - stabilization session -> remediation session
12
+ - the user explicitly asks for a new developer session
13
+ - post-submission external evaluation feedback reopens the project for more fixes
14
14
 
15
15
  ## Rules
16
16
 
17
- - rollover is planned, not a recovery event
17
+ - rollover is intentional, not a recovery event
18
+ - keep exactly one active developer session at a time
18
19
  - do not open the next developer session until the current session has a clear handoff out
19
20
  - record the new session id and status immediately after the new session is created
20
21
 
22
+ If the user explicitly requests a new session while one is active:
23
+
24
+ 1. ask the current developer exactly: `give me a summary of all the work that has been done`
25
+ 2. store that reply as the handoff artifact for the closing session
26
+ 3. start the new developer session with that summary as the handoff-in context
27
+
28
+ If the project is reopened after successful submission because of external evaluation feedback:
29
+
30
+ 1. record the external issue list and any accepted evaluation report references in the handoff
31
+ 2. start a new `bugfix` developer session
32
+ 3. use that external feedback plus the handoff summary as the starting context
33
+
21
34
  ## Required handoff contents
22
35
 
23
36
  - current phase and why rollover is happening
@@ -33,7 +46,8 @@ When rollover succeeds, update metadata so it is obvious:
33
46
  - which prior session is now completed or inactive
34
47
  - which new session is active
35
48
  - where the handoff artifact lives
36
- - which phase group the new session now owns
49
+ - which session class the new session belongs to
50
+ - whether the new session was opened after successful submission
37
51
 
38
52
  ## Avoid
39
53
 
@@ -73,14 +73,16 @@ The final submission layout in the parent project root must be:
73
73
  - `questions.md`
74
74
  - `submission/`
75
75
  - generated submission documents based on the reference files in `~/`
76
- - evaluation reports
76
+ - `backend-evaluation-report.md` when applicable
77
+ - `frontend-evaluation-report.md` when applicable
78
+ - cleaned original session exports for every tracked developer session
77
79
  - repo file structure screenshot
78
80
  - working app screenshots
79
81
  - relocated screenshots and proof materials needed for submission review
80
82
  - current working directory delivered as parent-root `repo/`
81
83
  - `../sessions/`
82
- - `develop-N.json`
83
- - `bugfix-N.json`
84
+ - converted `develop-N.json` session exports for all pre-submission developer sessions
85
+ - converted `bugfix-N.json` session exports for all post-submission external-follow-up developer sessions
84
86
  - `../metadata.json`
85
87
  - parent-root `../.tmp/` directory moved out of current `.tmp/` when it exists
86
88
 
@@ -91,7 +93,7 @@ The final submission layout in the parent project root must be:
91
93
  - verify parent-root `../docs/design.md` exists and reflects the final delivered design when applicable
92
94
  - verify parent-root `../docs/api-spec.md` exists and reflects the final delivered interfaces when applicable
93
95
  - verify parent-root `../docs/test-coverage.md` exists and reflects the final delivered verification coverage
94
- - verify parent-root `../docs/questions.md` exists from the accepted clarification/question record when applicable
96
+ - verify parent-root `../docs/questions.md` exists from the accepted clarification/question record
95
97
  - create parent-root `../submission/` for final generated submission artifacts and reviewer-facing proof
96
98
  - use these exact `~/slopmachine/` document templates as the packaging reference sources:
97
99
  - `~/slopmachine/document-completeness.md`
@@ -100,11 +102,13 @@ The final submission layout in the parent project root must be:
100
102
  - `~/slopmachine/quality-document.md`
101
103
  - ensure `README.md` matches the delivered codebase, functionality, runtime steps, and test steps, stays friendly to a junior developer, and does not reference the external docs set in `../docs/`
102
104
  - include `./run_tests.sh` and any supporting runner logic it needs to execute the project's broad test path from a clean environment
103
- - relocate evaluation artifacts into parent-root `../submission/`
105
+ - relocate accepted backend and frontend evaluation reports into parent-root `../submission/` as `backend-evaluation-report.md` and `frontend-evaluation-report.md` when those tracks apply
104
106
  - relocate screenshots and proof materials relevant to runtime behavior and major flows into parent-root `../submission/`
105
107
  - preserve parent-root `../sessions/` as the session artifact directory for converted workflow session exports
106
108
  - export all tracked workflow sessions before generating the final submission documents
107
- - after the session exports are complete, use the last evaluation session recorded in metadata when generating the final submission report content so the report answers can come from cached evaluation context instead of rebuilding that context from scratch
109
+ - preserve all cleaned original session exports under parent-root `../submission/`
110
+ - when packaging succeeds, mark `submission_completed` as true in internal metadata so later reopen handling can classify post-submission sessions correctly
111
+ - after the session exports are complete, the owner should answer the final submission-document questions based on the recent review responses gathered across evaluation reports, remediation results, verification notes, and packaging checks
108
112
 
109
113
  ## Session export sequence
110
114
 
@@ -112,22 +116,34 @@ This export sequence must happen first in packaging, before final submission doc
112
116
 
113
117
  Export every tracked workflow session from metadata.
114
118
 
119
+ Use a class-and-sequence label for each tracked session, for example:
120
+
121
+ - `develop-1`
122
+ - `develop-2`
123
+ - `bugfix-1`
124
+
115
125
  For each tracked session:
116
126
 
117
127
  1. `opencode export <session-id> > ../session-export-<label>.json`
118
128
  2. `python3 ~/utils/strip_session_parent.py ../session-export-<label>.json --output ../session-clean-<label>.json`
119
129
  3. `python3 ~/utils/convert_ai_session.py -i ../session-clean-<label>.json -o ../sessions/<final-name>.json`
130
+ 4. copy `../session-clean-<label>.json` to `../submission/session-clean-<label>.json`
120
131
 
121
132
  Naming rule for converted files under `../sessions/`:
122
133
 
123
- - development-phase sessions become `develop-N.json`
124
- - hardening or remediation sessions become `bugfix-N.json`
134
+ - every pre-submission developer session becomes `develop-N.json`
135
+ - every post-submission external-follow-up developer session becomes `bugfix-N.json`
136
+
137
+ Naming rule for cleaned original session exports under `../submission/`:
138
+
139
+ - `session-clean-develop-N.json`
140
+ - `session-clean-bugfix-N.json`
125
141
 
126
142
  After those steps:
127
143
 
128
- - verify every planned developer session has been exported and converted before continuing packaging
144
+ - verify every tracked developer session has been exported, cleaned, converted, and copied into `../submission/` before continuing packaging
129
145
  - keep the converted session outputs in `../sessions/` using the naming rules above
130
- - treat the `../session-export-*.json` and `../session-clean-*.json` files as temporary packaging intermediates unless the package contract later says otherwise
146
+ - treat only the raw `../session-export-*.json` files as temporary packaging intermediates unless the package contract later says otherwise
131
147
  - if the required utilities, metadata session ids, or output files are missing, packaging is not ready to continue
132
148
  - only after these exports are complete may you generate the final submission documents
133
149
 
@@ -135,19 +151,19 @@ After those steps:
135
151
 
136
152
  After all session exports are complete:
137
153
 
138
- 1. recover the last evaluation session id from metadata
139
- 2. use that last evaluation session to answer the final submission-document questions from its cached context
140
- 3. generate the required final submission documents from that evaluation context plus the canonical `~/slopmachine/` reference files
154
+ 1. gather the recent review responses from evaluation reports, accepted triage notes, remediation outcomes, verification notes, and packaging checks
155
+ 2. answer the final submission-document questions from that gathered review evidence
156
+ 3. generate the required final submission documents from that gathered review evidence plus the canonical `~/slopmachine/` reference files
141
157
 
142
158
  Do not start generating the final submission documents before the session exports are complete.
143
- Do not create a new evaluation session for final report generation if the last evaluation session is still available.
144
- If the last evaluation session id is missing or unusable, stop and repair metadata/session recovery before continuing packaging.
159
+ Do not rely on one single cached evaluation session as the only source for final submission-document answers.
160
+ If the gathered review evidence is incomplete, stop and repair the missing evidence before continuing packaging.
145
161
 
146
162
  ## Required file moves
147
163
 
148
164
  - if repo-local `docs/` exists, treat it as accidental residue, reconcile any missing content into parent-root `../docs/`, and remove it from the delivered `repo/` tree
149
165
  - if current `.tmp/` exists, move the whole directory to parent-root `../.tmp/` before harvesting any reviewer-facing artifacts from it
150
- - after preserving parent-root `../.tmp/`, collect the relevant evaluation reports and proof artifacts from it into parent-root `../submission/`
166
+ - after preserving parent-root `../.tmp/`, collect the accepted backend and frontend evaluation reports plus other relevant proof artifacts into parent-root `../submission/`
151
167
  - collect screenshots and other required proof materials from repo-local runtime/output directories into parent-root `../submission/`
152
168
  - after relocation, the final submission should not require digging through repo-local output directories to find evidence
153
169
  - keep screenshot filenames clear enough that the referenced runtime page, flow, or evidence purpose is understandable
@@ -166,13 +182,15 @@ If the last evaluation session id is missing or unusable, stop and repair metada
166
182
  - confirm docs describe delivered behavior, not planned or aspirational behavior
167
183
  - confirm parent-root `../docs/test-coverage.md` explains the tested flows, coverage boundaries, and how the evaluator should interpret the coverage evidence
168
184
  - confirm generated submission documents exist under parent-root `../submission/` and correspond to the final qualified state
169
- - confirm evaluation reports and screenshots have been relocated into parent-root `../submission/`
185
+ - confirm accepted backend and frontend evaluation reports have been relocated into parent-root `../submission/` when those tracks apply
186
+ - confirm cleaned original session exports for every tracked developer session exist under parent-root `../submission/`
170
187
  - confirm shared project docs live in parent-root `../docs/` and any accidental repo-local `docs/` copy has been removed from the delivered tree
171
188
  - confirm required screenshots have been relocated into parent-root `../submission/`
172
189
  - confirm parent-root metadata fields are populated correctly
190
+ - confirm internal metadata marks `submission_completed` as true for a successful packaged state
173
191
  - confirm session export naming rules are followed under `../sessions/`:
174
- - `develop-N.json` for development-phase sessions
175
- - `bugfix-N.json` for hardening/remediation sessions
192
+ - `develop-N.json` for every pre-submission developer session
193
+ - `bugfix-N.json` for every post-submission external-follow-up developer session
176
194
 
177
195
  ## Submission artifact and response contract
178
196
 
@@ -42,7 +42,8 @@ Use this skill after development begins whenever you are reviewing work, decidin
42
42
  - do not accept frontend/backend drift in fullstack work
43
43
  - do not accept missing end-to-end coverage for major fullstack flows
44
44
  - do not accept UI claims without screenshot-backed or platform-equivalent visual evidence when the change affects real UI behavior
45
- - do not accept prototype residue such as seeded credentials, weak demo defaults, login hints, or unsanitized user-facing error behavior
45
+ - do not accept production-inappropriate residue such as weak demo defaults, stray login hints, or unsanitized user-facing error behavior
46
+ - deterministic non-secret Dockerized dev/test default credentials are acceptable only when they are clearly labeled local-only, supplied through runtime configuration rather than `.env` files or Dockerfile image layers, and required so local startup or tests do not fail on missing credentials
46
47
  - do not accept multi-tenant or cross-user security claims without negative isolation evidence when that boundary matters
47
48
  - do not accept file-bearing flows without path confinement and traversal-style validation when that boundary matters
48
49
  - do not accept partial foundation work for complex features when the prompt implies broader usable scope, infrastructure depth, or security depth than what was actually delivered
@@ -55,6 +56,8 @@ Use this skill after development begins whenever you are reviewing work, decidin
55
56
  - use targeted local verification as the default during scaffold corrections, development, hardening, and remediation
56
57
  - reserve the selected stack's broad verification path for the limited owner-run gate moments in the workflow budget
57
58
  - do not turn ordinary acceptance into repeated integrated-style gate runs
59
+ - do not run `./run_tests.sh` casually on the owner side
60
+ - do not run `docker compose up --build` casually on the owner side
58
61
  - for Dockerized web backend/fullstack projects, the owner must run `docker compose up --build` and `./run_tests.sh` once after scaffold completion to confirm the scaffold baseline
59
62
  - after that scaffold confirmation, the next Docker-based run should be at development completion or integrated-verification entry unless a real blocker forces earlier escalation
60
63
 
@@ -64,10 +67,12 @@ Use this skill after development begins whenever you are reviewing work, decidin
64
67
  - review technical quality, prompt alignment, architecture impact, and verification depth of the current work
65
68
  - during normal implementation iteration, always prefer fast local language-native or framework-native verification for the changed area instead of the selected stack's broad gate path
66
69
  - require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
70
+ - require the developer to report the exact verification commands that were run and the concrete results they produced
67
71
  - require local runtime proof when relevant by starting the app or service through the selected stack's local run path and exercising the changed behavior directly rather than jumping to the broad gate path
68
72
  - if the local toolchain is missing, require the developer to install or enable it first; do not jump to the broad gate path during ordinary iteration just because local setup is inconvenient
69
73
  - do not accept hand-wavy claims that local verification is unavailable without a real setup attempt and clear explanation
70
74
  - for applicable UI-bearing work, require the selected stack's local UI/E2E tool on affected flows plus screenshot review or equivalent platform artifacts and explicit UI validation
75
+ - if the developer already ran the relevant targeted test or E2E command and reported it clearly, do not rerun the same command on the owner side unless the evidence is weak, contradictory, flaky, high-risk, or needed to answer a new question
71
76
  - if verification is weak, missing, or failing, require fixes and reruns before acceptance
72
77
  - if documentation or repo hygiene drifts, secrets leak, contracts drift, or frontend integrity is compromised, require cleanup before acceptance
73
78
  - keep looping until the current work is genuinely acceptable
@@ -77,7 +82,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
77
82
  - a broad gate is an owner-run integrated verification boundary, not every ordinary phase change
78
83
  - a phase change alone does not automatically require a broad gate unless that phase exit explicitly calls for one
79
84
  - a broad gate normally means some combination of full clean runtime proof, `./run_tests.sh`, and platform-appropriate UI/E2E evidence when UI-bearing flows exist
80
- - in v2, the workflow target is at most 3 broad owner-run verification moments across the whole cycle
85
+ - the workflow target is at most 3 broad owner-run verification moments across the whole cycle
81
86
  - ordinary planning, ordinary slice acceptance, and routine in-phase verification are not broad gates by default and should rely on targeted local verification unless the risk profile says otherwise
82
87
 
83
88
  For Dockerized web backend/fullstack projects, the default Docker cadence is:
@@ -101,6 +106,8 @@ Use evidence such as internal metadata files, structured Beads comments, verific
101
106
  - when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
102
107
  - for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
103
108
  - scaffold acceptance also requires clean startup and teardown behavior in the selected runtime model; for Dockerized web projects this includes self-contained Compose namespacing and no unnecessary fragile `container_name` usage
109
+ - for Dockerized web projects, scaffold acceptance also requires collision-resistant shared-machine defaults: only the primary app-facing port exposed to host by default, internal services not bound to host without prompt need, default host binding on `127.0.0.1`, and either random host-port assignment or a real free-port fallback when fixed ports are required
110
+ - for Dockerized web projects, scaffold acceptance also requires deterministic non-secret local-only default credentials in runtime configuration when startup or tests depend on credentials, and those defaults must not live in `.env` files or Dockerfile image layers
104
111
  - for Dockerized web backend/fullstack projects, scaffold acceptance is not complete until the owner has actually run `docker compose up --build` and `./run_tests.sh` once successfully after scaffold completion
105
112
  - module implementation requires platform-appropriate local verification and selected-stack UI/E2E evidence when UI-bearing flows are material
106
113
  - module implementation acceptance should challenge tenant isolation, path confinement, sanitized error behavior, prototype residue, integration seams, and cross-cutting consistency when those concerns are in scope
@@ -1,45 +1,59 @@
1
- ## 5.1 Document Completeness
1
+ ## Delivery Completeness Template
2
+
3
+ Use this file as the structure reference for the delivery-completeness and hard-threshold submission outputs.
4
+
5
+ Do not copy sample-project content into the final document.
6
+ Replace every placeholder with evidence from the current delivered project.
7
+
8
+ ## Document completeness
2
9
 
3
10
  | Document Type | File Path | Completeness | Description |
4
11
  | :---- | :---- | :---- | :---- |
5
- | **User Instructions** | README.md | Complete | Includes functional features, installation steps, and usage guides |
6
- | **Testing Instructions** | TESTING.md | Complete | Test environment isolation and execution methods |
7
- | **Data Persistence Instructions** | DATA\_PERSISTENCE.md | Complete | Docker volume mounting and data protection |
12
+ | **User Instructions** | `<path>` | `<Complete/Partial/Missing>` | `<what is present and whether it is enough>` |
13
+ | **Testing Instructions** | `<path>` | `<Complete/Partial/Missing>` | `<how local and broad test paths are documented>` |
14
+ | **Runtime/Deployment Instructions** | `<path>` | `<Complete/Partial/Missing>` | `<how runtime startup is documented>` |
15
+ | **Other Required Project Docs** | `<path>` | `<Complete/Partial/Missing>` | `<other prompt-required or delivery-required docs>` |
8
16
 
9
- 5.2 Code Completeness
17
+ ## Code completeness
10
18
 
11
- | Module | Implementation Status | Description |
19
+ | Module / Area | Implementation Status | Description |
12
20
  | :---- | :---- | :---- |
13
- | **Configuration Management** | Complete | complaint\_system/settings.py |
14
- | **URL Routing** | Complete | complaint\_system/urls.py, main/urls.py |
15
- | **Data Models** | Complete | main/models.py (FoodVote, Suggestion) |
16
- | **View Functions** | Complete | main/views.py (5 view functions) |
17
- | **Backend Management** | Complete | main/admin.py |
18
- | **Template Files** | Complete | main/templates/ (4 templates) |
19
- | **Test Suite** | ✅ Complete | tests/ (41 test cases) |
20
- | **Dependency Config** | ✅ Complete | requirements.txt |
21
- | **Docker Config** | ✅ Complete | Dockerfile, docker-compose.yml |
22
- | **Startup Script** | Complete | docker-entrypoint.sh |
23
- | **Test Data** | ✅ Complete | add\_test\_data.py |
24
-
25
- 5.3 Deployment Completeness
26
-
27
- | Deployment Method | Implementation Status | Description |
21
+ | **Core runtime** | `<Complete/Partial/Missing>` | `<what exists>` |
22
+ | **Primary user-facing flows** | `<Complete/Partial/Missing>` | `<what exists>` |
23
+ | **Admin/operator flows** | `<Complete/Partial/Missing/Not Applicable>` | `<what exists>` |
24
+ | **Persistence / state / data model** | `<Complete/Partial/Missing/Not Applicable>` | `<what exists>` |
25
+ | **Tests** | `<Complete/Partial/Missing>` | `<what exists>` |
26
+ | **Build / packaging / runtime config** | `<Complete/Partial/Missing>` | `<what exists>` |
27
+
28
+ ## Runtime and deployment completeness
29
+
30
+ | Runtime Method | Implementation Status | Description |
28
31
  | :---- | :---- | :---- |
29
- | **Local Execution** | Complete | requirements.txt \+ python manage.py runserver |
30
- | **Docker Deployment** | Complete | docker compose up one-click startup |
31
- | **Data Persistence** | Complete | Volume mounting for media\_data and db\_data |
32
- | **Auto-Initialization** | Complete | Automatic database migration, admin creation, and test data addition |
32
+ | **Primary runtime command** | `<Complete/Partial/Missing>` | `<docker compose up --build or ./run_app.sh, etc.>` |
33
+ | **Broad test command** | `<Complete/Partial/Missing>` | `<./run_tests.sh behavior>` |
34
+ | **Local development verification** | `<Complete/Partial/Missing>` | `<local tests/tools present for normal iteration>` |
35
+ | **Persistence / initialization / automation** | `<Complete/Partial/Missing/Not Applicable>` | `<what happens automatically>` |
33
36
 
34
- 5.4 Delivery Completeness Rating
37
+ ## Delivery completeness rating
35
38
 
36
- **Rating: 10/10**
39
+ **Rating:** `<score or qualitative result>`
37
40
 
38
41
  **Strengths:**
39
42
 
40
- * **Complete Documentation**: README and other documents are comprehensive
41
- * **Complete Code**: All functional modules are implemented with nothing missing
42
- * **Runnability**: Supports both local execution and Docker one-click deployment
43
- * **Test Coverage**: 41 test cases with a completely isolated test environment
44
- * **Automation**: Docker startup automatically completes all initialization tasks
43
+ - `<strength 1>`
44
+ - `<strength 2>`
45
+ - `<strength 3>`
46
+
47
+ **Gaps or limits:**
48
+
49
+ - `<gap 1 or 'None material'>`
50
+ - `<gap 2 if needed>`
51
+
52
+ ## Hard-threshold summary inputs
53
+
54
+ Explicitly answer:
45
55
 
56
+ 1. Can the delivered product actually run through its primary runtime command?
57
+ 2. Can the delivered product actually be verified through `./run_tests.sh`?
58
+ 3. Does the delivered output fully cover the core prompt requirements?
59
+ 4. Is the delivered product a real 0-to-1 delivery rather than a partial or schematic implementation?