theslopmachine 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/MANUAL.md +63 -0
- package/README.md +23 -0
- package/RELEASE.md +81 -0
- package/assets/agents/developer.md +294 -0
- package/assets/agents/slopmachine.md +510 -0
- package/assets/skills/beads-operations/SKILL.md +75 -0
- package/assets/skills/clarification-gate/SKILL.md +51 -0
- package/assets/skills/developer-session-lifecycle/SKILL.md +75 -0
- package/assets/skills/final-evaluation-orchestration/SKILL.md +75 -0
- package/assets/skills/frontend-design/SKILL.md +41 -0
- package/assets/skills/get-overlays/SKILL.md +157 -0
- package/assets/skills/planning-gate/SKILL.md +68 -0
- package/assets/skills/submission-packaging/SKILL.md +268 -0
- package/assets/skills/verification-gates/SKILL.md +106 -0
- package/assets/slopmachine/backend-evaluation-prompt.md +275 -0
- package/assets/slopmachine/beads-init.js +428 -0
- package/assets/slopmachine/document-completeness.md +45 -0
- package/assets/slopmachine/engineering-results.md +59 -0
- package/assets/slopmachine/frontend-evaluation-prompt.md +304 -0
- package/assets/slopmachine/implementation-comparison.md +36 -0
- package/assets/slopmachine/quality-document.md +108 -0
- package/assets/slopmachine/templates/AGENTS.md +114 -0
- package/assets/slopmachine/utils/convert_ai_session.py +1837 -0
- package/assets/slopmachine/utils/strip_session_parent.py +66 -0
- package/bin/slopmachine.js +9 -0
- package/package.json +25 -0
- package/src/cli.js +32 -0
- package/src/constants.js +77 -0
- package/src/init.js +179 -0
- package/src/install.js +330 -0
- package/src/utils.js +162 -0
|
@@ -0,0 +1,268 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: submission-packaging
|
|
3
|
+
description: Final delivery packaging checklist and artifact assembly rules for repo-cwd blueprint-driven projects. Use only during the submission packaging phase.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Submission Packaging
|
|
7
|
+
|
|
8
|
+
Use this skill only when the project has already passed integrated verification, hardening, and the final evaluation/remediation loop closely enough to qualify for packaging.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill only in the submission packaging phase.
|
|
13
|
+
- Treat it as internal packaging guidance, not developer-visible workflow text.
|
|
14
|
+
- Do not use it to justify skipping verification or evaluation gates.
|
|
15
|
+
- Packaging is not cleanup theater. Preserve required evidence and remove only local junk, caches, and accidental noise.
|
|
16
|
+
- Do not declare packaging complete early.
|
|
17
|
+
- Packaging is incomplete until every required step in this skill has been executed and every required artifact path has been verified to exist.
|
|
18
|
+
|
|
19
|
+
## Completion discipline
|
|
20
|
+
|
|
21
|
+
- execute packaging in order; do not jump to the end and assume missing pieces can be filled later
|
|
22
|
+
- after each major packaging block, verify the expected outputs before continuing
|
|
23
|
+
- before closing the packaging phase, explicitly check that every required file and directory exists in its final location
|
|
24
|
+
- if any required artifact is missing, packaging is still in progress
|
|
25
|
+
- if any cleanup, move, export, or reporting step is incomplete, packaging is still in progress
|
|
26
|
+
|
|
27
|
+
## Packaging goals
|
|
28
|
+
|
|
29
|
+
- produce the exact required delivery structure
|
|
30
|
+
- make the package reviewable and reproducible
|
|
31
|
+
- include the required docs, metadata, evidence, and session artifacts
|
|
32
|
+
- exclude local-only junk, transient noise, and secret material
|
|
33
|
+
|
|
34
|
+
## Required final structure
|
|
35
|
+
|
|
36
|
+
The final submission layout in the parent project root must be:
|
|
37
|
+
|
|
38
|
+
- `docs/`
|
|
39
|
+
- `design.md`
|
|
40
|
+
- `api-spec.md` when applicable
|
|
41
|
+
- `test-coverage.md`
|
|
42
|
+
- `questions.md`
|
|
43
|
+
- `submission-self-test-index.md`
|
|
44
|
+
- `hard-threshold-explanation.md`
|
|
45
|
+
- `document-completeness-report.md`
|
|
46
|
+
- `engineering-architecture-quality-report.md`
|
|
47
|
+
- `engineering-architecture-quality-statement.md`
|
|
48
|
+
- `engineering-results-report.md`
|
|
49
|
+
- `engineering-details-professionalism.md`
|
|
50
|
+
- `implementation-comparison-report.md`
|
|
51
|
+
- `prompt-understanding-adaptability.md`
|
|
52
|
+
- `aesthetics.md` when applicable
|
|
53
|
+
- `aesthetics-assessment.md` when applicable
|
|
54
|
+
- evaluation reports
|
|
55
|
+
- screenshots and proof materials
|
|
56
|
+
- relocated screenshots and proof images needed for submission review
|
|
57
|
+
- current working directory delivered as parent-root `repo/`
|
|
58
|
+
- `../sessions/`
|
|
59
|
+
- `trajectory.json`
|
|
60
|
+
- `trajectory-N.json` when multiple session trajectories exist
|
|
61
|
+
- `../metadata.json`
|
|
62
|
+
- `../session.json`
|
|
63
|
+
- `../session-N.json` when multiple exported sessions exist
|
|
64
|
+
- parent-root `../.tmp/` directory moved out of current `.tmp/` when it exists
|
|
65
|
+
|
|
66
|
+
Do not treat `.ai/` as part of the final submission structure unless the user explicitly changes that requirement later.
|
|
67
|
+
|
|
68
|
+
## Required packaging actions
|
|
69
|
+
|
|
70
|
+
- verify the root package structure matches the blueprint exactly
|
|
71
|
+
- make sure parent-root `../metadata.json` is complete and reflects the delivered project truthfully
|
|
72
|
+
- create or finalize parent-root `../docs/design.md` from the working `docs/design.md`
|
|
73
|
+
- create or finalize parent-root `../docs/api-spec.md` from the working `docs/api-spec.md` when applicable
|
|
74
|
+
- create or finalize parent-root `../docs/test-coverage.md` from the working `docs/test-coverage.md`
|
|
75
|
+
- create or finalize parent-root `../docs/questions.md` from the accepted clarification/question record
|
|
76
|
+
- move the final docs set out of current `docs/` into parent-root `../docs/` after development is complete
|
|
77
|
+
- ensure `README.md` matches the delivered runtime, verification, and feature behavior
|
|
78
|
+
- include `run_tests.sh`
|
|
79
|
+
- relocate evaluation artifacts into parent-root `../docs/`, including:
|
|
80
|
+
- backend evaluation report
|
|
81
|
+
- frontend evaluation report
|
|
82
|
+
- relocate screenshots and proof materials relevant to runtime behavior and major flows into parent-root `../docs/`
|
|
83
|
+
- gather required screenshots from test-output or artifact directories and relocate them into parent-root `../docs/` during packaging
|
|
84
|
+
- include exported session artifacts at the parent project root using the naming rules:
|
|
85
|
+
- `../session.json` for a single exported session
|
|
86
|
+
- `../session-N.json` when multiple exported sessions exist
|
|
87
|
+
- include trajectory artifacts in `../sessions/` using the naming rules:
|
|
88
|
+
- `../sessions/trajectory.json` for a single trajectory
|
|
89
|
+
- `../sessions/trajectory-N.json` when multiple trajectories exist
|
|
90
|
+
- include any required package/tree proof or delivery evidence
|
|
91
|
+
|
|
92
|
+
## Session export sequence
|
|
93
|
+
|
|
94
|
+
For the developer session, run these exact steps:
|
|
95
|
+
|
|
96
|
+
- this sequence is mandatory
|
|
97
|
+
- do not skip, reorder, or replace these commands with approximations
|
|
98
|
+
- if one of these commands fails, stop and fix the export pipeline before continuing packaging
|
|
99
|
+
- do not proceed to later packaging steps until the export, strip, and trajectory conversion steps have all succeeded
|
|
100
|
+
|
|
101
|
+
1. `opencode export <developer-session-id> > ../session-export.json`
|
|
102
|
+
2. `python3 ~/slopmachine/utils/strip_session_parent.py ../session-export.json --output ../session.json`
|
|
103
|
+
3. `python3 ~/slopmachine/utils/convert_ai_session.py -i ../session.json -o ../trajectory.json`
|
|
104
|
+
|
|
105
|
+
After those steps:
|
|
106
|
+
|
|
107
|
+
- keep the cleaned final exported session as parent-root `../session.json` unless multiple exports require `../session-N.json`
|
|
108
|
+
- move or copy the generated trajectory into `../sessions/trajectory.json` unless multiple trajectories require `../sessions/trajectory-N.json`
|
|
109
|
+
- treat `../session-export.json` and the intermediate parent-root `../session.json` as temporary packaging intermediates unless the package contract later says otherwise
|
|
110
|
+
- pause immediately after the export/clean/convert sequence and verify that all expected directories and required files exist before running any later packaging scripts
|
|
111
|
+
- if the required utilities or output files are missing, packaging is not ready to continue
|
|
112
|
+
|
|
113
|
+
## Required file moves
|
|
114
|
+
|
|
115
|
+
- move the final docs set out of current `docs/` into parent-root `../docs/`
|
|
116
|
+
- after packaging, current `docs/` must not remain in the delivered `repo/` tree
|
|
117
|
+
- if current `.tmp/` exists, collect the relevant evaluation reports and proof artifacts from it into parent-root `../docs/`
|
|
118
|
+
- collect screenshots and other required proof materials from repo-local runtime/output directories into parent-root `../docs/`
|
|
119
|
+
- after relocation, the final submission should not require digging through repo-local output directories to find evidence
|
|
120
|
+
- find required screenshots in their generated locations and move or copy them into parent-root `../docs/` so reviewers do not have to inspect e2e artifact directories manually
|
|
121
|
+
- keep screenshot filenames clear enough that the referenced runtime page, flow, or evidence purpose is understandable from the file name or nearby document reference
|
|
122
|
+
|
|
123
|
+
## Repo cleanup rules
|
|
124
|
+
|
|
125
|
+
Clean the current working directory that will be delivered as parent-root `repo/` before final submission.
|
|
126
|
+
|
|
127
|
+
Remove runtime, editor, cache, and tooling noise such as:
|
|
128
|
+
|
|
129
|
+
- `.tmp/`
|
|
130
|
+
- `.git/`
|
|
131
|
+
- `.opencode/`
|
|
132
|
+
- `.codex/`
|
|
133
|
+
- `.vscode/`
|
|
134
|
+
- `__pycache__/`
|
|
135
|
+
- `.pytest_cache/`
|
|
136
|
+
- `node_modules/`
|
|
137
|
+
- `.venv/`
|
|
138
|
+
- `.net/`
|
|
139
|
+
- `AGENTS.md`
|
|
140
|
+
- similar environment-dependent or local-only directories
|
|
141
|
+
|
|
142
|
+
Product attachment packaging is not allowed.
|
|
143
|
+
|
|
144
|
+
Do not include environment-dependent local runtime content in the delivered package.
|
|
145
|
+
|
|
146
|
+
## Validation checklist
|
|
147
|
+
|
|
148
|
+
- confirm the final package contains what reviewers actually need to inspect the project
|
|
149
|
+
- confirm docs describe delivered behavior, not planned or aspirational behavior
|
|
150
|
+
- confirm `docs/test-coverage.md` explains the tested flows, coverage boundaries, and how the evaluator should interpret the coverage evidence
|
|
151
|
+
- confirm evaluation reports and screenshots have been relocated into parent-root `../docs/` and correspond to the final qualified state, not an older revision
|
|
152
|
+
- confirm current `docs/` has been handled correctly so final delivery docs live in parent-root `../docs/`
|
|
153
|
+
- confirm repo-local evidence directories have been harvested so required reports and screenshots now live in parent-root `../docs/`
|
|
154
|
+
- confirm required screenshots have been relocated into parent-root `../docs/`
|
|
155
|
+
- confirm parent-root metadata fields are populated correctly:
|
|
156
|
+
- `prompt`
|
|
157
|
+
- `project_type`
|
|
158
|
+
- `frontend_language`
|
|
159
|
+
- `backend_language`
|
|
160
|
+
- `database`
|
|
161
|
+
- `session_id`
|
|
162
|
+
- `frontend_framework`
|
|
163
|
+
- `backend_framework`
|
|
164
|
+
- confirm session export naming rules are followed:
|
|
165
|
+
- parent-root `../session.json`
|
|
166
|
+
- parent-root `../session-N.json` when multiple exported sessions exist
|
|
167
|
+
- `../sessions/trajectory.json`
|
|
168
|
+
- `../sessions/trajectory-N.json` when multiple trajectories exist
|
|
169
|
+
|
|
170
|
+
## Submission reporting documents
|
|
171
|
+
|
|
172
|
+
Create a root document at:
|
|
173
|
+
|
|
174
|
+
- `docs/submission-self-test-index.md`
|
|
175
|
+
|
|
176
|
+
Use it as the master index of packaging/self-test outputs.
|
|
177
|
+
|
|
178
|
+
Rules:
|
|
179
|
+
|
|
180
|
+
- answer the direct questions inline in this index when a standalone document is not required
|
|
181
|
+
- when a standalone document is required, create the file and include its path in the index
|
|
182
|
+
- keep the answers tied to the real delivered project, not generic template text
|
|
183
|
+
- use real screenshots from the delivered product and real project evidence
|
|
184
|
+
|
|
185
|
+
Create or populate these deliverables:
|
|
186
|
+
|
|
187
|
+
1. `docs/submission-self-test-index.md`
|
|
188
|
+
- answer whether the delivered product can actually run and be verified
|
|
189
|
+
- link screenshots proving successful runtime pages and, when relevant, error pages
|
|
190
|
+
2. `docs/hard-threshold-explanation.md`
|
|
191
|
+
- explain success/failure status
|
|
192
|
+
- state whether the delivered product can actually be operated and verified
|
|
193
|
+
- state whether the delivered product significantly deviates from the prompt topic
|
|
194
|
+
3. `docs/document-completeness-report.md`
|
|
195
|
+
- based on `~/slopmachine/document-completeness.md`
|
|
196
|
+
- create a similar table document for this codebase
|
|
197
|
+
- include screenshots with real product data where applicable
|
|
198
|
+
4. Add to `docs/submission-self-test-index.md`
|
|
199
|
+
- Self-Test Status — Delivery Completeness Statement
|
|
200
|
+
- answer whether the deliverables fully cover the core prompt requirements
|
|
201
|
+
- answer whether the product has a real 0-to-1 delivery form rather than partial/schematic output
|
|
202
|
+
- if unable, provide a specific reason
|
|
203
|
+
5. `docs/engineering-architecture-quality-report.md`
|
|
204
|
+
- based on `~/slopmachine/quality-document.md`
|
|
205
|
+
- generate a report with the same type of information and proper formatting
|
|
206
|
+
- include a screenshot of the current working directory folder structure that will be delivered as `repo/`
|
|
207
|
+
- suggested method: run a tree command such as `tree . -a -L 3 > ../repo-structure.txt`, open that output in the terminal or editor, and capture a screenshot manually
|
|
208
|
+
6. `docs/engineering-architecture-quality-statement.md`
|
|
209
|
+
- Self-test results: engineering and architecture quality statement
|
|
210
|
+
7. `docs/engineering-results-report.md`
|
|
211
|
+
- based on `~/slopmachine/engineering-results.md`
|
|
212
|
+
- generate a similar document for this codebase
|
|
213
|
+
8. `docs/engineering-details-professionalism.md`
|
|
214
|
+
- cover these exact checks:
|
|
215
|
+
- Error handling: all APIs return standard HTTP codes and JSON error format
|
|
216
|
+
- Logging: key flows have structured logs
|
|
217
|
+
- Input validation: all body/query/path params validated
|
|
218
|
+
- No secrets/keys in config files
|
|
219
|
+
- No `node_modules` / `.venv` committed
|
|
220
|
+
- No stray debug statements such as `console.log("here")`
|
|
221
|
+
9. `docs/implementation-comparison-report.md`
|
|
222
|
+
- based on `~/slopmachine/implementation-comparison.md`
|
|
223
|
+
- generate a similar document for this codebase with real evidence
|
|
224
|
+
10. `docs/prompt-understanding-adaptability.md`
|
|
225
|
+
- explain whether the delivered product accurately understands and responds to business objectives, use cases, and implicit constraints in the prompt rather than merely implementing presentation-layer technical requirements
|
|
226
|
+
11. `docs/aesthetics.md`
|
|
227
|
+
- for fullstack or pure frontend only
|
|
228
|
+
- answer whether the visual/interaction fits the scene and is aesthetically pleasing
|
|
229
|
+
- if not applicable, write `Not Applicable`
|
|
230
|
+
- if aesthetically unappealing, provide a specific reason
|
|
231
|
+
12. `docs/aesthetics-assessment.md`
|
|
232
|
+
- for fullstack or pure frontend only
|
|
233
|
+
- produce the aesthetics assessment in a separate document
|
|
234
|
+
- if not applicable, write `Not Applicable`
|
|
235
|
+
- if aesthetically unappealing, provide a specific reason
|
|
236
|
+
|
|
237
|
+
## Cleanliness rules
|
|
238
|
+
|
|
239
|
+
- remove caches, temp outputs, build junk, editor noise, and accidental local files that are not part of the package contract
|
|
240
|
+
- do not remove required evidence just because it looks noisy
|
|
241
|
+
- verify no local secret files or sensitive values are included in the final package
|
|
242
|
+
- verify frontend screenshots do not show demo/debug/setup leakage
|
|
243
|
+
- verify evaluation rebuttal notes, if any, remain attached to the relevant evaluation reports
|
|
244
|
+
- verify the final package does not include pointless tests or artifacts whose only purpose was to assert documentation-directory existence
|
|
245
|
+
- verify that required evidence is centralized under parent-root `../docs/` rather than scattered across repo-local output directories
|
|
246
|
+
- do not leave required screenshots stranded only inside raw e2e artifact directories
|
|
247
|
+
|
|
248
|
+
## Final packaging verification
|
|
249
|
+
|
|
250
|
+
- do one final package review before declaring packaging complete
|
|
251
|
+
- confirm the package is coherent as a delivered submission, not just a working repo snapshot
|
|
252
|
+
- confirm the final git checkpoint can be created cleanly for the packaged state
|
|
253
|
+
- if packaging reveals a real defect or missing artifact, fix it before closing the phase
|
|
254
|
+
|
|
255
|
+
## Required completion checklist
|
|
256
|
+
|
|
257
|
+
Do not close packaging until all of these are true:
|
|
258
|
+
|
|
259
|
+
- final docs exist in parent-root `../docs/`
|
|
260
|
+
- current `docs/` no longer remains in the delivered repo tree
|
|
261
|
+
- parent-root `../metadata.json` exists and is populated correctly
|
|
262
|
+
- parent-root `../session.json` or `../session-N.json` exists as required
|
|
263
|
+
- `../sessions/trajectory.json` or `../sessions/trajectory-N.json` exists as required
|
|
264
|
+
- evaluation reports are present under parent-root `../docs/`
|
|
265
|
+
- required screenshots and proof materials are present under parent-root `../docs/`
|
|
266
|
+
- required submission-report documents exist in parent-root `../docs/`
|
|
267
|
+
- delivered repo no longer contains forbidden junk directories
|
|
268
|
+
- final structure matches the required layout exactly
|
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verification-gates
|
|
3
|
+
description: Owner-side review, verify-fix loop, heavy-gate interpretation, and runtime acceptance rules for repo-cwd blueprint-driven projects.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Verification Gates
|
|
7
|
+
|
|
8
|
+
Use this skill after development begins whenever you are reviewing developer work, deciding acceptance, interpreting phase exits, or enforcing hard verification boundaries.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill before review, acceptance, rejection, runtime gate interpretation, hardening readiness decisions, or heavy-gate decisions.
|
|
13
|
+
- Treat it as owner-side review and gate guidance, not developer-visible text.
|
|
14
|
+
- Use `get-overlays` as the source of truth for developer-facing execution guidance.
|
|
15
|
+
- Use this skill as the source of truth for owner-side verification, review pressure, and gate interpretation.
|
|
16
|
+
|
|
17
|
+
## Review standard
|
|
18
|
+
|
|
19
|
+
- do not accept weak tests
|
|
20
|
+
- do not accept shallow Docker verification
|
|
21
|
+
- do not accept documentation drift
|
|
22
|
+
- do not accept happy-path-only implementation when failure paths matter
|
|
23
|
+
- do not accept unsupported claims
|
|
24
|
+
- do not accept work that looks complete but is not resilient
|
|
25
|
+
- do not accept committed secrets, hardcoded sensitive values, or sloppy env handling
|
|
26
|
+
- do not accept frontend/backend drift in fullstack work
|
|
27
|
+
- do not accept missing end-to-end coverage for major fullstack flows
|
|
28
|
+
- do not accept UI claims without screenshot-backed Playwright evidence when the change affects real frontend behavior
|
|
29
|
+
- do not accept frontend placeholder, demo, setup, or debug messaging in product-facing UI
|
|
30
|
+
- do not accept prototype residue such as seeded credentials, weak demo defaults, login hints, or unsanitized user-facing error behavior
|
|
31
|
+
- do not accept multi-tenant or cross-user security claims without negative isolation evidence when that boundary matters
|
|
32
|
+
- do not accept file-bearing flows without path confinement and traversal-style validation when that boundary matters
|
|
33
|
+
- do not accept partial `foundation` work for complex features when the prompt implies broader usable scope, infrastructure depth, or security depth than what was actually delivered
|
|
34
|
+
- do not accept known user-facing, release-facing, production-path, or build failures as compatible with completion unless explicitly scoped out
|
|
35
|
+
- do not accept frontend-bearing slice completion without checking production build health when the change materially affects frontend code or tooling
|
|
36
|
+
- do not accept module completion that ignores integration seams or cross-cutting consistency with the existing system
|
|
37
|
+
- do not accept end-to-end evidence that bypasses a required user-facing or admin-facing surface with direct API shortcuts
|
|
38
|
+
|
|
39
|
+
## Verify-fix loop
|
|
40
|
+
|
|
41
|
+
- inspect the result and the evidence, not just the developer's confidence
|
|
42
|
+
- review technical quality, prompt alignment, architecture impact, and verification depth of the current work
|
|
43
|
+
- during normal implementation iteration, prefer fast local language-native or framework-native verification for the changed area instead of forcing `run_tests.sh` every turn
|
|
44
|
+
- require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
|
|
45
|
+
- if the local toolchain is missing, require the developer to install or enable it first; allow fallback to `run_tests.sh` only when that is not practical
|
|
46
|
+
- do not accept hand-wavy claims that local verification is unavailable without a real setup attempt and clear explanation
|
|
47
|
+
- for applicable fullstack or UI-bearing work, require local Playwright on affected flows plus screenshot review and explicit UI validation
|
|
48
|
+
- if verification is weak, missing, or failing, require fixes and reruns before acceptance
|
|
49
|
+
- if docs drift, secrets leak, contracts drift, or frontend integrity is compromised, require cleanup before acceptance
|
|
50
|
+
- keep looping until the current work is genuinely acceptable
|
|
51
|
+
|
|
52
|
+
## Heavy-gate definition
|
|
53
|
+
|
|
54
|
+
- a heavy gate is an owner-run integrated verification boundary, not every ordinary phase change
|
|
55
|
+
- a phase change alone does not automatically require a heavy gate unless that phase's exit criteria explicitly call for one
|
|
56
|
+
- a heavy gate normally means some combination of full clean runtime proof, full `run_tests.sh`, and Playwright plus screenshot evidence when UI or fullstack flows exist
|
|
57
|
+
- heavy gates are required at scaffold acceptance, integrated/full verification, and post-evaluation remediation re-acceptance
|
|
58
|
+
- a mid-phase extra heavy gate is allowed only when the risk profile justifies it, such as major runtime, infra, migration, auth, security, build, or cross-module integration changes
|
|
59
|
+
- planning acceptance, ordinary module acceptance, and routine in-phase verification are not heavy gates by default and should rely on targeted local verification unless the risk profile says otherwise
|
|
60
|
+
|
|
61
|
+
## Testing cadence interpretation
|
|
62
|
+
|
|
63
|
+
- the first required `run_tests.sh` pass happens in scaffold once the clean foundation exists
|
|
64
|
+
- after scaffold, do not force `docker compose up --build` or `run_tests.sh` on every normal development step when faster local verification is sufficient
|
|
65
|
+
- prefer local targeted or native test commands during module implementation and ordinary verify-fix iteration
|
|
66
|
+
- local verification should run inside the current working directory using the project's own environment and tooling rather than hidden global assumptions
|
|
67
|
+
- during applicable fullstack or UI-bearing implementation work, require local Playwright on affected flows and review screenshots
|
|
68
|
+
- treat `docker compose up --build` and `run_tests.sh` as critical-gate verification commands for integrated/full verification, hard gates, and final-evaluation readiness rather than normal iteration tools
|
|
69
|
+
- the workflow owner handles those expensive critical-gate runs; do not require the developer to duplicate them during normal phase progression
|
|
70
|
+
- run `run_tests.sh` again at integrated/full verification
|
|
71
|
+
- integrated/full verification must also run Playwright for major flows and inspect screenshots
|
|
72
|
+
- run `run_tests.sh` again after post-evaluation remediation before re-acceptance
|
|
73
|
+
- after post-evaluation remediation affecting real flows or UI, rerun Playwright and inspect fresh screenshots before re-acceptance
|
|
74
|
+
|
|
75
|
+
## Runtime gate interpretation
|
|
76
|
+
|
|
77
|
+
Use evidence such as Bead metadata, structured Bead comments, verification command results, and file/project-state checks.
|
|
78
|
+
|
|
79
|
+
- clarification requires the `clarification-gate` conditions plus explicit approval record
|
|
80
|
+
- development bootstrap requires the `developer-session-lifecycle` conditions plus a fresh planning-oriented start in the current working directory with working planning docs under `docs/`
|
|
81
|
+
- scaffold requires evidence for `docker compose up --build`, `run_tests.sh`, baseline logging/config, and when relevant the chosen frontend stack and UI approach being set intentionally
|
|
82
|
+
- scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, and a usable local test environment in the current working directory when practical
|
|
83
|
+
- when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
|
|
84
|
+
- for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
|
|
85
|
+
- Dockerized scaffold acceptance also requires self-contained Compose namespacing, no unnecessary fragile `container_name` usage, and clean startup plus teardown behavior in the intended shared-environment model
|
|
86
|
+
- module implementation requires module planning notes, module definition of done, relevant local verification for the changed area, and for applicable fullstack or UI work local Playwright evidence with screenshots, plus docs sync and review acceptance
|
|
87
|
+
- module implementation also requires integration-seam verification against adjacent modules and cross-cutting concerns where relevant, and known release-facing or build failures block acceptance unless explicitly scoped out
|
|
88
|
+
- module implementation acceptance should also challenge tenant isolation, path confinement, sanitized error behavior, and prototype residue when those concerns are in scope
|
|
89
|
+
- integrated verification requires owner-run `docker compose up --build`, owner-run `run_tests.sh`, end-to-end, Playwright, prompt-alignment, README/runtime, and cross-module evidence
|
|
90
|
+
- fullstack integrated verification must include Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
|
|
91
|
+
- if a required flow cannot be exercised through the intended UI surface, treat that as incomplete implementation rather than acceptable E2E coverage
|
|
92
|
+
- hardening requires security, maintainability, exploratory, and release-freeze evidence
|
|
93
|
+
- hardening must explicitly re-check secret handling, redaction, and frontend/backend observability hygiene
|
|
94
|
+
- final evaluation readiness requires automated evaluation to be complete and triaged, with a clear go-to-packaging vs return-to-fixes decision
|
|
95
|
+
- remediation requires accepted issue records plus rerun verification, and after post-evaluation remediation it requires an owner-run fresh `run_tests.sh` pass and Playwright rerun where applicable before re-acceptance
|
|
96
|
+
|
|
97
|
+
## Hardening and pre-evaluation discipline
|
|
98
|
+
|
|
99
|
+
When all planned modules are complete:
|
|
100
|
+
|
|
101
|
+
- run integrated verification
|
|
102
|
+
- run hardening and exploratory testing
|
|
103
|
+
- for fullstack applications, rerun Playwright coverage for major flows and inspect screenshots for frontend regressions or weak UX
|
|
104
|
+
- enforce release-candidate freeze
|
|
105
|
+
- allow only fixes, verification improvements, doc corrections, and packaging work
|
|
106
|
+
- prepare the package and evidence cleanly before the final evaluation decision gate
|
|
@@ -0,0 +1,275 @@
|
|
|
1
|
+
You are the reviewer responsible for Delivery Acceptance and Project Architecture Audit.
|
|
2
|
+
|
|
3
|
+
In the current working directory, review the project against the Business / Task Prompt and the Acceptance / Scoring Criteria.
|
|
4
|
+
|
|
5
|
+
[Business / Task Prompt]
|
|
6
|
+
{prompt}
|
|
7
|
+
|
|
8
|
+
[Acceptance / Scoring Criteria (the only authority)]
|
|
9
|
+
{
|
|
10
|
+
|
|
11
|
+
1. Hard Gates
|
|
12
|
+
|
|
13
|
+
1.1 Whether the delivered project can actually be run and verified
|
|
14
|
+
|
|
15
|
+
- Whether clear startup or run instructions are provided
|
|
16
|
+
- Whether the project can be started or run without modifying core code
|
|
17
|
+
- Whether the actual runtime behavior is broadly consistent with the delivery documentation
|
|
18
|
+
|
|
19
|
+
1.2 Whether the delivered project materially deviates from the Prompt
|
|
20
|
+
|
|
21
|
+
- Whether the implementation is centered on the business goal or usage scenario described in the Prompt
|
|
22
|
+
- Whether there are major parts of the implementation that are only loosely related, or unrelated, to the Prompt
|
|
23
|
+
- Whether the project replaces, weakens, or ignores the core problem definition in the Prompt without justification
|
|
24
|
+
|
|
25
|
+
2. Delivery Completeness
|
|
26
|
+
|
|
27
|
+
2.1 Whether the delivered project fully covers the core requirements explicitly stated in the Prompt
|
|
28
|
+
|
|
29
|
+
- Whether all explicitly stated core functional requirements in the Prompt are implemented
|
|
30
|
+
|
|
31
|
+
2.2 Whether the delivered project represents a basic end-to-end deliverable from 0 to 1, rather than a partial feature, illustrative implementation, or code fragment
|
|
32
|
+
|
|
33
|
+
- Whether mock / hardcoded behavior is used in place of real logic without explanation
|
|
34
|
+
- Whether the project includes a complete project structure rather than scattered code or a single-file example
|
|
35
|
+
- Whether basic project documentation is provided, such as a README or equivalent
|
|
36
|
+
|
|
37
|
+
3. Engineering and Architecture Quality
|
|
38
|
+
|
|
39
|
+
3.1 Whether the project adopts a reasonable engineering structure and module decomposition for the scale of the problem
|
|
40
|
+
|
|
41
|
+
- Whether the project structure is clear and module responsibilities are reasonably defined
|
|
42
|
+
- Whether the project contains redundant or unnecessary files
|
|
43
|
+
- Whether the implementation is excessively piled into a single file
|
|
44
|
+
|
|
45
|
+
3.2 Whether the project shows basic maintainability and extensibility, rather than being a temporary or stacked implementation
|
|
46
|
+
|
|
47
|
+
- Whether there are obvious signs of chaotic structure or tight coupling
|
|
48
|
+
- Whether the core logic leaves room for extension rather than being completely hard-coded
|
|
49
|
+
|
|
50
|
+
4. Engineering Details and Professionalism
|
|
51
|
+
|
|
52
|
+
4.1 Whether the engineering details and overall shape reflect professional software practice, including but not limited to error handling, logging, validation, and API design
|
|
53
|
+
|
|
54
|
+
- Whether error handling is basically reliable and user-friendly
|
|
55
|
+
- Whether logs support troubleshooting rather than being random print statements or completely absent
|
|
56
|
+
- Whether necessary validation is present for key inputs and boundary conditions
|
|
57
|
+
|
|
58
|
+
4.2 Whether the project is organized like a real product or service, rather than remaining at the level of an example or demo
|
|
59
|
+
|
|
60
|
+
- Whether the overall deliverable resembles a real application instead of a teaching sample or demonstration-only project
|
|
61
|
+
|
|
62
|
+
5. Prompt Understanding and Requirement Fit
|
|
63
|
+
|
|
64
|
+
5.1 Whether the project accurately understands and responds to the business goal, usage scenario, and implicit constraints described in the Prompt, rather than merely implementing surface-level technical features
|
|
65
|
+
|
|
66
|
+
- Whether the core business objective in the Prompt is implemented correctly
|
|
67
|
+
- Whether there are obvious misunderstandings of the requirement semantics or deviations from the actual problem
|
|
68
|
+
- Whether key constraints in the Prompt are changed or ignored without explanation
|
|
69
|
+
|
|
70
|
+
6. Aesthetics (frontend-only / full-stack tasks only)
|
|
71
|
+
|
|
72
|
+
6.1 Whether the visual and interaction design fits the scenario and demonstrates reasonable visual quality
|
|
73
|
+
|
|
74
|
+
- Whether different functional areas of the page are visually distinguishable through background, spacing, separation, or hierarchy
|
|
75
|
+
- Whether the overall layout is reasonable, and whether alignment, spacing, and proportions are broadly consistent
|
|
76
|
+
- Whether UI elements, including text, images, and icons, render and display correctly
|
|
77
|
+
- Whether visual elements are consistent with the page theme and textual content, and whether there are obvious mismatches between images, illustrations, decorative elements, and the actual content
|
|
78
|
+
- Whether basic interaction feedback is provided, such as hover states, click states, or transitions, so users can understand the current interaction state
|
|
79
|
+
- Whether fonts, font sizes, colors, and icon styles are generally consistent, without obvious visual inconsistency or mixed design language
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
Review Objective
|
|
83
|
+
|
|
84
|
+
Determine whether the delivered project is a credible, runnable, prompt-aligned, and minimally professional 0-to-1 deliverable.
|
|
85
|
+
|
|
86
|
+
Priority Order
|
|
87
|
+
|
|
88
|
+
1. delivery runnability boundary
|
|
89
|
+
2. prompt requirement fit
|
|
90
|
+
3. security-critical flaws
|
|
91
|
+
4. test sufficiency
|
|
92
|
+
5. major engineering quality issues
|
|
93
|
+
6. frontend aesthetics only if clearly applicable
|
|
94
|
+
|
|
95
|
+
Execution Rules
|
|
96
|
+
|
|
97
|
+
1. Review only the highest-impact findings that can change the final verdict.
|
|
98
|
+
Do not perform exhaustive enumeration of every secondary or tertiary checklist item.
|
|
99
|
+
|
|
100
|
+
2. Do not relax standards for:
|
|
101
|
+
|
|
102
|
+
- security
|
|
103
|
+
- prompt-fit
|
|
104
|
+
- delivery completeness
|
|
105
|
+
- test sufficiency
|
|
106
|
+
- evidence for material conclusions
|
|
107
|
+
|
|
108
|
+
3. Do not skip any issue that could independently cause a Fail or Partial Pass verdict.
|
|
109
|
+
|
|
110
|
+
4. If a security, prompt-fit, runnability, or core test-sufficiency issue is suspected, continue investigation until it is either evidenced or explicitly marked Cannot Confirm.
|
|
111
|
+
|
|
112
|
+
5. Stop after either:
|
|
113
|
+
|
|
114
|
+
- identifying up to 10 findings total, or
|
|
115
|
+
- identifying up to 5 High / Blocker findings,
|
|
116
|
+
whichever comes first.
|
|
117
|
+
|
|
118
|
+
6. Do not modify project code.
|
|
119
|
+
|
|
120
|
+
7. Use evidence only for material conclusions.
|
|
121
|
+
For any conclusion that changes the final verdict, provide concrete, traceable evidence using file path + line number, tool output, or explicit runtime result.
|
|
122
|
+
|
|
123
|
+
8. If evidence is insufficient, do not guess.
|
|
124
|
+
Use "Cannot Confirm" or explicitly label the assumption and its boundary.
|
|
125
|
+
|
|
126
|
+
9. Perform runtime verification only when all of the following are true:
|
|
127
|
+
|
|
128
|
+
- the command is explicitly documented
|
|
129
|
+
- no Docker is required
|
|
130
|
+
- no Docker-related command is required
|
|
131
|
+
- no container orchestration is required
|
|
132
|
+
- no privileged system access is required
|
|
133
|
+
- no external network / third-party dependency is required
|
|
134
|
+
- expected completion is short
|
|
135
|
+
|
|
136
|
+
10. Never run Docker-related commands.
|
|
137
|
+
This includes, but is not limited to:
|
|
138
|
+
|
|
139
|
+
- docker
|
|
140
|
+
- docker compose
|
|
141
|
+
- docker-compose
|
|
142
|
+
- podman
|
|
143
|
+
- container runtime / orchestration commands with equivalent effect
|
|
144
|
+
|
|
145
|
+
11. If verification would require Docker or any container-related command, do not execute it.
|
|
146
|
+
Instead:
|
|
147
|
+
|
|
148
|
+
- state that Docker-based runtime verification was not performed
|
|
149
|
+
- treat it as a verification boundary, not automatically as a project defect
|
|
150
|
+
- provide the local reproduction command(s) the user can run
|
|
151
|
+
- state what was confirmed statically
|
|
152
|
+
- state what remains unconfirmed
|
|
153
|
+
|
|
154
|
+
12. Docker non-execution is a verification constraint, not a project defect by itself.
|
|
155
|
+
Only report a defect if the project itself lacks runnable documentation, has broken setup logic, or shows static evidence of delivery failure.
|
|
156
|
+
|
|
157
|
+
13. Security review has priority over style issues.
|
|
158
|
+
Always assess:
|
|
159
|
+
|
|
160
|
+
- authentication entry points
|
|
161
|
+
- route-level authorization
|
|
162
|
+
- object-level authorization
|
|
163
|
+
- tenant / user isolation
|
|
164
|
+
|
|
165
|
+
Expand security review further only if relevant code paths exist, such as:
|
|
166
|
+
|
|
167
|
+
- admin / internal / debug endpoints
|
|
168
|
+
- function-level authorization
|
|
169
|
+
- privilege escalation paths
|
|
170
|
+
|
|
171
|
+
14. Test review is mandatory, but do not build a full requirement-to-test traceability matrix.
|
|
172
|
+
Assess only whether tests sufficiently cover:
|
|
173
|
+
|
|
174
|
+
- the core business happy path
|
|
175
|
+
- major failure paths such as validation failure, 401, 403, 404, 409 where relevant
|
|
176
|
+
- security-critical areas
|
|
177
|
+
- obvious high-risk boundaries directly relevant to the business flow
|
|
178
|
+
|
|
179
|
+
15. For test coverage, state only:
|
|
180
|
+
|
|
181
|
+
- covered / partially covered / missing / cannot confirm
|
|
182
|
+
- one or two supporting evidence points
|
|
183
|
+
- the minimum additional test needed if coverage is weak
|
|
184
|
+
|
|
185
|
+
16. Logging review is mandatory but concise.
|
|
186
|
+
Assess only:
|
|
187
|
+
|
|
188
|
+
- whether logging exists for meaningful troubleshooting
|
|
189
|
+
- whether logging categories are reasonably clear if present
|
|
190
|
+
- whether there is obvious sensitive-data leakage risk in logs or responses
|
|
191
|
+
|
|
192
|
+
17. Mock / stub / fake behavior is not a defect by itself unless the Prompt or documentation requires real integration.
|
|
193
|
+
If present, explain only:
|
|
194
|
+
|
|
195
|
+
- the mock scope
|
|
196
|
+
- how it is enabled
|
|
197
|
+
- whether there is obvious accidental production-use risk
|
|
198
|
+
|
|
199
|
+
18. Do not continue searching for additional low-severity issues after the final verdict is already supportable.
|
|
200
|
+
|
|
201
|
+
19. Do not read unrelated files once enough evidence has been collected to support the verdict and top findings.
|
|
202
|
+
|
|
203
|
+
Required Output Format
|
|
204
|
+
|
|
205
|
+
Return exactly these sections:
|
|
206
|
+
|
|
207
|
+
1. Verdict
|
|
208
|
+
|
|
209
|
+
- Pass / Partial Pass / Fail / Cannot Confirm
|
|
210
|
+
|
|
211
|
+
2. Scope and Verification Boundary
|
|
212
|
+
|
|
213
|
+
- what was reviewed
|
|
214
|
+
- what was not executed
|
|
215
|
+
- whether Docker-based verification was required but not executed
|
|
216
|
+
- what remains unconfirmed
|
|
217
|
+
|
|
218
|
+
3. Top Findings
|
|
219
|
+
|
|
220
|
+
- up to 10 findings only
|
|
221
|
+
- each finding must include:
|
|
222
|
+
- Severity: Blocker / High / Medium / Low
|
|
223
|
+
- Conclusion
|
|
224
|
+
- Brief rationale
|
|
225
|
+
- Evidence
|
|
226
|
+
- Impact
|
|
227
|
+
- Minimum actionable fix
|
|
228
|
+
|
|
229
|
+
4. Security Summary
|
|
230
|
+
|
|
231
|
+
- authentication
|
|
232
|
+
- route authorization
|
|
233
|
+
- object-level authorization
|
|
234
|
+
- tenant / user isolation
|
|
235
|
+
For each, return:
|
|
236
|
+
- Pass / Partial Pass / Fail / Cannot Confirm
|
|
237
|
+
- brief evidence or verification boundary
|
|
238
|
+
|
|
239
|
+
5. Test Sufficiency Summary
|
|
240
|
+
Return:
|
|
241
|
+
|
|
242
|
+
- Test Overview
|
|
243
|
+
- whether unit tests exist
|
|
244
|
+
- whether API / integration tests exist
|
|
245
|
+
- obvious test entry points if present
|
|
246
|
+
- Core Coverage
|
|
247
|
+
- happy path: covered / partial / missing / cannot confirm
|
|
248
|
+
- key failure paths: covered / partial / missing / cannot confirm
|
|
249
|
+
- security-critical coverage: covered / partial / missing / cannot confirm
|
|
250
|
+
- Major Gaps
|
|
251
|
+
- up to 3 highest-risk missing tests
|
|
252
|
+
- Final Test Verdict
|
|
253
|
+
- Pass / Partial Pass / Fail / Cannot Confirm
|
|
254
|
+
|
|
255
|
+
6. Engineering Quality Summary
|
|
256
|
+
Assess only major maintainability / architecture concerns that materially affect delivery confidence.
|
|
257
|
+
|
|
258
|
+
7. Next Actions
|
|
259
|
+
|
|
260
|
+
- up to 5 minimum actions only
|
|
261
|
+
- prioritize by severity and unblock value
|
|
262
|
+
|
|
263
|
+
Final Verification Before Output
|
|
264
|
+
|
|
265
|
+
Before finalizing, check all of the following:
|
|
266
|
+
|
|
267
|
+
1. Does each material conclusion have supporting evidence?
|
|
268
|
+
2. Are any claims stronger than the evidence supports?
|
|
269
|
+
3. If unsupported observations are removed, does the final verdict still hold?
|
|
270
|
+
4. Has any uncertain point been incorrectly presented as a confirmed fact?
|
|
271
|
+
5. Has security or test sufficiency been judged too loosely without evidence?
|
|
272
|
+
6. Has any Docker non-execution boundary been incorrectly described as a confirmed runtime failure?
|
|
273
|
+
|
|
274
|
+
If file writing is supported, save the final report to a markdown file.
|
|
275
|
+
Otherwise, return the report in-chat.
|