theslopmachine 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/MANUAL.md +63 -0
  2. package/README.md +23 -0
  3. package/RELEASE.md +81 -0
  4. package/assets/agents/developer.md +294 -0
  5. package/assets/agents/slopmachine.md +510 -0
  6. package/assets/skills/beads-operations/SKILL.md +75 -0
  7. package/assets/skills/clarification-gate/SKILL.md +51 -0
  8. package/assets/skills/developer-session-lifecycle/SKILL.md +75 -0
  9. package/assets/skills/final-evaluation-orchestration/SKILL.md +75 -0
  10. package/assets/skills/frontend-design/SKILL.md +41 -0
  11. package/assets/skills/get-overlays/SKILL.md +157 -0
  12. package/assets/skills/planning-gate/SKILL.md +68 -0
  13. package/assets/skills/submission-packaging/SKILL.md +268 -0
  14. package/assets/skills/verification-gates/SKILL.md +106 -0
  15. package/assets/slopmachine/backend-evaluation-prompt.md +275 -0
  16. package/assets/slopmachine/beads-init.js +428 -0
  17. package/assets/slopmachine/document-completeness.md +45 -0
  18. package/assets/slopmachine/engineering-results.md +59 -0
  19. package/assets/slopmachine/frontend-evaluation-prompt.md +304 -0
  20. package/assets/slopmachine/implementation-comparison.md +36 -0
  21. package/assets/slopmachine/quality-document.md +108 -0
  22. package/assets/slopmachine/templates/AGENTS.md +114 -0
  23. package/assets/slopmachine/utils/convert_ai_session.py +1837 -0
  24. package/assets/slopmachine/utils/strip_session_parent.py +66 -0
  25. package/bin/slopmachine.js +9 -0
  26. package/package.json +25 -0
  27. package/src/cli.js +32 -0
  28. package/src/constants.js +77 -0
  29. package/src/init.js +179 -0
  30. package/src/install.js +330 -0
  31. package/src/utils.js +162 -0
@@ -0,0 +1,268 @@
1
+ ---
2
+ name: submission-packaging
3
+ description: Final delivery packaging checklist and artifact assembly rules for repo-cwd blueprint-driven projects. Use only during the submission packaging phase.
4
+ ---
5
+
6
+ # Submission Packaging
7
+
8
+ Use this skill only when the project has already passed integrated verification, hardening, and the final evaluation/remediation loop closely enough to qualify for packaging.
9
+
10
+ ## Usage rules
11
+
12
+ - Load this skill only in the submission packaging phase.
13
+ - Treat it as internal packaging guidance, not developer-visible workflow text.
14
+ - Do not use it to justify skipping verification or evaluation gates.
15
+ - Packaging is not cleanup theater. Preserve required evidence and remove only local junk, caches, and accidental noise.
16
+ - Do not declare packaging complete early.
17
+ - Packaging is incomplete until every required step in this skill has been executed and every required artifact path has been verified to exist.
18
+
19
+ ## Completion discipline
20
+
21
+ - execute packaging in order; do not jump to the end and assume missing pieces can be filled later
22
+ - after each major packaging block, verify the expected outputs before continuing
23
+ - before closing the packaging phase, explicitly check that every required file and directory exists in its final location
24
+ - if any required artifact is missing, packaging is still in progress
25
+ - if any cleanup, move, export, or reporting step is incomplete, packaging is still in progress
26
+
27
+ ## Packaging goals
28
+
29
+ - produce the exact required delivery structure
30
+ - make the package reviewable and reproducible
31
+ - include the required docs, metadata, evidence, and session artifacts
32
+ - exclude local-only junk, transient noise, and secret material
33
+
34
+ ## Required final structure
35
+
36
+ The final submission layout in the parent project root must be:
37
+
38
+ - `docs/`
39
+ - `design.md`
40
+ - `api-spec.md` when applicable
41
+ - `test-coverage.md`
42
+ - `questions.md`
43
+ - `submission-self-test-index.md`
44
+ - `hard-threshold-explanation.md`
45
+ - `document-completeness-report.md`
46
+ - `engineering-architecture-quality-report.md`
47
+ - `engineering-architecture-quality-statement.md`
48
+ - `engineering-results-report.md`
49
+ - `engineering-details-professionalism.md`
50
+ - `implementation-comparison-report.md`
51
+ - `prompt-understanding-adaptability.md`
52
+ - `aesthetics.md` when applicable
53
+ - `aesthetics-assessment.md` when applicable
54
+ - evaluation reports
55
+ - screenshots and proof materials
56
+ - relocated screenshots and proof images needed for submission review
57
+ - current working directory delivered as parent-root `repo/`
58
+ - `../sessions/`
59
+ - `trajectory.json`
60
+ - `trajectory-N.json` when multiple session trajectories exist
61
+ - `../metadata.json`
62
+ - `../session.json`
63
+ - `../session-N.json` when multiple exported sessions exist
64
+ - parent-root `../.tmp/` directory moved out of current `.tmp/` when it exists
65
+
66
+ Do not treat `.ai/` as part of the final submission structure unless the user explicitly changes that requirement later.
67
+
68
+ ## Required packaging actions
69
+
70
+ - verify the root package structure matches the blueprint exactly
71
+ - make sure parent-root `../metadata.json` is complete and reflects the delivered project truthfully
72
+ - create or finalize parent-root `../docs/design.md` from the working `docs/design.md`
73
+ - create or finalize parent-root `../docs/api-spec.md` from the working `docs/api-spec.md` when applicable
74
+ - create or finalize parent-root `../docs/test-coverage.md` from the working `docs/test-coverage.md`
75
+ - create or finalize parent-root `../docs/questions.md` from the accepted clarification/question record
76
+ - move the final docs set out of current `docs/` into parent-root `../docs/` after development is complete
77
+ - ensure `README.md` matches the delivered runtime, verification, and feature behavior
78
+ - include `run_tests.sh`
79
+ - relocate evaluation artifacts into parent-root `../docs/`, including:
80
+ - backend evaluation report
81
+ - frontend evaluation report
82
+ - relocate screenshots and proof materials relevant to runtime behavior and major flows into parent-root `../docs/`
83
+ - gather required screenshots from test-output or artifact directories and relocate them into parent-root `../docs/` during packaging
84
+ - include exported session artifacts at the parent project root using the naming rules:
85
+ - `../session.json` for a single exported session
86
+ - `../session-N.json` when multiple exported sessions exist
87
+ - include trajectory artifacts in `../sessions/` using the naming rules:
88
+ - `../sessions/trajectory.json` for a single trajectory
89
+ - `../sessions/trajectory-N.json` when multiple trajectories exist
90
+ - include any required package/tree proof or delivery evidence
91
+
92
+ ## Session export sequence
93
+
94
+ For the developer session, run these exact steps:
95
+
96
+ - this sequence is mandatory
97
+ - do not skip, reorder, or replace these commands with approximations
98
+ - if one of these commands fails, stop and fix the export pipeline before continuing packaging
99
+ - do not proceed to later packaging steps until the export, strip, and trajectory conversion steps have all succeeded
100
+
101
+ 1. `opencode export <developer-session-id> > ../session-export.json`
102
+ 2. `python3 ~/slopmachine/utils/strip_session_parent.py ../session-export.json --output ../session.json`
103
+ 3. `python3 ~/slopmachine/utils/convert_ai_session.py -i ../session.json -o ../trajectory.json`
104
+
105
+ After those steps:
106
+
107
+ - keep the cleaned final exported session as parent-root `../session.json` unless multiple exports require `../session-N.json`
108
+ - move or copy the generated trajectory into `../sessions/trajectory.json` unless multiple trajectories require `../sessions/trajectory-N.json`
109
+ - treat `../session-export.json` and the intermediate parent-root `../session.json` as temporary packaging intermediates unless the package contract later says otherwise
110
+ - pause immediately after the export/clean/convert sequence and verify that all expected directories and required files exist before running any later packaging scripts
111
+ - if the required utilities or output files are missing, packaging is not ready to continue
112
+
113
+ ## Required file moves
114
+
115
+ - move the final docs set out of current `docs/` into parent-root `../docs/`
116
+ - after packaging, current `docs/` must not remain in the delivered `repo/` tree
117
+ - if current `.tmp/` exists, collect the relevant evaluation reports and proof artifacts from it into parent-root `../docs/`
118
+ - collect screenshots and other required proof materials from repo-local runtime/output directories into parent-root `../docs/`
119
+ - after relocation, the final submission should not require digging through repo-local output directories to find evidence
120
+ - find required screenshots in their generated locations and move or copy them into parent-root `../docs/` so reviewers do not have to inspect e2e artifact directories manually
121
+ - keep screenshot filenames clear enough that the referenced runtime page, flow, or evidence purpose is understandable from the file name or nearby document reference
122
+
123
+ ## Repo cleanup rules
124
+
125
+ Clean the current working directory that will be delivered as parent-root `repo/` before final submission.
126
+
127
+ Remove runtime, editor, cache, and tooling noise such as:
128
+
129
+ - `.tmp/`
130
+ - `.git/`
131
+ - `.opencode/`
132
+ - `.codex/`
133
+ - `.vscode/`
134
+ - `__pycache__/`
135
+ - `.pytest_cache/`
136
+ - `node_modules/`
137
+ - `.venv/`
138
+ - `.net/`
139
+ - `AGENTS.md`
140
+ - similar environment-dependent or local-only directories
141
+
142
+ Product attachment packaging is not allowed.
143
+
144
+ Do not include environment-dependent local runtime content in the delivered package.
145
+
146
+ ## Validation checklist
147
+
148
+ - confirm the final package contains what reviewers actually need to inspect the project
149
+ - confirm docs describe delivered behavior, not planned or aspirational behavior
150
+ - confirm `docs/test-coverage.md` explains the tested flows, coverage boundaries, and how the evaluator should interpret the coverage evidence
151
+ - confirm evaluation reports and screenshots have been relocated into parent-root `../docs/` and correspond to the final qualified state, not an older revision
152
+ - confirm current `docs/` has been handled correctly so final delivery docs live in parent-root `../docs/`
153
+ - confirm repo-local evidence directories have been harvested so required reports and screenshots now live in parent-root `../docs/`
154
+ - confirm required screenshots have been relocated into parent-root `../docs/`
155
+ - confirm parent-root metadata fields are populated correctly:
156
+ - `prompt`
157
+ - `project_type`
158
+ - `frontend_language`
159
+ - `backend_language`
160
+ - `database`
161
+ - `session_id`
162
+ - `frontend_framework`
163
+ - `backend_framework`
164
+ - confirm session export naming rules are followed:
165
+ - parent-root `../session.json`
166
+ - parent-root `../session-N.json` when multiple exported sessions exist
167
+ - `../sessions/trajectory.json`
168
+ - `../sessions/trajectory-N.json` when multiple trajectories exist
169
+
170
+ ## Submission reporting documents
171
+
172
+ Create a root document at:
173
+
174
+ - `docs/submission-self-test-index.md`
175
+
176
+ Use it as the master index of packaging/self-test outputs.
177
+
178
+ Rules:
179
+
180
+ - answer the direct questions inline in this index when a standalone document is not required
181
+ - when a standalone document is required, create the file and include its path in the index
182
+ - keep the answers tied to the real delivered project, not generic template text
183
+ - use real screenshots from the delivered product and real project evidence
184
+
185
+ Create or populate these deliverables:
186
+
187
+ 1. `docs/submission-self-test-index.md`
188
+ - answer whether the delivered product can actually run and be verified
189
+ - link screenshots proving successful runtime pages and, when relevant, error pages
190
+ 2. `docs/hard-threshold-explanation.md`
191
+ - explain success/failure status
192
+ - state whether the delivered product can actually be operated and verified
193
+ - state whether the delivered product significantly deviates from the prompt topic
194
+ 3. `docs/document-completeness-report.md`
195
+ - based on `~/slopmachine/document-completeness.md`
196
+ - create a similar table document for this codebase
197
+ - include screenshots with real product data where applicable
198
+ 4. Add to `docs/submission-self-test-index.md`
199
+ - Self-Test Status — Delivery Completeness Statement
200
+ - answer whether the deliverables fully cover the core prompt requirements
201
+ - answer whether the product has a real 0-to-1 delivery form rather than partial/schematic output
202
+ - if unable, provide a specific reason
203
+ 5. `docs/engineering-architecture-quality-report.md`
204
+ - based on `~/slopmachine/quality-document.md`
205
+ - generate a report with the same type of information and proper formatting
206
+ - include a screenshot of the current working directory folder structure that will be delivered as `repo/`
207
+ - suggested method: run a tree command such as `tree . -a -L 3 > ../repo-structure.txt`, open that output in the terminal or editor, and capture a screenshot manually
208
+ 6. `docs/engineering-architecture-quality-statement.md`
209
+ - Self-test results: engineering and architecture quality statement
210
+ 7. `docs/engineering-results-report.md`
211
+ - based on `~/slopmachine/engineering-results.md`
212
+ - generate a similar document for this codebase
213
+ 8. `docs/engineering-details-professionalism.md`
214
+ - cover these exact checks:
215
+ - Error handling: all APIs return standard HTTP codes and JSON error format
216
+ - Logging: key flows have structured logs
217
+ - Input validation: all body/query/path params validated
218
+ - No secrets/keys in config files
219
+ - No `node_modules` / `.venv` committed
220
+ - No stray debug statements such as `console.log("here")`
221
+ 9. `docs/implementation-comparison-report.md`
222
+ - based on `~/slopmachine/implementation-comparison.md`
223
+ - generate a similar document for this codebase with real evidence
224
+ 10. `docs/prompt-understanding-adaptability.md`
225
+ - explain whether the delivered product accurately understands and responds to business objectives, use cases, and implicit constraints in the prompt rather than merely implementing presentation-layer technical requirements
226
+ 11. `docs/aesthetics.md`
227
+ - for fullstack or pure frontend only
228
+ - answer whether the visual/interaction fits the scene and is aesthetically pleasing
229
+ - if not applicable, write `Not Applicable`
230
+ - if aesthetically unappealing, provide a specific reason
231
+ 12. `docs/aesthetics-assessment.md`
232
+ - for fullstack or pure frontend only
233
+ - produce the aesthetics assessment in a separate document
234
+ - if not applicable, write `Not Applicable`
235
+ - if aesthetically unappealing, provide a specific reason
236
+
237
+ ## Cleanliness rules
238
+
239
+ - remove caches, temp outputs, build junk, editor noise, and accidental local files that are not part of the package contract
240
+ - do not remove required evidence just because it looks noisy
241
+ - verify no local secret files or sensitive values are included in the final package
242
+ - verify frontend screenshots do not show demo/debug/setup leakage
243
+ - verify evaluation rebuttal notes, if any, remain attached to the relevant evaluation reports
244
+ - verify the final package does not include pointless tests or artifacts whose only purpose was to assert documentation-directory existence
245
+ - verify that required evidence is centralized under parent-root `../docs/` rather than scattered across repo-local output directories
246
+ - do not leave required screenshots stranded only inside raw e2e artifact directories
247
+
248
+ ## Final packaging verification
249
+
250
+ - do one final package review before declaring packaging complete
251
+ - confirm the package is coherent as a delivered submission, not just a working repo snapshot
252
+ - confirm the final git checkpoint can be created cleanly for the packaged state
253
+ - if packaging reveals a real defect or missing artifact, fix it before closing the phase
254
+
255
+ ## Required completion checklist
256
+
257
+ Do not close packaging until all of these are true:
258
+
259
+ - final docs exist in parent-root `../docs/`
260
+ - current `docs/` no longer remains in the delivered repo tree
261
+ - parent-root `../metadata.json` exists and is populated correctly
262
+ - parent-root `../session.json` or `../session-N.json` exists as required
263
+ - `../sessions/trajectory.json` or `../sessions/trajectory-N.json` exists as required
264
+ - evaluation reports are present under parent-root `../docs/`
265
+ - required screenshots and proof materials are present under parent-root `../docs/`
266
+ - required submission-report documents exist in parent-root `../docs/`
267
+ - delivered repo no longer contains forbidden junk directories
268
+ - final structure matches the required layout exactly
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: verification-gates
3
+ description: Owner-side review, verify-fix loop, heavy-gate interpretation, and runtime acceptance rules for repo-cwd blueprint-driven projects.
4
+ ---
5
+
6
+ # Verification Gates
7
+
8
+ Use this skill after development begins whenever you are reviewing developer work, deciding acceptance, interpreting phase exits, or enforcing hard verification boundaries.
9
+
10
+ ## Usage rules
11
+
12
+ - Load this skill before review, acceptance, rejection, runtime gate interpretation, hardening readiness decisions, or heavy-gate decisions.
13
+ - Treat it as owner-side review and gate guidance, not developer-visible text.
14
+ - Use `get-overlays` as the source of truth for developer-facing execution guidance.
15
+ - Use this skill as the source of truth for owner-side verification, review pressure, and gate interpretation.
16
+
17
+ ## Review standard
18
+
19
+ - do not accept weak tests
20
+ - do not accept shallow Docker verification
21
+ - do not accept documentation drift
22
+ - do not accept happy-path-only implementation when failure paths matter
23
+ - do not accept unsupported claims
24
+ - do not accept work that looks complete but is not resilient
25
+ - do not accept committed secrets, hardcoded sensitive values, or sloppy env handling
26
+ - do not accept frontend/backend drift in fullstack work
27
+ - do not accept missing end-to-end coverage for major fullstack flows
28
+ - do not accept UI claims without screenshot-backed Playwright evidence when the change affects real frontend behavior
29
+ - do not accept frontend placeholder, demo, setup, or debug messaging in product-facing UI
30
+ - do not accept prototype residue such as seeded credentials, weak demo defaults, login hints, or unsanitized user-facing error behavior
31
+ - do not accept multi-tenant or cross-user security claims without negative isolation evidence when that boundary matters
32
+ - do not accept file-bearing flows without path confinement and traversal-style validation when that boundary matters
33
+ - do not accept partial `foundation` work for complex features when the prompt implies broader usable scope, infrastructure depth, or security depth than what was actually delivered
34
+ - do not accept known user-facing, release-facing, production-path, or build failures as compatible with completion unless explicitly scoped out
35
+ - do not accept frontend-bearing slice completion without checking production build health when the change materially affects frontend code or tooling
36
+ - do not accept module completion that ignores integration seams or cross-cutting consistency with the existing system
37
+ - do not accept end-to-end evidence that bypasses a required user-facing or admin-facing surface with direct API shortcuts
38
+
39
+ ## Verify-fix loop
40
+
41
+ - inspect the result and the evidence, not just the developer's confidence
42
+ - review technical quality, prompt alignment, architecture impact, and verification depth of the current work
43
+ - during normal implementation iteration, prefer fast local language-native or framework-native verification for the changed area instead of forcing `run_tests.sh` every turn
44
+ - require the developer to set up and use the project-appropriate local test environment in the current working directory when normal local verification is needed
45
+ - if the local toolchain is missing, require the developer to install or enable it first; allow fallback to `run_tests.sh` only when that is not practical
46
+ - do not accept hand-wavy claims that local verification is unavailable without a real setup attempt and clear explanation
47
+ - for applicable fullstack or UI-bearing work, require local Playwright on affected flows plus screenshot review and explicit UI validation
48
+ - if verification is weak, missing, or failing, require fixes and reruns before acceptance
49
+ - if docs drift, secrets leak, contracts drift, or frontend integrity is compromised, require cleanup before acceptance
50
+ - keep looping until the current work is genuinely acceptable
51
+
52
+ ## Heavy-gate definition
53
+
54
+ - a heavy gate is an owner-run integrated verification boundary, not every ordinary phase change
55
+ - a phase change alone does not automatically require a heavy gate unless that phase's exit criteria explicitly call for one
56
+ - a heavy gate normally means some combination of full clean runtime proof, full `run_tests.sh`, and Playwright plus screenshot evidence when UI or fullstack flows exist
57
+ - heavy gates are required at scaffold acceptance, integrated/full verification, and post-evaluation remediation re-acceptance
58
+ - a mid-phase extra heavy gate is allowed only when the risk profile justifies it, such as major runtime, infra, migration, auth, security, build, or cross-module integration changes
59
+ - planning acceptance, ordinary module acceptance, and routine in-phase verification are not heavy gates by default and should rely on targeted local verification unless the risk profile says otherwise
60
+
61
+ ## Testing cadence interpretation
62
+
63
+ - the first required `run_tests.sh` pass happens in scaffold once the clean foundation exists
64
+ - after scaffold, do not force `docker compose up --build` or `run_tests.sh` on every normal development step when faster local verification is sufficient
65
+ - prefer local targeted or native test commands during module implementation and ordinary verify-fix iteration
66
+ - local verification should run inside the current working directory using the project's own environment and tooling rather than hidden global assumptions
67
+ - during applicable fullstack or UI-bearing implementation work, require local Playwright on affected flows and review screenshots
68
+ - treat `docker compose up --build` and `run_tests.sh` as critical-gate verification commands for integrated/full verification, hard gates, and final-evaluation readiness rather than normal iteration tools
69
+ - the workflow owner handles those expensive critical-gate runs; do not require the developer to duplicate them during normal phase progression
70
+ - run `run_tests.sh` again at integrated/full verification
71
+ - integrated/full verification must also run Playwright for major flows and inspect screenshots
72
+ - run `run_tests.sh` again after post-evaluation remediation before re-acceptance
73
+ - after post-evaluation remediation affecting real flows or UI, rerun Playwright and inspect fresh screenshots before re-acceptance
74
+
75
+ ## Runtime gate interpretation
76
+
77
+ Use evidence such as Bead metadata, structured Bead comments, verification command results, and file/project-state checks.
78
+
79
+ - clarification requires the `clarification-gate` conditions plus explicit approval record
80
+ - development bootstrap requires the `developer-session-lifecycle` conditions plus a fresh planning-oriented start in the current working directory with working planning docs under `docs/`
81
+ - scaffold requires evidence for `docker compose up --build`, `run_tests.sh`, baseline logging/config, and when relevant the chosen frontend stack and UI approach being set intentionally
82
+ - scaffold also requires safe env/config handling, no persisted local secrets, real migration/runtime foundations, and a usable local test environment in the current working directory when practical
83
+ - when scaffold includes prompt-critical security controls, acceptance requires real runtime or endpoint verification of the protection rather than helper-only or shape-only proof
84
+ - for security-bearing scaffolds, require applicable rejection evidence such as stale replay rejection, nonce reuse rejection, CSRF rejection on protected mutations, lockout triggering when lockout is in scope, or equivalent proof that the control is truly enforced
85
+ - Dockerized scaffold acceptance also requires self-contained Compose namespacing, no unnecessary fragile `container_name` usage, and clean startup plus teardown behavior in the intended shared-environment model
86
+ - module implementation requires module planning notes, module definition of done, relevant local verification for the changed area, and for applicable fullstack or UI work local Playwright evidence with screenshots, plus docs sync and review acceptance
87
+ - module implementation also requires integration-seam verification against adjacent modules and cross-cutting concerns where relevant, and known release-facing or build failures block acceptance unless explicitly scoped out
88
+ - module implementation acceptance should also challenge tenant isolation, path confinement, sanitized error behavior, and prototype residue when those concerns are in scope
89
+ - integrated verification requires owner-run `docker compose up --build`, owner-run `run_tests.sh`, end-to-end, Playwright, prompt-alignment, README/runtime, and cross-module evidence
90
+ - fullstack integrated verification must include Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
91
+ - if a required flow cannot be exercised through the intended UI surface, treat that as incomplete implementation rather than acceptable E2E coverage
92
+ - hardening requires security, maintainability, exploratory, and release-freeze evidence
93
+ - hardening must explicitly re-check secret handling, redaction, and frontend/backend observability hygiene
94
+ - final evaluation readiness requires automated evaluation to be complete and triaged, with a clear go-to-packaging vs return-to-fixes decision
95
+ - remediation requires accepted issue records plus rerun verification, and after post-evaluation remediation it requires an owner-run fresh `run_tests.sh` pass and Playwright rerun where applicable before re-acceptance
96
+
97
+ ## Hardening and pre-evaluation discipline
98
+
99
+ When all planned modules are complete:
100
+
101
+ - run integrated verification
102
+ - run hardening and exploratory testing
103
+ - for fullstack applications, rerun Playwright coverage for major flows and inspect screenshots for frontend regressions or weak UX
104
+ - enforce release-candidate freeze
105
+ - allow only fixes, verification improvements, doc corrections, and packaging work
106
+ - prepare the package and evidence cleanly before the final evaluation decision gate
@@ -0,0 +1,275 @@
1
+ You are the reviewer responsible for Delivery Acceptance and Project Architecture Audit.
2
+
3
+ In the current working directory, review the project against the Business / Task Prompt and the Acceptance / Scoring Criteria.
4
+
5
+ [Business / Task Prompt]
6
+ {prompt}
7
+
8
+ [Acceptance / Scoring Criteria (the only authority)]
9
+ {
10
+
11
+ 1. Hard Gates
12
+
13
+ 1.1 Whether the delivered project can actually be run and verified
14
+
15
+ - Whether clear startup or run instructions are provided
16
+ - Whether the project can be started or run without modifying core code
17
+ - Whether the actual runtime behavior is broadly consistent with the delivery documentation
18
+
19
+ 1.2 Whether the delivered project materially deviates from the Prompt
20
+
21
+ - Whether the implementation is centered on the business goal or usage scenario described in the Prompt
22
+ - Whether there are major parts of the implementation that are only loosely related, or unrelated, to the Prompt
23
+ - Whether the project replaces, weakens, or ignores the core problem definition in the Prompt without justification
24
+
25
+ 2. Delivery Completeness
26
+
27
+ 2.1 Whether the delivered project fully covers the core requirements explicitly stated in the Prompt
28
+
29
+ - Whether all explicitly stated core functional requirements in the Prompt are implemented
30
+
31
+ 2.2 Whether the delivered project represents a basic end-to-end deliverable from 0 to 1, rather than a partial feature, illustrative implementation, or code fragment
32
+
33
+ - Whether mock / hardcoded behavior is used in place of real logic without explanation
34
+ - Whether the project includes a complete project structure rather than scattered code or a single-file example
35
+ - Whether basic project documentation is provided, such as a README or equivalent
36
+
37
+ 3. Engineering and Architecture Quality
38
+
39
+ 3.1 Whether the project adopts a reasonable engineering structure and module decomposition for the scale of the problem
40
+
41
+ - Whether the project structure is clear and module responsibilities are reasonably defined
42
+ - Whether the project contains redundant or unnecessary files
43
+ - Whether the implementation is excessively piled into a single file
44
+
45
+ 3.2 Whether the project shows basic maintainability and extensibility, rather than being a temporary or stacked implementation
46
+
47
+ - Whether there are obvious signs of chaotic structure or tight coupling
48
+ - Whether the core logic leaves room for extension rather than being completely hard-coded
49
+
50
+ 4. Engineering Details and Professionalism
51
+
52
+ 4.1 Whether the engineering details and overall shape reflect professional software practice, including but not limited to error handling, logging, validation, and API design
53
+
54
+ - Whether error handling is basically reliable and user-friendly
55
+ - Whether logs support troubleshooting rather than being random print statements or completely absent
56
+ - Whether necessary validation is present for key inputs and boundary conditions
57
+
58
+ 4.2 Whether the project is organized like a real product or service, rather than remaining at the level of an example or demo
59
+
60
+ - Whether the overall deliverable resembles a real application instead of a teaching sample or demonstration-only project
61
+
62
+ 5. Prompt Understanding and Requirement Fit
63
+
64
+ 5.1 Whether the project accurately understands and responds to the business goal, usage scenario, and implicit constraints described in the Prompt, rather than merely implementing surface-level technical features
65
+
66
+ - Whether the core business objective in the Prompt is implemented correctly
67
+ - Whether there are obvious misunderstandings of the requirement semantics or deviations from the actual problem
68
+ - Whether key constraints in the Prompt are changed or ignored without explanation
69
+
70
+ 6. Aesthetics (frontend-only / full-stack tasks only)
71
+
72
+ 6.1 Whether the visual and interaction design fits the scenario and demonstrates reasonable visual quality
73
+
74
+ - Whether different functional areas of the page are visually distinguishable through background, spacing, separation, or hierarchy
75
+ - Whether the overall layout is reasonable, and whether alignment, spacing, and proportions are broadly consistent
76
+ - Whether UI elements, including text, images, and icons, render and display correctly
77
+ - Whether visual elements are consistent with the page theme and textual content, and whether there are obvious mismatches between images, illustrations, decorative elements, and the actual content
78
+ - Whether basic interaction feedback is provided, such as hover states, click states, or transitions, so users can understand the current interaction state
79
+ - Whether fonts, font sizes, colors, and icon styles are generally consistent, without obvious visual inconsistency or mixed design language
80
+ }
81
+
82
+ Review Objective
83
+
84
+ Determine whether the delivered project is a credible, runnable, prompt-aligned, and minimally professional 0-to-1 deliverable.
85
+
86
+ Priority Order
87
+
88
+ 1. delivery runnability boundary
89
+ 2. prompt requirement fit
90
+ 3. security-critical flaws
91
+ 4. test sufficiency
92
+ 5. major engineering quality issues
93
+ 6. frontend aesthetics only if clearly applicable
94
+
95
+ Execution Rules
96
+
97
+ 1. Review only the highest-impact findings that can change the final verdict.
98
+ Do not perform exhaustive enumeration of every secondary or tertiary checklist item.
99
+
100
+ 2. Do not relax standards for:
101
+
102
+ - security
103
+ - prompt-fit
104
+ - delivery completeness
105
+ - test sufficiency
106
+ - evidence for material conclusions
107
+
108
+ 3. Do not skip any issue that could independently cause a Fail or Partial Pass verdict.
109
+
110
+ 4. If a security, prompt-fit, runnability, or core test-sufficiency issue is suspected, continue investigation until it is either evidenced or explicitly marked Cannot Confirm.
111
+
112
+ 5. Stop after either:
113
+
114
+ - identifying up to 10 findings total, or
115
+ - identifying up to 5 High / Blocker findings,
116
+ whichever comes first.
117
+
118
+ 6. Do not modify project code.
119
+
120
+ 7. Use evidence only for material conclusions.
121
+ For any conclusion that changes the final verdict, provide concrete, traceable evidence using file path + line number, tool output, or explicit runtime result.
122
+
123
+ 8. If evidence is insufficient, do not guess.
124
+ Use "Cannot Confirm" or explicitly label the assumption and its boundary.
125
+
126
+ 9. Perform runtime verification only when all of the following are true:
127
+
128
+ - the command is explicitly documented
129
+ - no Docker is required
130
+ - no Docker-related command is required
131
+ - no container orchestration is required
132
+ - no privileged system access is required
133
+ - no external network / third-party dependency is required
134
+ - expected completion is short
135
+
136
+ 10. Never run Docker-related commands.
137
+ This includes, but is not limited to:
138
+
139
+ - docker
140
+ - docker compose
141
+ - docker-compose
142
+ - podman
143
+ - container runtime / orchestration commands with equivalent effect
144
+
145
+ 11. If verification would require Docker or any container-related command, do not execute it.
146
+ Instead:
147
+
148
+ - state that Docker-based runtime verification was not performed
149
+ - treat it as a verification boundary, not automatically as a project defect
150
+ - provide the local reproduction command(s) the user can run
151
+ - state what was confirmed statically
152
+ - state what remains unconfirmed
153
+
154
+ 12. Docker non-execution is a verification constraint, not a project defect by itself.
155
+ Only report a defect if the project itself lacks runnable documentation, has broken setup logic, or shows static evidence of delivery failure.
156
+
157
+ 13. Security review has priority over style issues.
158
+ Always assess:
159
+
160
+ - authentication entry points
161
+ - route-level authorization
162
+ - object-level authorization
163
+ - tenant / user isolation
164
+
165
+ Expand security review further only if relevant code paths exist, such as:
166
+
167
+ - admin / internal / debug endpoints
168
+ - function-level authorization
169
+ - privilege escalation paths
170
+
171
+ 14. Test review is mandatory, but do not build a full requirement-to-test traceability matrix.
172
+ Assess only whether tests sufficiently cover:
173
+
174
+ - the core business happy path
175
+ - major failure paths such as validation failure, 401, 403, 404, 409 where relevant
176
+ - security-critical areas
177
+ - obvious high-risk boundaries directly relevant to the business flow
178
+
179
+ 15. For test coverage, state only:
180
+
181
+ - covered / partially covered / missing / cannot confirm
182
+ - one or two supporting evidence points
183
+ - the minimum additional test needed if coverage is weak
184
+
185
+ 16. Logging review is mandatory but concise.
186
+ Assess only:
187
+
188
+ - whether logging exists for meaningful troubleshooting
189
+ - whether logging categories are reasonably clear if present
190
+ - whether there is obvious sensitive-data leakage risk in logs or responses
191
+
192
+ 17. Mock / stub / fake behavior is not a defect by itself unless the Prompt or documentation requires real integration.
193
+ If present, explain only:
194
+
195
+ - the mock scope
196
+ - how it is enabled
197
+ - whether there is obvious accidental production-use risk
198
+
199
+ 18. Do not continue searching for additional low-severity issues after the final verdict is already supportable.
200
+
201
+ 19. Do not read unrelated files once enough evidence has been collected to support the verdict and top findings.
202
+
203
+ Required Output Format
204
+
205
+ Return exactly these sections:
206
+
207
+ 1. Verdict
208
+
209
+ - Pass / Partial Pass / Fail / Cannot Confirm
210
+
211
+ 2. Scope and Verification Boundary
212
+
213
+ - what was reviewed
214
+ - what was not executed
215
+ - whether Docker-based verification was required but not executed
216
+ - what remains unconfirmed
217
+
218
+ 3. Top Findings
219
+
220
+ - up to 10 findings only
221
+ - each finding must include:
222
+ - Severity: Blocker / High / Medium / Low
223
+ - Conclusion
224
+ - Brief rationale
225
+ - Evidence
226
+ - Impact
227
+ - Minimum actionable fix
228
+
229
+ 4. Security Summary
230
+
231
+ - authentication
232
+ - route authorization
233
+ - object-level authorization
234
+ - tenant / user isolation
235
+ For each, return:
236
+ - Pass / Partial Pass / Fail / Cannot Confirm
237
+ - brief evidence or verification boundary
238
+
239
+ 5. Test Sufficiency Summary
240
+ Return:
241
+
242
+ - Test Overview
243
+ - whether unit tests exist
244
+ - whether API / integration tests exist
245
+ - obvious test entry points if present
246
+ - Core Coverage
247
+ - happy path: covered / partial / missing / cannot confirm
248
+ - key failure paths: covered / partial / missing / cannot confirm
249
+ - security-critical coverage: covered / partial / missing / cannot confirm
250
+ - Major Gaps
251
+ - up to 3 highest-risk missing tests
252
+ - Final Test Verdict
253
+ - Pass / Partial Pass / Fail / Cannot Confirm
254
+
255
+ 6. Engineering Quality Summary
256
+ Assess only major maintainability / architecture concerns that materially affect delivery confidence.
257
+
258
+ 7. Next Actions
259
+
260
+ - up to 5 minimum actions only
261
+ - prioritize by severity and unblock value
262
+
263
+ Final Verification Before Output
264
+
265
+ Before finalizing, check all of the following:
266
+
267
+ 1. Does each material conclusion have supporting evidence?
268
+ 2. Are any claims stronger than the evidence supports?
269
+ 3. If unsupported observations are removed, does the final verdict still hold?
270
+ 4. Has any uncertain point been incorrectly presented as a confirmed fact?
271
+ 5. Has security or test sufficiency been judged too loosely without evidence?
272
+ 6. Has any Docker non-execution boundary been incorrectly described as a confirmed runtime failure?
273
+
274
+ If file writing is supported, save the final report to a markdown file.
275
+ Otherwise, return the report in-chat.