theslopmachine 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/MANUAL.md +63 -0
- package/README.md +23 -0
- package/RELEASE.md +81 -0
- package/assets/agents/developer.md +294 -0
- package/assets/agents/slopmachine.md +510 -0
- package/assets/skills/beads-operations/SKILL.md +75 -0
- package/assets/skills/clarification-gate/SKILL.md +51 -0
- package/assets/skills/developer-session-lifecycle/SKILL.md +75 -0
- package/assets/skills/final-evaluation-orchestration/SKILL.md +75 -0
- package/assets/skills/frontend-design/SKILL.md +41 -0
- package/assets/skills/get-overlays/SKILL.md +157 -0
- package/assets/skills/planning-gate/SKILL.md +68 -0
- package/assets/skills/submission-packaging/SKILL.md +268 -0
- package/assets/skills/verification-gates/SKILL.md +106 -0
- package/assets/slopmachine/backend-evaluation-prompt.md +275 -0
- package/assets/slopmachine/beads-init.js +428 -0
- package/assets/slopmachine/document-completeness.md +45 -0
- package/assets/slopmachine/engineering-results.md +59 -0
- package/assets/slopmachine/frontend-evaluation-prompt.md +304 -0
- package/assets/slopmachine/implementation-comparison.md +36 -0
- package/assets/slopmachine/quality-document.md +108 -0
- package/assets/slopmachine/templates/AGENTS.md +114 -0
- package/assets/slopmachine/utils/convert_ai_session.py +1837 -0
- package/assets/slopmachine/utils/strip_session_parent.py +66 -0
- package/bin/slopmachine.js +9 -0
- package/package.json +25 -0
- package/src/cli.js +32 -0
- package/src/constants.js +77 -0
- package/src/init.js +179 -0
- package/src/install.js +330 -0
- package/src/utils.js +162 -0
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: developer-session-lifecycle
|
|
3
|
+
description: Startup, canonical developer session persistence, recovery, and initial project structure rules for repo-cwd tracked development.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Developer Session Lifecycle
|
|
7
|
+
|
|
8
|
+
Use this skill during startup, tracked developer-session creation, and recovery.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill before starting the canonical developer session.
|
|
13
|
+
- Load it again during any recovery or session-consistency check.
|
|
14
|
+
- Treat it as internal orchestration guidance, not developer-visible text.
|
|
15
|
+
|
|
16
|
+
## Canonical developer session
|
|
17
|
+
|
|
18
|
+
- use the current working directory as the live codebase and start the fresh developer session in it
|
|
19
|
+
- ensure the parent project root has the required supporting structure, especially `../sessions/`
|
|
20
|
+
- record the developer session id immediately in Beads
|
|
21
|
+
- persist the developer session id in more than one durable place
|
|
22
|
+
- reuse that same session throughout development and remediation whenever possible
|
|
23
|
+
- if the developer session crashes, resume it and continue the same work loop
|
|
24
|
+
- if the owner process crashes, recovery happens externally; when restarted, recover from state and continue
|
|
25
|
+
|
|
26
|
+
## Startup contract
|
|
27
|
+
|
|
28
|
+
Expect to start from:
|
|
29
|
+
|
|
30
|
+
- a project prompt
|
|
31
|
+
- tech stack information when it is not already clear from the prompt
|
|
32
|
+
|
|
33
|
+
Optional startup inputs may include:
|
|
34
|
+
|
|
35
|
+
- task id
|
|
36
|
+
- project type
|
|
37
|
+
- explicit constraints or preferences
|
|
38
|
+
|
|
39
|
+
## Startup flow
|
|
40
|
+
|
|
41
|
+
1. receive the prompt and stack context
|
|
42
|
+
2. create `../.ai/metadata.json` for internal workflow state
|
|
43
|
+
3. initialize parent-root `../metadata.json` with the required schema and store the full prompt text in `prompt`
|
|
44
|
+
4. initialize root workflow state and top-level phase Beads
|
|
45
|
+
5. complete clarification using the clarification skill
|
|
46
|
+
6. stop for approval before development starts
|
|
47
|
+
7. ensure the parent project root has the required working structure, especially `../sessions/`
|
|
48
|
+
8. start the canonical developer session
|
|
49
|
+
9. send `Let's plan this project: <original-prompt>` as the first message in that session
|
|
50
|
+
10. wait for the developer's first exchange
|
|
51
|
+
11. send the approved clarification prompt as the next guidance message
|
|
52
|
+
12. continue orchestration from there
|
|
53
|
+
|
|
54
|
+
## Initial structure rule
|
|
55
|
+
|
|
56
|
+
- during development, working technical docs live under the current working directory `docs/`
|
|
57
|
+
- parent-root `../docs/` is a final delivery structure created or finalized during submission packaging
|
|
58
|
+
- parent-root `../sessions/` is the session artifact directory for exported conversation traces
|
|
59
|
+
|
|
60
|
+
## Recovery rule
|
|
61
|
+
|
|
62
|
+
- orchestrator restart is handled externally
|
|
63
|
+
- developer-session restart is your responsibility
|
|
64
|
+
- on recovery, read root Bead metadata, `../.ai/metadata.json`, `../metadata.json`, current phase Bead, latest `SESSION:` comment, latest unresolved `ISSUE:` comments, and any persistent session record before continuing
|
|
65
|
+
- treat resume as deterministic state recovery, not guesswork
|
|
66
|
+
|
|
67
|
+
## Session persistence rule
|
|
68
|
+
|
|
69
|
+
- store the canonical developer session id in root Bead metadata as `session_id`
|
|
70
|
+
- mirror it in a `SESSION:` comment on the root Bead
|
|
71
|
+
- mirror it in `../.ai/metadata.json`
|
|
72
|
+
- mirror it in parent-root `../metadata.json` as `session_id`
|
|
73
|
+
- once created, treat that session id as locked unless an explicit reset policy is introduced later
|
|
74
|
+
- if these records disagree, stop and resolve the inconsistency before continuing
|
|
75
|
+
- do not silently create a replacement primary developer session if the canonical one can still be resumed
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: final-evaluation-orchestration
|
|
3
|
+
description: Dual final-evaluation workflow, prompt composition, triage, report integrity, and bounded remediation-loop rules for repo-cwd execution.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Final Evaluation Orchestration
|
|
7
|
+
|
|
8
|
+
Use this skill only after integrated verification and hardening are complete enough for final evaluation.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill only during final evaluation and evaluation-driven remediation decisions.
|
|
13
|
+
- Treat it as internal orchestration guidance.
|
|
14
|
+
- Do not let evaluator findings automatically override your triage judgment.
|
|
15
|
+
|
|
16
|
+
## Prompt sources
|
|
17
|
+
|
|
18
|
+
- `~/slopmachine/backend-evaluation-prompt.md`
|
|
19
|
+
- `~/slopmachine/frontend-evaluation-prompt.md`
|
|
20
|
+
|
|
21
|
+
## Evaluation execution rules
|
|
22
|
+
|
|
23
|
+
- when the project reaches final-evaluation readiness, run two separate evaluations:
|
|
24
|
+
- backend/non-frontend evaluation using `~/slopmachine/backend-evaluation-prompt.md`
|
|
25
|
+
- frontend evaluation using `~/slopmachine/frontend-evaluation-prompt.md`
|
|
26
|
+
- read the full original project prompt from parent-root `../metadata.json` field `prompt`
|
|
27
|
+
- read the respective evaluation prompt file contents yourself before launching evaluation
|
|
28
|
+
- compose each evaluation request yourself as one large final prompt block
|
|
29
|
+
- prefix each evaluation request with a clear instruction that the reviewer must work in the current project directory and evaluate that delivered project
|
|
30
|
+
- inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content
|
|
31
|
+
- send that fully composed text block directly to the fresh `General` evaluator session
|
|
32
|
+
- never tell the evaluator to go read prompt files, metadata files, or evaluation template paths on its own
|
|
33
|
+
- never send only a path, filename, or shorthand reference and expect the evaluator to assemble the prompt itself
|
|
34
|
+
- run each evaluation in its own fresh ephemeral `General` session only
|
|
35
|
+
- never reuse, resume, or continue a prior evaluation session
|
|
36
|
+
- run the two evaluations sequentially, not in parallel, so shared runtime state, Docker containers, ports, databases, and artifacts do not conflict
|
|
37
|
+
- track backend and frontend evaluation status separately
|
|
38
|
+
- once backend evaluation passes, do not run backend evaluation again in later remediation rounds
|
|
39
|
+
- once frontend evaluation passes, do not run frontend evaluation again in later remediation rounds
|
|
40
|
+
- require each evaluation session to produce its own detailed evaluation report artifact
|
|
41
|
+
- always compare both evaluations against the original prompt for alignment, not just the delivered implementation
|
|
42
|
+
|
|
43
|
+
On the first evaluation round, preserve and use this instruction exactly inside both evaluation prompts/sessions:
|
|
44
|
+
|
|
45
|
+
"Please confirm whether the current project tests are genuine and effective rather than superficial or fake tests, whether the API tests actually invoke real HTTP endpoints, and whether they cover more than 90% of the overall API surface."
|
|
46
|
+
|
|
47
|
+
## Triage rules
|
|
48
|
+
|
|
49
|
+
- read both reports and merge the findings into one explicit triage set before deciding what happens next
|
|
50
|
+
- use the evaluator's priority ordering directly when triaging findings
|
|
51
|
+
- any finding marked `Blocker` or `High` must be returned to the original `Developer` session for remediation
|
|
52
|
+
- findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
|
|
53
|
+
- findings marked `Low` may be passed without remediation
|
|
54
|
+
- do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
|
|
55
|
+
- if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
|
|
56
|
+
- evaluation findings are advisory inputs to triage, not automatic implementation orders
|
|
57
|
+
- if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
|
|
58
|
+
- if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
|
|
59
|
+
- challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
|
|
60
|
+
- never edit or rewrite the evaluation report itself
|
|
61
|
+
- if you need to add context, disagreement, or justification, append it only as a clearly labeled `User comment/message` section at the bottom of the report
|
|
62
|
+
- do not enter remediation just because a new evaluation found something; the evaluator will almost always find additional issues
|
|
63
|
+
- once all `Blocker` and `High` findings are resolved, do not loop forever chasing every newly surfaced `Medium` or `Low` issue
|
|
64
|
+
- after each re-evaluation, decide explicitly whether the remaining findings still justify another remediation round or whether the project is now qualified to package
|
|
65
|
+
|
|
66
|
+
## Remediation loop
|
|
67
|
+
|
|
68
|
+
- route accepted blocking issues back into remediation in the same long-lived `Developer` session
|
|
69
|
+
- after remediation, rerun full verification before any re-evaluation:
|
|
70
|
+
- `docker compose up --build`
|
|
71
|
+
- `run_tests.sh`
|
|
72
|
+
- Playwright where applicable, with fresh screenshots
|
|
73
|
+
- rerun only the evaluation tracks that have not already passed, each in a brand new fresh `General` session and still sequentially
|
|
74
|
+
- keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
|
|
75
|
+
- remember the external process allows a maximum of 3 repair rounds
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: frontend-design
|
|
3
|
+
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
|
7
|
+
|
|
8
|
+
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
|
9
|
+
|
|
10
|
+
## Design Thinking
|
|
11
|
+
|
|
12
|
+
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
|
13
|
+
- **Purpose**: What problem does this interface solve? Who uses it?
|
|
14
|
+
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
|
15
|
+
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
|
16
|
+
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
|
17
|
+
|
|
18
|
+
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
|
19
|
+
|
|
20
|
+
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
|
21
|
+
- Production-grade and functional
|
|
22
|
+
- Visually striking and memorable
|
|
23
|
+
- Cohesive with a clear aesthetic point-of-view
|
|
24
|
+
- Meticulously refined in every detail
|
|
25
|
+
|
|
26
|
+
## Frontend Aesthetics Guidelines
|
|
27
|
+
|
|
28
|
+
Focus on:
|
|
29
|
+
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
|
30
|
+
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
|
31
|
+
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
|
32
|
+
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
|
33
|
+
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
|
34
|
+
|
|
35
|
+
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
|
36
|
+
|
|
37
|
+
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
|
38
|
+
|
|
39
|
+
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
|
40
|
+
|
|
41
|
+
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: get-overlays
|
|
3
|
+
description: Loads the developer phase overlays so the workflow owner can extract only the relevant guidance for the current implementation phase without naming workflow internals to the developer.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Get Overlays
|
|
7
|
+
|
|
8
|
+
Use this skill when you need the detailed developer overlay guidance for one of the active implementation phases.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill when entering an overlay-backed developer phase or when switching to a different overlay-backed developer phase.
|
|
13
|
+
- Do not mention the skill or the word `overlay` to the developer.
|
|
14
|
+
- Treat this content as internal scaffolding for composing a natural developer message.
|
|
15
|
+
- Pass only the relevant guidance for the current engineering step, not the whole section verbatim unless the moment truly needs it.
|
|
16
|
+
- Prefer short, natural teammate-style prompts.
|
|
17
|
+
|
|
18
|
+
## Phase mapping
|
|
19
|
+
|
|
20
|
+
- `P2 Development Bootstrap and Planning` -> `Planning And Design`
|
|
21
|
+
- `P3 Scaffold and Foundation` -> `Scaffold And Foundation`
|
|
22
|
+
- `P4 Module Implementation` -> `Module Implementation`
|
|
23
|
+
- `P5 Ongoing Verification` -> `Verification And Review`
|
|
24
|
+
- `P6 Integrated Verification` -> `Verification And Review`
|
|
25
|
+
- `P7 Hardening` -> `Hardening`
|
|
26
|
+
- `P9 Remediation` -> `Remediation`
|
|
27
|
+
- `P10 Submission Packaging` -> `Packaging Preparation`
|
|
28
|
+
|
|
29
|
+
## Planning And Design
|
|
30
|
+
|
|
31
|
+
- start from the actual project prompt and build the plan from there
|
|
32
|
+
- carry the settled project requirements forward consistently as you plan
|
|
33
|
+
- identify the hard non-negotiable requirements early and do not quietly trade them away for implementation convenience
|
|
34
|
+
- break the problem into explicit requirements, constraints, flows, boundaries, and edge cases
|
|
35
|
+
- map each meaningful requirement to its owning module, visible UI/API surface, failure behavior, test target, and final acceptance check
|
|
36
|
+
- create or update working design notes and API/spec notes when relevant
|
|
37
|
+
- keep the spec focused on required behavior rather than turning it into a progress or completion narrative
|
|
38
|
+
- define major modules as meaningful delivery units, not arbitrary folders
|
|
39
|
+
- for fullstack work, map frontend surfaces, routes, components, and state boundaries to the backend modules and contracts that support them
|
|
40
|
+
- define failure paths, permissions, validation, logging, runtime assumptions, and test strategy before coding
|
|
41
|
+
- define logging and observability expectations for both frontend and backend
|
|
42
|
+
- call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
|
|
43
|
+
- define end-to-end coverage for major user flows before coding
|
|
44
|
+
- for fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable
|
|
45
|
+
- when UI-bearing flows are material, explicitly plan screenshot review as part of Playwright verification so UI correctness is checked, not just browser success
|
|
46
|
+
- aim for at least 90 percent meaningful coverage of the relevant behavior surface
|
|
47
|
+
- define verification strategy, Docker expectations, and documentation implications before coding
|
|
48
|
+
- make the plan detailed enough to guide real implementation and later verification
|
|
49
|
+
- review the module map and make sure it is stable before deeper implementation begins
|
|
50
|
+
- do not move into deeper implementation with vague architecture or unstable module boundaries
|
|
51
|
+
|
|
52
|
+
## Scaffold And Foundation
|
|
53
|
+
|
|
54
|
+
- create the initial project structure intentionally
|
|
55
|
+
- establish Docker as the main runtime contract
|
|
56
|
+
- create `run_tests.sh` as the standard test entrypoint
|
|
57
|
+
- create required testing directories and baseline docs structure
|
|
58
|
+
- put baseline config and logging structure in place
|
|
59
|
+
- put migrations, worker/job foundation, and real runtime health surfaces in place when the project needs them
|
|
60
|
+
- keep real secrets out of the repository and rely on Docker-managed runtime injection for any sensitive values
|
|
61
|
+
- keep committed env files to placeholders or clearly non-production defaults only
|
|
62
|
+
- remove prototype residue from runtime foundations: no placeholder titles, hidden setup, fake defaults, or seeded live-path assumptions
|
|
63
|
+
- make prompt-critical runtime behavior visible in the scaffold instead of hand-waving it for later, especially offline, worker, backup, or HTTPS requirements
|
|
64
|
+
- establish README structure early instead of leaving it until the end
|
|
65
|
+
- prove the scaffold in a clean state before deeper feature work
|
|
66
|
+
- verify `docker compose up` and `run_tests.sh` in the clean scaffold state
|
|
67
|
+
- do not treat scaffold as placeholder boilerplate or rely on hidden setup
|
|
68
|
+
|
|
69
|
+
## Module Implementation
|
|
70
|
+
|
|
71
|
+
- work module by module in vertical slices
|
|
72
|
+
- define lightweight planning notes for the module before coding
|
|
73
|
+
- define the module's purpose, constraints, and edge cases before coding
|
|
74
|
+
- keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
|
|
75
|
+
- implement real behavior, not partial scattered logic
|
|
76
|
+
- handle failure paths and boundary conditions
|
|
77
|
+
- add or update tests as part of the module work
|
|
78
|
+
- prefer fast local language-native or framework-native test commands for the changed area during normal iteration
|
|
79
|
+
- set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
|
|
80
|
+
- if the local toolchain is missing, try to install or enable it before falling back to `run_tests.sh`
|
|
81
|
+
- for applicable fullstack or UI-bearing work, run local Playwright on the affected flows during implementation and inspect screenshots to confirm the UI actually matches
|
|
82
|
+
- make sure the module is moving toward full definition-of-done completion, not just happy-path completion
|
|
83
|
+
- keep auth, authorization, ownership, validation, and logging concerns in view when relevant
|
|
84
|
+
- keep frontend and backend contracts synchronized when the module spans both sides
|
|
85
|
+
- do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
|
|
86
|
+
- when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
|
|
87
|
+
- do not leave computed-but-unrendered or partially surfaced requirement behavior in place
|
|
88
|
+
- do not ship frontend screens with demo/debug/setup messaging or development-only status text; product UI should serve the real workflow only
|
|
89
|
+
- use the `frontend-design` skill for frontend component or page work
|
|
90
|
+
- use the `frontend-design` skill during frontend/UI verification when reviewing Playwright screenshots and tightening the interface
|
|
91
|
+
- do not hardcode secrets or persist local sensitive values in the repo while implementing
|
|
92
|
+
- update relevant docs when behavior changes
|
|
93
|
+
- verify the module against its planned behavior before trying to move on
|
|
94
|
+
- do not move on while the module is still obviously weak or half-finished
|
|
95
|
+
|
|
96
|
+
## Verification And Review
|
|
97
|
+
|
|
98
|
+
- run the relevant tests for the changed behavior
|
|
99
|
+
- during normal in-phase verification, prefer the fastest meaningful local test commands for the changed area
|
|
100
|
+
- if local tooling is unavailable, try to install or enable the repo-local test setup when practical; otherwise fall back to `run_tests.sh`
|
|
101
|
+
- the workflow owner handles the expensive critical-gate runs for `docker compose up --build` and `run_tests.sh`; use local verification to prepare for those gates rather than duplicating them casually
|
|
102
|
+
- integrated/full verification should rely on owner-run `docker compose up --build`, owner-run `run_tests.sh`, and Playwright gate evidence
|
|
103
|
+
- after post-evaluation remediation, strengthen local verification and rerun affected Playwright checks so the next owner-run gate pass is likely to succeed
|
|
104
|
+
- for applicable fullstack or UI-bearing work, run Playwright for the affected flows in-phase, capture screenshots, and verify the UI behavior and quality directly
|
|
105
|
+
- rerun the runtime startup path when the phase expects it
|
|
106
|
+
- verify requirement closure, not just feature existence
|
|
107
|
+
- verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
|
|
108
|
+
- verify end-to-end flow behavior where the change affects real workflows
|
|
109
|
+
- for fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
|
|
110
|
+
- use `frontend-design` while reviewing frontend screenshots so UI issues are challenged, not just functionally observed
|
|
111
|
+
- verify screenshots do not contain demo placeholders, scaffold instructions, debug notices, or other development-only UI leakage
|
|
112
|
+
- verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
|
|
113
|
+
- verify required remediation guidance is actually visible to the user, not just computed internally
|
|
114
|
+
- verify security-sensitive behavior where applicable
|
|
115
|
+
- verify secrets are not committed, hardcoded, or leaking through logs/config/docs
|
|
116
|
+
- verify docs do not overstate implementation completeness or claim behavior that is only partial
|
|
117
|
+
- verify docs still match implementation reality
|
|
118
|
+
- trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
|
|
119
|
+
- do not treat a module as done until functional behavior, failure behavior, tests, docs, security considerations, and required runtime verification are all in place
|
|
120
|
+
- call out weak evidence, missing coverage, or unresolved issues plainly
|
|
121
|
+
- do not treat developer claims as enough without real verification
|
|
122
|
+
|
|
123
|
+
## Hardening
|
|
124
|
+
|
|
125
|
+
- audit security boundaries, validation, ownership, and secret handling
|
|
126
|
+
- audit env/config paths so sensitive values are injected safely and are not baked into committed files or images
|
|
127
|
+
- inspect architecture, coupling, file size, and maintainability risks
|
|
128
|
+
- check for bad engineering practices that accumulated during implementation
|
|
129
|
+
- tighten weak tests, weak docs, and weak operational instructions
|
|
130
|
+
- run exploratory testing around awkward states, repeated actions, and realistic edge behavior
|
|
131
|
+
- re-check frontend and backend observability, redaction, and operator visibility paths
|
|
132
|
+
- run a prompt-fidelity sweep for silent requirement substitution, partially delivered hard requirements, and frontend/backend mismatch
|
|
133
|
+
- run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
|
|
134
|
+
- re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
|
|
135
|
+
- enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
|
|
136
|
+
- make sure the system is genuinely reviewable and reproducible
|
|
137
|
+
|
|
138
|
+
## Packaging Preparation
|
|
139
|
+
|
|
140
|
+
- move or copy the final development docs into the required root `docs/` structure
|
|
141
|
+
- make sure package structure matches the blueprint exactly
|
|
142
|
+
- verify required artifacts are present and named correctly
|
|
143
|
+
- prepare screenshots, proof materials, and other delivery evidence
|
|
144
|
+
- review package cleanliness so caches, junk, and local-only files are not included
|
|
145
|
+
- verify no local secret files or sensitive values are included in the package
|
|
146
|
+
- verify docs, specs, and runtime instructions describe what is actually delivered rather than what was only planned or partially built
|
|
147
|
+
- archive or exclude transient review artifacts so delivery evidence stays intentional and clean
|
|
148
|
+
|
|
149
|
+
## Remediation
|
|
150
|
+
|
|
151
|
+
- focus only on accepted defects and the work needed to fix them cleanly
|
|
152
|
+
- fix the issue completely instead of layering hacks on top
|
|
153
|
+
- trace the fix back to the original requirement so the remediation restores fidelity instead of only hiding the symptom
|
|
154
|
+
- rerun the relevant verification after each fix
|
|
155
|
+
- if the issue exposed drift, docs overclaim, or missing acceptance coverage, repair that too before closing the issue
|
|
156
|
+
- update docs if behavior or instructions changed
|
|
157
|
+
- report exactly what was fixed, what was rerun, and what still looks risky if anything remains
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: planning-gate
|
|
3
|
+
description: Owner-side planning acceptance, cross-document consistency, and decomposition gate rules for repo-cwd blueprint-driven projects.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Planning Gate
|
|
7
|
+
|
|
8
|
+
Use this skill during `P2 Development Bootstrap and Planning` when reviewing, tightening, or accepting the first real technical plan.
|
|
9
|
+
|
|
10
|
+
## Usage rules
|
|
11
|
+
|
|
12
|
+
- Load this skill before accepting planning, before declaring the plan sufficient, and before creating deep execution sub-beads from the plan.
|
|
13
|
+
- Treat it as owner-side planning gate guidance, not developer-visible text.
|
|
14
|
+
- Use `get-overlays` as the source of truth for developer-facing planning guidance.
|
|
15
|
+
- Use this skill as the source of truth for owner-side planning acceptance and decomposition readiness.
|
|
16
|
+
|
|
17
|
+
## Core planning gate
|
|
18
|
+
|
|
19
|
+
- the developer should produce the first in-depth technical plan
|
|
20
|
+
- do not create deep execution sub-beads before the technical plan is accepted
|
|
21
|
+
- do not accept planning that reduces, weakens, narrows, or silently reinterprets the original prompt
|
|
22
|
+
- declare prompt-critical planning acceptance criteria before accepting the first planning pass when those criteria are already visible from the prompt
|
|
23
|
+
- require relevant cross-cutting system contracts to be explicitly planned rather than left to per-module invention
|
|
24
|
+
|
|
25
|
+
## Cross-document discipline
|
|
26
|
+
|
|
27
|
+
- require working planning docs under `docs/` when relevant, especially `docs/design.md`, `docs/api-spec.md`, and `docs/test-coverage.md`
|
|
28
|
+
- require cross-document consistency so design, API/spec, and test-planning artifacts do not drift on lifecycle/state models, permissions, flow coverage, or operational behavior
|
|
29
|
+
- if planning docs disagree on core system behavior, planning is still in progress
|
|
30
|
+
|
|
31
|
+
## Cross-cutting planning requirements
|
|
32
|
+
|
|
33
|
+
- require shared lifecycle and state models to be aligned across planning artifacts when the product has meaningful workflow state
|
|
34
|
+
- require explicit cross-cutting system contracts when relevant, especially:
|
|
35
|
+
- error normalization and user-visible error behavior
|
|
36
|
+
- audit/logging and redaction patterns
|
|
37
|
+
- permission alignment across UI, route guards, and API enforcement
|
|
38
|
+
- state-transition and context-switch behavior
|
|
39
|
+
- auth/session edge cases such as expiry, refresh, or clock skew tolerance
|
|
40
|
+
- when the prompt says behavior is configurable, require the real configuration surface, permissions, operator flow, and backend support to be planned explicitly
|
|
41
|
+
- when a feature must be admin-manageable or operator-manageable, require a real usable UI surface for that management flow, not just API endpoints or data-model notes
|
|
42
|
+
|
|
43
|
+
## Architecture-depth requirements
|
|
44
|
+
|
|
45
|
+
- for complex security, offline, sync, authorization, storage, or data-governance features, define what `done` means across all prompt-promised dimensions rather than accepting a partial foundation or hook layer
|
|
46
|
+
- define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
|
|
47
|
+
- define frontend validation and accessibility expectations when the product surface materially depends on them
|
|
48
|
+
- if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
|
|
49
|
+
|
|
50
|
+
## Planning acceptance checklist
|
|
51
|
+
|
|
52
|
+
Before accepting planning, apply this checklist when relevant:
|
|
53
|
+
|
|
54
|
+
- major user-facing flows are mapped to backend support and verification targets
|
|
55
|
+
- frontend route, page, component, and state boundaries are planned when the UI is material
|
|
56
|
+
- configurable behaviors are concretely planned where the prompt requires configurability
|
|
57
|
+
- lifecycle and state models are aligned across design and API/spec artifacts
|
|
58
|
+
- prompt-critical operational obligations and operator visibility paths are concretely planned
|
|
59
|
+
- prompt-literal storage, partitioning, indexing, retention, or performance requirements are explicitly represented
|
|
60
|
+
- relevant cross-cutting system contracts are explicitly defined rather than left to per-module invention
|
|
61
|
+
- each major module has a clear integration contract with existing modules and shared patterns
|
|
62
|
+
- verification plans include cross-module seam checks, not just isolated feature tests
|
|
63
|
+
|
|
64
|
+
## Exit conditions
|
|
65
|
+
|
|
66
|
+
- the first real technical plan is accepted against this gate
|
|
67
|
+
- planning artifacts are internally consistent enough to guide implementation
|
|
68
|
+
- deep execution sub-beads can be created from the accepted plan without guesswork
|