theslopmachine 0.3.7 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/MANUAL.md +13 -9
  2. package/README.md +163 -3
  3. package/RELEASE.md +11 -3
  4. package/assets/agents/developer-v2.md +86 -0
  5. package/assets/agents/developer.md +21 -23
  6. package/assets/agents/slopmachine-v2.md +219 -0
  7. package/assets/agents/slopmachine.md +56 -38
  8. package/assets/skills/beads-operations/SKILL.md +32 -31
  9. package/assets/skills/beads-operations-v2/SKILL.md +82 -0
  10. package/assets/skills/clarification-gate/SKILL.md +8 -1
  11. package/assets/skills/clarification-gate-v2/SKILL.md +74 -0
  12. package/assets/skills/developer-session-lifecycle/SKILL.md +45 -14
  13. package/assets/skills/developer-session-lifecycle-v2/SKILL.md +148 -0
  14. package/assets/skills/development-guidance-v2/SKILL.md +60 -0
  15. package/assets/skills/evaluation-triage-v2/SKILL.md +38 -0
  16. package/assets/skills/final-evaluation-orchestration/SKILL.md +9 -11
  17. package/assets/skills/final-evaluation-orchestration-v2/SKILL.md +57 -0
  18. package/assets/skills/get-overlays/SKILL.md +77 -6
  19. package/assets/skills/hardening-gate-v2/SKILL.md +64 -0
  20. package/assets/skills/integrated-verification-v2/SKILL.md +47 -0
  21. package/assets/skills/owner-evidence-discipline-v2/SKILL.md +15 -0
  22. package/assets/skills/planning-gate/SKILL.md +6 -4
  23. package/assets/skills/planning-gate-v2/SKILL.md +91 -0
  24. package/assets/skills/planning-guidance-v2/SKILL.md +100 -0
  25. package/assets/skills/remediation-guidance-v2/SKILL.md +31 -0
  26. package/assets/skills/report-output-discipline-v2/SKILL.md +15 -0
  27. package/assets/skills/scaffold-guidance-v2/SKILL.md +57 -0
  28. package/assets/skills/session-rollover-v2/SKILL.md +41 -0
  29. package/assets/skills/submission-packaging/SKILL.md +147 -115
  30. package/assets/skills/submission-packaging-v2/SKILL.md +142 -0
  31. package/assets/skills/verification-gates/SKILL.md +44 -16
  32. package/assets/skills/verification-gates-v2/SKILL.md +102 -0
  33. package/assets/slopmachine/backend-evaluation-prompt.md +9 -2
  34. package/assets/slopmachine/frontend-evaluation-prompt.md +9 -2
  35. package/assets/slopmachine/templates/AGENTS-v2.md +55 -0
  36. package/assets/slopmachine/templates/AGENTS.md +20 -17
  37. package/assets/slopmachine/tracker-init.js +104 -0
  38. package/assets/slopmachine/workflow-init-v2.js +99 -0
  39. package/package.json +1 -1
  40. package/src/constants.js +22 -3
  41. package/src/init.js +33 -28
  42. package/src/install.js +186 -140
  43. package/src/utils.js +19 -0
  44. package/assets/slopmachine/beads-init.js +0 -439
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: developer-session-lifecycle
3
- description: Startup, canonical developer session persistence, recovery, and initial project structure rules for repo-cwd tracked development.
3
+ description: Startup, primary developer-session persistence, recovery, and initial project structure rules for repo-cwd tracked development.
4
4
  ---
5
5
 
6
6
  # Developer Session Lifecycle
@@ -9,15 +9,16 @@ Use this skill during startup, tracked developer-session creation, and recovery.
9
9
 
10
10
  ## Usage rules
11
11
 
12
- - Load this skill before starting the canonical developer session.
12
+ - Load this skill before starting the main developer session.
13
13
  - Load it again during any recovery or session-consistency check.
14
14
  - Treat it as internal orchestration guidance, not developer-visible text.
15
+ - outside the initial clarification approval stop, do not use this skill to create extra pause points; startup should continue directly into development bootup once approval exists
15
16
 
16
- ## Canonical developer session
17
+ ## Main developer session
17
18
 
18
19
  - use the current working directory as the live codebase and start the fresh developer session in it
19
20
  - ensure the parent project root has the required supporting structure, especially `../sessions/`
20
- - record the developer session id immediately in Beads
21
+ - record the developer session id immediately in the tracker and `.ai/metadata.json`
21
22
  - persist the developer session id in more than one durable place
22
23
  - reuse that same session throughout development and remediation whenever possible
23
24
  - if the developer session crashes, resume it and continue the same work loop
@@ -41,35 +42,65 @@ Optional startup inputs may include:
41
42
  1. receive the prompt and stack context
42
43
  2. create `../.ai/metadata.json` for internal workflow state
43
44
  3. initialize parent-root `../metadata.json` with the required schema and store the full prompt text in `prompt`
44
- 4. initialize root workflow state and top-level phase Beads
45
+ 4. initialize root workflow state and top-level phase tracker items
45
46
  5. complete clarification using the clarification skill
46
- 6. stop for approval before development starts
47
- 7. ensure the parent project root has the required working structure, especially `../sessions/`
48
- 8. start the canonical developer session
47
+ 6. wait only for the initial clarification approval before development starts
48
+ 7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
49
+ 8. start the main developer session
49
50
  9. send `Let's plan this project: <original-prompt>` as the first message in that session
50
51
  10. wait for the developer's first exchange
51
52
  11. send the approved clarification prompt as the next guidance message
52
53
  12. continue orchestration from there
53
54
 
55
+ ## Metadata and workflow files
56
+
57
+ - the internal workflow mirror is `../.ai/metadata.json`
58
+ - the internal clarification artifact is `../.ai/clarification-prompt.md`
59
+ - the project artifact metadata file is `../metadata.json`
60
+ - do not use `../metadata.json` as internal workflow scratch state
61
+
62
+ Required internal workflow fields in `../.ai/metadata.json`:
63
+
64
+ - `workflow_state`
65
+ - `current_phase_item`
66
+ - `session_id`
67
+ - `clarification_approved`
68
+ - `awaiting_human`
69
+ - `remediation_round`
70
+
71
+ Required project metadata fields in `../metadata.json`:
72
+
73
+ - `prompt`
74
+ - `project_type`
75
+ - `frontend_language`
76
+ - `backend_language`
77
+ - `database`
78
+ - `session_id`
79
+ - `frontend_framework`
80
+ - `backend_framework`
81
+
82
+ - maintain `../metadata.json` as a real project artifact from the beginning of the workflow
83
+ - fill known values immediately and keep the file current as the project becomes clearer
84
+ - prefer explicit values; use `null` only when a field is genuinely unknown or not applicable
85
+
54
86
  ## Initial structure rule
55
87
 
56
- - during development, working technical docs live under the current working directory `docs/`
57
- - parent-root `../docs/` is a final delivery structure created or finalized during submission packaging
88
+ - parent-root `../docs/` is the owner-maintained external documentation directory
89
+ - do not treat repo-local `docs/` as the active documentation location in `v3`
58
90
  - parent-root `../sessions/` is the session artifact directory for exported conversation traces
59
91
 
60
92
  ## Recovery rule
61
93
 
62
94
  - orchestrator restart is handled externally
63
95
  - developer-session restart is your responsibility
64
- - on recovery, read root Bead metadata, `../.ai/metadata.json`, `../metadata.json`, current phase Bead, latest `SESSION:` comment, latest unresolved `ISSUE:` comments, and any persistent session record before continuing
96
+ - on recovery, read `../.ai/metadata.json`, `../metadata.json`, current phase tracker item, latest `SESSION:` comment, latest unresolved `ISSUE:` comments, and any persistent session record before continuing
65
97
  - treat resume as deterministic state recovery, not guesswork
66
98
 
67
99
  ## Session persistence rule
68
100
 
69
- - store the canonical developer session id in root Bead metadata as `session_id`
70
- - mirror it in a `SESSION:` comment on the root Bead
101
+ - store the main developer session id in the root tracker item comments using `SESSION:`
71
102
  - mirror it in `../.ai/metadata.json`
72
103
  - mirror it in parent-root `../metadata.json` as `session_id`
73
104
  - once created, treat that session id as locked unless an explicit reset policy is introduced later
74
105
  - if these records disagree, stop and resolve the inconsistency before continuing
75
- - do not silently create a replacement primary developer session if the canonical one can still be resumed
106
+ - do not silently create a replacement main developer session if the existing one can still be resumed
@@ -0,0 +1,148 @@
1
+ ---
2
+ name: developer-session-lifecycle-v2
3
+ description: Startup, resume detection, metadata consistency, and developer-session recovery rules for slopmachine-v2.
4
+ ---
5
+
6
+ # Developer Session Lifecycle v2
7
+
8
+ Use this skill during `P0 Intake and Setup` and whenever startup or recovery state is uncertain.
9
+
10
+ ## Purpose
11
+
12
+ - detect whether the run is new or resumed
13
+ - initialize or recover workflow metadata consistently
14
+ - initialize the planned bounded developer-session slots
15
+ - recover the current active developer session when one already exists
16
+
17
+ ## Usage rules
18
+
19
+ - keep startup and recovery in one skill; both begin from the same state inspection problem
20
+ - treat this as internal orchestration guidance, not developer-visible text
21
+ - do not launch the developer during `P0` or `P1`
22
+ - do not use this skill to create extra approval stops beyond the two allowed human gates
23
+
24
+ ## State inspection sequence
25
+
26
+ Inspect:
27
+
28
+ - Beads root state and current phase
29
+ - `../.ai/metadata.json`
30
+ - `../metadata.json`
31
+ - existing session comments or recorded session ids
32
+
33
+ Decide whether the run is:
34
+
35
+ - a fresh startup
36
+ - a resumed run with consistent state
37
+ - a run needing consistency repair before it can continue
38
+
39
+ ## Startup contract
40
+
41
+ Expect to start from:
42
+
43
+ - a project prompt
44
+ - tech stack information when it is not already clear from the prompt
45
+
46
+ Optional startup inputs may include:
47
+
48
+ - task id
49
+ - project type
50
+ - explicit constraints or preferences
51
+
52
+ ## Startup flow
53
+
54
+ 1. receive the prompt and stack context
55
+ 2. create `../.ai/metadata.json` for internal workflow state
56
+ 3. initialize parent-root `../metadata.json` with the required schema and store the full prompt text in `prompt`
57
+ 4. initialize root workflow state and top-level phase Beads items
58
+ 5. complete clarification using the clarification skill
59
+ 6. wait only for the initial clarification approval before development starts
60
+ 7. ensure the parent project root has the required working structure, especially `../sessions/` and `../docs/`
61
+ 8. initialize the bounded developer-session slots
62
+ 9. start the build developer session only after `P2` is ready to begin
63
+ 10. send `Let's plan this project: <original-prompt>` as the first message in that session
64
+ 11. wait for the developer's first exchange
65
+ 12. send the approved clarification prompt as the next guidance message
66
+ 13. continue orchestration from there
67
+
68
+ ## Required startup outputs
69
+
70
+ - root workflow state exists
71
+ - `../.ai/metadata.json` exists
72
+ - `../metadata.json` exists
73
+ - planned developer session slots are initialized
74
+ - required parent-root directories exist
75
+
76
+ ## Metadata and workflow files
77
+
78
+ - the internal workflow mirror is `../.ai/metadata.json`
79
+ - the internal clarification artifact is `../.ai/clarification-prompt.md`
80
+ - the project artifact metadata file is `../metadata.json`
81
+ - do not use `../metadata.json` as internal workflow scratch state
82
+
83
+ ## Suggested metadata fields
84
+
85
+ Track at least:
86
+
87
+ - `current_phase`
88
+ - `awaiting_human`
89
+ - `clarification_approved`
90
+ - `remediation_round`
91
+ - `developer_sessions`
92
+ - `active_developer_session_index`
93
+
94
+ Each planned developer session record should include enough to recover it later, such as:
95
+
96
+ - `index`
97
+ - `label`
98
+ - `phase_group`
99
+ - `session_id`
100
+ - `status`
101
+ - `handoff_in`
102
+ - `handoff_out`
103
+
104
+ Required project metadata fields in `../metadata.json` when relevant:
105
+
106
+ - `prompt`
107
+ - `project_type`
108
+ - `frontend_language`
109
+ - `backend_language`
110
+ - `database`
111
+ - `session_id`
112
+ - `frontend_framework`
113
+ - `backend_framework`
114
+
115
+ - maintain `../metadata.json` as a real project artifact from the beginning of the workflow
116
+ - fill known values immediately and keep the file current as the project becomes clearer
117
+ - prefer explicit values; use `null` only when a field is genuinely unknown or not applicable
118
+
119
+ ## Bounded session model
120
+
121
+ Track up to three planned developer sessions:
122
+
123
+ 1. build
124
+ 2. stabilization
125
+ 3. remediation
126
+
127
+ Later session slots may remain unused if the workflow never needs them.
128
+
129
+ ## Initial structure rule
130
+
131
+ - parent-root `../docs/` is the owner-maintained external documentation directory
132
+ - parent-root `../sessions/` is the session artifact directory for exported conversation traces
133
+ - do not treat repo-local `docs/` as the active external documentation location
134
+
135
+ ## Recovery rule
136
+
137
+ - if session records disagree, stop and resolve the inconsistency before continuing
138
+ - if the current phase already has an active developer session, recover that session instead of silently creating a new one
139
+ - treat resume as deterministic recovery, not guesswork
140
+ - on recovery, read `../.ai/metadata.json`, `../metadata.json`, current phase Beads item, latest `SESSION:` comment, latest unresolved `ISSUE:` comments, and any persistent session record before continuing
141
+
142
+ ## Session persistence rule
143
+
144
+ - store the active developer session id in Beads comments using `SESSION:`
145
+ - mirror it in `../.ai/metadata.json`
146
+ - mirror the active session id in parent-root `../metadata.json` as `session_id`
147
+ - if these records disagree, stop and resolve the inconsistency before continuing
148
+ - do not silently create a replacement developer session if the intended existing one can still be resumed
@@ -0,0 +1,60 @@
1
+ ---
2
+ name: development-guidance-v2
3
+ description: Developer-facing slice execution and local verification guidance for slopmachine-v2.
4
+ ---
5
+
6
+ # Development Guidance v2
7
+
8
+ Use this skill during `P4 Development` before prompting the developer.
9
+
10
+ ## Slice model
11
+
12
+ - work in bounded vertical slices
13
+ - complete the real user-facing and admin-facing surface for the slice
14
+ - keep slice-local planning, implementation, verification, and doc sync together
15
+
16
+ ## Module implementation guidance
17
+
18
+ - define lightweight planning notes for the module before coding
19
+ - define the module purpose, constraints, and edge cases before coding
20
+ - keep the original requirement and clarified interpretation visible while implementing so the module does not silently drift
21
+ - implement real behavior, not partial scattered logic
22
+ - handle failure paths and boundary conditions
23
+ - add or update tests as part of the module work
24
+ - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
25
+ - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
26
+ - keep frontend and backend contracts synchronized when the module spans both sides
27
+ - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
28
+ - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
29
+ - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
30
+ - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
31
+ - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
32
+ - perform a clean-slate sweep before reporting module completion: remove seeded credentials, weak demo defaults, test-account hints, prototype residue, and other production-inappropriate artifacts
33
+ - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
34
+ - when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
35
+ - if a required user-facing or admin-facing surface is missing, treat that gap as incomplete implementation rather than a reason to bypass the surface with direct API calls or test-only shortcuts
36
+ - do not leave computed-but-unrendered or partially surfaced requirement behavior in place
37
+ - do not treat a module as complete when a meaningful user-facing, release-facing, production-path, or build verification is known to be failing unless the owner explicitly scopes that check out
38
+ - use the `frontend-design` skill for frontend component or page work
39
+ - use the `frontend-design` skill during frontend/UI verification when reviewing Playwright screenshots and tightening the interface
40
+ - do not hardcode secrets or persist local sensitive values in the repo while implementing
41
+ - explain behavior changes clearly enough that the documentation discipline can be satisfied accurately
42
+ - verify the module against its planned behavior before trying to move on
43
+ - do not move on while the module is still obviously weak or half-finished
44
+
45
+ ## Verification model
46
+
47
+ - use targeted local verification by default
48
+ - use local Playwright on affected flows when UI is material
49
+ - avoid broad Docker/full-suite commands during ordinary slice work
50
+ - prefer fast local language-native or framework-native test commands for the changed area during normal iteration
51
+ - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
52
+ - if the local toolchain is missing, try to install or enable it before falling back to the broad gate path
53
+ - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
54
+
55
+ ## Quality rules
56
+
57
+ - do not bypass required UI surfaces with API shortcuts and call that done
58
+ - do not leave placeholder, demo, setup, or debug UI in product-facing screens
59
+ - do not report completion while known release-facing failures still exist
60
+ - product UI should serve the real workflow only
@@ -0,0 +1,38 @@
1
+ ---
2
+ name: evaluation-triage-v2
3
+ description: Owner-side evaluation report triage rules for slopmachine-v2.
4
+ ---
5
+
6
+ # Evaluation Triage v2
7
+
8
+ Use this skill during `P7 Evaluation and Triage` after evaluation reports exist.
9
+
10
+ ## Rules
11
+
12
+ - evaluation findings are advisory inputs, not automatic orders
13
+ - accept or reject findings explicitly
14
+ - keep accepted findings concrete and bounded
15
+ - do not enter remediation just because a report found something; enter it only when the accepted findings justify it
16
+ - if no remediation is needed, move directly to the final human decision
17
+
18
+ ## Triage rules
19
+
20
+ - read both reports and merge the findings into one explicit triage set before deciding what happens next
21
+ - use the evaluator priority ordering directly when triaging findings unless stronger direct evidence says otherwise
22
+ - any finding marked `Blocker` or `High` should normally be returned for remediation
23
+ - findings marked `Medium` may be passed in limited cases, but should usually be fixed when they materially improve confidence, correctness, or acceptance readiness
24
+ - findings marked `Low` may be passed without remediation
25
+ - do not treat complaints about test coverage depth, unverifiable tests, or evaluator inability to confirm a test path as automatic blockers by themselves
26
+ - if your own direct evidence shows the tests run and the coverage is acceptable for qualification, defend the project and pass those findings instead of automatically remediating
27
+ - if a report says it could not verify some behavior because of environment limits or avoidable verification setup issues, first decide whether you can remove that constraint and rerun the evaluation in a cleaner state
28
+ - if the evaluator could not verify something but your own verified evidence already shows the behavior is acceptable, do not treat that as an automatic remediation trigger
29
+ - challenge weak, random, or overreaching findings using your stronger project context and direct codebase knowledge
30
+ - never edit or rewrite the evaluation report itself
31
+ - if you need to add context, disagreement, or justification, append it only as a clearly labeled `User comment/message` section at the bottom of the report
32
+ - do not loop forever chasing every newly surfaced medium or low issue once the project is otherwise qualified
33
+
34
+ ## Output standard
35
+
36
+ - keep a clear accepted-finding set
37
+ - keep a clear rejected or passed set when disagreement matters
38
+ - keep the remediation brief focused on accepted issues only
@@ -12,17 +12,18 @@ Use this skill only after integrated verification and hardening are complete eno
12
12
  - Load this skill only during final evaluation and evaluation-driven remediation decisions.
13
13
  - Treat it as internal orchestration guidance.
14
14
  - Do not let evaluator findings automatically override your triage judgment.
15
+ - this phase is the only allowed later human-stop point after development has started; do not create any other approval pauses outside the final evaluation decision itself
15
16
 
16
17
  ## Prompt sources
17
18
 
18
- - `~/slopmachine/backend-evaluation-prompt.md`
19
- - `~/slopmachine/frontend-evaluation-prompt.md`
19
+ - `~/backend-evaluation-prompt.md`
20
+ - `~/frontend-evaluation-prompt.md`
20
21
 
21
22
  ## Evaluation execution rules
22
23
 
23
24
  - when the project reaches final-evaluation readiness, run two separate evaluations:
24
- - backend/non-frontend evaluation using `~/slopmachine/backend-evaluation-prompt.md`
25
- - frontend evaluation using `~/slopmachine/frontend-evaluation-prompt.md`
25
+ - backend/non-frontend evaluation using `~/backend-evaluation-prompt.md`
26
+ - frontend evaluation using `~/frontend-evaluation-prompt.md`
26
27
  - read the full original project prompt from parent-root `../metadata.json` field `prompt`
27
28
  - read the respective evaluation prompt file contents yourself before launching evaluation
28
29
  - compose each evaluation request yourself as one large final prompt block
@@ -40,10 +41,6 @@ Use this skill only after integrated verification and hardening are complete eno
40
41
  - require each evaluation session to produce its own detailed evaluation report artifact
41
42
  - always compare both evaluations against the original prompt for alignment, not just the delivered implementation
42
43
 
43
- On the first evaluation round, preserve and use this instruction exactly inside both evaluation prompts/sessions:
44
-
45
- "Please confirm whether the current project tests are genuine and effective rather than superficial or fake tests, whether the API tests actually invoke real HTTP endpoints, and whether they cover more than 90% of the overall API surface."
46
-
47
44
  ## Triage rules
48
45
 
49
46
  - read both reports and merge the findings into one explicit triage set before deciding what happens next
@@ -66,10 +63,11 @@ On the first evaluation round, preserve and use this instruction exactly inside
66
63
  ## Remediation loop
67
64
 
68
65
  - route accepted blocking issues back into remediation in the same long-lived `Developer` session
69
- - after remediation, rerun full verification before any re-evaluation:
70
- - `docker compose up --build`
71
- - `run_tests.sh`
66
+ - after remediation, rerun strong local verification before any re-evaluation:
67
+ - relevant local test commands
68
+ - local runtime checks when affected behavior needs runtime proof
72
69
  - Playwright where applicable, with fresh screenshots
70
+ - if remediation materially reopens an owner-run milestone boundary, route the project back to that milestone gate before any re-evaluation instead of treating every remediation pass as an automatic Docker and `run_tests.sh` moment
73
71
  - rerun only the evaluation tracks that have not already passed, each in a brand new fresh `General` session and still sequentially
74
72
  - keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
75
73
  - remember the external process allows a maximum of 3 repair rounds
@@ -0,0 +1,57 @@
1
+ ---
2
+ name: final-evaluation-orchestration-v2
3
+ description: Evaluation execution rules for slopmachine-v2.
4
+ ---
5
+
6
+ # Final Evaluation Orchestration v2
7
+
8
+ Use this skill only during `P7 Evaluation and Triage`.
9
+
10
+ ## Usage rules
11
+
12
+ - load this skill only during final evaluation and evaluation-driven remediation decisions
13
+ - treat it as internal orchestration guidance
14
+ - do not let evaluator findings automatically override your triage judgment
15
+
16
+ ## Prompt sources
17
+
18
+ - `~/backend-evaluation-prompt.md`
19
+ - `~/frontend-evaluation-prompt.md`
20
+
21
+ ## Evaluation execution rules
22
+
23
+ - run backend and frontend evaluation in separate fresh `General` sessions
24
+ - compose the evaluation prompts yourself; do not tell the evaluator to read prompt files on its own
25
+ - use the original project prompt from metadata
26
+ - read the respective evaluation prompt file contents yourself before launching evaluation
27
+ - compose each evaluation request yourself as one large final prompt block
28
+ - prefix each evaluation request with a clear instruction that the reviewer must work in the current project directory and evaluate that delivered project
29
+ - inject the full original project prompt into the `{prompt}` placeholder for the chosen evaluation prompt content
30
+ - send that fully composed text block directly to the fresh `General` evaluator session
31
+ - never tell the evaluator to go read prompt files, metadata files, or evaluation template paths on its own
32
+ - never send only a path, filename, or shorthand reference and expect the evaluator to assemble the prompt itself
33
+ - never reuse, resume, or continue a prior evaluation session
34
+ - run the two evaluations sequentially, not in parallel, so shared runtime state, ports, databases, and artifacts do not conflict
35
+ - track backend and frontend evaluation status separately
36
+ - once backend evaluation passes, do not run backend evaluation again in later remediation rounds
37
+ - once frontend evaluation passes, do not run frontend evaluation again in later remediation rounds
38
+ - require each evaluation session to produce its own detailed evaluation report artifact
39
+ - always compare both evaluations against the original prompt for alignment, not just the delivered implementation
40
+ - keep reports file-backed and bring only short summaries into chat
41
+ - rerun only the evaluation track that still needs re-evaluation after remediation
42
+
43
+ ## Remediation loop
44
+
45
+ - route accepted blocking issues back into the active remediation developer-session slot rather than inventing an untracked side path
46
+ - after remediation, rerun strong local verification before any re-evaluation:
47
+ - relevant local test commands
48
+ - local runtime checks when affected behavior needs runtime proof
49
+ - Playwright where applicable, with fresh screenshots
50
+ - if remediation materially reopens an owner-run broad milestone boundary, route the project back to that boundary before re-evaluation instead of treating every remediation pass as an automatic broad rerun moment
51
+ - keep the remediation loop bounded and explicit so you never lose track of the active evaluation round or the accepted issue set
52
+ - remember the external process allows a maximum of 3 repair rounds
53
+
54
+ ## Boundaries
55
+
56
+ - this phase is owner-side analysis, not the final human decision gate
57
+ - do not create extra human pauses while evaluation and triage are still active
@@ -15,6 +15,22 @@ Use this skill when you need the detailed developer overlay guidance for one of
15
15
  - Pass only the relevant guidance for the current engineering step, not the whole section verbatim unless the moment truly needs it.
16
16
  - Prefer short, natural teammate-style prompts.
17
17
 
18
+ ## Cross-Cutting Documentation Discipline
19
+
20
+ - the owner maintains external docs under parent-root `../docs/`
21
+ - the developer should keep `README.md` and any codebase-local docs accurate for repo-local use
22
+ - planning and implementation guidance should stay explicit enough that owner-maintained external docs can be updated accurately
23
+ - verification and hardening should check both `README.md` and the owner-maintained external docs against implementation reality
24
+ - `README.md` must stay codebase-specific and must not become an index or explanation of the external docs set
25
+
26
+ ## Cross-Cutting Env-File Discipline
27
+
28
+ - never create or keep `.env` files anywhere in the repo tree
29
+ - do not allow committed `.env` files even as placeholders or examples
30
+ - keep real secrets out of the repository and rely on Docker-provided runtime variables for sensitive values
31
+ - if the stack requires env-file format at runtime, generate it ephemerally from Docker-provided runtime variables rather than storing it in the repo or package
32
+ - verify the delivered project can start from scratch without any preexisting `.env` file in the repo or package
33
+
18
34
  ## Phase mapping
19
35
 
20
36
  - `P2 Development Bootstrap and Planning` -> `Planning And Design`
@@ -31,20 +47,44 @@ Use this skill when you need the detailed developer overlay guidance for one of
31
47
  - start from the actual project prompt and build the plan from there
32
48
  - carry the settled project requirements forward consistently as you plan
33
49
  - identify the hard non-negotiable requirements early and do not quietly trade them away for implementation convenience
50
+ - when planning technical items that depend on a library, framework, API, or tool, check Context7 documentation first for authoritative usage details
51
+ - when planning needs targeted outside research beyond direct documentation, use Exa web search next
52
+ - use technical research to strengthen concrete planning decisions, interfaces, constraints, and verification strategy rather than leaving them vague
34
53
  - break the problem into explicit requirements, constraints, flows, boundaries, and edge cases
35
54
  - map each meaningful requirement to its owning module, visible UI/API surface, failure behavior, test target, and final acceptance check
36
- - create or update working design notes and API/spec notes when relevant
55
+ - make the planning explicit enough that the owner can maintain external design notes and API/spec docs accurately when relevant
37
56
  - keep the spec focused on required behavior rather than turning it into a progress or completion narrative
38
57
  - define major modules as meaningful delivery units, not arbitrary folders
39
58
  - for fullstack work, map frontend surfaces, routes, components, and state boundaries to the backend modules and contracts that support them
59
+ - for fullstack work, make the frontend-to-backend crosswalk explicit enough that each major route, page, component group, or state boundary has a defined supporting backend module, endpoint, and data shape
60
+ - when the prompt says behavior is configurable, plan the real configuration surface, data model, permissions, and operator flow rather than treating configurability as an implementation detail to invent later
61
+ - when a feature must be admin-manageable or operator-manageable, plan the real usable UI surface for that management flow, not just the backing API or data model
40
62
  - define failure paths, permissions, validation, logging, runtime assumptions, and test strategy before coding
63
+ - for complex security, offline, sync, authorization, or data-governance features, define what `done` means across all prompt-promised dimensions rather than stopping at a partial foundation or hook layer
64
+ - define shared lifecycle and state models when the product has meaningful workflow state, and keep those models aligned across design notes and API/spec notes
65
+ - require cross-document consistency so design, API/spec, and test-planning artifacts do not drift on lifecycle/state models, flow coverage, permissions, or operational behavior
41
66
  - define logging and observability expectations for both frontend and backend
67
+ - define operator visibility and operator workflow expectations when the prompt implies admin, operational, audit, backup, or support responsibilities
68
+ - when the system has meaningful cross-cutting behavior, define shared implementation contracts early rather than leaving each module to invent its own pattern
69
+ - define error-handling contracts when relevant, including normalization patterns for user-visible errors and backend error-shape expectations
70
+ - define audit contracts when relevant, including centralized helper or service expectations and redaction rules
71
+ - define permission contracts when relevant so navigation visibility, route guards, and API enforcement stay aligned
72
+ - define state-lifecycle contracts when relevant, including context-switch or tenant-switch cleanup expectations
73
+ - define auth edge-case expectations when relevant, such as token refresh, session expiry, or clock-skew tolerance
42
74
  - call out operational obligations early when they are prompt-critical, such as scheduling, retention, backups, workers, auditability, or offline behavior
75
+ - define infrastructure requirements early when they are material to correctness, such as rate limiting, encryption boundaries, production-equivalent test infrastructure, and browser-storage rules for sensitive data
76
+ - define frontend validation and accessibility expectations when the product surface materially depends on them, including keyboard, focus, feedback, and other user-interaction quality requirements where relevant
77
+ - if backup or recovery behavior is prompt-critical, plan the designated media, operator drill flow, visibility, and verification expectations explicitly
78
+ - if the prompt names literal storage, indexing, partitioning, retention, or performance dimensions, represent them literally in the planning artifacts rather than abstracting them away
79
+ - for frontend work, unless the prompt, existing repository, or established stack clearly dictates otherwise, default to Tailwind CSS for styling and `shadcn/ui` for component primitives
80
+ - if the existing project already uses a different UI system, preserve and extend that system instead of forcing Tailwind CSS or `shadcn/ui` into it
43
81
  - define end-to-end coverage for major user flows before coding
44
82
  - for fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable
45
83
  - when UI-bearing flows are material, explicitly plan screenshot review as part of Playwright verification so UI correctness is checked, not just browser success
46
84
  - aim for at least 90 percent meaningful coverage of the relevant behavior surface
47
85
  - define verification strategy, Docker expectations, and documentation implications before coding
86
+ - for each major module, define how it integrates with existing modules and which shared contracts it must follow consistently
87
+ - define verification plans that include cross-module scenarios and seam checks, not just isolated feature checks
48
88
  - make the plan detailed enough to guide real implementation and later verification
49
89
  - review the module map and make sure it is stable before deeper implementation begins
50
90
  - do not move into deeper implementation with vague architecture or unstable module boundaries
@@ -57,13 +97,25 @@ Use this skill when you need the detailed developer overlay guidance for one of
57
97
  - create required testing directories and baseline docs structure
58
98
  - put baseline config and logging structure in place
59
99
  - put migrations, worker/job foundation, and real runtime health surfaces in place when the project needs them
60
- - keep real secrets out of the repository and rely on Docker-managed runtime injection for any sensitive values
61
- - keep committed env files to placeholders or clearly non-production defaults only
100
+ - never create or keep `.env` files anywhere in the repo tree
101
+ - treat prompt-critical security controls as real baseline runtime behavior, not placeholder checks or visual wiring
102
+ - if a requirement implies enforcement, persistence, statefulness, or rejection behavior, make that behavior real in the scaffold unless the prompt clearly scopes it down
103
+ - do not accept shape-only security implementations such as header presence checks, passive constants, or partially wired middleware when the requirement implies real protection
104
+ - when applicable at scaffold time, require real security baselines such as nonce reuse rejection rather than nonce-header presence, real lockout behavior rather than config-only lockout values, CSRF rejection on protected mutations, and meaningful server-side state when the protection model depends on it
105
+ - keep real secrets out of the repository and rely on Docker-provided runtime variables for sensitive values
106
+ - do not allow committed `.env` files even as placeholders or examples
107
+ - if the stack requires env-file format at runtime, generate it ephemerally from Docker-provided runtime variables rather than storing it in the repo or package
62
108
  - remove prototype residue from runtime foundations: no placeholder titles, hidden setup, fake defaults, or seeded live-path assumptions
63
109
  - make prompt-critical runtime behavior visible in the scaffold instead of hand-waving it for later, especially offline, worker, backup, or HTTPS requirements
110
+ - keep Docker runtime isolation clean in shared environments: use self-contained Compose namespacing, avoid fragile generic project names, and prefer Compose-managed service naming over unnecessary hardcoded `container_name` values
111
+ - require reproducible build and tooling foundations: prefer lockfile-driven installs where the stack supports them, keep source and build outputs clearly separated, and do not allow generated runtime artifacts to drift back into source directories
112
+ - for typed build pipelines, keep source-of-truth boundaries clean so compiled output does not create TS/JS or similar dual-source drift in the working tree
64
113
  - establish README structure early instead of leaving it until the end
65
114
  - prove the scaffold in a clean state before deeper feature work
66
115
  - verify `docker compose up` and `run_tests.sh` in the clean scaffold state
116
+ - verify clean `docker compose up --build -d` and `docker compose down` behavior under the chosen project namespace when Dockerized execution is in scope
117
+ - when the architecture materially depends on infrastructure capabilities such as rate limiting, encryption, offline support, or browser-storage policy, put the baseline framework and policy in place during scaffold rather than deferring it to late implementation
118
+ - for backend integration paths, prefer production-equivalent test infrastructure when practical rather than silently substituting a weaker database or runtime model that can hide real defects
67
119
  - do not treat scaffold as placeholder boilerplate or rely on hidden setup
68
120
 
69
121
  ## Module Implementation
@@ -79,17 +131,26 @@ Use this skill when you need the detailed developer overlay guidance for one of
79
131
  - set up and use the local test environment inside the current working directory so normal verification does not depend on hidden global tooling assumptions
80
132
  - if the local toolchain is missing, try to install or enable it before falling back to `run_tests.sh`
81
133
  - for applicable fullstack or UI-bearing work, run local Playwright on the affected flows during implementation and inspect screenshots to confirm the UI actually matches
134
+ - when the slice materially changes frontend code, frontend tooling, or release-facing build behavior, include production build health in meaningful local verification when practical
82
135
  - make sure the module is moving toward full definition-of-done completion, not just happy-path completion
83
136
  - keep auth, authorization, ownership, validation, and logging concerns in view when relevant
84
137
  - keep frontend and backend contracts synchronized when the module spans both sides
138
+ - verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
139
+ - check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
140
+ - verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
141
+ - verify file and export paths are validated and confined to allowed roots when the module reads, writes, imports, or exports files
142
+ - verify error and auth responses are user-safe and do not leak internal reasons, paths, stack details, or sensitive state
143
+ - perform a clean-slate sweep before reporting module completion: remove seeded credentials, weak demo defaults, test-account hints, prototype residue, and other production-inappropriate artifacts
85
144
  - do not treat backend existence, composable existence, or partial wiring as completion if the user-visible flow is still incomplete
86
145
  - when the prompt says users can manage or configure something, implement full management behavior rather than create-only controls where appropriate
146
+ - if a required user-facing or admin-facing surface is missing, treat that gap as incomplete implementation rather than a reason to bypass the surface with direct API calls or test-only shortcuts
87
147
  - do not leave computed-but-unrendered or partially surfaced requirement behavior in place
148
+ - do not treat a module as complete when a meaningful user-facing, release-facing, production-path, or build verification is known to be failing unless the owner explicitly scopes that check out
88
149
  - do not ship frontend screens with demo/debug/setup messaging or development-only status text; product UI should serve the real workflow only
89
150
  - use the `frontend-design` skill for frontend component or page work
90
151
  - use the `frontend-design` skill during frontend/UI verification when reviewing Playwright screenshots and tightening the interface
91
152
  - do not hardcode secrets or persist local sensitive values in the repo while implementing
92
- - update relevant docs when behavior changes
153
+ - explain behavior changes clearly enough that the documentation discipline can be satisfied accurately
93
154
  - verify the module against its planned behavior before trying to move on
94
155
  - do not move on while the module is still obviously weak or half-finished
95
156
 
@@ -106,16 +167,23 @@ Use this skill when you need the detailed developer overlay guidance for one of
106
167
  - verify requirement closure, not just feature existence
107
168
  - verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
108
169
  - verify end-to-end flow behavior where the change affects real workflows
170
+ - do not use mocked APIs as integration evidence; integration verification must use real HTTP requests against the actual running service surface
109
171
  - for fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
172
+ - end-to-end coverage must use the real intended user-facing or admin-facing surfaces for the flow; if the flow cannot be exercised that way, treat the missing surface as incomplete work
110
173
  - use `frontend-design` while reviewing frontend screenshots so UI issues are challenged, not just functionally observed
111
174
  - verify screenshots do not contain demo placeholders, scaffold instructions, debug notices, or other development-only UI leakage
112
175
  - verify important failure, conflict, stale-state, negative-auth, and cross-user-isolation paths where relevant
113
176
  - verify required remediation guidance is actually visible to the user, not just computed internally
114
177
  - verify security-sensitive behavior where applicable
178
+ - verify multi-tenant and cross-user isolation where applicable, including negative checks rather than single-actor happy paths only
179
+ - verify file/path safety for file-bearing flows where applicable, including traversal-style negative cases
115
180
  - verify secrets are not committed, hardcoded, or leaking through logs/config/docs
116
- - verify docs do not overstate implementation completeness or claim behavior that is only partial
117
- - verify docs still match implementation reality
181
+ - apply the cross-cutting env-file discipline during verification
182
+ - verify error surfaces and auth-related failures are sanitized for users and operators appropriately
183
+ - apply the cross-cutting documentation discipline during verification
118
184
  - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
185
+ - challenge integration seams and adjacent-module behavior, not just the changed module's local path
186
+ - when frontend behavior or tooling changed materially, treat known production build breakage as blocking evidence rather than optional cleanup
119
187
  - do not treat a module as done until functional behavior, failure behavior, tests, docs, security considerations, and required runtime verification are all in place
120
188
  - call out weak evidence, missing coverage, or unresolved issues plainly
121
189
  - do not treat developer claims as enough without real verification
@@ -131,6 +199,9 @@ Use this skill when you need the detailed developer overlay guidance for one of
131
199
  - re-check frontend and backend observability, redaction, and operator visibility paths
132
200
  - run a prompt-fidelity sweep for silent requirement substitution, partially delivered hard requirements, and frontend/backend mismatch
133
201
  - run a prototype-residue sweep for hardcoded preview values, placeholder text, seeded defaults, hidden fallbacks, and computed-but-unrendered behavior
202
+ - enforce the cross-cutting env-file discipline during hardening
203
+ - run documentation verification against the real codebase and runtime behavior, not just document existence
204
+ - enforce the cross-cutting documentation discipline during hardening
134
205
  - re-check prompt-critical operational obligations such as scheduled jobs, retention, backups, worker behavior, privacy/accountability logging, and admin controls
135
206
  - enter release-candidate mode: stop feature work and focus only on fixes, verification, docs, and packaging preparation
136
207
  - make sure the system is genuinely reviewable and reproducible