theslopmachine 0.9.14 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/MANUAL.md +1 -0
- package/README.md +2 -2
- package/assets/agents/developer.md +4 -3
- package/assets/agents/slopmachine-claude.md +14 -12
- package/assets/agents/slopmachine.md +14 -12
- package/assets/claude/agents/developer.md +3 -2
- package/assets/skills/beads-operations/SKILL.md +9 -0
- package/assets/skills/clarification-gate/SKILL.md +3 -0
- package/assets/skills/developer-session-lifecycle/SKILL.md +4 -4
- package/assets/skills/development-guidance/SKILL.md +6 -0
- package/assets/skills/evaluation-triage/SKILL.md +11 -5
- package/assets/skills/final-evaluation-orchestration/SKILL.md +36 -9
- package/assets/skills/integrated-verification/SKILL.md +25 -1
- package/assets/skills/planning-gate/SKILL.md +24 -0
- package/assets/skills/planning-guidance/SKILL.md +6 -0
- package/assets/skills/report-output-discipline/SKILL.md +4 -0
- package/assets/skills/retrospective-analysis/SKILL.md +28 -1
- package/assets/skills/scaffold-guidance/SKILL.md +15 -7
- package/assets/skills/submission-packaging/SKILL.md +44 -23
- package/assets/skills/verification-gates/SKILL.md +23 -5
- package/assets/slopmachine/clarifier-agent-prompt.md +2 -0
- package/assets/slopmachine/owner-verification-checklist.md +19 -0
- package/assets/slopmachine/phase-1-design-template.md +16 -0
- package/assets/slopmachine/phase-2-execution-planning-prompt.md +30 -0
- package/assets/slopmachine/phase-2-plan-template.md +85 -0
- package/assets/slopmachine/scaffold-playbooks/selection-matrix.md +106 -67
- package/assets/slopmachine/scaffold-playbooks/shared-contract.md +207 -0
- package/assets/slopmachine/scaffold-playbooks/stack-android-room-offline.md +30 -0
- package/assets/slopmachine/scaffold-playbooks/stack-browser-only-offline-spa.md +29 -0
- package/assets/slopmachine/scaffold-playbooks/stack-generic.md +25 -0
- package/assets/slopmachine/scaffold-playbooks/stack-go-gin-templ-postgres.md +70 -0
- package/assets/slopmachine/scaffold-playbooks/stack-react-go-postgres.md +33 -0
- package/assets/slopmachine/scaffold-playbooks/stack-rust-fullstack-workspace.md +31 -0
- package/assets/slopmachine/scaffold-playbooks/stack-vue-koa-mysql.md +77 -0
- package/assets/slopmachine/scaffold-playbooks/stack-vue-laravel-mysql.md +32 -0
- package/assets/slopmachine/scaffold-playbooks/stack-winforms-localdb.md +30 -0
- package/assets/slopmachine/scaffold-playbooks/tech-backend-gin-templ.md +22 -0
- package/assets/slopmachine/scaffold-playbooks/tech-backend-go.md +23 -0
- package/assets/slopmachine/scaffold-playbooks/tech-backend-koa.md +22 -0
- package/assets/slopmachine/scaffold-playbooks/tech-backend-laravel.md +33 -0
- package/assets/slopmachine/scaffold-playbooks/tech-db-localdb.md +20 -0
- package/assets/slopmachine/scaffold-playbooks/tech-db-mysql.md +22 -0
- package/assets/slopmachine/scaffold-playbooks/tech-db-postgres.md +22 -0
- package/assets/slopmachine/scaffold-playbooks/tech-db-room.md +20 -0
- package/assets/slopmachine/scaffold-playbooks/tech-frontend-react.md +22 -0
- package/assets/slopmachine/scaffold-playbooks/tech-frontend-vue.md +23 -0
- package/assets/slopmachine/scaffold-playbooks/tech-rust-workspace.md +21 -0
- package/assets/slopmachine/scaffold-playbooks/type-api-service.md +52 -0
- package/assets/slopmachine/scaffold-playbooks/type-background-jobs.md +45 -0
- package/assets/slopmachine/scaffold-playbooks/type-database.md +81 -0
- package/assets/slopmachine/scaffold-playbooks/type-desktop.md +30 -0
- package/assets/slopmachine/scaffold-playbooks/type-mobile-android.md +32 -0
- package/assets/slopmachine/scaffold-playbooks/type-offline-local-first.md +31 -0
- package/assets/slopmachine/scaffold-playbooks/type-web-spa.md +63 -0
- package/assets/slopmachine/templates/AGENTS.md +9 -7
- package/assets/slopmachine/templates/CLAUDE.md +9 -7
- package/assets/slopmachine/templates/plan.md +85 -0
- package/assets/slopmachine/utils/analyze_claude_project_dir.mjs +197 -0
- package/assets/slopmachine/utils/claude_live_common.mjs +7 -0
- package/assets/slopmachine/utils/claude_live_launch.mjs +11 -0
- package/assets/slopmachine/utils/package_claude_session.mjs +28 -101
- package/package.json +1 -1
- package/src/constants.js +31 -28
- package/src/init.js +3 -2
- package/src/install.js +28 -0
- package/assets/slopmachine/scaffold-playbooks/android-kotlin-compose.md +0 -73
- package/assets/slopmachine/scaffold-playbooks/android-kotlin-views.md +0 -138
- package/assets/slopmachine/scaffold-playbooks/android-native-java.md +0 -203
- package/assets/slopmachine/scaffold-playbooks/angular-default.md +0 -129
- package/assets/slopmachine/scaffold-playbooks/backend-baseline.md +0 -126
- package/assets/slopmachine/scaffold-playbooks/backend-family-matrix.md +0 -80
- package/assets/slopmachine/scaffold-playbooks/database-module-matrix.md +0 -80
- package/assets/slopmachine/scaffold-playbooks/django-default.md +0 -109
- package/assets/slopmachine/scaffold-playbooks/docker-baseline.md +0 -146
- package/assets/slopmachine/scaffold-playbooks/docker-shared-contract.md +0 -338
- package/assets/slopmachine/scaffold-playbooks/electron-vite-default.md +0 -124
- package/assets/slopmachine/scaffold-playbooks/expo-react-native-default.md +0 -73
- package/assets/slopmachine/scaffold-playbooks/fastapi-default.md +0 -97
- package/assets/slopmachine/scaffold-playbooks/frontend-baseline.md +0 -138
- package/assets/slopmachine/scaffold-playbooks/frontend-family-matrix.md +0 -134
- package/assets/slopmachine/scaffold-playbooks/generic-unknown-tech-guide.md +0 -136
- package/assets/slopmachine/scaffold-playbooks/go-chi-default.md +0 -103
- package/assets/slopmachine/scaffold-playbooks/ios-linux-portable.md +0 -93
- package/assets/slopmachine/scaffold-playbooks/ios-native-objective-c.md +0 -151
- package/assets/slopmachine/scaffold-playbooks/ios-native-swift.md +0 -188
- package/assets/slopmachine/scaffold-playbooks/laravel-default.md +0 -143
- package/assets/slopmachine/scaffold-playbooks/livewire-default.md +0 -172
- package/assets/slopmachine/scaffold-playbooks/overlay-module-matrix.md +0 -130
- package/assets/slopmachine/scaffold-playbooks/platform-family-matrix.md +0 -79
- package/assets/slopmachine/scaffold-playbooks/spring-boot-default.md +0 -100
- package/assets/slopmachine/scaffold-playbooks/tauri-default.md +0 -68
- package/assets/slopmachine/scaffold-playbooks/vue-vite-default.md +0 -140
- package/assets/slopmachine/scaffold-playbooks/web-default.md +0 -96
package/MANUAL.md
CHANGED
|
@@ -58,6 +58,7 @@ slopmachine init -o
|
|
|
58
58
|
- copies the packaged Claude repo rulebook into `repo/CLAUDE.md`
|
|
59
59
|
- seeds `repo/README.md`, `repo/plan.md`, and `repo/.claude/settings.json`
|
|
60
60
|
- seeds `.ai/startup-context.md` plus the parent-root planning docs under `docs/`
|
|
61
|
+
- later, when `P5` closes, the workflow preserves the final truthful execution record in `docs/plan.md` and removes `repo/plan.md` before evaluation begins
|
|
61
62
|
- creates the initial git commit so the workspace starts with a clean tree
|
|
62
63
|
- optionally opens `opencode` in `repo/`
|
|
63
64
|
- parallel worktrees should stay under hidden parent-root `.ai/worktrees/` so the visible workspace root stays clean
|
package/README.md
CHANGED
|
@@ -169,7 +169,7 @@ Important details:
|
|
|
169
169
|
- Beads lives in the workspace root, not inside `repo/`
|
|
170
170
|
- `repo/.claude/settings.json` seeds Claude Code to use the custom `developer` agent by default for that repo
|
|
171
171
|
- planned parallel git worktrees should live under hidden parent-root `.ai/worktrees/` by default so root-level `repo-lane-*` folders do not clutter the workspace
|
|
172
|
-
-
|
|
172
|
+
- when `P5` completes, the workflow moves `repo/plan.md` to parent-root `docs/plan.md`; packaging later validates that `repo/plan.md`, `repo/AGENTS.md`, and `repo/CLAUDE.md` are absent from the delivered `repo/`
|
|
173
173
|
- after non-`-o` bootstrap, the command prints the exact `cd repo` next step so you can continue immediately
|
|
174
174
|
- `--adopt` moves the current project files into `repo/`, preserves root workflow state in the parent workspace, and skips the automatic bootstrap commit
|
|
175
175
|
- `--continue-from <PX>` is a smoother alias for existing-project bootstrap; it implies adoption mode and seeds the requested start phase in one step
|
|
@@ -177,7 +177,7 @@ Important details:
|
|
|
177
177
|
- when a later start phase is seeded for adoption or recovery, the Beads workflow phases before that requested phase are created and immediately marked completed so tracker state matches the seeded entry point
|
|
178
178
|
- in the `slopmachine-claude` path, if adopted or resumed later-phase work has no recoverable tracked Claude developer session yet, the owner must launch and orient the needed Claude lane first and only then continue the substantive work in that same session
|
|
179
179
|
- `--phase <PX>` seeds the initial `current_phase` for adoption/recovery bootstrap; the owner should still fall back if the real repo evidence does not support that later phase
|
|
180
|
-
- `repo/plan.md` is seeded at bootstrap and becomes the definitive repo-local execution checklist
|
|
180
|
+
- `repo/plan.md` is seeded at bootstrap and becomes the definitive repo-local execution checklist through planning, development, and `P5`; after `P5`, the preserved reference copy is `docs/plan.md`
|
|
181
181
|
|
|
182
182
|
### `slopmachine set-token`
|
|
183
183
|
|
|
@@ -69,7 +69,7 @@ When accepted planning artifacts already exist, treat them as the primary execut
|
|
|
69
69
|
- treat follow-up prompts mainly as narrow deltas, guardrails, or correction signals
|
|
70
70
|
- if the current work is the scaffold step at the start of development, treat section 3 of `plan.md` as binding; do not re-choose the playbook, starter, or bootstrap path unless planning is explicitly reopened
|
|
71
71
|
- if the scaffold-step instructions are still vague about the playbook or bootstrap command, raise that as a planning gap instead of improvising a new baseline contract
|
|
72
|
-
- if `plan.md` includes a security execution contract, `Delivery Review Requirements`, `README Contract`, or test coverage execution contract, treat them as binding parts of the current workstream rather than optional follow-up polish
|
|
72
|
+
- if `plan.md` includes a security execution contract, `Core Semantic Path Proof`, `Prompt-Critical Rule Matrix`, `Role Surface Matrix`, `Runtime Lifecycle Checklist`, `Delivery Review Requirements`, `README Contract`, or test coverage execution contract, treat them as binding parts of the current workstream rather than optional follow-up polish
|
|
73
73
|
- treat the execution file tree and owned-file map in `plan.md` as real execution boundaries, not decorative planning notes
|
|
74
74
|
- for adopted projects, inspect the current repo tree first and use the accepted `plan.md` delta tree rather than assuming a greenfield layout
|
|
75
75
|
- keep `plan.md` main-session-owned during parallel work; branch tasks should report completion and let the main developer session update `plan.md` after merge
|
|
@@ -100,9 +100,9 @@ When instructed to plan without coding yet:
|
|
|
100
100
|
- when backend or fullstack API endpoints are added or changed, prefer real HTTP tests for the exact `METHOD + PATH` over controller or service bypasses when practical
|
|
101
101
|
- if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
|
|
102
102
|
- when closing a `plan.md` workstream or bounded follow-up, think briefly about what adjacent flows, runtime paths, or doc/spec claims it could have affected before claiming readiness
|
|
103
|
-
- keep `README.md` as the primary documentation file inside the repo; `plan.md` is the explicit execution-plan exception
|
|
103
|
+
- keep `README.md` as the primary documentation file inside the repo; repo-local `plan.md` is the explicit execution-plan exception only during active implementation through `P5`
|
|
104
104
|
- treat `README.md` and other shared integration-heavy files as main-session-owned by default during parallel work unless the accepted plan explicitly delegates them
|
|
105
|
-
- keep the repo self-sufficient and statically reviewable through code plus `README.md`, with `plan.md` as the deliberate execution-plan exception
|
|
105
|
+
- keep the repo self-sufficient and statically reviewable through code plus `README.md`, with repo-local `plan.md` as the deliberate execution-plan exception only during active implementation through `P5`; do not rely on runtime success alone to make the project understandable
|
|
106
106
|
- keep the repo self-sufficient; do not make it depend on parent-directory docs or sibling artifacts for startup, build/preview, configuration, verification, or basic understanding
|
|
107
107
|
- do not touch workflow or rulebook files such as `AGENTS.md` unless explicitly asked
|
|
108
108
|
- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming someone else will catch inconsistencies later
|
|
@@ -113,6 +113,7 @@ When instructed to plan without coding yet:
|
|
|
113
113
|
- before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
|
|
114
114
|
- before reporting development complete, run one deliberate main-session reread against the accepted `plan.md`, `../docs/design.md`, accepted `../docs/api-spec.md` when applicable, `README.md`, and the integrated repo so the owner is not first discovering obvious drift in `P5`
|
|
115
115
|
- before reporting development complete, close the common late-failure classes inside development: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup/test wrapper dishonesty, and partial user-facing or admin-facing flow closure
|
|
116
|
+
- before reporting development complete, explicitly report proof status for the core semantic path, prompt-critical rules, role surface matrix if applicable, runtime lifecycle checklist if applicable, and any residual risks instead of relying only on general test success
|
|
116
117
|
|
|
117
118
|
## Parallel Execution Model
|
|
118
119
|
|
|
@@ -122,9 +122,9 @@ Think of the workflow as four instruction planes:
|
|
|
122
122
|
1. owner prompt: lifecycle engine and general discipline
|
|
123
123
|
2. developer prompt: engineering behavior and execution quality
|
|
124
124
|
3. skills: lifecycle-step or activity rules loaded on demand
|
|
125
|
-
4. repo-local rulebooks such as `CLAUDE.md` plus `plan.md`: durable execution guidance the developer should keep seeing in the codebase
|
|
125
|
+
4. repo-local rulebooks such as `CLAUDE.md` plus repo-local `plan.md` during planning, development, and `P5`: durable execution guidance the developer should keep seeing in the codebase
|
|
126
126
|
|
|
127
|
-
When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `CLAUDE.md` plus `plan.md`, not here.
|
|
127
|
+
When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `CLAUDE.md` plus repo-local `plan.md` during planning, development, and `P5`, not here.
|
|
128
128
|
|
|
129
129
|
## Source Of Truth
|
|
130
130
|
|
|
@@ -141,6 +141,7 @@ State split:
|
|
|
141
141
|
- `../metadata.json` stores project facts and exported project metadata
|
|
142
142
|
|
|
143
143
|
Do not create another competing workflow-state system.
|
|
144
|
+
Treat Beads as the primary lifecycle source of truth. Use `../.ai/metadata.json` as an orchestration mirror and repair metadata from Beads when they drift unless evidence proves the Beads state itself needs mutation.
|
|
144
145
|
|
|
145
146
|
## Git Traceability
|
|
146
147
|
|
|
@@ -159,7 +160,7 @@ Use git to preserve meaningful workflow checkpoints.
|
|
|
159
160
|
Operate in this order:
|
|
160
161
|
|
|
161
162
|
1. evaluate the current state critically
|
|
162
|
-
2. identify the active root lifecycle state and its exit evidence
|
|
163
|
+
2. identify the active root lifecycle state from Beads first and verify its exit evidence
|
|
163
164
|
3. load the required skill for that lifecycle state or activity first
|
|
164
165
|
4. compose the developer or owner action for the current step and decide whether the work should stay serial or be fanned out across the planned directory-tree branches or worktrees or Claude helper lanes
|
|
165
166
|
5. verify and review the result
|
|
@@ -197,8 +198,9 @@ Phase rules:
|
|
|
197
198
|
- exactly one root phase should normally be active at a time
|
|
198
199
|
- enter the phase before real work for that phase begins
|
|
199
200
|
- do not close multiple root phases in one transition block
|
|
200
|
-
- `P5 Integrated Verification and Hardening` should normally be one minimal gate that includes the owner-run local test harness check; if that passes and the repo is roughly coherent and broadly correct against `plan.md` plus accepted `../docs/design.md`,
|
|
201
|
-
-
|
|
201
|
+
- `P5 Integrated Verification and Hardening` should normally be one minimal gate that includes the owner-run local test harness check; if that passes and the repo is roughly coherent and broadly correct against repo-local `plan.md` plus accepted `../docs/design.md`, preserve the final truthful plan in parent-root `../docs/plan.md`, remove the repo-local copy, and then stop to ask whether to proceed to evaluation; only narrow owner-fixable local-harness/config/wrapper/README/docs/light-script churn should be fixed there directly, and any real code or actual test-file changes should trigger a bounded Claude developer reroute
|
|
202
|
+
- the explicit post-`P5` pause must be recorded in Beads only after repo-local `plan.md` has been preserved in parent-root `../docs/plan.md` and removed from the repo: add a structured comment showing that `P5` evidence is satisfied and that the workflow is waiting for the proceed-to-evaluation decision; do not silently advance into `P7` before that decision arrives
|
|
203
|
+
- `P8 Final Readiness Decision` should be one fast owner-run reconciliation sweep after `P7`: reread the delivered repo, `README.md`, parent-root `../docs/`, carried `../.tmp/` audit artifacts, and archived stale/fail report lineage together, fix small docs or README or repo-hygiene drift directly, record a readiness reconciliation note, and only reopen evaluation or packaging-adjacent follow-up when a material inconsistency remains
|
|
202
204
|
- `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
203
205
|
|
|
204
206
|
## Developer Session Model
|
|
@@ -221,12 +223,12 @@ Maintain exactly one active developer session at a time.
|
|
|
221
223
|
- each audit result decides the remediation lane:
|
|
222
224
|
- audit session `1` keeps all of its remediation in `bugfix-1`, including fail regenerations and later kept-report fixes
|
|
223
225
|
- audit session `2` keeps all of its remediation in `bugfix-2`, including fail regenerations and later kept-report fixes
|
|
224
|
-
- `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, extract the full issue set from
|
|
225
|
-
- `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane, and treat that kept report
|
|
226
|
-
- `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane for every reported issue and recommendation in that report, and if there are no reported items mark the audit session complete without inventing new issues
|
|
226
|
+
- `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, extract the full issue set from the full failed report file, analyze the exact failing surfaces and what must change to resolve them, send that full owner-analyzed corrective brief to that audit session's exact `bugfix-N` Claude lane, require that whole list to be fixed, and then rerun the full evaluation send packet inside the same evaluator session
|
|
227
|
+
- `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane, and treat the full issue list extracted from that kept report file as the authoritative fix-check scope for the rest of that audit session; send the developer the full owner-analyzed corrective brief for that scope rather than a narrow subset
|
|
228
|
+
- `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` Claude lane for every reported issue and recommendation found in that kept report file, and if there are no reported items mark the audit session complete without inventing new issues
|
|
227
229
|
- `audit_report-<N>-fix_check.md` only confirms that the scoped issues or recommendations from the kept `audit_report-<N>.md` are fixed; if it is not clean, send only the unresolved subset back for remediation, then repeat the same-session fix-check loop against the full kept-report scope, and once that scoped set is confirmed fixed move on to the next audit session or next `P7` subphase
|
|
228
230
|
- require both audit sessions to complete before the final post-audit coverage/README audit can run
|
|
229
|
-
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and on the initial send and every rerun use the full coverage/README evaluation send packet rather than a hand-written prompt; reread each generated report and reject it if the last evaluator send was not the exact full prepared packet, if it contains prior-run wording such as `previously` or `remaining`, or if it collapses into a tiny targeted issue list instead of a full standalone strict audit;
|
|
231
|
+
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and on the initial send and every rerun use the full coverage/README evaluation send packet rather than a hand-written prompt; reread each generated report and reject it if the last evaluator send was not the exact full prepared packet, if it contains prior-run wording such as `previously` or `remaining`, or if it collapses into a tiny targeted issue list instead of a full standalone strict audit; then read the full saved report file itself, extract every reported issue/recommendation from that file, and if any remain, move the displaced report into `../.ai/archive/`, route that full extracted issue set to `bugfix-2`, replace the report, and rerun the full test-coverage prompt again in that same evaluator session until the report is a full standalone pass-level report with no remaining issue/recommendation set to hand back; do not fall back to another developer session for this remediation window
|
|
230
232
|
- track the active evaluator session separately in metadata during `P7`
|
|
231
233
|
- if the active Claude developer session becomes rate-limited, keep that session as the active tracked developer session and auto-wait for reset instead of replacing it with owner implementation
|
|
232
234
|
- after every Claude launch or reply outcome, the owner must immediately do one of three things only: continue the workflow, wait for the same session to recover, or stop and inform the user about a real unrecoverable session problem
|
|
@@ -448,15 +450,15 @@ To the developer, this should feel like a normal engineering conversation with a
|
|
|
448
450
|
- when several issues are found in one review sweep, send them together once as one clear issue list instead of drip-feeding or re-batching them across multiple follow-ups
|
|
449
451
|
- for small non-core fixes such as README cleanup, docs sync, Docker config, wrapper/config glue, light `./run_tests.sh` cleanup, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the Claude developer worker
|
|
450
452
|
- if the fix would require editing actual test files or real product code, do not patch it in the owner session; send it back to the Claude developer worker
|
|
451
|
-
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md
|
|
452
|
-
- during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`,
|
|
453
|
+
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or the accepted plan (`plan.md` before `P5` closes, `../docs/plan.md` afterward), fix them directly in the owner session instead of bouncing them back to the Claude developer worker
|
|
454
|
+
- during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`, carried audit artifacts, archived stale/fail report lineage, report-shape validity, and residual risks before packaging starts; prefer direct owner fixes for small drift instead of turning that sweep into another Claude developer loop
|
|
453
455
|
- keep work moving without low-information continuation chatter
|
|
454
456
|
- read only what is needed to answer the current decision
|
|
455
457
|
- keep routine review inside the main owner session; do not use `Explore` or `General` subagents to verify Claude developer work
|
|
456
458
|
- clarification and evaluation may still use their dedicated subagent flows, but owner verification of Claude developer work stays in the main session
|
|
457
459
|
- at planning, scaffold-step review inside development, the opening full-repo review, any rare major reread, and final evaluation review, demand the exact expected outcomes in itemized form rather than relying on implied standards
|
|
458
460
|
- keep comments and metadata auditable and specific
|
|
459
|
-
- keep external docs owner-maintained
|
|
461
|
+
- keep external docs owner-maintained, keep repo-local README developer-maintained, allow repo-local `plan.md` only through planning, development, and `P5`, and preserve the final plan in parent-root `../docs/plan.md` after `P5`
|
|
460
462
|
|
|
461
463
|
## Backend Integrity
|
|
462
464
|
|
|
@@ -116,9 +116,9 @@ Think of the workflow as four instruction planes:
|
|
|
116
116
|
1. owner prompt: lifecycle engine and general discipline
|
|
117
117
|
2. developer prompt: engineering behavior and execution quality
|
|
118
118
|
3. skills: lifecycle-step or activity rules loaded on demand
|
|
119
|
-
4. repo-local rulebooks such as `AGENTS.md` plus `plan.md`: durable execution guidance the developer should keep seeing in the codebase
|
|
119
|
+
4. repo-local rulebooks such as `AGENTS.md` plus repo-local `plan.md` during planning, development, and `P5`: durable execution guidance the developer should keep seeing in the codebase
|
|
120
120
|
|
|
121
|
-
When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `AGENTS.md` plus `plan.md`, not here.
|
|
121
|
+
When a rule is not always relevant, it should usually live in a skill or in repo-local rulebooks such as `AGENTS.md` plus repo-local `plan.md` during planning, development, and `P5`, not here.
|
|
122
122
|
|
|
123
123
|
## Source Of Truth
|
|
124
124
|
|
|
@@ -135,6 +135,7 @@ State split:
|
|
|
135
135
|
- `../metadata.json` stores project facts and exported project metadata
|
|
136
136
|
|
|
137
137
|
Do not create another competing workflow-state system.
|
|
138
|
+
Treat Beads as the primary lifecycle source of truth. Use `../.ai/metadata.json` as an orchestration mirror and repair metadata from Beads when they drift unless evidence proves the Beads state itself needs mutation.
|
|
138
139
|
|
|
139
140
|
## Git Traceability
|
|
140
141
|
|
|
@@ -153,7 +154,7 @@ Use git to preserve meaningful workflow checkpoints.
|
|
|
153
154
|
Operate in this order:
|
|
154
155
|
|
|
155
156
|
1. evaluate the current state critically
|
|
156
|
-
2. identify the active root lifecycle state and its exit evidence
|
|
157
|
+
2. identify the active root lifecycle state from Beads first and verify its exit evidence
|
|
157
158
|
3. load the required skill for that lifecycle state or activity first
|
|
158
159
|
4. compose the developer or owner action for the current step and decide whether the work should stay serial or be split across the planned directory-tree branches or worktrees
|
|
159
160
|
5. verify and review the result
|
|
@@ -191,8 +192,9 @@ Phase rules:
|
|
|
191
192
|
- exactly one root phase should normally be active at a time
|
|
192
193
|
- enter the phase before real work for that phase begins
|
|
193
194
|
- do not close multiple root phases in one transition block
|
|
194
|
-
- `P5 Integrated Verification and Hardening` should normally be one minimal gate that includes the owner-run local test harness check; if that passes and the repo is roughly coherent and broadly correct against `plan.md` plus accepted `../docs/design.md`,
|
|
195
|
-
-
|
|
195
|
+
- `P5 Integrated Verification and Hardening` should normally be one minimal gate that includes the owner-run local test harness check; if that passes and the repo is roughly coherent and broadly correct against repo-local `plan.md` plus accepted `../docs/design.md`, preserve the final truthful plan in parent-root `../docs/plan.md`, remove the repo-local copy, and then stop to ask whether to proceed to evaluation; only narrow owner-fixable local-harness/config/wrapper/README/docs/light-script churn should be fixed there directly, and any real code or actual test-file changes should trigger a bounded developer reroute
|
|
196
|
+
- the explicit post-`P5` pause must be recorded in Beads only after repo-local `plan.md` has been preserved in parent-root `../docs/plan.md` and removed from the repo: add a structured comment showing that `P5` evidence is satisfied and that the workflow is waiting for the proceed-to-evaluation decision; do not silently advance into `P7` before that decision arrives
|
|
197
|
+
- `P8 Final Readiness Decision` should be one fast owner-run reconciliation sweep after `P7`: reread the delivered repo, `README.md`, parent-root `../docs/`, carried `../.tmp/` audit artifacts, and archived stale/fail report lineage together, fix small docs or README or repo-hygiene drift directly, record a readiness reconciliation note, and only reopen evaluation or packaging-adjacent follow-up when a material inconsistency remains
|
|
196
198
|
- `P10 Retrospective` runs automatically after successful packaging and is non-blocking unless it finds a real delivery defect
|
|
197
199
|
- post-packaging external evaluation feedback may reopen `P7 Evaluation and Fix Verification`, then rerun `P8 Final Readiness Decision`, `P9 Submission Packaging`, and `P10 Retrospective`
|
|
198
200
|
|
|
@@ -211,12 +213,12 @@ Maintain exactly one active developer session at a time.
|
|
|
211
213
|
- each audit result decides the remediation lane:
|
|
212
214
|
- audit session `1` keeps all of its remediation in `bugfix-1`, including fail regenerations and later kept-report fixes
|
|
213
215
|
- audit session `2` keeps all of its remediation in `bugfix-2`, including fail regenerations and later kept-report fixes
|
|
214
|
-
- `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, extract the full issue set from
|
|
215
|
-
- `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` lane, and treat that kept report
|
|
216
|
-
- `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` lane for every reported issue and recommendation in that report, and if there are no reported items mark the audit session complete without inventing new issues
|
|
216
|
+
- `fail` -> move the fail working report out of `../.tmp/` into `../.ai/archive/`, extract the full issue set from the full failed report file, analyze the exact failing surfaces and what must change to resolve them, send that full owner-analyzed corrective brief to that audit session's exact `bugfix-N` lane, require the whole list to be fixed, and then rerun the full evaluation send packet inside the same evaluator session
|
|
217
|
+
- `partial pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` lane, and treat the full issue list extracted from that kept report file as the authoritative fix-check scope for the rest of that audit session; send the developer the full owner-analyzed corrective brief for that scope rather than a narrow subset
|
|
218
|
+
- `pass` -> keep `audit_report-<N>.md`, use that audit session's exact `bugfix-N` lane for every reported issue and recommendation found in that kept report file, and if there are no reported items mark the audit session complete without inventing new issues
|
|
217
219
|
- `audit_report-<N>-fix_check.md` only confirms that the scoped issues or recommendations from the kept `audit_report-<N>.md` are fixed; if it is not clean, send only the unresolved subset back for remediation, then repeat the same-session fix-check loop against the full kept-report scope, and once that scoped set is confirmed fixed move on to the next audit session or next `P7` subphase
|
|
218
220
|
- require both audit sessions to complete before the final post-audit coverage/README audit can run
|
|
219
|
-
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and on the initial send and every rerun use the full coverage/README evaluation send packet rather than a hand-written prompt; reread each generated report and reject it if the last evaluator send was not the exact full prepared packet, if it contains prior-run wording such as `previously` or `remaining`, or if it collapses into a tiny targeted issue list instead of a full standalone strict audit;
|
|
221
|
+
- after the second audit session completes, run the installed `~/slopmachine/test-coverage-prompt.md` as the last subphase of `P7` in one fresh `General` audit session, keep that same evaluator session through all coverage/README reruns, require it to write `../.tmp/test_coverage_and_readme_audit_report.md`, and on the initial send and every rerun use the full coverage/README evaluation send packet rather than a hand-written prompt; reread each generated report and reject it if the last evaluator send was not the exact full prepared packet, if it contains prior-run wording such as `previously` or `remaining`, or if it collapses into a tiny targeted issue list instead of a full standalone strict audit; then read the full saved report file itself, extract every reported issue/recommendation from that file, and if any remain, move the displaced report into `../.ai/archive/`, route that full extracted issue set to `bugfix-2`, replace the report, and rerun the full test-coverage prompt again in that same evaluator session until the report is a full standalone pass-level report with no remaining issue/recommendation set to hand back; do not fall back to another developer session for this remediation window
|
|
220
222
|
- track the active evaluator session separately in metadata during `P7`
|
|
221
223
|
- once `P7` starts, keep looping inside `P7` until its exit criteria are actually satisfied; do not stop between audits, remediation turns, fix-check passes, or coverage/README reruns
|
|
222
224
|
|
|
@@ -429,15 +431,15 @@ Do not speak as a relay for a third party.
|
|
|
429
431
|
- when several issues are found in one review sweep, send them together once as one clear issue list instead of drip-feeding or re-batching them across multiple follow-ups
|
|
430
432
|
- for small non-core fixes such as README cleanup, docs sync, Docker config, wrapper/config glue, light `./run_tests.sh` cleanup, or similar release-churn cleanup, fix them directly in the owner session instead of bouncing them back to the developer
|
|
431
433
|
- if the fix would require editing actual test files or real product code, do not patch it in the owner session; send it back to the developer
|
|
432
|
-
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or `plan.md
|
|
433
|
-
- during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`,
|
|
434
|
+
- for small planning-document contract issues in `../docs/design.md`, `../docs/api-spec.md`, or the accepted plan (`plan.md` before `P5` closes, `../docs/plan.md` afterward), fix them directly in the owner session instead of bouncing them back to the developer
|
|
435
|
+
- during `P8`, do one deliberate cross-surface reconciliation sweep across the delivered repo, `README.md`, parent-root `../docs/`, carried audit artifacts, archived stale/fail report lineage, report-shape validity, and residual risks before packaging starts; prefer direct owner fixes for small drift instead of turning that sweep into another developer loop
|
|
434
436
|
- keep work moving without low-information continuation chatter
|
|
435
437
|
- read only what is needed to answer the current decision
|
|
436
438
|
- after planning is accepted, prefer plan-section references plus explicit acceptance checklists over repeated prompt dumps
|
|
437
439
|
- at planning, scaffold-step review inside development, the opening full-repo review, any rare major reread, and final evaluation review, demand the exact expected outcomes in itemized form rather than relying on implied standards
|
|
438
440
|
- reject development responses that silently collapse named parallel work bundles into serial work without an exact blocker and revised parallelization map
|
|
439
441
|
- keep comments and metadata auditable and specific
|
|
440
|
-
- keep external docs owner-maintained under parent-root `../docs/` as reference copies, keep `README.md` as the primary repo-local documentation file,
|
|
442
|
+
- keep external docs owner-maintained under parent-root `../docs/` as reference copies, keep `README.md` as the primary repo-local documentation file, allow repo-local `plan.md` only through planning, development, and `P5`, and preserve the final plan in parent-root `../docs/plan.md` after `P5`
|
|
441
443
|
- default review scope to the changed files and the specific supporting files named by the developer
|
|
442
444
|
- expand review scope only when a concrete inconsistency or missing dependency forces it
|
|
443
445
|
- avoid `grep` by default; prefer `glob` to identify exact files and `read` with targeted offsets
|
|
@@ -54,7 +54,7 @@ When accepted planning artifacts already exist, treat them as the primary execut
|
|
|
54
54
|
- treat follow-up prompts mainly as narrow deltas, guardrails, or correction signals
|
|
55
55
|
- if the current work is the scaffold step at the start of development, treat section 3 of `plan.md` as binding; do not re-choose the playbook, starter, or bootstrap path unless planning is explicitly reopened
|
|
56
56
|
- if the scaffold-step instructions are still vague about the playbook or bootstrap command, raise that as a planning gap instead of improvising a new baseline contract
|
|
57
|
-
- if `plan.md` includes a security execution contract, `Delivery Review Requirements`, `README Contract`, or test coverage execution contract, treat them as binding parts of the current workstream rather than optional follow-up polish
|
|
57
|
+
- if `plan.md` includes a security execution contract, `Core Semantic Path Proof`, `Prompt-Critical Rule Matrix`, `Role Surface Matrix`, `Runtime Lifecycle Checklist`, `Delivery Review Requirements`, `README Contract`, or test coverage execution contract, treat them as binding parts of the current workstream rather than optional follow-up polish
|
|
58
58
|
- planning-only deliverables inside the repo should normally stay minimal, but `plan.md` is the explicit allowed execution-plan artifact
|
|
59
59
|
- when planning is accepted, treat the execution file tree and file-ownership map in `plan.md` as real execution boundaries rather than decorative notes
|
|
60
60
|
- for adopted projects, inspect the current repo tree first and use the accepted `plan.md` delta tree rather than assuming a greenfield layout
|
|
@@ -67,6 +67,7 @@ When accepted planning artifacts already exist, treat them as the primary execut
|
|
|
67
67
|
- for backend, fullstack, and web projects, keep the canonical `docker compose up --build` contract in `README.md` and also include the exact legacy compatibility string `docker-compose up` somewhere in startup guidance
|
|
68
68
|
- for Android, iOS, and desktop projects, keep the required Docker-contained final contract while also maintaining the project-type-specific host-side guidance sections expected by the strict README audit
|
|
69
69
|
- before reporting development complete, remove local-only setup traces and host-only dependency assumptions from the delivered README and wrapper scripts
|
|
70
|
+
- before reporting development complete, explicitly report proof status for the core semantic path, prompt-critical rules, role surface matrix if applicable, runtime lifecycle checklist if applicable, and any residual risks instead of relying only on general test success
|
|
70
71
|
- keep `README.md` and other shared integration-heavy files main-session-owned by default during parallel work unless the accepted plan explicitly delegates them
|
|
71
72
|
- stay in this one developer session as the primary execution lane, but use internal Claude task sub-agents when they can parallelize independent search, reading, verification, or bounded implementation subtasks usefully
|
|
72
73
|
- prefer internal Claude sub-agents when the work naturally decomposes into independent chunks that can be explored or verified in parallel and merged back cleanly
|
|
@@ -92,7 +93,7 @@ When accepted planning artifacts already exist, treat them as the primary execut
|
|
|
92
93
|
- verify the changed area locally and realistically before reporting completion
|
|
93
94
|
- when backend or fullstack API endpoints are added or changed, prefer real HTTP tests for the exact `METHOD + PATH` over controller or service bypasses when practical
|
|
94
95
|
- if mocked HTTP tests or unit-only tests still exist for an API surface, do not overstate them as equivalent to true no-mock endpoint coverage
|
|
95
|
-
- keep the repo self-sufficient and statically reviewable through code plus `README.md`, with `plan.md` as the deliberate execution-plan exception
|
|
96
|
+
- keep the repo self-sufficient and statically reviewable through code plus `README.md`, with repo-local `plan.md` as the deliberate execution-plan exception only during active implementation through `P5`
|
|
96
97
|
- do not touch workflow or rulebook files such as `CLAUDE.md` unless explicitly asked
|
|
97
98
|
- if the work changes acceptance-critical docs or contracts, review those docs yourself before replying instead of assuming someone else will catch inconsistencies later
|
|
98
99
|
|
|
@@ -18,6 +18,14 @@ When a root phase changes:
|
|
|
18
18
|
5. open the next root phase
|
|
19
19
|
6. add a `STATE:` transition comment
|
|
20
20
|
|
|
21
|
+
When `P5` is complete but the workflow is intentionally paused for the user proceed-to-evaluation decision:
|
|
22
|
+
|
|
23
|
+
1. do not open `P7` yet
|
|
24
|
+
2. keep `P5` as the active root phase until the user explicitly says to proceed
|
|
25
|
+
3. before surfacing that pause, preserve the final truthful contents of repo-local `plan.md` in parent-root `../docs/plan.md` and remove the repo-local copy
|
|
26
|
+
4. add a structured comment such as `APPROVAL:` or `DECISION:` recording that `P5` evidence is satisfied and the workflow is waiting at the evaluation boundary
|
|
27
|
+
5. once the user approves evaluation, close `P5`, update metadata, open `P7`, and add the normal `STATE:` transition comment
|
|
28
|
+
|
|
21
29
|
## Rules
|
|
22
30
|
|
|
23
31
|
- enter the next phase before real work for that phase begins
|
|
@@ -25,6 +33,7 @@ When a root phase changes:
|
|
|
25
33
|
- keep structured comments specific and auditable
|
|
26
34
|
- treat phase-closure failures as real workflow failures to resolve
|
|
27
35
|
- keep Beads and metadata aligned on current phase and active developer session record when either changes
|
|
36
|
+
- when Beads and metadata disagree on the active root phase, treat Beads as the primary lifecycle source of truth and repair metadata unless concrete evidence proves a Beads mutation is required
|
|
28
37
|
|
|
29
38
|
## Structured comment prefixes
|
|
30
39
|
|
|
@@ -51,6 +51,7 @@ It must not become planning, architecture design, execution planning, or conveni
|
|
|
51
51
|
- It should explain what later planning could miss if each important requirement is not carried forward explicitly.
|
|
52
52
|
- It should distinguish between explicit prompt requirements, implied but binding requirements, and locked safe defaults where that separation helps later planning.
|
|
53
53
|
- It should end with a planning-miss checklist strong enough to expose details later design/planning commonly underbuild.
|
|
54
|
+
- It should explicitly cover hidden environment and trust-boundary assumptions when the prompt mentions or implies on-prem, intranet, offline, LAN, browser access, auth cookies/tokens, local storage, self-contained deployment, external reachability, or secure/insecure transport.
|
|
54
55
|
- It should cover material ambiguity only.
|
|
55
56
|
- It should preserve prompt faithfulness and avoid convenience narrowing.
|
|
56
57
|
- Each entry should end with a decisive solution.
|
|
@@ -66,6 +67,7 @@ It must not become planning, architecture design, execution planning, or conveni
|
|
|
66
67
|
- Require it to write `../.ai/clarification-faithfulness-review.md`.
|
|
67
68
|
- If the review finds only small owner-fixable wording or coverage issues, patch `../.ai/requirements-breakdown.md` and `../docs/questions.md` directly.
|
|
68
69
|
- If the review finds material drift or missing prompt-critical requirements, rerun clarification before leaving `P1`.
|
|
70
|
+
- If planning reveals one real prompt contradiction or materially ambiguous product rule immediately after clarification, allow one bounded clarification addendum; keep it narrow, classify it as product-significant rather than operator noise, update the accepted clarification package, and then resume planning.
|
|
69
71
|
|
|
70
72
|
5. Build the approved clarification package.
|
|
71
73
|
- Extract the accepted deep core requirements from `../.ai/requirements-breakdown.md` plus the accepted clarification list from `../docs/questions.md`.
|
|
@@ -87,6 +89,7 @@ Reject the clarification result if:
|
|
|
87
89
|
- planning or implementation structure leaked into `questions.md`
|
|
88
90
|
- the requirements breakdown is shallow, incomplete, or not representative enough of the prompt
|
|
89
91
|
- the requirements breakdown fails to expose planning-sensitive details, hidden constraints, negative conditions, or success-closure details that later planning could easily miss
|
|
92
|
+
- hidden environment or trust-boundary assumptions are relevant but absent or vague
|
|
90
93
|
- the prompt-faithfulness review finds material drift that was not corrected
|
|
91
94
|
- the owner still cannot restate the accepted core requirements and clarifications as a clean planning brief without carrying unresolved noise forward
|
|
92
95
|
|
|
@@ -162,7 +162,7 @@ Keep `../metadata.json` focused on project facts and exported project metadata w
|
|
|
162
162
|
- `frontend_framework`
|
|
163
163
|
- `backend_framework`
|
|
164
164
|
|
|
165
|
-
- use only `fullstack`, `
|
|
165
|
+
- use only `fullstack`, `server`, `android`, `ios`, `desktop`, or `web` for final `project_type` when known; map backend/server-only projects to `server`
|
|
166
166
|
- fill known values early and keep them current
|
|
167
167
|
- prefer explicit values; use empty strings instead of `null` or extra workflow fields
|
|
168
168
|
- do not use `../metadata.json` as owner workflow scratch state
|
|
@@ -257,9 +257,9 @@ For live Claude lanes specifically:
|
|
|
257
257
|
## Initial structure rule
|
|
258
258
|
|
|
259
259
|
- parent-root `../docs/` is the owner-maintained external documentation directory
|
|
260
|
-
- parent-root
|
|
261
|
-
-
|
|
262
|
-
-
|
|
260
|
+
- a parent-root session-export directory is not part of bootstrap, packaging, or the final Aquila root structure and should not be created
|
|
261
|
+
- Claude-backed developer sessions are packaged once as parent-root `claude-sessions.zip` by packaging the whole resolved Claude project folder for the current cwd; treat that zip as a separate handoff artifact to move or submit separately before Aquila submission
|
|
262
|
+
- the first usable developer-lane session content must include the original prompt text clearly and verbatim enough to anchor `metadata.prompt`; do not start the developer lane from a paraphrase, summary-only handoff, or mid-work continuation without the original prompt
|
|
263
263
|
- parent-root `../.tmp/` is the `P7` evaluation artifact directory for `audit_report-<N>.md`, `audit_report-<N>-fix_check.md`, and `test_coverage_and_readme_audit_report.md`
|
|
264
264
|
- parent-root `../.ai/claude-live/` is the live Claude bridge runtime directory root
|
|
265
265
|
- `../docs/questions.md` is the mandatory clarification record artifact
|
|
@@ -14,6 +14,7 @@ Use this skill before prompting the developer for the main implementation run.
|
|
|
14
14
|
- update `plan.md` in place from the main developer lane as items move from not started to in progress to done
|
|
15
15
|
- use `../docs/design.md` for system intent and architecture only, and use `plan.md` for the execution file tree, exact execution order, file ownership, and progress state
|
|
16
16
|
- treat the security execution contract and test coverage execution contract in `plan.md` as binding execution inputs, not as later cleanup suggestions
|
|
17
|
+
- treat the `Core Semantic Path Proof`, `Prompt-Critical Rule Matrix`, `Role Surface Matrix`, and `Runtime Lifecycle Checklist` sections of `plan.md` as binding execution inputs when present
|
|
17
18
|
- read the planned file tree and ownership map before deeper implementation so fan-out follows real file boundaries instead of vague feature labels
|
|
18
19
|
- treat section 3 of `plan.md` as the scaffold step: land the locked starter/playbook, bootstrap path, minimal proof surface, README structure, Docker/runtime files, repo-root dockerized `./run_tests.sh`, and the separate local test environment/tooling before deeper feature work
|
|
19
20
|
- keep that scaffold strict and minimal: set up the delivery contract honestly, avoid host-side setup beyond documented prerequisites, do not run Docker there, and make the local test harness ready for immediate development use
|
|
@@ -54,6 +55,7 @@ Use this skill before prompting the developer for the main implementation run.
|
|
|
54
55
|
- verify the module integrates cleanly with existing modules, routes, permissions, shared state, and cross-cutting helpers rather than only proving the new feature path in isolation
|
|
55
56
|
- before closing the current workstream, do a narrow adjacent-flow sweep: what existing flows, commands, or docs should still be true after this work lands?
|
|
56
57
|
- before marking a `plan.md` item done, make sure the owned behavior is actually closed: actor-facing task completion works, required validation and authorization exist, matching tests landed, and the result is not just wiring or a demo shell
|
|
58
|
+
- before marking a prompt-critical rule row, role surface row, lifecycle row, or core semantic path proof item done, verify the implementation and matching test evidence named by the plan actually exist
|
|
57
59
|
- check cross-cutting consistency where relevant, especially permissions, error handling, audit/logging/redaction behavior, and state or context transition behavior
|
|
58
60
|
- verify tenant or ownership isolation where relevant so access is scoped to the authorized context rather than merely functionally working for one actor
|
|
59
61
|
- verify route-level, object-level, and function-level authorization where those boundaries exist instead of treating “logged in” as sufficient proof
|
|
@@ -83,6 +85,9 @@ Use this skill before prompting the developer for the main implementation run.
|
|
|
83
85
|
- before reporting development complete, make sure the delivered repo is converging on exactly what `README.md` promises; if the README documents a final runtime command or broad test command, treat that as the required final output format rather than a loose note
|
|
84
86
|
- before reporting development complete, do not leave obvious repo-coherence, local-harness, startup, or Docker wiring issues for `P5` or `P9`; `P5` should only need a rough correctness pass over the prepared local harness before evaluation
|
|
85
87
|
- before reporting development complete, run one deliberate main-lane pre-`P5` reread against the original prompt plus accepted requirements-and-clarification package, accepted `plan.md`, `../docs/design.md`, accepted `../docs/api-spec.md` when applicable, `README.md`, and the integrated repo state so the owner is not first discovering obvious contract drift in `P5`
|
|
88
|
+
- before reporting development complete, fill or update the planned-but-missing proof ledger for core semantic path, prompt-critical rules, role surface matrix, runtime lifecycle behavior, security fail-closed expectations, README command honesty, and behavioral coverage proof
|
|
89
|
+
- before reporting development complete, prove the core semantic path with the exact input/setup, user/API path, expected state/artifact, and failure behavior named in `plan.md`, or report the exact residual risk rather than claiming readiness
|
|
90
|
+
- before reporting development complete, for lifecycle-sensitive behavior, include entrypoint-level proof that the scheduler/worker/timed/export/import/polling/startup/cleanup path is wired and mutates state or artifacts as planned
|
|
86
91
|
- before reporting development complete, close the common `P5` failure classes inside development rather than leaving them for owner rediscovery: `README.md` drift, API-spec drift, missing auth/authorization/ownership enforcement, weak validation or normalized error handling, missing owned tests, startup or wrapper dishonesty, and partial user-facing or admin-facing flow closure
|
|
87
92
|
- before reporting development complete, self-check the integrated repo against the release-readiness requirements already absorbed into `plan.md` `Delivery Review Requirements`; do not leave prompt-fit, static-reviewability, logging/validation, security-boundary, or coverage-structure defects unresolved
|
|
88
93
|
- before reporting development complete, when backend/fullstack APIs exist, make sure endpoint inventory, `METHOD + PATH` mapping, and true no-mock HTTP coverage expectations in `../docs/test-coverage.md` and the repo are genuinely aligned rather than only implied
|
|
@@ -129,6 +134,7 @@ Use this skill before prompting the developer for the main implementation run.
|
|
|
129
134
|
- in each development follow-up or completion reply, report the exact verification commands that were run and the concrete results they produced so the owner can review the evidence without blindly rerunning the same commands
|
|
130
135
|
- when the owner names specific expected outcomes for the current workstream or gate, tie the reported verification and changed files back to those expected outcomes explicitly
|
|
131
136
|
- before reporting overall development complete, run the prepared local test harness and report the exact command plus concrete result; this should normally be the current stack's real local suite rather than an invented placeholder wrapper
|
|
137
|
+
- in the development-complete reply, explicitly report the core semantic path proof, prompt-critical rule proof status, role surface proof status if applicable, lifecycle proof status if applicable, and any accepted or unresolved residual risks
|
|
132
138
|
- when parallel fan-out was part of the planned work, report which planned lanes actually launched, which were skipped, and the exact blocker or revised sequencing for any skipped lane
|
|
133
139
|
- in each development-complete reply, make the owner-side gate questions easy to answer directly: name the closed `plan.md` sections or workstreams, state how the delivered code still matches the accepted design and API contract where applicable, list the exact verification commands and results, and call out only real unresolved issues
|
|
134
140
|
- keep ordinary development follow-up replies short by default: short summary, exact changed files, exact verification commands plus results, and only real unresolved issues unless the owner explicitly asks for a deeper mapping
|
|
@@ -13,7 +13,8 @@ Use this skill during `P7 Evaluation and Fix Verification` after an audit attemp
|
|
|
13
13
|
- keep the issue set concrete and exact
|
|
14
14
|
- route every remediation pass to that audit session's matching `bugfix-N` lane
|
|
15
15
|
- do not split, silently drop, or wave through issues from the current audit output
|
|
16
|
-
- the owner must read the current audit
|
|
16
|
+
- the owner must read the full current audit report file, extract the full issue set from that file, and analyze the exact failing surfaces before talking to the developer
|
|
17
|
+
- do not rely on evaluator summaries, top-issue clusters, short follow-up responses, or reduced issue lists when the full saved report file exists
|
|
17
18
|
- after the developer claims fixes are complete for a kept audit session, return to the same evaluator session that produced that audit report
|
|
18
19
|
- keep `P7` moving; after triage, remediation, regeneration, or fix-check, continue with the next required step until the phase is complete or irrecoverably blocked
|
|
19
20
|
|
|
@@ -25,9 +26,10 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
|
|
|
25
26
|
|
|
26
27
|
- treat the audit as a remediation trigger that stays inside the same audit session
|
|
27
28
|
- extract and hand off all issues to the same audit session's `bugfix-N` lane
|
|
28
|
-
- treat the exact full issue list from that failed
|
|
29
|
-
-
|
|
29
|
+
- treat the exact full issue list extracted from that failed report file as the remediation scope for that fail-regeneration pass
|
|
30
|
+
- analyze every issue from the full saved report file in depth; do not frame remediation scope as only the major issues, top issue clusters, or verdict-driving subset
|
|
30
31
|
- move the fail audit report out of `../.tmp/` into `../.ai/archive/` after triage; do not keep it under the numbered audit-report path
|
|
32
|
+
- do not create or request a fix-check for a failed report; a failed report can only feed developer remediation followed by a full same-session rerun
|
|
31
33
|
- open or reuse the audit session's matching `bugfix-N` lane even during fail-regeneration so all fixes for that numbered audit session stay together
|
|
32
34
|
- after the developer returns, go back to the same evaluator session, prepare the same evaluation send packet again through `node ~/slopmachine/utils/prepare_evaluation_send_packet.mjs --workspace-root .. --prompt-file <chosen-prompt-file> --mode rerun`, and resend that exact packet unchanged as a full rerun instead of sending only a short regeneration instruction
|
|
33
35
|
- do not stop after handing off or fixing a fail audit; keep the current audit session moving until it reaches `pass` or `partial pass`
|
|
@@ -35,7 +37,7 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
|
|
|
35
37
|
### `partial pass`
|
|
36
38
|
|
|
37
39
|
- treat the kept report as the start of a scoped bugfix session
|
|
38
|
-
- use its exact issue list as the scope of that exact audit session's `bugfix-N` lane
|
|
40
|
+
- use its exact issue list extracted from the saved kept report file as the scope of that exact audit session's `bugfix-N` lane
|
|
39
41
|
- save the report as `../.tmp/audit_report-<N>.md`
|
|
40
42
|
- once that report is kept, treat its exact full issue list as the authoritative fix-check scope for the rest of that audit session; later remediation may narrow to the unresolved subset from that kept scope
|
|
41
43
|
- send the full kept-report issue set to the developer in direct human review language, with explicit owner analysis of the failing surfaces and the expected fixes
|
|
@@ -43,7 +45,7 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
|
|
|
43
45
|
### `pass`
|
|
44
46
|
|
|
45
47
|
- keep the audit report as `../.tmp/audit_report-<N>.md`
|
|
46
|
-
- if the report contains any reported issue or recommendation, open or reuse that exact audit session's `bugfix-N` lane and scope it to the full kept-report set
|
|
48
|
+
- if the report contains any reported issue or recommendation, open or reuse that exact audit session's `bugfix-N` lane and scope it to the full kept-report set extracted from the saved report file
|
|
47
49
|
- if the report has no reported issues or recommendations, do not invent a fake issue set; mark the audit session complete and move on
|
|
48
50
|
|
|
49
51
|
## Issue handoff standard
|
|
@@ -56,6 +58,7 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
|
|
|
56
58
|
- require the developer to provide an AI self-test report or concise self-test summary that can be attached or mentioned in the evaluator follow-up
|
|
57
59
|
- if the developer claims an issue is invalid or already fixed, require a concrete justification against the audit output instead of silently omitting it
|
|
58
60
|
- do not reduce the handoff to a small issue subset or a thin summary; the developer-facing prompt should contain the full issue set for the current scope
|
|
61
|
+
- do not reduce the handoff to a small issue subset, top issue cluster list, or thin summary; the developer-facing prompt should contain the full issue set extracted from the saved report file for the current scope
|
|
59
62
|
- for every issue, analyze and state as clearly as possible:
|
|
60
63
|
- what is wrong
|
|
61
64
|
- why it matters
|
|
@@ -65,8 +68,11 @@ Use `final-evaluation-orchestration` as the source of truth for session count, r
|
|
|
65
68
|
- where useful, group issues by failing section or affected surface so the developer sees the real coverage of work instead of a flat checklist
|
|
66
69
|
- include remediation hints or fix suggestions, affected areas, and confirmation text that explicitly requires every listed issue to be fixed before returning
|
|
67
70
|
- treat regenerated-audit quality defects as real issues too: if the kept audit output drops required sections, evidence, or depth from the original audit contract, reject it and rerun instead of waving it through
|
|
71
|
+
- validate report shape before keeping it: verdict, required sections/tables, evidence references, concrete issue list when non-pass, endpoint/surface inventory when applicable, and hard-gate README review for coverage reports
|
|
72
|
+
- if a report is stale or contradicted by current files, archive it with a cited stale reason and rerun instead of replacing it with an owner-authored reconciliation note
|
|
68
73
|
- treat evaluator-send defects as real issues too: if the owner cannot confirm that the last ordinary audit or coverage rerun sent the exact full prepared packet rather than a file reference or short regeneration note, reject the report and rerun instead of waving it through
|
|
69
74
|
- treat tiny regenerated reports as real defects too: outside the scoped fix-check path, reject a measurably small targeted issue list or fix-only note when the evaluator was supposed to regenerate the full report from scratch
|
|
75
|
+
- treat owner-extraction shortcuts as real defects too: if the owner has not extracted every reported issue/recommendation from the saved report file, do not accept the remediation scope as complete
|
|
70
76
|
- prefer a complete owner-analysis format such as: issue id or short label, exact finding, why it matters, failing section or verdict impact, narrow evidence reference, affected surface, required fix, remediation hint, and exact verification target
|
|
71
77
|
|
|
72
78
|
## Same-session fix-check standard
|
|
@@ -50,6 +50,22 @@ Before accepting any ordinary audit report regeneration or any coverage/README r
|
|
|
50
50
|
|
|
51
51
|
If either answer is unacceptable or unknown, reject the generated report, archive or replace it as needed, and rerun inside the same evaluator session with the full prepared packet again.
|
|
52
52
|
|
|
53
|
+
## Report shape validation
|
|
54
|
+
|
|
55
|
+
Before keeping any ordinary audit report or final coverage/README report, validate the saved file itself.
|
|
56
|
+
|
|
57
|
+
Required shape:
|
|
58
|
+
- verdict is present
|
|
59
|
+
- prompt-required major sections and tables are present
|
|
60
|
+
- file-level or route-level evidence references are present where the prompt requires them
|
|
61
|
+
- non-pass reports contain a concrete issue table or issue list
|
|
62
|
+
- coverage reports include endpoint/surface inventory when applicable
|
|
63
|
+
- coverage reports distinguish route/API/surface inventory completeness from behavioral proof sufficiency
|
|
64
|
+
- README audit content reviews hard-gate command, access, auth/credential, mock/debug/demo, and verification disclosures
|
|
65
|
+
- the report contains no owner-authored reconciliation placeholder and no request for the owner to infer missing report sections
|
|
66
|
+
|
|
67
|
+
Reject and archive the report if shape validation fails. A fix-check report is the only allowed narrow report shape, and only after a kept ordinary report exists.
|
|
68
|
+
|
|
53
69
|
## Evaluation selection rule
|
|
54
70
|
|
|
55
71
|
- choose one fresh-audit evaluation prompt kind for the whole `P7` cycle
|
|
@@ -149,7 +165,8 @@ After each audit verdict is produced inside the current audit session, branch by
|
|
|
149
165
|
|
|
150
166
|
- record the attempt as a `fail` in metadata
|
|
151
167
|
- move the generated working report file to `../.ai/archive/` instead of keeping it as `audit_report-<N>.md`
|
|
152
|
-
-
|
|
168
|
+
- never run a scoped fix-check against a failed report; failed reports require full issue extraction, developer remediation, and a full same-session rerun using the prepared packet
|
|
169
|
+
- extract all reported issues from the full generated report file itself, not from any evaluator summary, condensed response, issue cluster list, or owner paraphrase
|
|
153
170
|
- treat that exact full failed-attempt issue list as the remediation scope for the current fail-regeneration pass
|
|
154
171
|
- open or reuse the exact audit-owned remediation lane `bugfix-<N>` for that audit session and send the issues there
|
|
155
172
|
- the owner remediation handoff must include all of the following:
|
|
@@ -170,7 +187,7 @@ After each audit verdict is produced inside the current audit session, branch by
|
|
|
170
187
|
- keep the report as `../.tmp/audit_report-<N>.md`
|
|
171
188
|
- apply the kept-report acceptance rule above before continuing
|
|
172
189
|
- open or reuse `bugfix-<N>`
|
|
173
|
-
- treat the exact issue list from `audit_report-<N>.md` as the full scoped issue set for that bugfix session
|
|
190
|
+
- treat the exact issue list extracted from the full `audit_report-<N>.md` file as the full scoped issue set for that bugfix session
|
|
174
191
|
- once `audit_report-<N>.md` is kept, that exact full issue list becomes the authoritative fix-check scope for the rest of that audit session; later remediation may narrow to the unresolved subset from that kept scope
|
|
175
192
|
- keep using the same evaluator session for the later scoped fix-check loop tied to that audit number
|
|
176
193
|
|
|
@@ -178,7 +195,7 @@ After each audit verdict is produced inside the current audit session, branch by
|
|
|
178
195
|
|
|
179
196
|
- keep the report as `../.tmp/audit_report-<N>.md`
|
|
180
197
|
- apply the kept-report acceptance rule above before continuing
|
|
181
|
-
- if the report contains any reported issue or recommendation, open or reuse `bugfix-<N>` and scope it to the full kept-report set, regardless of severity
|
|
198
|
+
- if the report contains any reported issue or recommendation, open or reuse `bugfix-<N>` and scope it to the full kept-report set extracted from the saved report file, regardless of severity
|
|
182
199
|
- if the report contains no reported issues or recommendations, mark the audit session complete immediately and move to the next audit session or to the final coverage/README audit when `N = 2`
|
|
183
200
|
- do not discard a kept `pass` report
|
|
184
201
|
|
|
@@ -186,11 +203,12 @@ After each audit verdict is produced inside the current audit session, branch by
|
|
|
186
203
|
|
|
187
204
|
Inside a kept audit session after `audit_report-<N>.md` exists:
|
|
188
205
|
|
|
189
|
-
- treat the exact issue list from `audit_report-<N>.md` as the scope of the loop for `partial pass`
|
|
190
|
-
- treat every reported issue and recommendation in the kept report as the scope of the loop for `pass`
|
|
206
|
+
- treat the exact issue list extracted from the saved `audit_report-<N>.md` file as the scope of the loop for `partial pass`
|
|
207
|
+
- treat every reported issue and recommendation found in the saved kept report file as the scope of the loop for `pass`
|
|
191
208
|
- send that full scoped issue set to `bugfix-<N>` in direct human review language with owner analysis of the exact surfaces and expected fixes
|
|
192
209
|
- do not tell the developer to read the audit report file directly
|
|
193
210
|
- phrase the fix request as your own review, for example `fix these issues I found`, rather than as a report handoff
|
|
211
|
+
- do not ask the evaluator for `top issues`, `major issue clusters`, `summary issues`, or any other reduced remediation scope when the full report file already exists; the owner must read the whole file and extract the whole issue set instead
|
|
194
212
|
- require the developer to fix the scoped issue set and report:
|
|
195
213
|
- exact verification commands
|
|
196
214
|
- concrete results
|
|
@@ -225,18 +243,27 @@ These are the errors I encountered during my previous inspection of the current
|
|
|
225
243
|
- prepare the coverage/README evaluator send packet with `node ~/slopmachine/utils/prepare_evaluation_send_packet.mjs --workspace-root .. --prompt-file ~/slopmachine/test-coverage-prompt.md` and send that exact packet file unchanged to the evaluator session; do not tell the evaluator to read the file, and do not trim, excerpt, paraphrase, reorder, or partially paste it
|
|
226
244
|
- do not prepend cwd notes, workflow notes, or custom audit instructions because that prompt already defines its own report path and audit workspace assumptions
|
|
227
245
|
- before each rerun, move the previous `../.tmp/test_coverage_and_readme_audit_report.md` to `../.ai/archive/` before replacing it; do not keep numbered variants for this report in `../.tmp/`
|
|
228
|
-
- judge this audit
|
|
246
|
+
- judge this audit from the full saved report file itself, not from any evaluator summary; extract the full issue/recommendation set from that file and require remediation of that full set before accepting the rerun as complete
|
|
229
247
|
- after each generated `test_coverage_and_readme_audit_report.md`, reread it and reject it if it mentions, implies, or hints at previous runs, prior inspections, earlier fixes, regeneration, or reruns
|
|
230
248
|
- first verify that the last evaluator send for that report was the exact full prepared packet contents rather than a file reference, short note, footer-only message, or other reduced send
|
|
231
249
|
- explicitly look for wording such as `previously`, `remaining`, `still remaining`, `fixed from the prior run`, `rerun`, `regenerated`, `again`, `previous inspection`, or similar prior-run framing; treat those as audit-output defects when they refer to report history rather than current findings
|
|
232
250
|
- if the report contains that kind of prior-run wording, archive or replace it and rerun the same-session coverage/README audit until the kept report reads as a fresh standalone audit of the current repo state
|
|
233
251
|
- also reject the report if it is materially thinner or less complete than the original strict audit contract; do not allow regeneration to reduce the information depth
|
|
234
252
|
- also reject it if it is a measurably tiny file or a targeted issue list instead of a full strict coverage/README audit; this report should normally remain roughly `150+` lines and cover the full prompt-required review shape rather than collapsing to a short note
|
|
235
|
-
- route
|
|
253
|
+
- route the full extracted issue/recommendation set from that saved report file to `bugfix-2` only
|
|
236
254
|
- require fixes plus concrete verification evidence from `bugfix-2`
|
|
237
255
|
- do not run Docker or `./run_tests.sh` anywhere inside `P7`; the ordinary local-harness execution point is the owner-run gate in `P5`, and the first real Docker confirmation plus dockerized broad-test run is `P9`
|
|
238
256
|
- after the fixes land, return to that same evaluator session, prepare the coverage/README evaluator send packet again through `node ~/slopmachine/utils/prepare_evaluation_send_packet.mjs --workspace-root .. --prompt-file ~/slopmachine/test-coverage-prompt.md --mode rerun`, send that exact rerun packet unchanged to the evaluator session again, and replace the old report
|
|
239
|
-
- continue the same-session full-prompt coverage/README rerun loop until the fresh report
|
|
257
|
+
- continue the same-session full-prompt coverage/README rerun loop until the fresh report is a full standalone pass-level report with no remaining issue/recommendation set to hand back, or until an irrecoverable blocker stops the workflow
|
|
258
|
+
|
|
259
|
+
## Stale or degraded report protocol
|
|
260
|
+
|
|
261
|
+
If a report appears stale, contradicted by current files, or degraded:
|
|
262
|
+
- cite the current-file evidence that contradicts the report before rejecting it as stale
|
|
263
|
+
- archive the stale/degraded report under `../.ai/archive/`; do not keep it in `../.tmp/`
|
|
264
|
+
- record the stale/degraded reason in metadata or Beads comments when available
|
|
265
|
+
- rerun the full prepared packet in the same evaluator session or start a new evaluator session only when same-session recovery is not viable
|
|
266
|
+
- do not replace a stale/degraded report with an owner-authored reconciliation note
|
|
240
267
|
|
|
241
268
|
## Scope rule
|
|
242
269
|
|
|
@@ -255,7 +282,7 @@ These are the errors I encountered during my previous inspection of the current
|
|
|
255
282
|
- audit session `2` is complete
|
|
256
283
|
- the post-audit coverage/README audit has run as the last subphase of `P7`
|
|
257
284
|
- a clean `pass` or `partial pass` verdict alone does not end an audit session; the corresponding `bugfix-<N>` issue/recommendation loop must also close unless the kept `pass` report had no reported items at all
|
|
258
|
-
- after the second audit session completes, run the coverage/README audit; when the fresh report
|
|
285
|
+
- after the second audit session completes, run the coverage/README audit; only when the fresh report is a full standalone pass-level report and the owner has extracted no remaining issue/recommendation set from the saved file may the workflow move to `P8 Final Readiness Decision`; `P8` then performs one fast reconciliation sweep across the repo, parent-root docs, and carried audit artifacts before packaging begins
|
|
259
286
|
- until that exit target is actually met, never stop merely because one audit attempt, one remediation turn, or one fix-check loop pass has finished
|
|
260
287
|
|
|
261
288
|
## Boundaries
|