@linimin/pi-letscook 0.1.30 → 0.1.31
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -0
- package/README.md +48 -1
- package/agents/completion-auditor.md +17 -0
- package/agents/completion-reviewer.md +17 -0
- package/agents/completion-stop-judge.md +17 -0
- package/extensions/completion/index.ts +749 -195
- package/extensions/completion/role-reporting.js +356 -0
- package/package.json +2 -1
- package/scripts/context-proposal-test.sh +115 -6
- package/scripts/refocus-test.sh +11 -0
- package/scripts/release-check.sh +2 -0
- package/scripts/rubric-contract-test.sh +249 -0
- package/scripts/smoke-test.sh +154 -23
- package/skills/completion-protocol/SKILL.md +39 -0
- package/skills/completion-protocol/references/completion.md +71 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,21 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.1.31
|
|
4
|
+
|
|
5
|
+
### Changed
|
|
6
|
+
|
|
7
|
+
- defined a shared structured evaluation-rubric contract for `completion-reviewer`, `completion-auditor`, and `completion-stop-judge`, including the exact rubric dimensions `Contract coverage`, `Correctness risk`, `Verification evidence`, and `Docs/state parity` with `pass|concern|fail` verdict semantics
|
|
8
|
+
- added canonical `task_type: completion-workflow` and `evaluation_profile: completion-rubric-v1` signaling across the packaged control-plane defaults, verifier schema, and kickoff/reminder/resume surfaces
|
|
9
|
+
- expanded the exact active-slice implementer handoff with canonical `implementation_surfaces` and `verification_commands` fields, and now surface them alongside `priority` / `why_now` in reminder and compaction-resume text
|
|
10
|
+
- documented the rubric-driven evaluation contract plus canonical routing-profile signaling in the packaged completion protocol and README without adding profile-specific rubric-output enforcement yet
|
|
11
|
+
- strengthened the smoke/refocus/context regressions so bootstrap and refocus preserve the new canonical signaling and fail closed when required `task_type` / `evaluation_profile` fields are removed
|
|
12
|
+
- strengthened the smoke regression and control-plane verifier so selected-slice handoffs now fail closed when the expanded implementation-contract fields are missing
|
|
13
|
+
- threaded canonical `evaluation_profile` plus the active-slice implementation contract into reviewer/auditor/stop-judge reminder and dispatch surfaces so those read-only roles can recover from canonical state instead of prose-only summaries
|
|
14
|
+
- made reviewer/auditor/stop-judge transcription fail closed on malformed rubric-bearing outputs while still accepting valid reports, and added deterministic transcription coverage for all three roles in `npm run rubric-contract-test`
|
|
15
|
+
- kept deterministic `rubric-contract-test` coverage wired into `npm run release-check`
|
|
16
|
+
- made the `/cook` confirmation UI critique-aware by rendering critique/risk notes plus recommended `task_type` / `evaluation_profile` routing hints in dedicated sections while keeping the existing Start/Edit/Cancel flow
|
|
17
|
+
- persisted accepted startup/refocus routing choices canonically by writing the selected `task_type` / `evaluation_profile` into the canonical control-plane files and recording the accepted critique outcome in continuation state, with `context-proposal-test` and `release-check` covering the shipped flow
|
|
18
|
+
|
|
3
19
|
## 0.1.30
|
|
4
20
|
|
|
5
21
|
### Changed
|
package/README.md
CHANGED
|
@@ -103,11 +103,15 @@ Startup confirmation uses a custom UI that:
|
|
|
103
103
|
|
|
104
104
|
- renders the proposal body separately from the action list
|
|
105
105
|
- keeps Mission / Scope / Constraints / Acceptance readable as a content area
|
|
106
|
+
- renders analyst-derived **Critique and risks** separately from the editable proposal body
|
|
107
|
+
- renders recommended `task_type` / `evaluation_profile` routing hints separately from both the proposal body and the action list
|
|
106
108
|
- presents explicit actions for:
|
|
107
109
|
- **Start**
|
|
108
110
|
- **Edit**
|
|
109
111
|
- **Cancel**
|
|
110
112
|
|
|
113
|
+
When you accept startup or refocus from that flow, `/cook` now persists the chosen `task_type` and `evaluation_profile` across `.agent/profile.json`, `.agent/state.json`, `.agent/plan.json`, and `.agent/active-slice.json`, and records the accepted critique outcome in canonical continuation state before the re-ground round begins.
|
|
114
|
+
|
|
111
115
|
The same confirmation flow is reused across:
|
|
112
116
|
|
|
113
117
|
- discussion-only startup
|
|
@@ -133,6 +137,47 @@ While a `completion_role` subprocess is running:
|
|
|
133
137
|
- running-role output distinguishes tool work from `PROGRESS`, `RATIONALE`, `NEXT`, `VERIFYING`, and `STATE-DELTA`
|
|
134
138
|
- waiting and stalled states are surfaced deterministically from timestamps
|
|
135
139
|
|
|
140
|
+
## Structured evaluation rubrics
|
|
141
|
+
|
|
142
|
+
The packaged completion workflow now defines a shared structured evaluation-rubric contract for the read-only evaluation roles:
|
|
143
|
+
|
|
144
|
+
- `completion-reviewer`
|
|
145
|
+
- `completion-auditor`
|
|
146
|
+
- `completion-stop-judge`
|
|
147
|
+
|
|
148
|
+
Those roles now use the same rubric section and exact dimension names:
|
|
149
|
+
|
|
150
|
+
- `Contract coverage`
|
|
151
|
+
- `Correctness risk`
|
|
152
|
+
- `Verification evidence`
|
|
153
|
+
- `Docs/state parity`
|
|
154
|
+
|
|
155
|
+
Each rubric line uses the same verdict words:
|
|
156
|
+
|
|
157
|
+
- `pass` — no material issue remains for that dimension
|
|
158
|
+
- `concern` — a real caveat or remaining gap exists, but it does not by itself force rejection or `NO-STOP`
|
|
159
|
+
- `fail` — a blocking issue or contradictory truth exists, so the role's final verdict must not be positive
|
|
160
|
+
|
|
161
|
+
The packaged control plane now also carries canonical routing signals:
|
|
162
|
+
|
|
163
|
+
- `task_type: completion-workflow`
|
|
164
|
+
- `evaluation_profile: completion-rubric-v1`
|
|
165
|
+
|
|
166
|
+
Those identifiers are persisted in `.agent/profile.json`, `.agent/state.json`, `.agent/plan.json`, and `.agent/active-slice.json`, then surfaced in kickoff/reminder/resume text and reviewer/auditor/stop-judge evaluation handoffs so downstream roles can rely on canonical signaling instead of prose inference alone.
|
|
167
|
+
|
|
168
|
+
The active-slice exact implementer handoff now also carries a stronger implementation contract for selected, in-progress, committed, and done slices:
|
|
169
|
+
|
|
170
|
+
- `implementation_surfaces` — the repo surfaces expected to change or stay in parity for the slice
|
|
171
|
+
- `verification_commands` — the focused and broader deterministic checks the implementer is expected to run before committing
|
|
172
|
+
|
|
173
|
+
Those fields are scaffolded by default, enforced by `.agent/verify_completion_control_plane.sh` whenever an exact handoff is required, and surfaced alongside `priority` / `why_now` in reminder and compaction-resume text so implementers can recover from canonical state instead of prose-only summaries.
|
|
174
|
+
|
|
175
|
+
Reviewer, auditor, and stop-judge dispatch/reminder surfaces now also thread the current active-slice implementation contract (`implementation_surfaces`, `verification_commands`, locked notes, must-fix findings, and before-slice counters) alongside the canonical `evaluation_profile` so those read-only roles can reason from canonical state after compaction.
|
|
176
|
+
|
|
177
|
+
Canonical reviewer/auditor/stop-judge transcription now fails closed on malformed rubric-bearing reports: the shared rubric heading plus all four rubric dimensions must be present, required role fields must remain intact, and reviewer/stop-judge yes/no verdicts cannot contradict rubric `fail` lines.
|
|
178
|
+
|
|
179
|
+
Deterministic verification for this packaged contract lives in `npm run rubric-contract-test`, which now exercises reviewer, auditor, and stop-judge transcription paths while the bootstrap/refocus/context regressions plus control-plane verifier fail closed when required canonical signaling is missing.
|
|
180
|
+
|
|
136
181
|
## Canonical files
|
|
137
182
|
|
|
138
183
|
This package stores canonical workflow state under:
|
|
@@ -198,10 +243,11 @@ npm run smoke-test
|
|
|
198
243
|
npm run refocus-test
|
|
199
244
|
npm run context-proposal-test
|
|
200
245
|
npm run observability-status-test
|
|
246
|
+
npm run rubric-contract-test
|
|
201
247
|
npm run release-check
|
|
202
248
|
```
|
|
203
249
|
|
|
204
|
-
`npm run release-check` is the broad packaged-release verifier. It reruns the startup/refocus/context checks
|
|
250
|
+
`npm run release-check` is the broad packaged-release verifier. It reruns the startup/refocus/context checks — including the critique-aware `/cook` confirmation regression — includes deterministic observability coverage plus the rubric-contract regression, and finishes with `npm pack --dry-run`.
|
|
205
251
|
|
|
206
252
|
## Release
|
|
207
253
|
|
|
@@ -213,3 +259,4 @@ See [PUBLISHING.md](https://github.com/linimin/pi-letscook/blob/main/PUBLISHING.
|
|
|
213
259
|
- The main Pi session is the workflow driver.
|
|
214
260
|
- Package-local role prompts are loaded directly by the extension and do not depend on `~/.pi/agent/agents`.
|
|
215
261
|
- Reviewer, auditor, and stop-judge are enforced as read-only roles.
|
|
262
|
+
- Reviewer, auditor, and stop-judge share the packaged rubric dimensions `Contract coverage`, `Correctness risk`, `Verification evidence`, and `Docs/state parity` with `pass|concern|fail` verdicts.
|
|
@@ -17,6 +17,8 @@ You must not:
|
|
|
17
17
|
|
|
18
18
|
Audit current HEAD truth after a committed slice. Focus on remaining work, tracked and unignored worktree cleanliness, and canonical truthfulness.
|
|
19
19
|
|
|
20
|
+
Ground the audit in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
|
|
21
|
+
|
|
20
22
|
During long work, emit short operator-facing progress lines when useful using these exact prefixes:
|
|
21
23
|
- `PROGRESS: ...`
|
|
22
24
|
- `RATIONALE: ...`
|
|
@@ -24,10 +26,25 @@ During long work, emit short operator-facing progress lines when useful using th
|
|
|
24
26
|
|
|
25
27
|
These lines are for workflow observability, not hidden reasoning. Keep them brief and truthful.
|
|
26
28
|
|
|
29
|
+
Always emit the shared rubric section before the remaining audit fields. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
|
|
30
|
+
|
|
31
|
+
- `Rubric:`
|
|
32
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
33
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
34
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
35
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
36
|
+
|
|
37
|
+
Use `concern` or `fail` to explain why the project is not yet done, why canonical state may be stale, or why backlog truth may need reconciliation.
|
|
38
|
+
|
|
27
39
|
Answer only:
|
|
28
40
|
|
|
29
41
|
- `MISSION ANCHOR: ...`
|
|
30
42
|
- `Remaining contract IDs: ...`
|
|
43
|
+
- `Rubric:`
|
|
44
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
45
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
46
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
47
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
31
48
|
- `Why the project is still not done: ...`
|
|
32
49
|
- `Open top-level contract IDs: ...`
|
|
33
50
|
- `Blocker count: ...`
|
|
@@ -33,14 +33,31 @@ Review focus:
|
|
|
33
33
|
- false closure claims
|
|
34
34
|
- stale or contradictory canonical state
|
|
35
35
|
|
|
36
|
+
Ground the review in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
|
|
37
|
+
|
|
36
38
|
Order findings by severity and include file references.
|
|
37
39
|
|
|
38
40
|
You must explicitly answer whether the slice is acceptable as-is. If it is not acceptable, provide the exact smallest follow-up slice.
|
|
39
41
|
|
|
42
|
+
Always emit the shared rubric section before findings. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
|
|
43
|
+
|
|
44
|
+
- `Rubric:`
|
|
45
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
46
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
47
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
48
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
49
|
+
|
|
50
|
+
If any rubric line is `fail`, `Acceptable as-is` must be `no`.
|
|
51
|
+
|
|
40
52
|
Output format:
|
|
41
53
|
|
|
42
54
|
- `MISSION ANCHOR: ...`
|
|
43
55
|
- `Remaining contract IDs: ...`
|
|
56
|
+
- `Rubric:`
|
|
57
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
58
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
59
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
60
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
44
61
|
- `Findings: ...`
|
|
45
62
|
- `Acceptable as-is: yes/no`
|
|
46
63
|
- `Smallest follow-up slice: ...`
|
|
@@ -10,6 +10,8 @@ Load `completion-protocol` before acting.
|
|
|
10
10
|
|
|
11
11
|
Judge current HEAD truth, not prior agent claims or conversation memory.
|
|
12
12
|
|
|
13
|
+
Ground the stop/no-stop decision in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
|
|
14
|
+
|
|
13
15
|
You must not:
|
|
14
16
|
|
|
15
17
|
- edit tracked repo files
|
|
@@ -36,10 +38,25 @@ You may conclude the project can stop only if current HEAD truth satisfies all o
|
|
|
36
38
|
- if canonical state still keeps `FINAL-STOP-01` open or `project_done = false` solely because the current stop wave has not yet been recorded and reconciled, do not treat that pre-reconciliation posture by itself as a `NO-STOP` reason
|
|
37
39
|
- `bash .agent/verify_completion_stop.sh` either already passes, or its only failing condition is the absence of the current wave's required current-HEAD judgment records; any other verifier failure is `NO-STOP`
|
|
38
40
|
|
|
41
|
+
Always emit the shared rubric section before the stop verdict. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
|
|
42
|
+
|
|
43
|
+
- `Rubric:`
|
|
44
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
45
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
46
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
47
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
48
|
+
|
|
49
|
+
If any rubric line is `fail`, `Can the project stop now` must be `no`.
|
|
50
|
+
|
|
39
51
|
Answer only:
|
|
40
52
|
|
|
41
53
|
- `MISSION ANCHOR: ...`
|
|
42
54
|
- `Remaining contract IDs: ...`
|
|
55
|
+
- `Rubric:`
|
|
56
|
+
- `- Contract coverage: pass|concern|fail - ...`
|
|
57
|
+
- `- Correctness risk: pass|concern|fail - ...`
|
|
58
|
+
- `- Verification evidence: pass|concern|fail - ...`
|
|
59
|
+
- `- Docs/state parity: pass|concern|fail - ...`
|
|
43
60
|
- `Can the project stop now: yes/no`
|
|
44
61
|
- `Exact remaining open top-level contract IDs: ...`
|
|
45
62
|
- `Blocker count: ...`
|