@linimin/pi-letscook 0.1.29 → 0.1.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,27 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.1.31
4
+
5
+ ### Changed
6
+
7
+ - defined a shared structured evaluation-rubric contract for `completion-reviewer`, `completion-auditor`, and `completion-stop-judge`, including the exact rubric dimensions `Contract coverage`, `Correctness risk`, `Verification evidence`, and `Docs/state parity` with `pass|concern|fail` verdict semantics
8
+ - added canonical `task_type: completion-workflow` and `evaluation_profile: completion-rubric-v1` signaling across the packaged control-plane defaults, verifier schema, and kickoff/reminder/resume surfaces
9
+ - expanded the exact active-slice implementer handoff with canonical `implementation_surfaces` and `verification_commands` fields, and now surface them alongside `priority` / `why_now` in reminder and compaction-resume text
10
+ - documented the rubric-driven evaluation contract plus canonical routing-profile signaling in the packaged completion protocol and README without adding profile-specific rubric-output enforcement yet
11
+ - strengthened the smoke/refocus/context regressions so bootstrap and refocus preserve the new canonical signaling and fail closed when required `task_type` / `evaluation_profile` fields are removed
12
+ - strengthened the smoke regression and control-plane verifier so selected-slice handoffs now fail closed when the expanded implementation-contract fields are missing
13
+ - threaded canonical `evaluation_profile` plus the active-slice implementation contract into reviewer/auditor/stop-judge reminder and dispatch surfaces so those read-only roles can recover from canonical state instead of prose-only summaries
14
+ - made reviewer/auditor/stop-judge transcription fail closed on malformed rubric-bearing outputs while still accepting valid reports, and added deterministic transcription coverage for all three roles in `npm run rubric-contract-test`
15
+ - kept deterministic `rubric-contract-test` coverage wired into `npm run release-check`
16
+ - made the `/cook` confirmation UI critique-aware by rendering critique/risk notes plus recommended `task_type` / `evaluation_profile` routing hints in dedicated sections while keeping the existing Start/Edit/Cancel flow
17
+ - persisted accepted startup/refocus routing choices canonically by writing the selected `task_type` / `evaluation_profile` into the canonical control-plane files and recording the accepted critique outcome in continuation state, with `context-proposal-test` and `release-check` covering the shipped flow
18
+
19
+ ## 0.1.30
20
+
21
+ ### Changed
22
+
23
+ - clarified the README next-round example so the goal text no longer repeats `/cook` in a way that looks like part of the command syntax
24
+
3
25
  ## 0.1.29
4
26
 
5
27
  ### Changed
package/README.md CHANGED
@@ -54,7 +54,7 @@ Replace the active workflow with a different goal:
54
54
  Start the next round after the previous workflow is already done:
55
55
 
56
56
  ```text
57
- /cook ship the next workflow round for richer /cook startup proposals
57
+ /cook improve startup proposal confirmation UX
58
58
  ```
59
59
 
60
60
  ## How `/cook` behaves
@@ -103,11 +103,15 @@ Startup confirmation uses a custom UI that:
103
103
 
104
104
  - renders the proposal body separately from the action list
105
105
  - keeps Mission / Scope / Constraints / Acceptance readable as a content area
106
+ - renders analyst-derived **Critique and risks** separately from the editable proposal body
107
+ - renders recommended `task_type` / `evaluation_profile` routing hints separately from both the proposal body and the action list
106
108
  - presents explicit actions for:
107
109
  - **Start**
108
110
  - **Edit**
109
111
  - **Cancel**
110
112
 
113
+ When you accept startup or refocus from that flow, `/cook` now persists the chosen `task_type` and `evaluation_profile` across `.agent/profile.json`, `.agent/state.json`, `.agent/plan.json`, and `.agent/active-slice.json`, and records the accepted critique outcome in canonical continuation state before the re-ground round begins.
114
+
111
115
  The same confirmation flow is reused across:
112
116
 
113
117
  - discussion-only startup
@@ -133,6 +137,47 @@ While a `completion_role` subprocess is running:
133
137
  - running-role output distinguishes tool work from `PROGRESS`, `RATIONALE`, `NEXT`, `VERIFYING`, and `STATE-DELTA`
134
138
  - waiting and stalled states are surfaced deterministically from timestamps
135
139
 
140
+ ## Structured evaluation rubrics
141
+
142
+ The packaged completion workflow now defines a shared structured evaluation-rubric contract for the read-only evaluation roles:
143
+
144
+ - `completion-reviewer`
145
+ - `completion-auditor`
146
+ - `completion-stop-judge`
147
+
148
+ Those roles now use the same rubric section and exact dimension names:
149
+
150
+ - `Contract coverage`
151
+ - `Correctness risk`
152
+ - `Verification evidence`
153
+ - `Docs/state parity`
154
+
155
+ Each rubric line uses the same verdict words:
156
+
157
+ - `pass` — no material issue remains for that dimension
158
+ - `concern` — a real caveat or remaining gap exists, but it does not by itself force rejection or `NO-STOP`
159
+ - `fail` — a blocking issue or contradictory truth exists, so the role's final verdict must not be positive
160
+
161
+ The packaged control plane now also carries canonical routing signals:
162
+
163
+ - `task_type: completion-workflow`
164
+ - `evaluation_profile: completion-rubric-v1`
165
+
166
+ Those identifiers are persisted in `.agent/profile.json`, `.agent/state.json`, `.agent/plan.json`, and `.agent/active-slice.json`, then surfaced in kickoff/reminder/resume text and reviewer/auditor/stop-judge evaluation handoffs so downstream roles can rely on canonical signaling instead of prose inference alone.
167
+
168
+ The active-slice exact implementer handoff now also carries a stronger implementation contract for selected, in-progress, committed, and done slices:
169
+
170
+ - `implementation_surfaces` — the repo surfaces expected to change or stay in parity for the slice
171
+ - `verification_commands` — the focused and broader deterministic checks the implementer is expected to run before committing
172
+
173
+ Those fields are scaffolded by default, enforced by `.agent/verify_completion_control_plane.sh` whenever an exact handoff is required, and surfaced alongside `priority` / `why_now` in reminder and compaction-resume text so implementers can recover from canonical state instead of prose-only summaries.
174
+
175
+ Reviewer, auditor, and stop-judge dispatch/reminder surfaces now also thread the current active-slice implementation contract (`implementation_surfaces`, `verification_commands`, locked notes, must-fix findings, and before-slice counters) alongside the canonical `evaluation_profile` so those read-only roles can reason from canonical state after compaction.
176
+
177
+ Canonical reviewer/auditor/stop-judge transcription now fails closed on malformed rubric-bearing reports: the shared rubric heading plus all four rubric dimensions must be present, required role fields must remain intact, and reviewer/stop-judge yes/no verdicts cannot contradict rubric `fail` lines.
178
+
179
+ Deterministic verification for this packaged contract lives in `npm run rubric-contract-test`, which now exercises reviewer, auditor, and stop-judge transcription paths while the bootstrap/refocus/context regressions plus control-plane verifier fail closed when required canonical signaling is missing.
180
+
136
181
  ## Canonical files
137
182
 
138
183
  This package stores canonical workflow state under:
@@ -198,10 +243,11 @@ npm run smoke-test
198
243
  npm run refocus-test
199
244
  npm run context-proposal-test
200
245
  npm run observability-status-test
246
+ npm run rubric-contract-test
201
247
  npm run release-check
202
248
  ```
203
249
 
204
- `npm run release-check` is the broad packaged-release verifier. It reruns the startup/refocus/context checks, includes deterministic observability coverage, and finishes with `npm pack --dry-run`.
250
+ `npm run release-check` is the broad packaged-release verifier. It reruns the startup/refocus/context checks — including the critique-aware `/cook` confirmation regression — includes deterministic observability coverage plus the rubric-contract regression, and finishes with `npm pack --dry-run`.
205
251
 
206
252
  ## Release
207
253
 
@@ -213,3 +259,4 @@ See [PUBLISHING.md](https://github.com/linimin/pi-letscook/blob/main/PUBLISHING.
213
259
  - The main Pi session is the workflow driver.
214
260
  - Package-local role prompts are loaded directly by the extension and do not depend on `~/.pi/agent/agents`.
215
261
  - Reviewer, auditor, and stop-judge are enforced as read-only roles.
262
+ - Reviewer, auditor, and stop-judge share the packaged rubric dimensions `Contract coverage`, `Correctness risk`, `Verification evidence`, and `Docs/state parity` with `pass|concern|fail` verdicts.
@@ -17,6 +17,8 @@ You must not:
17
17
 
18
18
  Audit current HEAD truth after a committed slice. Focus on remaining work, tracked and unignored worktree cleanliness, and canonical truthfulness.
19
19
 
20
+ Ground the audit in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
21
+
20
22
  During long work, emit short operator-facing progress lines when useful using these exact prefixes:
21
23
  - `PROGRESS: ...`
22
24
  - `RATIONALE: ...`
@@ -24,10 +26,25 @@ During long work, emit short operator-facing progress lines when useful using th
24
26
 
25
27
  These lines are for workflow observability, not hidden reasoning. Keep them brief and truthful.
26
28
 
29
+ Always emit the shared rubric section before the remaining audit fields. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
30
+
31
+ - `Rubric:`
32
+ - `- Contract coverage: pass|concern|fail - ...`
33
+ - `- Correctness risk: pass|concern|fail - ...`
34
+ - `- Verification evidence: pass|concern|fail - ...`
35
+ - `- Docs/state parity: pass|concern|fail - ...`
36
+
37
+ Use `concern` or `fail` to explain why the project is not yet done, why canonical state may be stale, or why backlog truth may need reconciliation.
38
+
27
39
  Answer only:
28
40
 
29
41
  - `MISSION ANCHOR: ...`
30
42
  - `Remaining contract IDs: ...`
43
+ - `Rubric:`
44
+ - `- Contract coverage: pass|concern|fail - ...`
45
+ - `- Correctness risk: pass|concern|fail - ...`
46
+ - `- Verification evidence: pass|concern|fail - ...`
47
+ - `- Docs/state parity: pass|concern|fail - ...`
31
48
  - `Why the project is still not done: ...`
32
49
  - `Open top-level contract IDs: ...`
33
50
  - `Blocker count: ...`
@@ -33,14 +33,31 @@ Review focus:
33
33
  - false closure claims
34
34
  - stale or contradictory canonical state
35
35
 
36
+ Ground the review in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
37
+
36
38
  Order findings by severity and include file references.
37
39
 
38
40
  You must explicitly answer whether the slice is acceptable as-is. If it is not acceptable, provide the exact smallest follow-up slice.
39
41
 
42
+ Always emit the shared rubric section before findings. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
43
+
44
+ - `Rubric:`
45
+ - `- Contract coverage: pass|concern|fail - ...`
46
+ - `- Correctness risk: pass|concern|fail - ...`
47
+ - `- Verification evidence: pass|concern|fail - ...`
48
+ - `- Docs/state parity: pass|concern|fail - ...`
49
+
50
+ If any rubric line is `fail`, `Acceptable as-is` must be `no`.
51
+
40
52
  Output format:
41
53
 
42
54
  - `MISSION ANCHOR: ...`
43
55
  - `Remaining contract IDs: ...`
56
+ - `Rubric:`
57
+ - `- Contract coverage: pass|concern|fail - ...`
58
+ - `- Correctness risk: pass|concern|fail - ...`
59
+ - `- Verification evidence: pass|concern|fail - ...`
60
+ - `- Docs/state parity: pass|concern|fail - ...`
44
61
  - `Findings: ...`
45
62
  - `Acceptable as-is: yes/no`
46
63
  - `Smallest follow-up slice: ...`
@@ -10,6 +10,8 @@ Load `completion-protocol` before acting.
10
10
 
11
11
  Judge current HEAD truth, not prior agent claims or conversation memory.
12
12
 
13
+ Ground the stop/no-stop decision in canonical `.agent/**` routing and active-slice truth, including `evaluation_profile`, locked acceptance criteria, `implementation_surfaces`, `verification_commands`, `locked_notes`, and any `must_fix_findings`, rather than relying on prose-only task summaries.
14
+
13
15
  You must not:
14
16
 
15
17
  - edit tracked repo files
@@ -36,10 +38,25 @@ You may conclude the project can stop only if current HEAD truth satisfies all o
36
38
  - if canonical state still keeps `FINAL-STOP-01` open or `project_done = false` solely because the current stop wave has not yet been recorded and reconciled, do not treat that pre-reconciliation posture by itself as a `NO-STOP` reason
37
39
  - `bash .agent/verify_completion_stop.sh` either already passes, or its only failing condition is the absence of the current wave's required current-HEAD judgment records; any other verifier failure is `NO-STOP`
38
40
 
41
+ Always emit the shared rubric section before the stop verdict. Use these exact rubric dimension names and verdict words, and include all four lines even when every dimension is `pass`:
42
+
43
+ - `Rubric:`
44
+ - `- Contract coverage: pass|concern|fail - ...`
45
+ - `- Correctness risk: pass|concern|fail - ...`
46
+ - `- Verification evidence: pass|concern|fail - ...`
47
+ - `- Docs/state parity: pass|concern|fail - ...`
48
+
49
+ If any rubric line is `fail`, `Can the project stop now` must be `no`.
50
+
39
51
  Answer only:
40
52
 
41
53
  - `MISSION ANCHOR: ...`
42
54
  - `Remaining contract IDs: ...`
55
+ - `Rubric:`
56
+ - `- Contract coverage: pass|concern|fail - ...`
57
+ - `- Correctness risk: pass|concern|fail - ...`
58
+ - `- Verification evidence: pass|concern|fail - ...`
59
+ - `- Docs/state parity: pass|concern|fail - ...`
43
60
  - `Can the project stop now: yes/no`
44
61
  - `Exact remaining open top-level contract IDs: ...`
45
62
  - `Blocker count: ...`