@lumoai/cli 1.45.0 → 1.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -111,7 +111,10 @@ what's unmet and why (the exact failure tails), and how many rounds are left.
111
111
  ### What it prints
112
112
 
113
113
  - **Header** — task identifier/title/status + `verification round N/3` (round 0 = never verified) + an escalation warning when the machine loop is exhausted.
114
- - **Machine verification rollup** (LUM-470) — directly under the `Criteria` header, one line `Machine verification: N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria, aligned with the web read model (LUM-456). Printed whenever the contract has ≥1 MACHINE criterion, so the terminal rollup never reads as all-human when a checkpointer actually verified the work.
114
+ - **Claim vs verification** (LUM-564) — 规律 2 声称vs核验: the headline contrast, printed right after the header (whenever the contract exists), so the report shows **both** columns instead of only the verification one. Two sides:
115
+ - **Claim** — what the agent _says_ it did: the latest session's run summary, **same-source** with the web delivery card's `latestRunSummary` (the LLM run summary, else the raw STOP turn digest). Explicitly tagged an **unverified self-report** (`agent self-report · estimated, not verification`) — estimate-tier provenance (估, not 测), with a `↳ source:` line distinguishing an `LLM run summary` from a `raw turn digest — no LLM summary yet`. **Fail-closed**: with no run summary it prints `not generated yet — the agent run summary is synthesized when the task reaches DONE`, never a fabricated claim.
116
+ - **Verification** — what was actually _confirmed_ (measured): the machine-verification rollup (LUM-470/456) `N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria (relocated here from its old standalone line under `Criteria`), plus `X of Y criteria met by their latest verdict`. **Fail-closed**: before any round runs it prints `no verification has run yet — the claim is unconfirmed` rather than implying a pass.
117
+ - Carried in `--json` as `claim { text, source }` (`source: 'RUN_SUMMARY' | 'DIGEST' | null`; `text: null` = none generated). Omitted only against an older server that doesn't emit the field. The machine-verification rollup is still carried top-level as `machineVerification`.
115
118
  - **Criteria** — every criterion as `<glyph> <id> [TYPE] SOURCE@rN statement` (✓ latest verdict passed / ✗ failed / ○ no verdict yet) with its checkpointer and latest verdict line (failure tail on fail). `REVIEW_ADDED@rN` provenance is visible per row.
116
119
  - A passing **MACHINE** criterion's verdict line carries a machine-state tag derived from the read model's `machinePassed` flag, NOT the latest verdict (LUM-470): `· machine-verified` when a checkpointer actually passed it (even after a human later signs the task off), or `· human override (no machine pass)` when it passes only on a human sign-off with no machine run underneath. This keeps the terminal honest with web — a machine-verified criterion that a human co-signed no longer reads as a plain human pass.
117
120
  - A verdict's **evidence is drillable** (LUM-563), rendered as an indented `↳ evidence:` line under the verdict (PASS _and_ FAIL) instead of the inert raw pointer that used to ride the verdict line — so a conclusion points at real proof you can act on, not just the `check:` command: a `cmd:` pointer prints the actual command + exit code (`ran \`…\` → exit N · re-run to reproduce`), a `file:`pointer prints a terminal-clickable`path:line`, and a `commit:` pointer prints a navigable web URL (`<repo>/commit/<hash>`, resolved from the local git `origin`remote) or a`git show <hash>`fallback when no remote resolves. A criterion that **requires evidence but has none recorded yet** (e.g. a HUMAN evidence criterion before sign-off) renders an explicit`↳ evidence: pending — no reference recorded yet`(fail-closed) instead of a bare, dead`[evidence]` tag.
@@ -45,15 +45,14 @@ function formatTaskStatus(data, extras = {}) {
45
45
  pushOpenCrossings(lines, extras);
46
46
  return lines.join('\n') + '\n';
47
47
  }
48
+ // LUM-564 声称 vs 核验: the agent's CLAIM (what it says it did) paired with
49
+ // the measured verification conclusion (what was actually confirmed) — the
50
+ // two columns regular 規律2 wants, instead of only the verification one. The
51
+ // machine-verification rollup (LUM-470) lives on the verification side here
52
+ // rather than as a standalone line, so the contrast is in one place.
53
+ pushClaimVsVerification(lines, data);
48
54
  lines.push('');
49
55
  lines.push(`Criteria (${data.criteria.length} total, ${data.nextActions.length} unmet):`);
50
- // LUM-470: honest machine-verification rollup over the active MACHINE criteria
51
- // (same read model as web, LUM-456) — so the terminal rollup never reads as
52
- // all-human when a checkpointer actually verified the work.
53
- const mv = data.machineVerification;
54
- if (mv.total > 0) {
55
- lines.push(`Machine verification: ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
56
- }
57
56
  for (const c of data.criteria) {
58
57
  const glyph = c.latestVerdict == null
59
58
  ? '○'
@@ -208,6 +207,56 @@ function fmtDuration(totalSec) {
208
207
  parts.push(`${s}s`);
209
208
  return parts.join(' ');
210
209
  }
210
+ /**
211
+ * Append the "Claim vs verification" pairing (LUM-564) — 規律2 声称vs核验. The
212
+ * verification conclusion (machine-verified vs human-override, LUM-470) was the
213
+ * whole-system template, but the agent's run summary — its CLAIM of what it did
214
+ * — lived on a different surface, so the "two-column" contrast was really one
215
+ * column. This puts them side by side: the unverified self-report (estimate-
216
+ * tier — 估, not 测) against the measured verdict (测), so a reader can weigh
217
+ * 声称 against 核验 at a glance.
218
+ *
219
+ * Fail-closed on every axis: a missing claim renders an explicit "not generated
220
+ * yet" line (never a fabricated claim); a not-yet-verified task renders the
221
+ * verification side as explicitly unconfirmed (never an implied pass). The
222
+ * claim text is same-source with the web card (latest session's LLM run
223
+ * summary, else its STOP turn digest) so the two surfaces cannot drift.
224
+ */
225
+ function pushClaimVsVerification(lines, data) {
226
+ lines.push('');
227
+ lines.push('Claim vs verification:');
228
+ // ── Claim (声称): the agent's self-report — estimate-tier, never measured.
229
+ lines.push(' ▸ Claim — what the agent says it did');
230
+ lines.push(' (agent self-report · estimated, not verification):');
231
+ const claim = data.claim;
232
+ if (claim && claim.text) {
233
+ for (const cl of (0, sanitize_1.sanitizeField)(claim.text).split('\n')) {
234
+ lines.push(` ${cl}`);
235
+ }
236
+ lines.push(claim.source === 'RUN_SUMMARY'
237
+ ? ' ↳ source: LLM run summary'
238
+ : ' ↳ source: raw turn digest — no LLM summary yet');
239
+ }
240
+ else {
241
+ // Fail-closed: no run summary → say so, never invent a claim. Run summaries
242
+ // are synthesized when the bound task reaches DONE (LUM-481).
243
+ lines.push(' not generated yet — the agent run summary is synthesized when the task reaches DONE');
244
+ }
245
+ // ── Verification (核验): the measured verdict — machine-verified vs override.
246
+ lines.push(' ▸ Verification — what was actually confirmed (measured):');
247
+ const mv = data.machineVerification;
248
+ if (data.currentRound === 0) {
249
+ // Nothing verified yet — the claim stands unconfirmed. Don't imply a pass.
250
+ lines.push(' no verification has run yet — the claim is unconfirmed');
251
+ }
252
+ else {
253
+ if (mv.total > 0) {
254
+ lines.push(` ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
255
+ }
256
+ const met = data.criteria.length - data.nextActions.length;
257
+ lines.push(` ${met} of ${data.criteria.length} criteria met by their latest verdict`);
258
+ }
259
+ }
211
260
  /**
212
261
  * Append the honest "Cost" section (LUM-560) — 规律 1: surface the costs a human
213
262
  * should weigh (token spend, active time, machine rework) on the same report as
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lumoai/cli",
3
- "version": "1.45.0",
3
+ "version": "1.46.0",
4
4
  "description": "Lumo CLI — manage tasks and sessions from the terminal",
5
5
  "license": "MIT",
6
6
  "author": "cli@uselumo.ai",