npm - @lumoai/cli - Versions diffs - 1.45.0 → 1.46.0 - Mend

@lumoai/cli 1.45.0 → 1.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/assets/skill/references/verify.md +4 -1
package/dist/cli/src/commands/task-status.js +56 -7
package/package.json +1 -1

package/assets/skill/references/verify.md CHANGED Viewed

@@ -111,7 +111,10 @@ what's unmet and why (the exact failure tails), and how many rounds are left.
 ### What it prints
 - **Header** — task identifier/title/status + `verification round N/3` (round 0 = never verified) + an escalation warning when the machine loop is exhausted.
-- **Machine verification rollup** (LUM-470) — directly under the `Criteria` header, one line `Machine verification: N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria, aligned with the web read model (LUM-456). Printed whenever the contract has ≥1 MACHINE criterion, so the terminal rollup never reads as all-human when a checkpointer actually verified the work.
+- **Claim vs verification** (LUM-564) — 规律 2 声称vs核验: the headline contrast, printed right after the header (whenever the contract exists), so the report shows **both** columns instead of only the verification one. Two sides:
+  - **Claim** — what the agent _says_ it did: the latest session's run summary, **same-source** with the web delivery card's `latestRunSummary` (the LLM run summary, else the raw STOP turn digest). Explicitly tagged an **unverified self-report** (`agent self-report · estimated, not verification`) — estimate-tier provenance (估, not 测), with a `↳ source:` line distinguishing an `LLM run summary` from a `raw turn digest — no LLM summary yet`. **Fail-closed**: with no run summary it prints `not generated yet — the agent run summary is synthesized when the task reaches DONE`, never a fabricated claim.
+  - **Verification** — what was actually _confirmed_ (measured): the machine-verification rollup (LUM-470/456) `N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria (relocated here from its old standalone line under `Criteria`), plus `X of Y criteria met by their latest verdict`. **Fail-closed**: before any round runs it prints `no verification has run yet — the claim is unconfirmed` rather than implying a pass.
+  - Carried in `--json` as `claim { text, source }` (`source: 'RUN_SUMMARY' | 'DIGEST' | null`; `text: null` = none generated). Omitted only against an older server that doesn't emit the field. The machine-verification rollup is still carried top-level as `machineVerification`.
 - **Criteria** — every criterion as `<glyph> <id>  [TYPE]  SOURCE@rN statement` (✓ latest verdict passed / ✗ failed / ○ no verdict yet) with its checkpointer and latest verdict line (failure tail on fail). `REVIEW_ADDED@rN` provenance is visible per row.
   - A passing **MACHINE** criterion's verdict line carries a machine-state tag derived from the read model's `machinePassed` flag, NOT the latest verdict (LUM-470): `· machine-verified` when a checkpointer actually passed it (even after a human later signs the task off), or `· human override (no machine pass)` when it passes only on a human sign-off with no machine run underneath. This keeps the terminal honest with web — a machine-verified criterion that a human co-signed no longer reads as a plain human pass.
   - A verdict's **evidence is drillable** (LUM-563), rendered as an indented `↳ evidence:` line under the verdict (PASS _and_ FAIL) instead of the inert raw pointer that used to ride the verdict line — so a conclusion points at real proof you can act on, not just the `check:` command: a `cmd:` pointer prints the actual command + exit code (`ran \`…\` → exit N · re-run to reproduce`), a `file:`pointer prints a terminal-clickable`path:line`, and a `commit:` pointer prints a navigable web URL (`<repo>/commit/<hash>`, resolved from the local git `origin`remote) or a`git show <hash>`fallback when no remote resolves. A criterion that **requires evidence but has none recorded yet** (e.g. a HUMAN evidence criterion before sign-off) renders an explicit`↳ evidence: pending — no reference recorded yet`(fail-closed) instead of a bare, dead`[evidence]` tag.

package/dist/cli/src/commands/task-status.js CHANGED Viewed

@@ -45,15 +45,14 @@ function formatTaskStatus(data, extras = {}) {
         pushOpenCrossings(lines, extras);
         return lines.join('\n') + '\n';
     }
+    // LUM-564 声称 vs 核验: the agent's CLAIM (what it says it did) paired with
+    // the measured verification conclusion (what was actually confirmed) — the
+    // two columns regular 規律2 wants, instead of only the verification one. The
+    // machine-verification rollup (LUM-470) lives on the verification side here
+    // rather than as a standalone line, so the contrast is in one place.
+    pushClaimVsVerification(lines, data);
     lines.push('');
     lines.push(`Criteria (${data.criteria.length} total, ${data.nextActions.length} unmet):`);
-    // LUM-470: honest machine-verification rollup over the active MACHINE criteria
-    // (same read model as web, LUM-456) — so the terminal rollup never reads as
-    // all-human when a checkpointer actually verified the work.
-    const mv = data.machineVerification;
-    if (mv.total > 0) {
-        lines.push(`Machine verification: ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
-    }
     for (const c of data.criteria) {
         const glyph = c.latestVerdict == null
             ? '○'
@@ -208,6 +207,56 @@ function fmtDuration(totalSec) {
         parts.push(`${s}s`);
     return parts.join(' ');
 }
+/**
+ * Append the "Claim vs verification" pairing (LUM-564) — 規律2 声称vs核验. The
+ * verification conclusion (machine-verified vs human-override, LUM-470) was the
+ * whole-system template, but the agent's run summary — its CLAIM of what it did
+ * — lived on a different surface, so the "two-column" contrast was really one
+ * column. This puts them side by side: the unverified self-report (estimate-
+ * tier — 估, not 测) against the measured verdict (测), so a reader can weigh
+ * 声称 against 核验 at a glance.
+ *
+ * Fail-closed on every axis: a missing claim renders an explicit "not generated
+ * yet" line (never a fabricated claim); a not-yet-verified task renders the
+ * verification side as explicitly unconfirmed (never an implied pass). The
+ * claim text is same-source with the web card (latest session's LLM run
+ * summary, else its STOP turn digest) so the two surfaces cannot drift.
+ */
+function pushClaimVsVerification(lines, data) {
+    lines.push('');
+    lines.push('Claim vs verification:');
+    // ── Claim (声称): the agent's self-report — estimate-tier, never measured.
+    lines.push('  ▸ Claim — what the agent says it did');
+    lines.push('    (agent self-report · estimated, not verification):');
+    const claim = data.claim;
+    if (claim && claim.text) {
+        for (const cl of (0, sanitize_1.sanitizeField)(claim.text).split('\n')) {
+            lines.push(`      ${cl}`);
+        }
+        lines.push(claim.source === 'RUN_SUMMARY'
+            ? '      ↳ source: LLM run summary'
+            : '      ↳ source: raw turn digest — no LLM summary yet');
+    }
+    else {
+        // Fail-closed: no run summary → say so, never invent a claim. Run summaries
+        // are synthesized when the bound task reaches DONE (LUM-481).
+        lines.push('      not generated yet — the agent run summary is synthesized when the task reaches DONE');
+    }
+    // ── Verification (核验): the measured verdict — machine-verified vs override.
+    lines.push('  ▸ Verification — what was actually confirmed (measured):');
+    const mv = data.machineVerification;
+    if (data.currentRound === 0) {
+        // Nothing verified yet — the claim stands unconfirmed. Don't imply a pass.
+        lines.push('      no verification has run yet — the claim is unconfirmed');
+    }
+    else {
+        if (mv.total > 0) {
+            lines.push(`      ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
+        }
+        const met = data.criteria.length - data.nextActions.length;
+        lines.push(`      ${met} of ${data.criteria.length} criteria met by their latest verdict`);
+    }
+}
 /**
  * Append the honest "Cost" section (LUM-560) — 规律 1: surface the costs a human
  * should weigh (token spend, active time, machine rework) on the same report as

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@lumoai/cli",
-  "version": "1.45.0",
+  "version": "1.46.0",
   "description": "Lumo CLI — manage tasks and sessions from the terminal",
   "license": "MIT",
   "author": "cli@uselumo.ai",