npm - @lumoai/cli - Versions diffs - 1.44.0 → 1.46.0 - Mend

@lumoai/cli 1.44.0 → 1.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/assets/skill/references/verify.md +8 -2
package/dist/cli/src/commands/task-status.js +165 -11
package/dist/cli/src/lib/evidence-display.js +106 -0
package/dist/shared/src/acceptance-evidence.js +19 -0
package/package.json +1 -1

package/assets/skill/references/verify.md CHANGED Viewed

@@ -111,9 +111,13 @@ what's unmet and why (the exact failure tails), and how many rounds are left.
 ### What it prints
 - **Header** — task identifier/title/status + `verification round N/3` (round 0 = never verified) + an escalation warning when the machine loop is exhausted.
-- **Machine verification rollup** (LUM-470) — directly under the `Criteria` header, one line `Machine verification: N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria, aligned with the web read model (LUM-456). Printed whenever the contract has ≥1 MACHINE criterion, so the terminal rollup never reads as all-human when a checkpointer actually verified the work.
-- **Criteria** — every criterion as `<glyph> <id>  [TYPE]  SOURCE@rN statement` (✓ latest verdict passed / ✗ failed / ○ no verdict yet) with its checkpointer and latest verdict line (evidence pointer on pass, failure tail on fail). `REVIEW_ADDED@rN` provenance is visible per row.
+- **Claim vs verification** (LUM-564) — 规律 2 声称vs核验: the headline contrast, printed right after the header (whenever the contract exists), so the report shows **both** columns instead of only the verification one. Two sides:
+  - **Claim** — what the agent _says_ it did: the latest session's run summary, **same-source** with the web delivery card's `latestRunSummary` (the LLM run summary, else the raw STOP turn digest). Explicitly tagged an **unverified self-report** (`agent self-report · estimated, not verification`) — estimate-tier provenance (估, not 测), with a `↳ source:` line distinguishing an `LLM run summary` from a `raw turn digest — no LLM summary yet`. **Fail-closed**: with no run summary it prints `not generated yet — the agent run summary is synthesized when the task reaches DONE`, never a fabricated claim.
+  - **Verification** — what was actually _confirmed_ (measured): the machine-verification rollup (LUM-470/456) `N machine-verified / M human override (of T MACHINE criteria)` over the active MACHINE criteria (relocated here from its old standalone line under `Criteria`), plus `X of Y criteria met by their latest verdict`. **Fail-closed**: before any round runs it prints `no verification has run yet — the claim is unconfirmed` rather than implying a pass.
+  - Carried in `--json` as `claim { text, source }` (`source: 'RUN_SUMMARY' | 'DIGEST' | null`; `text: null` = none generated). Omitted only against an older server that doesn't emit the field. The machine-verification rollup is still carried top-level as `machineVerification`.
+- **Criteria** — every criterion as `<glyph> <id>  [TYPE]  SOURCE@rN statement` (✓ latest verdict passed / ✗ failed / ○ no verdict yet) with its checkpointer and latest verdict line (failure tail on fail). `REVIEW_ADDED@rN` provenance is visible per row.
   - A passing **MACHINE** criterion's verdict line carries a machine-state tag derived from the read model's `machinePassed` flag, NOT the latest verdict (LUM-470): `· machine-verified` when a checkpointer actually passed it (even after a human later signs the task off), or `· human override (no machine pass)` when it passes only on a human sign-off with no machine run underneath. This keeps the terminal honest with web — a machine-verified criterion that a human co-signed no longer reads as a plain human pass.
+  - A verdict's **evidence is drillable** (LUM-563), rendered as an indented `↳ evidence:` line under the verdict (PASS _and_ FAIL) instead of the inert raw pointer that used to ride the verdict line — so a conclusion points at real proof you can act on, not just the `check:` command: a `cmd:` pointer prints the actual command + exit code (`ran \`…\` → exit N · re-run to reproduce`), a `file:`pointer prints a terminal-clickable`path:line`, and a `commit:` pointer prints a navigable web URL (`<repo>/commit/<hash>`, resolved from the local git `origin`remote) or a`git show <hash>`fallback when no remote resolves. A criterion that **requires evidence but has none recorded yet** (e.g. a HUMAN evidence criterion before sign-off) renders an explicit`↳ evidence: pending — no reference recorded yet`(fail-closed) instead of a bare, dead`[evidence]` tag.
   - A pass can carry a **`⚠ pre-edit version`** note (LUM-457): the criterion was changed after that verdict (reworded, or its checkpointer was swapped so the recorded evidence ran a different command). The pass still counts as met (a stale pass does not block DONE — render-only signal), but it vouches for an older version — **re-run `lumo verify` to re-confirm against the current criterion.** This is the habit whenever you edit a MACHINE criterion's checkpointer mid-task: change the check, then re-verify so the green is honest.
 - **History** — one line per recorded round: `rN · timestamp · X PASS / Y FAIL`.
 - **Last round failures** — the most recent round's FAIL verdicts with their rejection reasons (why the last round bounced).
@@ -127,6 +131,8 @@ what's unmet and why (the exact failure tails), and how many rounds are left.
   When the trail is genuinely empty it states the **basis** (`None recorded — N rounds run, 0 FAIL, no send-backs, no reopens, no leftover follow-ups`); when nothing has been verified yet it says so (`No verification has run yet — cannot confirm there were no difficulties`) rather than rendering an implicitly-clean slate. Carried in `--json` as `struggleTrail` (incl. `pullRequests` + `reopens`).
+- **Trend** (LUM-562) — 规律 7 趋势非快照: the _movement_ of the key quantities across the task's attempts, not a single snapshot. Where History/Cost/Struggle list current values, this shows direction: **Pass rate** across verification rounds (`r1 60% → r2 100% (↑ +40pts)`), **Cost/session** across the task's sessions (`4.2K → 1.1K tokens (↓), 5.3K total` — per-session spend from the same source as the **Cost** total, so the trajectory's points sum to it), and **Rework** accrual (`3 accrued — 1 FAIL round, 1 reopen, +1 PR cycle (↑ from 0)`). **Honest about a single point:** with only one round and one session every quantity is one data point, so it prints `Single attempt so far — no trajectory yet (a trend needs ≥2 rounds or sessions)` rather than drawing a fake arrow off one value (closes the named "1-round pass = single point" gap). When nothing was verified and no cost was measured it says `No verification rounds or measured cost yet — nothing to trend`. Carried in `--json` as `trend { passRate[], cost[], rework{} }`. Omitted only against an older server.
 - **Next actions** — the unmet criteria (latest verdict is not a pass: failed or never verified, HUMAN ones included). This list IS the plan — recomputed from the event log on every read, never maintained separately. Empty + rounds recorded = awaiting human adjudication.
 - **Open boundary crossings** (LUM-448) — a trailing safety block when the task has ≥1 OPEN (undispositioned) forbidden-action crossing: a count, then one line per crossing `• [SEVERITY] CATEGORY — <clipped detail>` (highest-severity first), each followed by a read-only **attribution** line `↳ by model=<m> · agent=<type>[/branch] · session=<8-char prefix>` (LUM-469 — who/what crossed; any dimension that couldn't be resolved server-side prints `unknown`, never a fabricated value), then a pointer to the web acceptance panel. Silent when there are none, so it never overshadows the criteria.
   - **Read-only awareness** — this surfaces crossings detected elsewhere (LUM-426/435/442); there is no CLI path to disposition or clear one. Disposition stays web + human-only (LUM-426/435/422): an agent/CLI bearer cannot clear its own crossing from the terminal.

package/dist/cli/src/commands/task-status.js CHANGED Viewed

@@ -7,6 +7,7 @@ const api_1 = require("../lib/api");
 const resolve_bound_task_1 = require("../lib/resolve-bound-task");
 const sanitize_1 = require("../lib/sanitize");
 const open_crossings_1 = require("../lib/open-crossings");
+const evidence_display_1 = require("../lib/evidence-display");
 /** One-line a possibly-multiline crossing detail and cap it so the safety block
  *  stays a glance, never a wall (it must not overshadow the criteria). */
 const CROSSING_DETAIL_CAP = 160;
@@ -44,15 +45,14 @@ function formatTaskStatus(data, extras = {}) {
         pushOpenCrossings(lines, extras);
         return lines.join('\n') + '\n';
     }
+    // LUM-564 声称 vs 核验: the agent's CLAIM (what it says it did) paired with
+    // the measured verification conclusion (what was actually confirmed) — the
+    // two columns regular 規律2 wants, instead of only the verification one. The
+    // machine-verification rollup (LUM-470) lives on the verification side here
+    // rather than as a standalone line, so the contrast is in one place.
+    pushClaimVsVerification(lines, data);
     lines.push('');
     lines.push(`Criteria (${data.criteria.length} total, ${data.nextActions.length} unmet):`);
-    // LUM-470: honest machine-verification rollup over the active MACHINE criteria
-    // (same read model as web, LUM-456) — so the terminal rollup never reads as
-    // all-human when a checkpointer actually verified the work.
-    const mv = data.machineVerification;
-    if (mv.total > 0) {
-        lines.push(`Machine verification: ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
-    }
     for (const c of data.criteria) {
         const glyph = c.latestVerdict == null
             ? '○'
@@ -85,9 +85,6 @@ function formatTaskStatus(data, extras = {}) {
             lines.push(`      ✗ FAIL@r${v.round}${why}`);
         }
         else {
-            const evidencePart = v.evidencePointer
-                ? ` · ${(0, sanitize_1.sanitizeField)(v.evidencePointer)}`
-                : '';
             // LUM-470: tag a passing MACHINE criterion by the read model's
             // machinePassed flag, not the latest verdict — a criterion a checkpointer
             // verified reads as machine-verified even after a human signs off, and a
@@ -97,13 +94,26 @@ function formatTaskStatus(data, extras = {}) {
                     ? ' · machine-verified'
                     : ' · human override (no machine pass)'
                 : '';
-            lines.push(`      ✓ ${v.verdict}@r${v.round}${evidencePart}${machineTag}`);
+            lines.push(`      ✓ ${v.verdict}@r${v.round}${machineTag}`);
             // LUM-457: a pass that vouches for a pre-edit version of the criterion —
             // render-only downgrade, the criterion still counts met.
             if (c.verdictStale || c.checkMismatch) {
                 lines.push('      ⚠ pre-edit version — criterion changed since this check; re-run `lumo verify` to re-confirm');
             }
         }
+        // LUM-563: drill the verdict's evidence into a navigable / re-runnable
+        // reference instead of the inert raw pointer that used to ride the verdict
+        // line — a commit URL (or `git show` fallback), a clickable path:line, or
+        // the actual command + exit. Rendered for PASS and FAIL alike so a failure
+        // is just as reproducible. When a criterion requires evidence but none is
+        // recorded yet, say so explicitly (fail-closed) rather than leaving the
+        // bare `[evidence]` tag pointing at nothing.
+        if (v?.evidencePointer) {
+            lines.push(`        ↳ evidence: ${(0, sanitize_1.sanitizeField)((0, evidence_display_1.formatEvidenceDrilldown)(v.evidencePointer, extras.repoWebUrl ?? null))}`);
+        }
+        else if (c.evidenceRequired) {
+            lines.push('        ↳ evidence: pending — no reference recorded yet');
+        }
         // LUM-511 Phase 5: send-back lifecycle (was this criterion's send-back
         // resolved, and by which PR).
         const sb = c.sendBackResolution;
@@ -136,6 +146,7 @@ function formatTaskStatus(data, extras = {}) {
     }
     pushCost(lines, data);
     pushStruggleTrail(lines, data);
+    pushTrend(lines, data);
     lines.push('');
     if (data.nextActions.length === 0) {
         lines.push(data.currentRound > 0
@@ -196,6 +207,56 @@ function fmtDuration(totalSec) {
         parts.push(`${s}s`);
     return parts.join(' ');
 }
+/**
+ * Append the "Claim vs verification" pairing (LUM-564) — 規律2 声称vs核验. The
+ * verification conclusion (machine-verified vs human-override, LUM-470) was the
+ * whole-system template, but the agent's run summary — its CLAIM of what it did
+ * — lived on a different surface, so the "two-column" contrast was really one
+ * column. This puts them side by side: the unverified self-report (estimate-
+ * tier — 估, not 测) against the measured verdict (测), so a reader can weigh
+ * 声称 against 核验 at a glance.
+ *
+ * Fail-closed on every axis: a missing claim renders an explicit "not generated
+ * yet" line (never a fabricated claim); a not-yet-verified task renders the
+ * verification side as explicitly unconfirmed (never an implied pass). The
+ * claim text is same-source with the web card (latest session's LLM run
+ * summary, else its STOP turn digest) so the two surfaces cannot drift.
+ */
+function pushClaimVsVerification(lines, data) {
+    lines.push('');
+    lines.push('Claim vs verification:');
+    // ── Claim (声称): the agent's self-report — estimate-tier, never measured.
+    lines.push('  ▸ Claim — what the agent says it did');
+    lines.push('    (agent self-report · estimated, not verification):');
+    const claim = data.claim;
+    if (claim && claim.text) {
+        for (const cl of (0, sanitize_1.sanitizeField)(claim.text).split('\n')) {
+            lines.push(`      ${cl}`);
+        }
+        lines.push(claim.source === 'RUN_SUMMARY'
+            ? '      ↳ source: LLM run summary'
+            : '      ↳ source: raw turn digest — no LLM summary yet');
+    }
+    else {
+        // Fail-closed: no run summary → say so, never invent a claim. Run summaries
+        // are synthesized when the bound task reaches DONE (LUM-481).
+        lines.push('      not generated yet — the agent run summary is synthesized when the task reaches DONE');
+    }
+    // ── Verification (核验): the measured verdict — machine-verified vs override.
+    lines.push('  ▸ Verification — what was actually confirmed (measured):');
+    const mv = data.machineVerification;
+    if (data.currentRound === 0) {
+        // Nothing verified yet — the claim stands unconfirmed. Don't imply a pass.
+        lines.push('      no verification has run yet — the claim is unconfirmed');
+    }
+    else {
+        if (mv.total > 0) {
+            lines.push(`      ${mv.machineVerified} machine-verified / ${mv.humanOverridden} human override (of ${mv.total} MACHINE criteria)`);
+        }
+        const met = data.criteria.length - data.nextActions.length;
+        lines.push(`      ${met} of ${data.criteria.length} criteria met by their latest verdict`);
+    }
+}
 /**
  * Append the honest "Cost" section (LUM-560) — 规律 1: surface the costs a human
  * should weigh (token spend, active time, machine rework) on the same report as
@@ -291,6 +352,95 @@ function pushStruggleTrail(lines, data) {
         }
     }
 }
+/** Direction glyph for a delta — ↑ up, ↓ down, → flat. */
+function arrow(delta) {
+    return delta > 0 ? '↑' : delta < 0 ? '↓' : '→';
+}
+/** Pass rate of one round as an integer percent (0 verdicts → 0%). */
+function passPct(p) {
+    const total = p.passed + p.failed;
+    return total === 0 ? 0 : Math.round((p.passed / total) * 100);
+}
+/**
+ * Append the "Trend" section (LUM-562, 規律7 趨勢非快照) — the *movement* of the
+ * task's key quantities across its attempts, not a single-point snapshot. Where
+ * History lists each round's PASS/FAIL, Cost shows the running total, and
+ * Struggle lists the scars, this shows direction: pass rate climbing, cost per
+ * session rising/falling, rework accruing. The honesty rule (mirrors the
+ * struggle trail): a single data point is NOT a trajectory — with one round and
+ * one session the section says so outright rather than drawing a fake arrow off
+ * a single value, closing the named "1-round pass = single point" gap. The
+ * per-session cost reuses the same source as the Cost section, so the
+ * trajectory's points sum to the Cost total (no in-report drift). Skipped only
+ * when the server didn't emit the field (older server) — never fabricated.
+ */
+function pushTrend(lines, data) {
+    const trend = data.trend;
+    if (!trend)
+        return; // older server: no trend data — don't invent one.
+    const rounds = trend.passRate;
+    const cost = trend.cost;
+    const rw = trend.rework;
+    lines.push('');
+    // Nothing to trend at all — no verification, no measured cost.
+    if (rounds.length === 0 && cost.length === 0) {
+        lines.push('Trend:');
+        lines.push('  No verification rounds or measured cost yet — nothing to trend. Run `lumo verify`.');
+        return;
+    }
+    // A trajectory needs ≥2 points in at least one dimension. With a single
+    // round AND a single session, every quantity is one data point — say so
+    // plainly instead of implying a flat/healthy trend (the named gap).
+    const hasTrajectory = rounds.length >= 2 || cost.length >= 2;
+    if (!hasTrajectory) {
+        lines.push('Trend:');
+        lines.push('  Single attempt so far — no trajectory yet (a trend needs ≥2 rounds or sessions).');
+        const bits = [];
+        if (rounds.length === 1)
+            bits.push(`pass rate r${rounds[0].round} ${passPct(rounds[0])}%`);
+        if (cost.length === 1)
+            bits.push(`cost ${fmtTokens(cost[0].total)} tokens`);
+        bits.push(rw.total === 0 ? 'rework none' : `rework ${rw.total}`);
+        lines.push(`  ${bits.join(' · ')}`);
+        return;
+    }
+    const sessionWord = cost.length === 1 ? 'session' : 'sessions';
+    lines.push(`Trend (${rounds.length} rounds · ${cost.length} ${sessionWord}):`);
+    // Pass rate across rounds — the climb (or fall) of the verify loop.
+    if (rounds.length >= 2) {
+        const seq = rounds.map(r => `r${r.round} ${passPct(r)}%`).join(' → ');
+        const delta = passPct(rounds[rounds.length - 1]) - passPct(rounds[0]);
+        const sign = delta > 0 ? `+${delta}` : `${delta}`;
+        lines.push(`  Pass rate: ${seq}  (${arrow(delta)} ${sign}pts)`);
+    }
+    else if (rounds.length === 1) {
+        lines.push(`  Pass rate: r${rounds[0].round} ${passPct(rounds[0])}% (1 round — single point)`);
+    }
+    // Cost per session across the task's sessions — the spend trajectory.
+    if (cost.length >= 2) {
+        const first = cost[0].total;
+        const last = cost[cost.length - 1].total;
+        const total = cost.reduce((a, c) => a + c.total, 0);
+        lines.push(`  Cost/session: ${fmtTokens(first)} → ${fmtTokens(last)} tokens  (${arrow(last - first)}), ${fmtTokens(total)} total over ${cost.length} sessions`);
+    }
+    else if (cost.length === 1) {
+        lines.push(`  Cost: ${fmtTokens(cost[0].total)} tokens (1 session — single point)`);
+    }
+    // Rework accrual — monotonic from a clean baseline of 0.
+    if (rw.total === 0) {
+        lines.push('  Rework: none accrued (0) — clean across every attempt');
+    }
+    else {
+        const parts = [];
+        if (rw.reworkRounds > 0)
+            parts.push(`${rw.reworkRounds} FAIL round${rw.reworkRounds === 1 ? '' : 's'}`);
+        if (rw.reopens > 0)
+            parts.push(`${rw.reopens} reopen${rw.reopens === 1 ? '' : 's'}`);
+        if (rw.extraPrCycles > 0)
+            parts.push(`+${rw.extraPrCycles} PR cycle${rw.extraPrCycles === 1 ? '' : 's'}`);
+        lines.push(`  Rework: ${rw.total} accrued — ${parts.join(', ')}  (↑ from 0)`);
+    }
+}
 /**
  * Append the OPEN boundary-crossings safety block (LUM-448) — a count, one line
  * per crossing with its severity + category + clipped detail, and a pointer to
@@ -407,5 +557,9 @@ async function taskStatus(identifier, options = {}) {
     process.stdout.write(formatTaskStatus(data, {
         openCrossings: crossingsResult,
         dispositionUrl: (0, open_crossings_1.dispositionUrl)(base, creds.workspaceSlug ?? 'lumo', data.task.identifier),
+        // LUM-563: resolve the local repo's web base so a `commit:` evidence
+        // pointer drills to a navigable URL; null (not a git repo / no origin)
+        // falls back to a `git show` hint in the renderer.
+        repoWebUrl: (0, evidence_display_1.resolveRepoWebUrl)(),
     }));
 }

package/dist/cli/src/lib/evidence-display.js ADDED Viewed

@@ -0,0 +1,106 @@
+"use strict";
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.normalizeRepoWebUrl = normalizeRepoWebUrl;
+exports.resolveRepoWebUrl = resolveRepoWebUrl;
+exports.formatEvidenceDrilldown = formatEvidenceDrilldown;
+const child_process_1 = require("child_process");
+const acceptance_evidence_1 = require("../../../shared/src/acceptance-evidence");
+/**
+ * Drill-down rendering for acceptance evidence pointers (LUM-563).
+ *
+ * A verdict's evidencePointer is a structured token (`cmd:…#exit=N`,
+ * `file:path:line`, `commit:hash`). Printed raw it satisfies Norman's gulf of
+ * evaluation only halfway: you can read it but not act on it. These helpers
+ * turn each shape into a navigable / re-runnable line — a web commit URL when
+ * the git remote resolves (else a `git show` fallback), a terminal-clickable
+ * `path:line`, or the actual command + exit you can re-run to reproduce.
+ *
+ * No third-party deps; sanitization is applied by the caller (task-status.ts)
+ * before anything reaches the terminal.
+ */
+/**
+ * Normalize a git remote URL to its https web base (no trailing `.git`), or
+ * null when it doesn't look like a remote we can navigate. Fails closed: a
+ * shape we can't confidently map yields null so the caller drops to a
+ * non-URL fallback rather than printing a broken link.
+ *
+ * Handles the three remote forms git emits:
+ *   - scp-style  `git@host:owner/repo.git`
+ *   - https      `https://host/owner/repo.git`
+ *   - ssh url    `ssh://git@host/owner/repo.git`
+ */
+function normalizeRepoWebUrl(remote) {
+    const raw = remote.trim();
+    if (!raw)
+        return null;
+    let host;
+    let path;
+    const scp = /^[\w.-]+@([\w.-]+):(.+)$/.exec(raw);
+    if (scp) {
+        host = scp[1];
+        path = scp[2];
+    }
+    else {
+        let url;
+        try {
+            url = new URL(raw);
+        }
+        catch {
+            return null;
+        }
+        if (url.protocol !== 'https:' && url.protocol !== 'ssh:')
+            return null;
+        host = url.host;
+        path = url.pathname;
+    }
+    const cleanPath = path
+        .replace(/^\/+/, '')
+        .replace(/\.git$/, '')
+        .replace(/\/+$/, '');
+    // A navigable repo path is at minimum owner/repo — reject a bare host.
+    if (!host || !cleanPath || !cleanPath.includes('/'))
+        return null;
+    return `https://${host}/${cleanPath}`;
+}
+/**
+ * Resolve the current repo's web base by reading `origin`'s remote URL. Best
+ * effort and side-effecting (runs git in the cwd), so it lives outside the pure
+ * formatter and is injected via TaskStatusExtras. Null when not a git repo /
+ * no origin / unparseable — the renderer then uses the `git show` fallback.
+ */
+function resolveRepoWebUrl(cwd = process.cwd()) {
+    try {
+        const r = (0, child_process_1.spawnSync)('git', ['remote', 'get-url', 'origin'], {
+            cwd,
+            encoding: 'utf8',
+            timeout: 5_000,
+        });
+        if (r.status !== 0 || !r.stdout)
+            return null;
+        return normalizeRepoWebUrl(r.stdout);
+    }
+    catch {
+        return null;
+    }
+}
+/**
+ * Render the actionable evidence body for one pointer (the text after
+ * `↳ evidence: `). Unrecognised shapes fall back to the raw pointer so data is
+ * never hidden, only upgraded when we know how.
+ */
+function formatEvidenceDrilldown(pointer, repoWebUrl) {
+    const parsed = (0, acceptance_evidence_1.parseEvidencePointer)(pointer);
+    if (!parsed)
+        return pointer;
+    if (parsed.kind === 'cmd') {
+        return `ran \`${parsed.command}\` → exit ${parsed.exitCode} · re-run to reproduce`;
+    }
+    if (parsed.kind === 'file') {
+        return `${parsed.path}:${parsed.line}`;
+    }
+    // commit
+    const short = parsed.hash.slice(0, 10);
+    return repoWebUrl
+        ? `commit ${short} · ${repoWebUrl}/commit/${parsed.hash}`
+        : `commit ${short} · view with \`git show ${parsed.hash}\``;
+}

package/dist/shared/src/acceptance-evidence.js CHANGED Viewed

@@ -2,6 +2,7 @@
 Object.defineProperty(exports, "__esModule", { value: true });
 exports.EVIDENCE_POINTER_MAX = exports.EVIDENCE_POINTER_FORMAT_HINT = exports.EVIDENCE_POINTER_PATTERNS = void 0;
 exports.isValidEvidencePointer = isValidEvidencePointer;
+exports.parseEvidencePointer = parseEvidencePointer;
 exports.buildCmdEvidencePointer = buildCmdEvidencePointer;
 /**
  * Evidence-pointer grammar for acceptance verification (LUM-343).
@@ -29,6 +30,24 @@ function isValidEvidencePointer(value) {
 }
 /** Max stored pointer length — mirrors the column-level cap in validation. */
 exports.EVIDENCE_POINTER_MAX = 2_000;
+/**
+ * Decompose an evidence pointer into its parts, or null if it doesn't parse.
+ * The grammar is the single source of truth (EVIDENCE_POINTER_PATTERNS); this
+ * mirrors those exact shapes — a `#` may appear inside a `cmd:` command, so the
+ * exit marker is matched greedily from the END, never the first `#`.
+ */
+function parseEvidencePointer(value) {
+    const commit = /^commit:([0-9a-f]{7,40})$/.exec(value);
+    if (commit)
+        return { kind: 'commit', hash: commit[1] };
+    const file = /^file:([^\s]+):(\d+(?:-\d+)?)$/.exec(value);
+    if (file)
+        return { kind: 'file', path: file[1], line: file[2] };
+    const cmd = /^cmd:([\s\S]+)#exit=(\d+)$/.exec(value);
+    if (cmd)
+        return { kind: 'cmd', command: cmd[1], exitCode: Number(cmd[2]) };
+    return null;
+}
 /**
  * Build a `cmd:` evidence pointer for an executed checkpointer. The command
  * is truncated so the suffix (`#exit=N`) always survives the length cap —

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@lumoai/cli",
-  "version": "1.44.0",
+  "version": "1.46.0",
   "description": "Lumo CLI — manage tasks and sessions from the terminal",
   "license": "MIT",
   "author": "cli@uselumo.ai",