@lumoai/cli 1.43.0 → 1.44.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -117,6 +117,7 @@ what's unmet and why (the exact failure tails), and how many rounds are left.
|
|
|
117
117
|
- A pass can carry a **`⚠ pre-edit version`** note (LUM-457): the criterion was changed after that verdict (reworded, or its checkpointer was swapped so the recorded evidence ran a different command). The pass still counts as met (a stale pass does not block DONE — render-only signal), but it vouches for an older version — **re-run `lumo verify` to re-confirm against the current criterion.** This is the habit whenever you edit a MACHINE criterion's checkpointer mid-task: change the check, then re-verify so the green is honest.
|
|
118
118
|
- **History** — one line per recorded round: `rN · timestamp · X PASS / Y FAIL`.
|
|
119
119
|
- **Last round failures** — the most recent round's FAIL verdicts with their rejection reasons (why the last round bounced).
|
|
120
|
+
- **Cost** (LUM-560) — 规律 1: the costs a human should weigh, on the same report as the verdict instead of scattered across the web delivery card and `task lineage`. Three lines: **Tokens** (total input+output+cache across the task's sessions), **Active time** (non-idle agent seconds — Σ per-turn `STOP − prompt`, LUM-487), and **Rework rounds** (verify rounds that recorded a FAIL). Read from the **same** server-side source the web delivery card consumes (`retrospectiveRepository.loadActuals`), so the two reports cannot drift. Token cost is **fail-closed**: when no session usage was recorded it prints `Tokens: not recorded (no session usage captured)`, kept distinct from a measured `0` (没测到 vs 花了0, aligned with LUM-559). Carried in `--json` as `cost { tokenCost, activeTimeSec, reworkRounds }` (`tokenCost: null` = not measured). Omitted only against an older server that doesn't emit the field.
|
|
120
121
|
- **Struggle / rework / outstanding** (LUM-561) — the anti-mum-and-deaf block: **always printed when the contract exists, even on a clean 0-unmet task** so a passing task still shows its scars instead of wiping them to a single PASS count. Lists, when present:
|
|
121
122
|
- **rework rounds** — verify rounds that had a FAIL;
|
|
122
123
|
- **send-backs** — criteria sent back by a human/agent verdict (a MACHINE verify-loop FAIL is not a 打回), with their open/resolved lifecycle, preserved even for since-removed criteria;
|
|
@@ -134,6 +134,7 @@ function formatTaskStatus(data, extras = {}) {
|
|
|
134
134
|
}
|
|
135
135
|
}
|
|
136
136
|
}
|
|
137
|
+
pushCost(lines, data);
|
|
137
138
|
pushStruggleTrail(lines, data);
|
|
138
139
|
lines.push('');
|
|
139
140
|
if (data.nextActions.length === 0) {
|
|
@@ -166,6 +167,57 @@ function formatTaskStatus(data, extras = {}) {
|
|
|
166
167
|
pushOpenCrossings(lines, extras);
|
|
167
168
|
return lines.join('\n') + '\n';
|
|
168
169
|
}
|
|
170
|
+
/** Compact a token count for the terminal — 1_200_000 → "1.2M", 850_000 →
|
|
171
|
+
* "850K", 0 → "0". Uses the SAME Intl compact formatter the web card's
|
|
172
|
+
* fmtCompact uses (notation:'compact', maximumFractionDigits:1) so the same
|
|
173
|
+
* number reads identically in both surfaces — no presentation drift (LUM-560). */
|
|
174
|
+
const TOKEN_FMT = new Intl.NumberFormat('en-US', {
|
|
175
|
+
notation: 'compact',
|
|
176
|
+
maximumFractionDigits: 1,
|
|
177
|
+
});
|
|
178
|
+
function fmtTokens(n) {
|
|
179
|
+
return TOKEN_FMT.format(n);
|
|
180
|
+
}
|
|
181
|
+
/** Active (non-idle) seconds → a compact "2h 14m" / "3m 5s" / "12s". 0 → "0s". */
|
|
182
|
+
function fmtDuration(totalSec) {
|
|
183
|
+
const sec = Math.max(0, Math.round(totalSec));
|
|
184
|
+
if (sec === 0)
|
|
185
|
+
return '0s';
|
|
186
|
+
const h = Math.floor(sec / 3600);
|
|
187
|
+
const m = Math.floor((sec % 3600) / 60);
|
|
188
|
+
const s = sec % 60;
|
|
189
|
+
const parts = [];
|
|
190
|
+
if (h > 0)
|
|
191
|
+
parts.push(`${h}h`);
|
|
192
|
+
if (m > 0)
|
|
193
|
+
parts.push(`${m}m`);
|
|
194
|
+
// Show seconds only when the duration is under an hour (keeps long runs tidy).
|
|
195
|
+
if (s > 0 && h === 0)
|
|
196
|
+
parts.push(`${s}s`);
|
|
197
|
+
return parts.join(' ');
|
|
198
|
+
}
|
|
199
|
+
/**
|
|
200
|
+
* Append the honest "Cost" section (LUM-560) — 规律 1: surface the costs a human
|
|
201
|
+
* should weigh (token spend, active time, machine rework) on the same report as
|
|
202
|
+
* the acceptance verdict, instead of leaving them scattered across the web
|
|
203
|
+
* delivery card and `task lineage`. Every number is the server's, read from the
|
|
204
|
+
* same loadActuals source as the web card (no drift). Token cost is fail-closed:
|
|
205
|
+
* a null reads as an explicit "not recorded" line, never a silent or fake 0, so
|
|
206
|
+
* 没测到 (no session usage) stays distinct from a measured 花了0. Skipped only
|
|
207
|
+
* when the server didn't emit the field (older server) — never fabricated.
|
|
208
|
+
*/
|
|
209
|
+
function pushCost(lines, data) {
|
|
210
|
+
const cost = data.cost;
|
|
211
|
+
if (!cost)
|
|
212
|
+
return; // older server: can't fabricate cost, so don't claim any.
|
|
213
|
+
lines.push('');
|
|
214
|
+
lines.push('Cost:');
|
|
215
|
+
lines.push(cost.tokenCost == null
|
|
216
|
+
? ' Tokens: not recorded (no session usage captured)'
|
|
217
|
+
: ` Tokens: ${fmtTokens(cost.tokenCost)}`);
|
|
218
|
+
lines.push(` Active time: ${fmtDuration(cost.activeTimeSec)} (non-idle)`);
|
|
219
|
+
lines.push(` Rework rounds: ${cost.reworkRounds}${cost.reworkRounds === 0 ? ' (no machine rework)' : ''}`);
|
|
220
|
+
}
|
|
169
221
|
/**
|
|
170
222
|
* Append the honest "Struggle / rework / outstanding" section (LUM-561) — the
|
|
171
223
|
* anti-mum-and-deaf block (kills a silent "Nothing outstanding"). It is ALWAYS
|