@lumoai/cli 1.27.0 → 1.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,124 +0,0 @@
1
- # lumo verify — machine verification loop
2
-
3
- `lumo verify` is the machine half of the acceptance system (Acceptance v1,
4
- LUM-343). It executes every **MACHINE** criterion's checkpointer in the local
5
- repo, reports one structured PASS/FAIL verdict per criterion to the server,
6
- and prints what to do next. The judge lives server-side: round numbering, the
7
- 3-round cap, escalation, and the IN_REVIEW transition all happen there
8
- (execution on the client, adjudication on the server).
9
-
10
- ## The claim-done rule
11
-
12
- **Before claiming a task is complete — in conversation, in a wrap-up, or by
13
- touching its status — run `lumo verify`.** The loop replaces "I read the code
14
- and it looks done" with executed evidence.
15
-
16
- ```
17
- lumo verify # session-bound task
18
- lumo verify LUM-42 # explicit task (overrides the session binding)
19
- lumo verify --timeout 900 # per-checkpointer timeout in seconds (default 600)
20
- ```
21
-
22
- ## What one round does
23
-
24
- 1. Loads the task's acceptance contract and picks out MACHINE criteria.
25
- 2. Runs each checkpointer locally (shell, cwd = current directory), one at a
26
- time, echoing PASS/FAIL as it goes.
27
- 3. POSTs the structured verdicts; the server records one VerificationRun per
28
- criterion at round = previous max + 1 and mirrors each verdict as a
29
- TaskActivity event.
30
- 4. Prints the round outcome:
31
- - **All PASS** → the task transitions to **IN_REVIEW** (existing state
32
- machine + TASK_IN_REVIEW notification). **Stop here.** Human
33
- adjudication and any HUMAN criteria take over; never set DONE yourself.
34
- - **Any FAIL** → task status is untouched; the unmet criteria are printed
35
- as next actions (statement, checkpointer, failure tail). Fix and re-run.
36
- - **Round 3 still failing** → the loop escalates: a human is notified
37
- (AGENT_VERIFY, requires action) and further `lumo verify` rounds are
38
- rejected with 409. **Stop retrying**; fix only what the human directs.
39
-
40
- Exit code 0 = all passed (or nothing to run); 1 = failures, escalation, or
41
- errors.
42
-
43
- ## Verdict semantics (what the CLI sends)
44
-
45
- - checkpointer exits 0 → `PASS` with evidence `cmd:<command>#exit=0`
46
- - non-zero exit → `FAIL`, reason = output tail, enum `CRITERION_UNMET`
47
- - spawn failure / timeout → `FAIL`, enum `CHECK_EXECUTION_ERROR`
48
-
49
- evidencePointer is **not free text** — the server only accepts
50
- `commit:<hash>`, `file:<path>:<line>`, or `cmd:<command>#exit=<code>`.
51
- Verdicts are PASS|FAIL only; the agent path cannot write HUMAN verdicts or
52
- `PASS_WITH_FOLLOWUP` (red line — those enter via human-initiated UI paths
53
- only).
54
-
55
- ## Edge cases
56
-
57
- - **No contract yet** → error pointing at `lumo task criteria set`; draft the
58
- contract first (criteria.md golden rule).
59
- - **HUMAN-only contract (zero MACHINE criteria)** → nothing to run; the CLI
60
- says so and suggests handing off for human review
61
- (`lumo task update <id> --status in_review`). No server write happens.
62
- - **A round must cover every MACHINE criterion** — the CLI always runs all of
63
- them; the server rejects partial rounds.
64
- - Criteria added during review (`REVIEW_ADDED`) appear in the contract and
65
- are picked up automatically by the next round.
66
-
67
- ## Round discipline
68
-
69
- Rounds are a hard budget of 3, not a retry loop. Between rounds, actually fix
70
- the failures — re-running without changes burns a round and (at round 3)
71
- pages a human. A FAIL round never changes task status; only an all-pass round
72
- moves it (to IN_REVIEW, never further).
73
-
74
- ## lumo task status — the read half (self-check entry point)
75
-
76
- `lumo task status [task] [--json]` is the read-only counterpart of the loop
77
- (LUM-344): pure read, milliseconds, no LLM, never writes — running it costs
78
- nothing and burns no round. Defaults to the session-bound task; an explicit
79
- identifier overrides.
80
-
81
- ```
82
- lumo task status # session-bound task
83
- lumo task status LUM-42 # explicit task
84
- lumo task status --json # versioned machine-readable payload
85
- ```
86
-
87
- ### When to run it
88
-
89
- **Status-first recovery:** run it FIRST — before re-reading code or
90
- planning — whenever you:
91
-
92
- - resume a task in a new session (yours or another agent's earlier work);
93
- - come back after a verification round was rejected (`lumo verify` failed);
94
- - were told the task bounced in review (REVIEW_ADDED criteria may have been
95
- appended at the round they surfaced — they show up here automatically).
96
-
97
- It answers "where does the loop stand": what already passed (don't redo it),
98
- what's unmet and why (the exact failure tails), and how many rounds are left.
99
-
100
- ### What it prints
101
-
102
- - Header: task identifier/title/status + `verification round N/3` (round 0 =
103
- never verified) + an escalation warning when the machine loop is exhausted.
104
- - **Criteria** — every criterion as `<glyph> <id> [TYPE] SOURCE@rN
105
- statement` (✓ latest verdict passed / ✗ failed / ○ no verdict yet) with its
106
- checkpointer and latest verdict line (evidence pointer on pass, failure
107
- tail on fail). `REVIEW_ADDED@rN` provenance is visible per row.
108
- - **History** — one line per recorded round: `rN · timestamp · X PASS / Y FAIL`.
109
- - **Last round failures** — the most recent round's FAIL verdicts with their
110
- rejection reasons (why the last round bounced).
111
- - **Next actions** — the unmet criteria (latest verdict is not a pass:
112
- failed or never verified, HUMAN ones included). This list IS the plan —
113
- it is recomputed from the event log on every read, never maintained
114
- separately. Empty + rounds recorded = awaiting human adjudication.
115
-
116
- ### --json contract
117
-
118
- `--json` emits the full read model with a top-level `version` field
119
- (currently `1`). The schema is versioned: breaking shape changes bump the
120
- major; additive fields don't. Pin on `version` when scripting against it.
121
-
122
- `status` reads; `verify` judges. Running status never starts a round, never
123
- escalates, and never changes task state — loop rules (cap 3, IN_REVIEW on
124
- all-pass, human-only DONE) live entirely in `lumo verify` and the server.