planr 1.1.16 → 1.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -77,7 +77,7 @@ With `--json`, responses follow one convention so agents never guess where data
77
77
  - Other single objects use their semantic key: `plan`, `log`, `review`, `artifact`, `context`.
78
78
  - Optional guidance appears under `hint` or `next` when a follow-up command is the expected move.
79
79
 
80
- `plan check` validates path, YAML frontmatter, and that required sections have content: build plans need `## Scope Decision`, `## Verification`, and `## Acceptance Criteria` filled; product plans need `## Problem`, `## Requirements`, and `## Success Criteria` filled in `PRODUCT_SPEC.md`. Each warning is structured — `{"file", "section", "message", "fix"}` — and names the exact file to edit plus the re-run command, so a failed check is a repair instruction, not a riddle.
80
+ `plan check` validates path, YAML frontmatter, and that required sections have content: build plans need `## Scope Decision`, `## Verification`, and `## Acceptance Criteria` filled; product plans need `## Problem`, `## Requirements`, and `## Success Criteria` filled in `PRODUCT_SPEC.md`. It also flags a task list that still contains only the scaffold placeholder (or no work specs at all) — `map build` would turn that into a single coarse item, so the fix names the granularity contract: one `### TASK-00n:` heading (or `- [ ]` line) per verifiable slice, typically 4-8, in execution order. Each warning is structured — `{"file", "section", "message", "fix"}` — and names the exact file to edit plus the re-run command, so a failed check is a repair instruction, not a riddle.
81
81
 
82
82
  `plan audit <plan-id>` is the one-call contract verdict for a plan's map scope. It evaluates four clauses with evidence: `items_settled` (open items listed), `reviews_complete` (open review items listed), `approvals_clear` (requested/denied approvals listed), and `verification_logged` (logs with `--kind verification` on scope items). The stored goal contract (`planr context --tag goal-contract` mentioning the plan id) is included; the verification clause is binding only when such a contract exists. `holds: true` means the contract is satisfied — loop agents use this as their stop condition instead of stitching the verdict together from `map status`, `log list`, and `approval list`. Also available as MCP `planr_plan_audit`.
83
83
 
@@ -103,9 +103,9 @@ With `--json`, responses follow one convention so agents never guess where data
103
103
  }
104
104
  ```
105
105
 
106
- `review request` (and `done --review`) moves a picked or running target to `in_review`: work is finished, evidence is logged, the item waits on its gate. The owner keeps the pick and can still log evidence; `in_review` items are never handed out by `pick`.
106
+ `review request` (and `done --review`) moves a picked or running target to `in_review`: work is finished, evidence is logged, the item waits on its gate. The owner keeps the pick and can still log evidence; `in_review` items are never handed out by `pick`. Reviews are gates, so pre-attaching one to a pending or blocked item is legal (the target keeps its status and the gate holds the close later); requesting a review on a settled item fails with `invalid_transition`. `done` on a ready item that was never picked adopts it first — the lease is written retroactively under the current worker, so the completion always carries a maker identity and the `in_review` transition is never skipped. The `done --review` response names the target's resulting status and a plan-scoped reviewer pick command (`next: planr pick --plan <id> --work-type review --json`).
107
107
 
108
- `review close` writes `.planr/reviews/<review-item-id>.review.md` and registers it as a review artifact. A `not-complete` or `unclear` verdict creates fix and follow-up review work; the follow-up review gates the same target item, so the chain keeps working with `--close-target`. With `--close-target` (complete verdicts only) the reviewed item is closed in the same command, provided it already has a completion log; the artifact is rendered after the target transition, so it snapshots the final target status. `--close-target` is also available through MCP `planr_review_close` and HTTP `POST /v1/reviews/{id}/close` (`"close_target": true`). `review close` responses include the same `remaining` progress snapshot as `done` and `close`. `--reviewer <id>` records the checker's identity on the review log, artifact, and event (defaults to the worker id), keeping maker and checker distinguishable in the audit trail. Closing an already-settled review fails with error code `already_closed` instead of silently duplicating evidence logs. The maker/checker split is derived, not declared: `review_mode` compares the closing reviewer identity against the target item's lease holder and reports `single_agent`, `independent`, or `unattributed` in the response, review log, artifact, and event — no ceremony note required.
108
+ `review close` writes `.planr/reviews/<review-item-id>.review.md` and registers it as a review artifact. A `not-complete` or `unclear` verdict creates fix and follow-up review work; the follow-up review gates the same target item, so the chain keeps working with `--close-target`. With `--close-target` (complete verdicts only) the reviewed item is closed in the same command, provided it already has a completion log; the artifact is rendered after the target transition, so it snapshots the final target status. `--close-target` is also available through MCP `planr_review_close` and HTTP `POST /v1/reviews/{id}/close` (`"close_target": true`). `review close` responses include the same `remaining` progress snapshot as `done` and `close`. `--reviewer <id>` records the checker's identity on the review log, artifact, and event (defaults to the worker id), keeping maker and checker distinguishable in the audit trail. Closing an already-settled review fails with error code `already_closed` instead of silently duplicating evidence logs. The maker/checker split is derived, not declared: `review_mode` compares the closing reviewer identity against the target item's lease holder and reports `single_agent`, `independent`, or `unattributed` in the response, review log, artifact, and event — no ceremony note required. An `unattributed` close explains itself in the output: it means the target has no recorded lease (work was never picked or its lease was released), not that the reviewer's identity was missing.
109
109
 
110
110
  `trace item` on a review item inlines the target item and its evidence logs under `target`, so a reviewer's first trace already contains what is being audited. The human (non-JSON) mode renders the packet: status, owner, links, logs.
111
111
 
@@ -115,6 +115,8 @@ With `--json`, responses follow one convention so agents never guess where data
115
115
 
116
116
  `pick --work-type <type>` restricts the lease to one work type, so checker agents pick only `review` items and makers only work items. `pick --plan <plan-id>` restricts the lease to one plan's items, so plan-scoped goal runs never pick work outside their contract even when other plans share the board; an unknown plan id is an error, never a silent unscoped pick. Both filters are available on MCP `planr_pick_item` and HTTP `POST /v1/pick` (`work_type`, `plan`). A null pick is never blind: `{"item": null}` carries a `reason` (`empty_map`, `all_settled`, `nothing_ready`, `ready_items_excluded_by_filter`) and the `remaining` snapshot. When ready work exists but the active filters rejected all of it, `excluded` lists each ready item with the cause (`work_type` mismatch, outside the `--plan` scope, or just requested by this worker) and `repair` carries the exact pick commands that would lease that work — across CLI, MCP, and HTTP. On a review item, `close_effect` previews the full `--close-target` cascade: it lists the work that closing the review (and with it the reviewed item) would unlock.
117
117
 
118
+ `artifact add` infers the mime type from the file extension when `--path` is given without `--mime` (PNG screenshots land as `image/png`, not `text/plain`); inline `--content` defaults to `text/plain`. The same inference applies on MCP `planr_artifact_add` and HTTP `POST /v1/artifacts`.
119
+
118
120
  `review evidence` reports Git worktree status scoped to files named by item logs or artifacts. Dirty files without item provenance are listed as unrelated and are not treated as agent-owned evidence. `--pr-url` records an item-scoped PR reference before returning the evidence package.
119
121
 
120
122
  `recover sweep` previews by default. With `--apply`, timed-out picked work that has a retry budget (`max_retries > 0`) is marked `failed` with an `item_timed_out` event; stale work and timeouts without a retry budget are released back to `ready`. Failed work re-enters `ready` once its retry delay has elapsed (`retry_delay_ms`, doubled per retry under `exponential` backoff) until the budget is exhausted. Every transition records a recovery event. Item pre/post conditions are visible in pick context, trace output, and close previews; post conditions are reported as manual verification gates instead of being guessed automatically.
package/docs/GOALS.md CHANGED
@@ -65,7 +65,7 @@ planr review close <review-id> --verdict complete --reviewer <id> --close-target
65
65
 
66
66
  `--plan` keeps the lease inside the goal contract: when several plans share the board (a parallel feature, leftovers from an aborted prep run), a plan-scoped goal run never picks work outside its own plan. A pick that finds nothing in scope never widens silently: it reports `reason: "nothing_ready"` when nothing is ready at all, or `reason: "ready_items_excluded_by_filter"` with the excluded items, the cause per item, and the exact `repair` pick commands when ready work exists outside the filter.
67
67
 
68
- `done`/`close`/`review close` responses and the pick packet include a `remaining` snapshot (`counts` with explicit zeros for every status, `settled`, `total`), so the orchestrator evaluates the stop condition straight from the completion output — no extra `map status` round-trip. The same responses list what each settlement `unlocked`, so the loop sees its next work without re-reading the map. `--next` never hands a worker its own freshly created review, so maker and checker stay separate even in compact loops. The review verdict records `review_mode` (`single_agent` or `independent`) automatically from worker identity — no ceremony note needed.
68
+ `done`/`close`/`review close` responses and the pick packet include a `remaining` snapshot (`counts` with explicit zeros for every status, `settled`, `total`), so the orchestrator evaluates the stop condition straight from the completion output — no extra `map status` round-trip. The same responses list what each settlement `unlocked`, so the loop sees its next work without re-reading the map. `--next` never hands a worker its own freshly created review, so maker and checker stay separate even in compact loops. The review verdict records `review_mode` (`single_agent` or `independent`) automatically from worker identity — no ceremony note needed. The contract's "all reviews closed" clause audits review items that exist; an item closed with plain `done` satisfies the contract without a review gate, so low-signal reviews can be skipped without blocking `plan audit`.
69
69
 
70
70
  ### 3. Finish
71
71
 
@@ -126,7 +126,7 @@ planr review close <review-id> \
126
126
 
127
127
  The target item may close only when required review items are closed.
128
128
 
129
- Every `review close` records a derived `review_mode`: the closing reviewer identity is compared against the target item's lease holder and recorded as `single_agent` (same identity), `independent` (different identity), or `unattributed` (no recorded maker). The mode lands in the close response, review log, artifact, and event — independence is proven by recorded identity, not declared by a note.
129
+ Every `review close` records a derived `review_mode`: the closing reviewer identity is compared against the target item's lease holder and recorded as `single_agent` (same identity), `independent` (different identity), or `unattributed` (no recorded maker). The mode lands in the close response, review log, artifact, and event — independence is proven by recorded identity, not declared by a note. `unattributed` should be rare: `done` adopts a never-picked ready item (the lease is written retroactively under the current worker), so every completion path records a maker. When it does appear, the close output explains that the target carried no lease.
130
130
 
131
131
  ## Evidence
132
132
 
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "planr",
3
- "version": "1.1.16",
3
+ "version": "1.1.18",
4
4
  "description": "Local-first planning and execution coordination for coding agents.",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "planr",
3
3
  "description": "Skill-driven planning and execution loop for coding agents: one planr entry point, an autonomous planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
4
- "version": "1.1.16",
4
+ "version": "1.1.18",
5
5
  "author": {
6
6
  "name": "instructa"
7
7
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "planr",
3
- "version": "1.1.16",
3
+ "version": "1.1.18",
4
4
  "description": "Skill-driven planning and execution loop for coding agents: one $planr entry point, an autonomous $planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
5
5
  "author": {
6
6
  "name": "instructa",
@@ -34,6 +34,8 @@ planr map build --from <plan-id> # idempotent: safe to re-run
34
34
 
35
35
  `plan refine` appends notes; the plan body is yours to edit. When `plan check` fails, each warning names the exact file and section — edit that file directly, fill the section with real content, and re-run the check. Scaffold sections (`## Scope Decision`, `## Verification`, `## Acceptance Criteria`) are filled by editing the plan markdown, not by more `refine` notes.
36
36
 
37
+ Before `map build`, expand the plan's task list: the scaffold ships a single placeholder task, and mapping it produces one coarse item that forces the worker to guess the breakdown later. Replace the placeholder with one `### TASK-00n: <slice>` heading (or `- [ ]` line) per verifiable slice — typically 4-8, in execution order, each one closeable with its own evidence. Derive the slices from the acceptance criteria; `plan check` flags the unexpanded placeholder.
38
+
37
39
  `map build` creates one item per plan step and chains them in plan order with `blocks` links; the output lists the created items and links. Review that chain and adjust it only where execution order differs from document order:
38
40
 
39
41
  ```bash
@@ -50,6 +52,8 @@ planr context add "GOAL CONTRACT <plan-id>: DONE when every in-scope map item is
50
52
 
51
53
  One contract per plan scope. Any agent on any host can recover it with `planr context list` or `planr search "GOAL CONTRACT"`. Never weaken a stored contract mid-run; scope changes go through `$planr-plan` and the user. During the run, workers lease with `planr pick --plan <plan-id>` so the loop never picks items outside this contract, even when other plans share the board. The loop checks the contract with `planr plan audit <plan-id>`, which evaluates exactly these clauses with evidence and answers `holds: true/false`.
52
54
 
55
+ "All reviews closed" audits review items that exist — it does not require a review gate on every item. An item closed with plain `done` (evidence still required) satisfies the contract without one; request reviews where they carry signal (implementation slices, user-facing work), not on trivial inspection or scaffold steps.
56
+
53
57
  ## Hand Off
54
58
 
55
59
  Print the starter command, then stop. Do not start execution yourself; ask whether to start now, refine the plan, or stop.
@@ -52,7 +52,7 @@ The short path per item is three commands: `planr pick --json` (one flat work pa
52
52
 
53
53
  `map build` chains created items in plan order with `blocks` links automatically and prints the created items and links. In step 2, verify that chain against real execution-order dependencies and adjust with `planr link add` only where document order and execution order differ. `item breakdown` works the same way: pass one `--into` per child title (or one value with newline-separated titles), and the output lists the chained children plus the next command.
54
54
 
55
- Request reviews where they carry signal: implementation slices and anything user-facing finish with `done --review`. Trivial inspection, baseline, or setup items close with plain `done` (evidence still required) — a review that can only confirm "the repo was empty" adds ceremony, not safety.
55
+ Request reviews where they carry signal: implementation slices and anything user-facing finish with `done --review`. Trivial inspection, baseline, or setup items close with plain `done` (evidence still required) — a review that can only confirm "the repo was empty" adds ceremony, not safety. The goal contract's "all reviews closed" clause audits review items that exist; plain-`done` items satisfy it without a review gate, so skipping low-signal reviews never blocks `plan audit`. In a single-agent host this bar rises: a review you close yourself mostly re-runs your own commands, so reserve gates for the riskiest slices — the core implementation and the final live verification — and close the rest with plain `done`.
56
56
 
57
57
  The loop never closes its own reviews when the host supports a second agent. Maker and checker stay separate. One agent instance keeps one `PLANR_WORKER_ID` for the whole session — never export a second identity inside the same instance to make reviews look `independent`; an honest `single_agent` stamp beats a fake `independent` one.
58
58
 
@@ -22,7 +22,7 @@ The pick output is one flat work packet — item, links, logs, runtime, recovery
22
22
  planr done <item-id> --summary "what changed" --files path-a --files path-b --cmd "exact verification command" --tests "exact test command" --review
23
23
  ```
24
24
 
25
- Put build/serve commands in `--cmd` and test runs in `--tests` — both are recorded as evidence. `done --review` writes the completion log, requests the review, and moves the item to `in_review` (you keep ownership; it is waiting on the gate, not abandoned); add `--next` to pick the following item in the same call. Without `--review` it closes the item directly (only for items that need no review gate). The response reports what your settlement `unlocked`, echoes the item's post condition, and hints when downstream work depends on an item closed without command/test evidence.
25
+ Put build/serve commands in `--cmd` and test runs in `--tests` — both are recorded as evidence. Include the decisive output line in `--summary` (e.g. "12 tests passed", "GET /videos returned 3 entries"): reviewers see your recorded command strings, not your terminal, so the summary must carry what you observed, not just what you ran. Single-quote `--files` values that contain `$` (route files like `watch.$videoId.tsx`), or the shell expands them before planr sees them. `done --review` writes the completion log, requests the review, and moves the item to `in_review` (you keep ownership; it is waiting on the gate, not abandoned) — the response names the target's new status and the plan-scoped reviewer pick command; add `--next` to pick the following item in the same call. Without `--review` it closes the item directly (only for items that need no review gate). Running `done` on a ready item you never picked adopts it: the lease is written retroactively under your worker id so the review always has a maker. The response reports what your settlement `unlocked`, echoes the item's post condition, and hints when downstream work depends on an item closed without command/test evidence.
26
26
 
27
27
  Live verification (browser flow, executed binary, real requests) gets its own log kind so `plan audit` can find it:
28
28