planr 1.1.17 → 1.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -11,6 +11,25 @@ The Planr repository is a Claude Code plugin. Install it to get the skills (name
11
11
 
12
12
  See [Skills](SKILLS.md) for the skill workflow. For autonomous goal runs with `/goal` or `/loop` on top of Planr state, see [Long-Running Goals](GOALS.md).
13
13
 
14
+ ## Long-Running Goals With `/goal`
15
+
16
+ Claude Code `/goal` drives autonomous Planr runs the same way Codex does: `/goal` supplies continuation pressure, Planr supplies durable state, evidence, reviews, and recovery. Run the driver session on your strongest model (`/model fable`, `/effort high`), prep once, then start:
17
+
18
+ ```text
19
+ /planr:planr-goal <your goal>
20
+ /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
21
+ ```
22
+
23
+ The plugin registers the `planr-worker` and `planr-reviewer` subagents automatically. The worker pins a cheaper tier in its frontmatter; the reviewer deliberately inherits the driver's model:
24
+
25
+ ```yaml
26
+ # planr-worker.md frontmatter
27
+ model: opus # alias tracks the current generation; budget alternative: sonnet
28
+ effort: medium
29
+ ```
30
+
31
+ Verify the pin once: `CLAUDE_CODE_SUBAGENT_MODEL` must be unset (it silently overrides all subagent frontmatter), then dispatch the worker on a trivial item and confirm the subagent's messages in `~/.claude/projects/<project>/*.jsonl` carry the worker model. Full workflow, recovery, and the tiering rationale: [Long-Running Goals](GOALS.md).
32
+
14
33
  ## MCP
15
34
 
16
35
  ```bash
@@ -13,7 +13,7 @@ planr plan check <plan-id>
13
13
  planr plan audit <plan-id>
14
14
  planr plan show <plan-id>
15
15
  planr plan archive <plan-id>
16
- planr map show
16
+ planr map show [--plan <plan-id>]
17
17
  planr map build --from <plan-id>
18
18
  planr map lane --critical
19
19
  planr map pressure
@@ -54,6 +54,7 @@ planr review close <review-item-id> --verdict complete|not-complete|unclear [--c
54
54
  planr close [item-id] --summary "..." [--next]
55
55
  planr done [item-id] --summary "..." [--files a --files b] [--cmd "..."] [--tests "..."] [--review] [--next]
56
56
  planr context add "text" [--item <item-id>] [--tag discovery]
57
+ planr context list [--item <item-id>] [--tag <tag>]
57
58
  planr search "query"
58
59
  planr doctor [--client codex|claude|cursor|all]
59
60
  planr install codex|claude|cursor [--dry-run]
@@ -77,9 +78,13 @@ With `--json`, responses follow one convention so agents never guess where data
77
78
  - Other single objects use their semantic key: `plan`, `log`, `review`, `artifact`, `context`.
78
79
  - Optional guidance appears under `hint` or `next` when a follow-up command is the expected move.
79
80
 
80
- `plan check` validates path, YAML frontmatter, and that required sections have content: build plans need `## Scope Decision`, `## Verification`, and `## Acceptance Criteria` filled; product plans need `## Problem`, `## Requirements`, and `## Success Criteria` filled in `PRODUCT_SPEC.md`. Each warning is structured — `{"file", "section", "message", "fix"}` — and names the exact file to edit plus the re-run command, so a failed check is a repair instruction, not a riddle.
81
+ `plan check` validates path, YAML frontmatter, and that required sections have content: build plans need `## Scope Decision`, `## Verification`, and `## Acceptance Criteria` filled; product plans need `## Problem`, `## Requirements`, and `## Success Criteria` filled in `PRODUCT_SPEC.md`. It also flags a task list that still contains only the scaffold placeholder (or no work specs at all) — `map build` would turn that into a single coarse item, so the fix names the granularity contract: one `### TASK-00n:` heading (or `- [ ]` line) per verifiable slice, typically 4-8, in execution order. Each warning is structured — `{"file", "section", "message", "fix"}` — and names the exact file to edit plus the re-run command, so a failed check is a repair instruction, not a riddle.
81
82
 
82
- `plan audit <plan-id>` is the one-call contract verdict for a plan's map scope. It evaluates four clauses with evidence: `items_settled` (open items listed), `reviews_complete` (open review items listed), `approvals_clear` (requested/denied approvals listed), and `verification_logged` (logs with `--kind verification` on scope items). The stored goal contract (`planr context --tag goal-contract` mentioning the plan id) is included; the verification clause is binding only when such a contract exists. `holds: true` means the contract is satisfied — loop agents use this as their stop condition instead of stitching the verdict together from `map status`, `log list`, and `approval list`. Also available as MCP `planr_plan_audit`.
83
+ `plan audit <plan-id>` is the one-call contract verdict for a plan's map scope. It evaluates four clauses with evidence: `items_settled` (open items listed), `reviews_complete` (open review items listed), `approvals_clear` (requested/denied approvals listed), and `verification_logged` (logs with `--kind verification` on scope items). The stored goal contract (`planr context --tag goal-contract` mentioning the plan id) is included; the verification clause is binding only when such a contract exists. `holds: true` means the contract is satisfied — loop agents use this as their stop condition instead of stitching the verdict together from `map status`, `log list`, and `approval list`. When the contract is open, `next` carries the exact next command derived from the first actionable gap: build the map, pick the ready review or work item (plan-scoped), resolve the blocking approval, inspect stalled leases, or log the missing verification. Also available as MCP `planr_plan_audit`.
84
+
85
+ `map show --plan <plan-id>` narrows the map to one plan's items and the links among them, with plan-scoped counts — plan-scoped goal runs on shared boards audit their slice, not the whole board. An unknown plan id is an error, never a silent unscoped view. Also available on MCP `planr_map_show` (`plan`) and HTTP `GET /v1/projects/{id}/map?plan=<plan-id>`.
86
+
87
+ `context list --tag <tag>` filters notes by the tag they were stored with (`context add --tag`), so e.g. the goal contract is recoverable with `planr context list --tag goal-contract` instead of scanning all notes.
83
88
 
84
89
  `map build` chains the created items in plan order with `blocks` links — build plan steps are ordered, so the map inherits that order instead of leaving everything flat. The output lists every created item with its status, the created links, and the next command; adjust order with `planr link add` before picking if execution order differs from document order.
85
90
 
@@ -115,6 +120,8 @@ With `--json`, responses follow one convention so agents never guess where data
115
120
 
116
121
  `pick --work-type <type>` restricts the lease to one work type, so checker agents pick only `review` items and makers only work items. `pick --plan <plan-id>` restricts the lease to one plan's items, so plan-scoped goal runs never pick work outside their contract even when other plans share the board; an unknown plan id is an error, never a silent unscoped pick. Both filters are available on MCP `planr_pick_item` and HTTP `POST /v1/pick` (`work_type`, `plan`). A null pick is never blind: `{"item": null}` carries a `reason` (`empty_map`, `all_settled`, `nothing_ready`, `ready_items_excluded_by_filter`) and the `remaining` snapshot. When ready work exists but the active filters rejected all of it, `excluded` lists each ready item with the cause (`work_type` mismatch, outside the `--plan` scope, or just requested by this worker) and `repair` carries the exact pick commands that would lease that work — across CLI, MCP, and HTTP. On a review item, `close_effect` previews the full `--close-target` cascade: it lists the work that closing the review (and with it the reviewed item) would unlock.
117
122
 
123
+ `artifact add` infers the mime type from the file extension when `--path` is given without `--mime` (PNG screenshots land as `image/png`, not `text/plain`); inline `--content` defaults to `text/plain`. The same inference applies on MCP `planr_artifact_add` and HTTP `POST /v1/artifacts`.
124
+
118
125
  `review evidence` reports Git worktree status scoped to files named by item logs or artifacts. Dirty files without item provenance are listed as unrelated and are not treated as agent-owned evidence. `--pr-url` records an item-scoped PR reference before returning the evidence package.
119
126
 
120
127
  `recover sweep` previews by default. With `--apply`, timed-out picked work that has a retry budget (`max_retries > 0`) is marked `failed` with an `item_timed_out` event; stale work and timeouts without a retry budget are released back to `ready`. Failed work re-enters `ready` once its retry delay has elapsed (`retry_delay_ms`, doubled per retry under `exponential` backoff) until the budget is exhausted. Every transition records a recovery event. Item pre/post conditions are visible in pick context, trace output, and close previews; post conditions are reported as manual verification gates instead of being guessed automatically.
package/docs/CODEX.md CHANGED
@@ -17,10 +17,20 @@ Codex `/goal` is the recommended orchestrator for autonomous Planr runs: `/goal`
17
17
 
18
18
  ```text
19
19
  $planr-goal <your goal>
20
- /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
20
+ /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
21
21
  ```
22
22
 
23
- The stop condition lives in Planr (`--tag goal-contract`), so a dead session resumes with the same starter line from zero chat context. Full workflow, recovery, and per-host variants: [Long-Running Goals](GOALS.md).
23
+ The stop condition lives in Planr (`--tag goal-contract`), so a dead session resumes with the same starter line from zero chat context.
24
+
25
+ Run the driver session on your strongest tier (e.g. `gpt-5.5` at `model_reasoning_effort = "high"` in `~/.codex/config.toml`). The provisioned worker role pins a cheaper tier; the reviewer deliberately inherits the session model:
26
+
27
+ ```toml
28
+ # .codex/agents/planr-worker.toml
29
+ model = "gpt-5.5"
30
+ model_reasoning_effort = "medium"
31
+ ```
32
+
33
+ Verify the pin once: some Codex versions ignore custom agent files on spawn ([openai/codex#26868](https://github.com/openai/codex/issues/26868)) and the child silently inherits the parent model. Spawn `planr_worker` on a trivial item and confirm the child metadata shows the pinned model and effort with a non-null `agent_path`. Full workflow, recovery, per-host variants, and the tiering rationale: [Long-Running Goals](GOALS.md).
24
34
 
25
35
  ## MCP
26
36
 
package/docs/GOALS.md CHANGED
@@ -39,9 +39,11 @@ planr context add "GOAL CONTRACT pl-csv-export: DONE when every in-scope map ite
39
39
  ### 2. Execute — the loop driver runs `$planr-loop`
40
40
 
41
41
  ```text
42
- /goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
42
+ /goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
43
43
  ```
44
44
 
45
+ The autonomy clause matters on long runs: deep into a session, frontier models occasionally end a turn with a statement of intent instead of the corresponding action, or pause to ask permission they already have. Stating the operating mode up front prevents both.
46
+
45
47
  Each iteration follows the `$planr-loop` protocol:
46
48
 
47
49
  ```text
@@ -100,7 +102,7 @@ $planr-goal <your goal> # prep: plan, map, contract, starter command
100
102
  /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
101
103
  ```
102
104
 
103
- The `/goal` PM dispatches `spawn the planr_worker agent for item <id>` and `spawn the planr_reviewer agent for item <id>` — the role files preload `$planr-work` and `$planr-review`, so dispatches stay one line. Codex Automations work the same way: set the automation prompt to the starter line.
105
+ The `/goal` PM dispatches `spawn the planr_worker agent for item <id>` and `spawn the planr_reviewer agent for item <id>` — the role files preload `$planr-work` and `$planr-review`, so dispatches stay one line. Codex Automations work the same way: set the automation prompt to the starter line. The provisioned worker role pins a cheaper effort tier; see [Cost Tiering](#cost-tiering).
104
106
 
105
107
  ### Claude Code
106
108
 
@@ -111,7 +113,7 @@ Same shape via the plugin (`/plugin install planr@planr`), which registers the `
111
113
  /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
112
114
  ```
113
115
 
114
- `/loop` works for fixed-cadence runs instead of goal-conditioned ones.
116
+ `/loop` works for fixed-cadence runs instead of goal-conditioned ones. The registered worker subagent pins a cheaper model tier; see [Cost Tiering](#cost-tiering).
115
117
 
116
118
  ### Cursor and hosts without a loop primitive
117
119
 
@@ -128,6 +130,31 @@ Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context
128
130
 
129
131
  Any MCP-capable agent uses the same flow over `planr mcp`. Every session starts with map state, so the loop is resumable by construction.
130
132
 
133
+ ## Cost Tiering
134
+
135
+ A goal run has three roles with different intelligence needs, so they should not all run on the same model tier:
136
+
137
+ - **Driver** (the `/goal` session): decomposition, dispatch decisions, conflict resolution, final synthesis. Run it on the strongest model you have — this is never configured in Planr files, it is simply the model of the main session.
138
+ - **Worker**: bounded implementation. `planr pick --json` is a complete handoff packet (one item, scope, evidence format, stop after review request), so the worker runs safely on a cheaper tier.
139
+ - **Reviewer**: the truth gate. It inherits the driver's model on purpose — make workers cheap, not the verdict.
140
+
141
+ Where each host configures the worker tier (the shipped role files carry these defaults):
142
+
143
+ | Host | Driver | Worker | Configured in |
144
+ | --- | --- | --- | --- |
145
+ | Codex | session default (e.g. `gpt-5.5` at `high`) | `model = "gpt-5.5"`, `model_reasoning_effort = "medium"` | `.codex/agents/planr-worker.toml` |
146
+ | Claude Code | session model (e.g. `fable` at `high` via `/model` + `/effort`) | `model: opus`, `effort: medium` | `planr-worker.md` frontmatter |
147
+ | Cursor | chat model of the driving session | chosen per dispatch in the host's subagent tooling | no Planr files — pick a cheaper model when dispatching the worker task |
148
+
149
+ The defaults use aliases and generic names so they track model generations; pin a full model id (e.g. `claude-opus-4-8`) only if you need determinism, and use `model: sonnet` as the budget alternative. The role files are user-owned copies — `planr project init` provisions them once and never overwrites local edits — so changing the tier is editing one line.
150
+
151
+ Two traps to verify once per setup:
152
+
153
+ - **Claude Code:** the `CLAUDE_CODE_SUBAGENT_MODEL` environment variable silently overrides every subagent's `model:` frontmatter. Make sure it is unset, then dispatch the worker on a trivial item and confirm the subagent's messages in the session log (`~/.claude/projects/<project>/*.jsonl`) carry the worker model, not the driver's.
154
+ - **Codex:** some versions ignore custom agent files on spawn ([openai/codex#26868](https://github.com/openai/codex/issues/26868)) — the child then inherits the parent model. Spawn `planr_worker` on a trivial item and confirm the child metadata shows the pinned model and effort with a non-null `agent_path`.
155
+
156
+ Both failure modes are silent (the run still works, just at driver prices), which is why the smoke test is worth the two minutes.
157
+
131
158
  ## Coming From Other Goal Tools
132
159
 
133
160
  If you already run goal workflows with other tools, the concepts map directly:
@@ -151,5 +178,6 @@ Using such tools for intake or visualization alongside Planr is fine — keep on
151
178
  - The maker never closes its own review; single-agent hosts record `review-mode` honestly.
152
179
  - Two iterations without map-state movement -> stop and report instead of grinding.
153
180
  - Destructive or out-of-repo side effects always go behind `planr approval request`.
181
+ - Lessons that should outlive the iteration (a confirmed approach, a correction, a dead end) go into `planr context add "..." --tag lesson` — the next iteration or a fresh session recovers them with `planr context list --tag lesson`, not from chat history.
154
182
 
155
183
  See also: [Skills](SKILLS.md), [Operating Model](OPERATING_MODEL.md), [Task Graph Model](TASK_GRAPH_MODEL.md), [Codex](CODEX.md), [Claude Code](CLAUDE_CODE.md), [Cursor](CURSOR.md).
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "planr",
3
- "version": "1.1.17",
3
+ "version": "1.1.19",
4
4
  "description": "Local-first planning and execution coordination for coding agents.",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "planr",
3
3
  "description": "Skill-driven planning and execution loop for coding agents: one planr entry point, an autonomous planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
4
- "version": "1.1.17",
4
+ "version": "1.1.19",
5
5
  "author": {
6
6
  "name": "instructa"
7
7
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "planr",
3
- "version": "1.1.17",
3
+ "version": "1.1.19",
4
4
  "description": "Skill-driven planning and execution loop for coding agents: one $planr entry point, an autonomous $planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
5
5
  "author": {
6
6
  "name": "instructa",
@@ -3,6 +3,8 @@ name: planr-reviewer
3
3
  description: Independent findings-first reviewer for one Planr item. Audits evidence and closes the review with a verdict. Dispatch with the item id.
4
4
  skills:
5
5
  - planr-review
6
+ # Deliberately no model override: the reviewer is the truth gate and inherits
7
+ # the driver's model. Make workers cheap, not the verdict.
6
8
  ---
7
9
 
8
10
  Use the preloaded planr-review skill exactly as written for the single item id you are given.
@@ -3,6 +3,12 @@ name: planr-worker
3
3
  description: Implements exactly one picked Planr map item to evidence-backed completion, then requests review and stops. Dispatch with the item id.
4
4
  skills:
5
5
  - planr-work
6
+ # Cost tiering: the pick packet bounds the worker's scope, so it runs on a
7
+ # cheaper tier than the driver. Aliases track the current generation; pin a
8
+ # full model id (e.g. claude-opus-4-8) only if you need determinism. Budget
9
+ # alternative: model: sonnet. See docs/GOALS.md "Cost Tiering".
10
+ model: opus
11
+ effort: medium
6
12
  ---
7
13
 
8
14
  Use the preloaded planr-work skill exactly as written for the single item id you are given.
@@ -34,6 +34,8 @@ planr map build --from <plan-id> # idempotent: safe to re-run
34
34
 
35
35
  `plan refine` appends notes; the plan body is yours to edit. When `plan check` fails, each warning names the exact file and section — edit that file directly, fill the section with real content, and re-run the check. Scaffold sections (`## Scope Decision`, `## Verification`, `## Acceptance Criteria`) are filled by editing the plan markdown, not by more `refine` notes.
36
36
 
37
+ Before `map build`, expand the plan's task list: the scaffold ships a single placeholder task, and mapping it produces one coarse item that forces the worker to guess the breakdown later. Replace the placeholder with one `### TASK-00n: <slice>` heading (or `- [ ]` line) per verifiable slice — typically 4-8, in execution order, each one closeable with its own evidence. Derive the slices from the acceptance criteria; `plan check` flags the unexpanded placeholder.
38
+
37
39
  `map build` creates one item per plan step and chains them in plan order with `blocks` links; the output lists the created items and links. Review that chain and adjust it only where execution order differs from document order:
38
40
 
39
41
  ```bash
@@ -48,7 +50,7 @@ The contract must survive compaction, session loss, and host switches, so it liv
48
50
  planr context add "GOAL CONTRACT <plan-id>: DONE when every in-scope map item is closed with log evidence, all reviews closed with verdict complete, no open approvals in scope, and a live verification log exists for <oracle>. Iteration budget: 10." --tag goal-contract
49
51
  ```
50
52
 
51
- One contract per plan scope. Any agent on any host can recover it with `planr context list` or `planr search "GOAL CONTRACT"`. Never weaken a stored contract mid-run; scope changes go through `$planr-plan` and the user. During the run, workers lease with `planr pick --plan <plan-id>` so the loop never picks items outside this contract, even when other plans share the board. The loop checks the contract with `planr plan audit <plan-id>`, which evaluates exactly these clauses with evidence and answers `holds: true/false`.
53
+ One contract per plan scope. Any agent on any host can recover it with `planr context list --tag goal-contract` or `planr search "GOAL CONTRACT"`. Never weaken a stored contract mid-run; scope changes go through `$planr-plan` and the user. During the run, workers lease with `planr pick --plan <plan-id>` so the loop never picks items outside this contract, even when other plans share the board. The loop checks the contract with `planr plan audit <plan-id>`, which evaluates exactly these clauses with evidence and answers `holds: true/false`.
52
54
 
53
55
  "All reviews closed" audits review items that exist — it does not require a review gate on every item. An item closed with plain `done` (evidence still required) satisfies the contract without one; request reviews where they carry signal (implementation slices, user-facing work), not on trivial inspection or scaffold steps.
54
56
 
@@ -59,13 +61,13 @@ Print the starter command, then stop. Do not start execution yourself; ask wheth
59
61
  With a native loop driver (Codex `/goal`, Claude Code `/goal` or `/loop`):
60
62
 
61
63
  ```text
62
- /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
64
+ /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
63
65
  ```
64
66
 
65
67
  Without one (Cursor, plain MCP clients, Codex without /goal):
66
68
 
67
69
  ```text
68
- Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
70
+ Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). You are operating autonomously: never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
69
71
  ```
70
72
 
71
73
  Re-dispatch the same line after any session ends. The map, logs, and stored contract make every iteration resumable from zero context — nothing about the goal lives only in chat.
@@ -31,7 +31,7 @@ Store the contract in Planr so it survives compaction, session loss, and host sw
31
31
  planr context add "GOAL CONTRACT <plan-id>: DONE when ... Iteration budget: 10." --tag goal-contract
32
32
  ```
33
33
 
34
- `$planr-goal` does this during prep; if the loop starts without a stored contract, store it in iteration 1 before picking. Every iteration re-reads the contract from Planr (`planr context list` or `planr search "GOAL CONTRACT"`), never from chat history. `done`, `close`, and `review close` responses and the pick packet include a `remaining` progress snapshot (`counts` with explicit zeros for every status, `settled`, `total`) plus the list of items each settlement `unlocked`, so the orchestrator can evaluate the stop condition from the completion output without an extra `map status` call.
34
+ `$planr-goal` does this during prep; if the loop starts without a stored contract, store it in iteration 1 before picking. Every iteration re-reads the contract from Planr (`planr context list --tag goal-contract` or `planr search "GOAL CONTRACT"`), never from chat history. `done`, `close`, and `review close` responses and the pick packet include a `remaining` progress snapshot (`counts` with explicit zeros for every status, `settled`, `total`) plus the list of items each settlement `unlocked`, so the orchestrator can evaluate the stop condition from the completion output without an extra `map status` call.
35
35
 
36
36
  The stop condition itself is one command: `planr plan audit <plan-id> --json` evaluates the contract clause by clause (items settled, reviews complete, approvals clear, verification logged) with evidence and answers `holds: true/false`. Use it at the top of every iteration and as the final audit — never hand-assemble the verdict from separate calls.
37
37
 
@@ -52,17 +52,19 @@ The short path per item is three commands: `planr pick --json` (one flat work pa
52
52
 
53
53
  `map build` chains created items in plan order with `blocks` links automatically and prints the created items and links. In step 2, verify that chain against real execution-order dependencies and adjust with `planr link add` only where document order and execution order differ. `item breakdown` works the same way: pass one `--into` per child title (or one value with newline-separated titles), and the output lists the chained children plus the next command.
54
54
 
55
- Request reviews where they carry signal: implementation slices and anything user-facing finish with `done --review`. Trivial inspection, baseline, or setup items close with plain `done` (evidence still required) — a review that can only confirm "the repo was empty" adds ceremony, not safety. The goal contract's "all reviews closed" clause audits review items that exist; plain-`done` items satisfy it without a review gate, so skipping low-signal reviews never blocks `plan audit`.
55
+ Request reviews where they carry signal: implementation slices and anything user-facing finish with `done --review`. Trivial inspection, baseline, or setup items close with plain `done` (evidence still required) — a review that can only confirm "the repo was empty" adds ceremony, not safety. The goal contract's "all reviews closed" clause audits review items that exist; plain-`done` items satisfy it without a review gate, so skipping low-signal reviews never blocks `plan audit`. In a single-agent host this bar rises: a review you close yourself mostly re-runs your own commands, so reserve gates for the riskiest slices — the core implementation and the final live verification — and close the rest with plain `done`.
56
56
 
57
57
  The loop never closes its own reviews when the host supports a second agent. Maker and checker stay separate. One agent instance keeps one `PLANR_WORKER_ID` for the whole session — never export a second identity inside the same instance to make reviews look `independent`; an honest `single_agent` stamp beats a fake `independent` one.
58
58
 
59
59
  ## Skills Are The Prompts
60
60
 
61
- When the host supports subagents, delegate with skill references plus an item id, nothing more:
61
+ When the host supports subagents, the driver never implements: it dispatches, audits, and synthesizes. Driver tokens go into `plan audit`, dispatch decisions, and conflict resolution — implementation and review run in the subagent roles, which the host wiring can pin to a cheaper tier (see the role files and `docs/GOALS.md` "Cost Tiering"). Delegate with skill references plus an item id, nothing more:
62
62
 
63
63
  - Worker dispatch: `Use $planr-work on item <item-id>. Stop after requesting review.`
64
64
  - Checker dispatch: `Use $planr-review on item <item-id>. Close the review with a verdict.`
65
65
 
66
+ A worker subagent may take several items sequentially instead of being respawned per item: `done --next` hands it the next ready work packet, keeps its context warm, and never returns its own freshly created review — maker/checker separation survives long-lived workers.
67
+
66
68
  Host wiring:
67
69
 
68
70
  - Codex: project agents in `.codex/agents/*.toml` preload the skill via `[[skills.config]]` (TOML templates in `agents/` next to this skill). Spawn explicitly: "spawn the planr_worker agent for item X". Keep `[agents] max_depth = 1`.
@@ -4,6 +4,9 @@ name = "planr_reviewer"
4
4
  description = "Independent findings-first reviewer for one Planr item. Audits evidence and closes the review with a verdict."
5
5
  sandbox_mode = "workspace-write"
6
6
 
7
+ # Deliberately no model override: the reviewer is the truth gate and inherits
8
+ # the driver session's model. Make workers cheap, not the verdict.
9
+
7
10
  developer_instructions = """
8
11
  Use the planr-review skill exactly as written for the single item id you are given.
9
12
  You did not write this code; audit it like an owner. Inspect the actual diff and rerun the
@@ -3,6 +3,13 @@
3
3
  name = "planr_worker"
4
4
  description = "Implements exactly one picked Planr map item to evidence-backed completion, then requests review and stops."
5
5
 
6
+ # Cost tiering: the pick packet bounds the worker's scope, so it can run on a
7
+ # lower effort tier than the driver session. Verify the pin took effect after
8
+ # the first spawn (some Codex versions ignore custom agent files on spawn —
9
+ # openai/codex#26868); see docs/GOALS.md "Cost Tiering" for the smoke test.
10
+ model = "gpt-5.5"
11
+ model_reasoning_effort = "medium"
12
+
6
13
  developer_instructions = """
7
14
  Use the planr-work skill exactly as written for the single item id you are given.
8
15
  Implement only that item. Log changed files and the real verification commands you ran.
@@ -22,7 +22,7 @@ The pick output is one flat work packet — item, links, logs, runtime, recovery
22
22
  planr done <item-id> --summary "what changed" --files path-a --files path-b --cmd "exact verification command" --tests "exact test command" --review
23
23
  ```
24
24
 
25
- Put build/serve commands in `--cmd` and test runs in `--tests` — both are recorded as evidence. Single-quote `--files` values that contain `$` (route files like `watch.$videoId.tsx`), or the shell expands them before planr sees them. `done --review` writes the completion log, requests the review, and moves the item to `in_review` (you keep ownership; it is waiting on the gate, not abandoned) — the response names the target's new status and the plan-scoped reviewer pick command; add `--next` to pick the following item in the same call. Without `--review` it closes the item directly (only for items that need no review gate). Running `done` on a ready item you never picked adopts it: the lease is written retroactively under your worker id so the review always has a maker. The response reports what your settlement `unlocked`, echoes the item's post condition, and hints when downstream work depends on an item closed without command/test evidence.
25
+ Put build/serve commands in `--cmd` and test runs in `--tests` — both are recorded as evidence. Include the decisive output line in `--summary` (e.g. "12 tests passed", "GET /videos returned 3 entries"): reviewers see your recorded command strings, not your terminal, so the summary must carry what you observed, not just what you ran. Single-quote `--files` values that contain `$` (route files like `watch.$videoId.tsx`), or the shell expands them before planr sees them. `done --review` writes the completion log, requests the review, and moves the item to `in_review` (you keep ownership; it is waiting on the gate, not abandoned) — the response names the target's new status and the plan-scoped reviewer pick command; add `--next` to pick the following item in the same call. Without `--review` it closes the item directly (only for items that need no review gate). Running `done` on a ready item you never picked adopts it: the lease is written retroactively under your worker id so the review always has a maker. The response reports what your settlement `unlocked`, echoes the item's post condition, and hints when downstream work depends on an item closed without command/test evidence.
26
26
 
27
27
  Live verification (browser flow, executed binary, real requests) gets its own log kind so `plan audit` can find it:
28
28
 
Binary file