npm - planr - Versions diffs - 1.1.18 → 1.1.19 - Mend

planr 1.1.18 → 1.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/docs/CLAUDE_CODE.md +19 -0
package/docs/CLI_REFERENCE.md +7 -2
package/docs/CODEX.md +12 -2
package/docs/GOALS.md +31 -3
package/npm/native/darwin-arm64/planr +0 -0
package/npm/native/darwin-x86_64/planr +0 -0
package/npm/native/linux-arm64/planr +0 -0
package/npm/native/linux-x86_64/planr +0 -0
package/package.json +1 -1
package/plugins/planr/.claude-plugin/plugin.json +1 -1
package/plugins/planr/.codex-plugin/plugin.json +1 -1
package/plugins/planr/agents/planr-reviewer.md +2 -0
package/plugins/planr/agents/planr-worker.md +6 -0
package/plugins/planr/skills/planr-goal/SKILL.md +3 -3
package/plugins/planr/skills/planr-loop/SKILL.md +4 -2
package/plugins/planr/skills/planr-loop/agents/planr-reviewer.toml +3 -0
package/plugins/planr/skills/planr-loop/agents/planr-worker.toml +7 -0
package/docs/planr-spec.zip +0 -0

package/docs/CLAUDE_CODE.md CHANGED Viewed

@@ -11,6 +11,25 @@ The Planr repository is a Claude Code plugin. Install it to get the skills (name
 See [Skills](SKILLS.md) for the skill workflow. For autonomous goal runs with `/goal` or `/loop` on top of Planr state, see [Long-Running Goals](GOALS.md).
+## Long-Running Goals With `/goal`
+Claude Code `/goal` drives autonomous Planr runs the same way Codex does: `/goal` supplies continuation pressure, Planr supplies durable state, evidence, reviews, and recovery. Run the driver session on your strongest model (`/model fable`, `/effort high`), prep once, then start:
+```text
+/planr:planr-goal <your goal>
+/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
+```
+The plugin registers the `planr-worker` and `planr-reviewer` subagents automatically. The worker pins a cheaper tier in its frontmatter; the reviewer deliberately inherits the driver's model:
+```yaml
+# planr-worker.md frontmatter
+model: opus      # alias tracks the current generation; budget alternative: sonnet
+effort: medium
+```
+Verify the pin once: `CLAUDE_CODE_SUBAGENT_MODEL` must be unset (it silently overrides all subagent frontmatter), then dispatch the worker on a trivial item and confirm the subagent's messages in `~/.claude/projects/<project>/*.jsonl` carry the worker model. Full workflow, recovery, and the tiering rationale: [Long-Running Goals](GOALS.md).
 ## MCP
 ```bash

package/docs/CLI_REFERENCE.md CHANGED Viewed

@@ -13,7 +13,7 @@ planr plan check <plan-id>
 planr plan audit <plan-id>
 planr plan show <plan-id>
 planr plan archive <plan-id>
-planr map show
+planr map show [--plan <plan-id>]
 planr map build --from <plan-id>
 planr map lane --critical
 planr map pressure
@@ -54,6 +54,7 @@ planr review close <review-item-id> --verdict complete|not-complete|unclear [--c
 planr close [item-id] --summary "..." [--next]
 planr done [item-id] --summary "..." [--files a --files b] [--cmd "..."] [--tests "..."] [--review] [--next]
 planr context add "text" [--item <item-id>] [--tag discovery]
+planr context list [--item <item-id>] [--tag <tag>]
 planr search "query"
 planr doctor [--client codex|claude|cursor|all]
 planr install codex|claude|cursor [--dry-run]
@@ -79,7 +80,11 @@ With `--json`, responses follow one convention so agents never guess where data
 `plan check` validates path, YAML frontmatter, and that required sections have content: build plans need `## Scope Decision`, `## Verification`, and `## Acceptance Criteria` filled; product plans need `## Problem`, `## Requirements`, and `## Success Criteria` filled in `PRODUCT_SPEC.md`. It also flags a task list that still contains only the scaffold placeholder (or no work specs at all) — `map build` would turn that into a single coarse item, so the fix names the granularity contract: one `### TASK-00n:` heading (or `- [ ]` line) per verifiable slice, typically 4-8, in execution order. Each warning is structured — `{"file", "section", "message", "fix"}` — and names the exact file to edit plus the re-run command, so a failed check is a repair instruction, not a riddle.
-`plan audit <plan-id>` is the one-call contract verdict for a plan's map scope. It evaluates four clauses with evidence: `items_settled` (open items listed), `reviews_complete` (open review items listed), `approvals_clear` (requested/denied approvals listed), and `verification_logged` (logs with `--kind verification` on scope items). The stored goal contract (`planr context --tag goal-contract` mentioning the plan id) is included; the verification clause is binding only when such a contract exists. `holds: true` means the contract is satisfied — loop agents use this as their stop condition instead of stitching the verdict together from `map status`, `log list`, and `approval list`. Also available as MCP `planr_plan_audit`.
+`plan audit <plan-id>` is the one-call contract verdict for a plan's map scope. It evaluates four clauses with evidence: `items_settled` (open items listed), `reviews_complete` (open review items listed), `approvals_clear` (requested/denied approvals listed), and `verification_logged` (logs with `--kind verification` on scope items). The stored goal contract (`planr context --tag goal-contract` mentioning the plan id) is included; the verification clause is binding only when such a contract exists. `holds: true` means the contract is satisfied — loop agents use this as their stop condition instead of stitching the verdict together from `map status`, `log list`, and `approval list`. When the contract is open, `next` carries the exact next command derived from the first actionable gap: build the map, pick the ready review or work item (plan-scoped), resolve the blocking approval, inspect stalled leases, or log the missing verification. Also available as MCP `planr_plan_audit`.
+`map show --plan <plan-id>` narrows the map to one plan's items and the links among them, with plan-scoped counts — plan-scoped goal runs on shared boards audit their slice, not the whole board. An unknown plan id is an error, never a silent unscoped view. Also available on MCP `planr_map_show` (`plan`) and HTTP `GET /v1/projects/{id}/map?plan=<plan-id>`.
+`context list --tag <tag>` filters notes by the tag they were stored with (`context add --tag`), so e.g. the goal contract is recoverable with `planr context list --tag goal-contract` instead of scanning all notes.
 `map build` chains the created items in plan order with `blocks` links — build plan steps are ordered, so the map inherits that order instead of leaving everything flat. The output lists every created item with its status, the created links, and the next command; adjust order with `planr link add` before picking if execution order differs from document order.

package/docs/CODEX.md CHANGED Viewed

@@ -17,10 +17,20 @@ Codex `/goal` is the recommended orchestrator for autonomous Planr runs: `/goal`
 ```text
 $planr-goal <your goal>
-/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
+/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
 ```
-The stop condition lives in Planr (`--tag goal-contract`), so a dead session resumes with the same starter line from zero chat context. Full workflow, recovery, and per-host variants: [Long-Running Goals](GOALS.md).
+The stop condition lives in Planr (`--tag goal-contract`), so a dead session resumes with the same starter line from zero chat context.
+Run the driver session on your strongest tier (e.g. `gpt-5.5` at `model_reasoning_effort = "high"` in `~/.codex/config.toml`). The provisioned worker role pins a cheaper tier; the reviewer deliberately inherits the session model:
+```toml
+# .codex/agents/planr-worker.toml
+model = "gpt-5.5"
+model_reasoning_effort = "medium"
+```
+Verify the pin once: some Codex versions ignore custom agent files on spawn ([openai/codex#26868](https://github.com/openai/codex/issues/26868)) and the child silently inherits the parent model. Spawn `planr_worker` on a trivial item and confirm the child metadata shows the pinned model and effort with a non-null `agent_path`. Full workflow, recovery, per-host variants, and the tiering rationale: [Long-Running Goals](GOALS.md).
 ## MCP

package/docs/GOALS.md CHANGED Viewed

@@ -39,9 +39,11 @@ planr context add "GOAL CONTRACT pl-csv-export: DONE when every in-scope map ite
 ### 2. Execute — the loop driver runs `$planr-loop`
 ```text
-/goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
+/goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
 ```
+The autonomy clause matters on long runs: deep into a session, frontier models occasionally end a turn with a statement of intent instead of the corresponding action, or pause to ask permission they already have. Stating the operating mode up front prevents both.
 Each iteration follows the `$planr-loop` protocol:
 ```text
@@ -100,7 +102,7 @@ $planr-goal <your goal>          # prep: plan, map, contract, starter command
 /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
 ```
-The `/goal` PM dispatches `spawn the planr_worker agent for item <id>` and `spawn the planr_reviewer agent for item <id>` — the role files preload `$planr-work` and `$planr-review`, so dispatches stay one line. Codex Automations work the same way: set the automation prompt to the starter line.
+The `/goal` PM dispatches `spawn the planr_worker agent for item <id>` and `spawn the planr_reviewer agent for item <id>` — the role files preload `$planr-work` and `$planr-review`, so dispatches stay one line. Codex Automations work the same way: set the automation prompt to the starter line. The provisioned worker role pins a cheaper effort tier; see [Cost Tiering](#cost-tiering).
 ### Claude Code
@@ -111,7 +113,7 @@ Same shape via the plugin (`/plugin install planr@planr`), which registers the `
 /goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
 ```
-`/loop` works for fixed-cadence runs instead of goal-conditioned ones.
+`/loop` works for fixed-cadence runs instead of goal-conditioned ones. The registered worker subagent pins a cheaper model tier; see [Cost Tiering](#cost-tiering).
 ### Cursor and hosts without a loop primitive
@@ -128,6 +130,31 @@ Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context
 Any MCP-capable agent uses the same flow over `planr mcp`. Every session starts with map state, so the loop is resumable by construction.
+## Cost Tiering
+A goal run has three roles with different intelligence needs, so they should not all run on the same model tier:
+- **Driver** (the `/goal` session): decomposition, dispatch decisions, conflict resolution, final synthesis. Run it on the strongest model you have — this is never configured in Planr files, it is simply the model of the main session.
+- **Worker**: bounded implementation. `planr pick --json` is a complete handoff packet (one item, scope, evidence format, stop after review request), so the worker runs safely on a cheaper tier.
+- **Reviewer**: the truth gate. It inherits the driver's model on purpose — make workers cheap, not the verdict.
+Where each host configures the worker tier (the shipped role files carry these defaults):
+| Host | Driver | Worker | Configured in |
+| --- | --- | --- | --- |
+| Codex | session default (e.g. `gpt-5.5` at `high`) | `model = "gpt-5.5"`, `model_reasoning_effort = "medium"` | `.codex/agents/planr-worker.toml` |
+| Claude Code | session model (e.g. `fable` at `high` via `/model` + `/effort`) | `model: opus`, `effort: medium` | `planr-worker.md` frontmatter |
+| Cursor | chat model of the driving session | chosen per dispatch in the host's subagent tooling | no Planr files — pick a cheaper model when dispatching the worker task |
+The defaults use aliases and generic names so they track model generations; pin a full model id (e.g. `claude-opus-4-8`) only if you need determinism, and use `model: sonnet` as the budget alternative. The role files are user-owned copies — `planr project init` provisions them once and never overwrites local edits — so changing the tier is editing one line.
+Two traps to verify once per setup:
+- **Claude Code:** the `CLAUDE_CODE_SUBAGENT_MODEL` environment variable silently overrides every subagent's `model:` frontmatter. Make sure it is unset, then dispatch the worker on a trivial item and confirm the subagent's messages in the session log (`~/.claude/projects/<project>/*.jsonl`) carry the worker model, not the driver's.
+- **Codex:** some versions ignore custom agent files on spawn ([openai/codex#26868](https://github.com/openai/codex/issues/26868)) — the child then inherits the parent model. Spawn `planr_worker` on a trivial item and confirm the child metadata shows the pinned model and effort with a non-null `agent_path`.
+Both failure modes are silent (the run still works, just at driver prices), which is why the smoke test is worth the two minutes.
 ## Coming From Other Goal Tools
 If you already run goal workflows with other tools, the concepts map directly:
@@ -151,5 +178,6 @@ Using such tools for intake or visualization alongside Planr is fine — keep on
 - The maker never closes its own review; single-agent hosts record `review-mode` honestly.
 - Two iterations without map-state movement -> stop and report instead of grinding.
 - Destructive or out-of-repo side effects always go behind `planr approval request`.
+- Lessons that should outlive the iteration (a confirmed approach, a correction, a dead end) go into `planr context add "..." --tag lesson` — the next iteration or a fresh session recovers them with `planr context list --tag lesson`, not from chat history.
 See also: [Skills](SKILLS.md), [Operating Model](OPERATING_MODEL.md), [Task Graph Model](TASK_GRAPH_MODEL.md), [Codex](CODEX.md), [Claude Code](CLAUDE_CODE.md), [Cursor](CURSOR.md).

package/npm/native/darwin-arm64/planr CHANGED Viewed

Binary file

package/npm/native/darwin-x86_64/planr CHANGED Viewed

Binary file

package/npm/native/linux-arm64/planr CHANGED Viewed

Binary file

package/npm/native/linux-x86_64/planr CHANGED Viewed

Binary file

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "planr",
-  "version": "1.1.18",
+  "version": "1.1.19",
   "description": "Local-first planning and execution coordination for coding agents.",
   "license": "MIT",
   "repository": {

package/plugins/planr/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "planr",
   "description": "Skill-driven planning and execution loop for coding agents: one planr entry point, an autonomous planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
-  "version": "1.1.18",
+  "version": "1.1.19",
   "author": {
     "name": "instructa"
   },

package/plugins/planr/.codex-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "planr",
-  "version": "1.1.18",
+  "version": "1.1.19",
   "description": "Skill-driven planning and execution loop for coding agents: one $planr entry point, an autonomous $planr-loop, and evidence-backed task graph skills powered by the planr CLI.",
   "author": {
     "name": "instructa",

package/plugins/planr/agents/planr-reviewer.md CHANGED Viewed

@@ -3,6 +3,8 @@ name: planr-reviewer
 description: Independent findings-first reviewer for one Planr item. Audits evidence and closes the review with a verdict. Dispatch with the item id.
 skills:
   - planr-review
+# Deliberately no model override: the reviewer is the truth gate and inherits
+# the driver's model. Make workers cheap, not the verdict.
 ---
 Use the preloaded planr-review skill exactly as written for the single item id you are given.

package/plugins/planr/agents/planr-worker.md CHANGED Viewed

@@ -3,6 +3,12 @@ name: planr-worker
 description: Implements exactly one picked Planr map item to evidence-backed completion, then requests review and stops. Dispatch with the item id.
 skills:
   - planr-work
+# Cost tiering: the pick packet bounds the worker's scope, so it runs on a
+# cheaper tier than the driver. Aliases track the current generation; pin a
+# full model id (e.g. claude-opus-4-8) only if you need determinism. Budget
+# alternative: model: sonnet. See docs/GOALS.md "Cost Tiering".
+model: opus
+effort: medium
 ---
 Use the preloaded planr-work skill exactly as written for the single item id you are given.

package/plugins/planr/skills/planr-goal/SKILL.md CHANGED Viewed

@@ -50,7 +50,7 @@ The contract must survive compaction, session loss, and host switches, so it liv
 planr context add "GOAL CONTRACT <plan-id>: DONE when every in-scope map item is closed with log evidence, all reviews closed with verdict complete, no open approvals in scope, and a live verification log exists for <oracle>. Iteration budget: 10." --tag goal-contract
 ```
-One contract per plan scope. Any agent on any host can recover it with `planr context list` or `planr search "GOAL CONTRACT"`. Never weaken a stored contract mid-run; scope changes go through `$planr-plan` and the user. During the run, workers lease with `planr pick --plan <plan-id>` so the loop never picks items outside this contract, even when other plans share the board. The loop checks the contract with `planr plan audit <plan-id>`, which evaluates exactly these clauses with evidence and answers `holds: true/false`.
+One contract per plan scope. Any agent on any host can recover it with `planr context list --tag goal-contract` or `planr search "GOAL CONTRACT"`. Never weaken a stored contract mid-run; scope changes go through `$planr-plan` and the user. During the run, workers lease with `planr pick --plan <plan-id>` so the loop never picks items outside this contract, even when other plans share the board. The loop checks the contract with `planr plan audit <plan-id>`, which evaluates exactly these clauses with evidence and answers `holds: true/false`.
 "All reviews closed" audits review items that exist — it does not require a review gate on every item. An item closed with plain `done` (evidence still required) satisfies the contract without one; request reviews where they carry signal (implementation slices, user-facing work), not on trivial inspection or scaffold steps.
@@ -61,13 +61,13 @@ Print the starter command, then stop. Do not start execution yourself; ask wheth
 With a native loop driver (Codex `/goal`, Claude Code `/goal` or `/loop`):
 ```text
-/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted.
+/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
 ```
 Without one (Cursor, plain MCP clients, Codex without /goal):
 ```text
-Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).
+Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract). You are operating autonomously: never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.
 ```
 Re-dispatch the same line after any session ends. The map, logs, and stored contract make every iteration resumable from zero context — nothing about the goal lives only in chat.

package/plugins/planr/skills/planr-loop/SKILL.md CHANGED Viewed

@@ -31,7 +31,7 @@ Store the contract in Planr so it survives compaction, session loss, and host sw
 planr context add "GOAL CONTRACT <plan-id>: DONE when ... Iteration budget: 10." --tag goal-contract
 ```
-`$planr-goal` does this during prep; if the loop starts without a stored contract, store it in iteration 1 before picking. Every iteration re-reads the contract from Planr (`planr context list` or `planr search "GOAL CONTRACT"`), never from chat history. `done`, `close`, and `review close` responses and the pick packet include a `remaining` progress snapshot (`counts` with explicit zeros for every status, `settled`, `total`) plus the list of items each settlement `unlocked`, so the orchestrator can evaluate the stop condition from the completion output without an extra `map status` call.
+`$planr-goal` does this during prep; if the loop starts without a stored contract, store it in iteration 1 before picking. Every iteration re-reads the contract from Planr (`planr context list --tag goal-contract` or `planr search "GOAL CONTRACT"`), never from chat history. `done`, `close`, and `review close` responses and the pick packet include a `remaining` progress snapshot (`counts` with explicit zeros for every status, `settled`, `total`) plus the list of items each settlement `unlocked`, so the orchestrator can evaluate the stop condition from the completion output without an extra `map status` call.
 The stop condition itself is one command: `planr plan audit <plan-id> --json` evaluates the contract clause by clause (items settled, reviews complete, approvals clear, verification logged) with evidence and answers `holds: true/false`. Use it at the top of every iteration and as the final audit — never hand-assemble the verdict from separate calls.
@@ -58,11 +58,13 @@ The loop never closes its own reviews when the host supports a second agent. Mak
 ## Skills Are The Prompts
-When the host supports subagents, delegate with skill references plus an item id, nothing more:
+When the host supports subagents, the driver never implements: it dispatches, audits, and synthesizes. Driver tokens go into `plan audit`, dispatch decisions, and conflict resolution — implementation and review run in the subagent roles, which the host wiring can pin to a cheaper tier (see the role files and `docs/GOALS.md` "Cost Tiering"). Delegate with skill references plus an item id, nothing more:
 - Worker dispatch: `Use $planr-work on item <item-id>. Stop after requesting review.`
 - Checker dispatch: `Use $planr-review on item <item-id>. Close the review with a verdict.`
+A worker subagent may take several items sequentially instead of being respawned per item: `done --next` hands it the next ready work packet, keeps its context warm, and never returns its own freshly created review — maker/checker separation survives long-lived workers.
 Host wiring:
 - Codex: project agents in `.codex/agents/*.toml` preload the skill via `[[skills.config]]` (TOML templates in `agents/` next to this skill). Spawn explicitly: "spawn the planr_worker agent for item X". Keep `[agents] max_depth = 1`.

package/plugins/planr/skills/planr-loop/agents/planr-reviewer.toml CHANGED Viewed

@@ -4,6 +4,9 @@ name = "planr_reviewer"
 description = "Independent findings-first reviewer for one Planr item. Audits evidence and closes the review with a verdict."
 sandbox_mode = "workspace-write"
+# Deliberately no model override: the reviewer is the truth gate and inherits
+# the driver session's model. Make workers cheap, not the verdict.
 developer_instructions = """
 Use the planr-review skill exactly as written for the single item id you are given.
 You did not write this code; audit it like an owner. Inspect the actual diff and rerun the

package/plugins/planr/skills/planr-loop/agents/planr-worker.toml CHANGED Viewed

@@ -3,6 +3,13 @@
 name = "planr_worker"
 description = "Implements exactly one picked Planr map item to evidence-backed completion, then requests review and stops."
+# Cost tiering: the pick packet bounds the worker's scope, so it can run on a
+# lower effort tier than the driver session. Verify the pin took effect after
+# the first spawn (some Codex versions ignore custom agent files on spawn —
+# openai/codex#26868); see docs/GOALS.md "Cost Tiering" for the smoke test.
+model = "gpt-5.5"
+model_reasoning_effort = "medium"
 developer_instructions = """
 Use the planr-work skill exactly as written for the single item id you are given.
 Implement only that item. Log changed files and the real verification commands you ran.

package/docs/planr-spec.zip DELETED Viewed

Binary file