npm - @open-agent-toolkit/cli - Versions diffs - 0.1.3 → 0.1.5 - Mend

@open-agent-toolkit/cli 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/assets/docs/workflows/projects/implementation-execution.md CHANGED Viewed

@@ -53,7 +53,7 @@ Model and effort are separate axes. Each axis logs one of four states:
 - `not-applicable` — this host/API has no meaningful per-dispatch concept for that axis.
 - `host-auto` — exceptional; the host uses that axis internally but the orchestrator cannot read or pin it.
-In Codex, the normal model choice is inherited unless the user requested a model override or the phase clearly requires one. Implementation dispatch maps selected `low`, `medium`, and `high` effort to configured Codex implementer roles (`oat-phase-implementer-low`, `oat-phase-implementer-medium`, and `oat-phase-implementer-high`) rather than relying on per-call effort overrides. The base `oat-phase-implementer` role represents inherited effort. `xhigh` is inherited-only: use it when the parent/orchestrator session is already xhigh, otherwise split/revise the phase or stop for user re-invocation instead of inventing an `xhigh` variant. In Claude Code, subagent model selection is a model axis when available; the separate effort axis is `not-applicable`.
+In Codex, the normal model choice is inherited unless the user requested a model override or the phase clearly requires one. Implementation and fix dispatch default to selected effort: classify the phase, choose the lowest sufficient `low`, `medium`, or `high`, and dispatch the matching configured Codex implementer role (`oat-phase-implementer-low`, `oat-phase-implementer-medium`, or `oat-phase-implementer-high`) rather than relying on per-call effort overrides. The base `oat-phase-implementer` role represents inherited effort, which is the parent-session ceiling path, not a neutral default. Use inherited effort only for an explicit user/Dispatch Profile override, when the phase needs the parent-session ceiling and `selected:high` is insufficient, or when selected-effort roles are unavailable. `xhigh` is inherited-only: use it when the parent/orchestrator session is already xhigh, otherwise split/revise the phase or stop for user re-invocation instead of inventing an `xhigh` variant. In Claude Code, subagent model selection is a model axis when available; the separate effort axis is `not-applicable`.
 Dispatch logs use a consistent structured block so provider behavior is comparable without flattening the model and effort axes:
@@ -100,7 +100,7 @@ Add Dispatch Profile rows only when the user has an explicit constraint or prefe
 For each phase in the plan (whether sequential or inside a parallel group):
 1. **Select runtime dispatch control** for the phase and log the chosen control plus rationale.
-2. **Dispatch the selected implementer role** with a Phase Scope block (project path, phase id, artifact paths, commit convention, workflow mode, and dispatch context when known). In Codex, `effort_axis=selected:low|medium|high` uses `oat-phase-implementer-low|medium|high`; inherited effort uses base `oat-phase-implementer`.
+2. **Dispatch the selected implementer role** with a Phase Scope block (project path, phase id, artifact paths, commit convention, workflow mode, and dispatch context when known). In Codex, `effort_axis=selected:low|medium|high` uses `oat-phase-implementer-low|medium|high`. Inherited effort uses base `oat-phase-implementer` only for an explicit ceiling/override/fallback reason; it should not be used just to avoid choosing a selected effort.
 3. **Receive the summary:** `DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED`.
    - `BLOCKED` stops the run and surfaces the blocker to the user.
 4. **Dispatch `oat-reviewer`** with a Review Scope block (phase id, commit range, optional files-changed hint, and inherited review dispatch context). Review dispatches inherit the parent session's model and effort axes unless the user explicitly requested an override. The commit range is authoritative; the file list is only orientation metadata. In Codex, pass this as a self-contained packet with `fork_context: false`, use the base reviewer role without model or effort overrides, and record `model_axis=inherited, effort_axis=inherited` so the reviewer reads git/OAT artifacts directly instead of inheriting the orchestration thread. In Claude Code, do not pass a per-review model override and record `effort_axis=not-applicable` since Claude Code does not expose a per-dispatch effort axis. If the reviewer does not conclude on the first wait, poll once more, then send a concise "return now with current findings" nudge before falling back inline for that phase.

package/assets/public-package-versions.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
-  "cli": "0.1.3",
-  "docs-config": "0.1.3",
-  "docs-theme": "0.1.3",
-  "docs-transforms": "0.1.3"
+  "cli": "0.1.5",
+  "docs-config": "0.1.5",
+  "docs-theme": "0.1.5",
+  "docs-transforms": "0.1.5"
 }

package/assets/skills/oat-agent-instructions-analyze/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: oat-agent-instructions-analyze
-version: 1.9.0
+version: 1.10.0
 description: Run when you need to evaluate agent instruction file coverage, quality, and drift. Produces a severity-rated analysis artifact. Run before oat-agent-instructions-apply to identify what needs improvement.
 disable-model-invocation: true
 user-invocable: true
@@ -294,7 +294,7 @@ Any discrepancy that would cause agents to follow the wrong pattern should be fl
 ### Step 4: Assess Coverage Gaps
-Walk the directory tree and evaluate each directory against `references/directory-assessment-criteria.md`.
+Walk the directory tree and evaluate **each directory against `references/directory-assessment-criteria.md`, per-directory and at every depth**. The walk descends into subdirectories recursively — it is not limited to top-level apps and packages. A nested domain subdirectory such as `packages/<pkg>/src/<domain>/` is in scope and is assessed with the same primary indicators as a top-level package; the size of a directory's parent never gates whether that directory is evaluated.
 Before general coverage-gap analysis, assess **provider baseline gaps** for every active provider.
 These checks are mandatory even when the missing file does not appear in the discovered inventory.

package/assets/skills/oat-agent-instructions-analyze/references/directory-assessment-criteria.md CHANGED Viewed

@@ -2,6 +2,8 @@
 When does a directory need its own instruction file? Use these criteria to identify coverage gaps during analysis. The full guidance lives in `docs/agent-instruction.md` — this is a distilled, actionable checklist.
+Apply these criteria **per directory, at every depth** — not just to top-level apps and packages. A nested subdirectory such as `packages/<pkg>/src/<domain>/` is assessed with exactly the same indicators as a top-level package. The size of a directory's parent never gates whether that directory is evaluated.
 ## Primary Indicators (any one = likely needs instructions)
 ### 1. Has Own Build Configuration
@@ -26,32 +28,45 @@ When does a directory need its own instruction file? Use these criteria to ident
 - Represents a bounded context or module with domain-specific business logic
 - Has its own data models, terminology, or invariants
-- Example: `packages/billing/`, `services/auth/`, `lib/search-engine/`
-- **Signal strength:** Moderate — depends on complexity
+- Has non-obvious conventions an agent would otherwise miss — patterns that diverge from the parent's defaults
+- **Applies at any depth.** A top-level package and a nested domain subdirectory are assessed the same way: `packages/billing/`, `services/auth/`, `lib/search-engine/`, **and** `packages/<pkg>/src/<domain>/` (for example, a `bigquery-sync/` or `payment-reconciliation/` subdirectory inside an otherwise modest package).
+- **Signal strength:** Strong — this is the primary trigger for a nested instruction file. A bounded domain with its own models, terminology, invariants, and non-obvious conventions warrants a file **at any depth, regardless of how large or small its parent is**. Domain specificity, not parent size, decides this.
-### 5. Significant Codebase (>10 source files)
+### 5. Significant Codebase
-- Contains more than ~10 source files with specialized conventions
+- Contains a substantial body of code (loosely, 10+ source files) with specialized conventions
 - Has patterns or conventions that differ from the rest of the repo
-- **Signal strength:** Moderate — larger directories benefit more from explicit guidance
+- **Signal strength:** Moderate — a larger directory benefits more from explicit guidance, but only when it has conventions of its own
+- **File count is never sufficient on its own.** A directory with many files that all mirror the parent's conventions is not a trigger — there is nothing distinct to capture. This indicator only fires alongside genuinely divergent, non-obvious conventions. Treat the "10+" figure as a loose illustration of "non-trivial", not a precise threshold.
+## Nested Instruction Files (Progressive Specificity)
+Instruction files form a hierarchy. A root `AGENTS.md` carries repo-wide conventions; deeper files carry progressively more specific guidance for the subtree they scope. **Deeper = more specific, never broader.**
+A nested instruction file **inherits everything from its ancestors** and contains **only the domain-specific delta** — the conventions, models, terminology, and invariants that are true for that subtree and are not already captured (or are contradicted) above it. A child file must **not repeat** the parent: no copied conventions, no restated repo-wide rules. See `references/docs/agent-instruction.md` §13 ("Scoped Files (When and How)") for the full progressive-disclosure model.
+Because a nested file is small and additive — it adds a thin delta rather than a full document — **the cost of adding one is low**. The decision bar is therefore qualitative, not size-based:
-## Large Directory Decomposition
+> Would an agent working only from the nearest existing (ancestor) instruction file get something wrong, or miss something, in this directory's domain?
-When a directory meets 1+ primary indicators and contains **more than 50 source files**, assess its major
-subdirectories starting at depth 1-2 before writing a single broad recommendation.
+If yes, the directory is a coverage-gap candidate. If no — the ancestor file already covers everything an agent needs here — it is not, regardless of how many files the directory contains.
-If the first pass still leaves a sub-area that is large or clearly heterogeneous, keep decomposing deeper until the
-distinct conventions are visible or the analysis stops yielding materially different guidance.
+**Worked example.** A package has ~29 source files overall and a root-level `AGENTS.md` describing the package's general conventions. Inside it, `src/bigquery-sync/` holds ~15 files implementing BigQuery sync: it has its own data models, its own terminology (sync cursors, watermark tables, backfill windows), and non-obvious invariants (ordering guarantees, idempotency keys, partition-boundary handling) that appear nowhere else in the package. The parent package is well under any "large" bar, so an app/package-only or size-gated reading would conclude "no nested file warranted." That conclusion is wrong: an agent editing `src/bigquery-sync/` from the package-level `AGENTS.md` alone would miss the sync invariants and likely introduce a correctness bug. `src/bigquery-sync/` is a legitimate coverage-gap candidate — it meets Indicator 4 — and should be surfaced with a scoped `AGENTS.md` recommendation that captures only the sync-specific delta.
-Check whether subdirectories differ by:
+## Decomposing Broad Recommendations
-- tech stack or runtime (for example, embedded React client vs Node server code)
+When an area you would recommend a single instruction file for actually spans **distinct sub-areas**, decompose the recommendation: assess and recommend per sub-area instead of writing one broad, vague file. The trigger for decomposition is **heterogeneity**, not file count.
+Decompose when the area's subdirectories differ by:
+- tech stack or runtime (for example, an embedded React client vs Node server code)
 - dominant file-type patterns (for example, resolvers vs repositories vs jobs)
 - build or tooling configuration (separate tsconfigs, bundlers, framework configs)
 - domain boundary or API surface
-Record distinct sub-areas in the coverage gap assessment. A scoped `AGENTS.md` recommendation for a large directory
-should enumerate the major sub-areas and their conventions, not just report the total file count.
+When you find heterogeneity, assess its major subdirectories starting at depth 1–2 before writing a single broad recommendation. If the first pass still leaves a sub-area that is clearly heterogeneous, keep decomposing deeper until the distinct conventions are visible or the analysis stops yielding materially different guidance. A homogeneous area — even a large one — needs only one recommendation; there are no distinct sub-areas to split out.
+Record distinct sub-areas in the coverage gap assessment. A scoped `AGENTS.md` recommendation for a heterogeneous area should enumerate the major sub-areas and their conventions, not just report a total file count.
 ## Secondary Indicators (strengthen the case but not sufficient alone)
@@ -81,13 +96,15 @@ For each directory meeting 1+ primary indicators:
 **Severity mapping:**
 - **High:** Primary indicators 1-3 (own build, different stack, public API) — these are clear gaps
-- **Medium:** Primary indicators 4-5 (domain boundary, large codebase) — beneficial but not urgent
+- **Medium:** Primary indicators 4-5 (domain boundary, significant codebase) — beneficial but not urgent
 ## Exclusions
 Do NOT flag these as needing instructions:
 - `node_modules/`, `dist/`, `build/`, `.git/` — generated/external
-- Directories with <5 source files and no build config — too small to warrant overhead
+- Directories that merely follow their parent's conventions with nothing distinct to capture — regardless of size. If an agent working from the nearest ancestor instruction file would already do the right thing here, a nested file would only repeat the parent.
 - Test directories that follow the same patterns as their parent — covered by parent instructions
 - Directories already covered by a parent's scoped rules (e.g., Cursor rule with `globs: packages/cli/**`)
+**Anti-sprawl:** Do not recommend an instruction file for a directory just because it contains many files. File count alone is never a trigger. If those files all follow the parent's conventions — no distinct domain, no divergent patterns, nothing an agent would get wrong from the ancestor file — the directory is excluded no matter how large it is. The positive trigger is always distinct, non-obvious conventions worth capturing, not size.

package/assets/skills/oat-project-implement/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: oat-project-implement
-version: 2.0.15
+version: 2.0.16
 description: Use when plan.md is ready for execution. Dispatches phase-level subagents with bounded fix loops; supports plan-declared parallel phase groups with worktree-isolated execution and ordered fan-in.
 argument-hint: '[--retry-limit <N>] [--dry-run]'
 disable-model-invocation: true
@@ -180,7 +180,7 @@ Selection rule:
    - `inherited` — host exposes the axis and the orchestrator deliberately defers to the parent session.
    - `not-applicable` — this host/API has no meaningful per-dispatch concept for that axis.
    - `host-auto` — exceptional; the host uses that axis internally but the orchestrator cannot read or pin it.
-4. In Codex, the model axis normally logs `inherited`; choose `effort_axis=selected:low|medium|high` from phase complexity and dispatch the matching effort-specific implementer role. Use `effort_axis=inherited` for the base implementer role.
+4. In Codex implementation/fix dispatch, the model axis normally logs `inherited`; choose `effort_axis=selected:low|medium|high` from phase complexity and dispatch the matching effort-specific implementer role. Treat `effort_axis=inherited` as the parent-session ceiling path, not a neutral default.
 5. In Claude Code, when subagent model selection is available, choose the lowest sufficient model on the model axis; the effort axis is `not-applicable` because Claude Code does not expose a separate `reasoning_effort` control for subagent dispatch.
 6. If a host uses model/effort internally but exposes neither axis to the orchestrator, log `model_axis=host-auto, effort_axis=host-auto` and include the rationale that would have informed selection.
 7. If confidence is low, choose a stronger available control before dispatch rather than knowingly underpowering the phase.
@@ -190,7 +190,8 @@ Selection rule:
 **Passing axis values to the host dispatch API.** The log shape and the actual dispatch call must agree: never log a `selected:<value>` axis without passing the corresponding parameter on the dispatch invocation, and never pass an explicit parameter that the log does not reflect.
 - **Claude Code implementer/fix dispatch:** when `model_axis=selected:<value>`, pass `model: "<value>"` on the Task tool call. When `model_axis=inherited`, omit the `model` parameter so Claude Code uses its own default. `effort_axis=not-applicable` for both cases because the Task tool exposes no per-dispatch `reasoning_effort` control.
-- **Codex implementer/fix dispatch:** when `effort_axis=selected:low|medium|high`, dispatch the matching configured role: `agent_type: "oat-phase-implementer-low"`, `agent_type: "oat-phase-implementer-medium"`, or `agent_type: "oat-phase-implementer-high"`. Those roles set `model_reasoning_effort` in `.codex/agents/*.toml`. Use the base `agent_type: "oat-phase-implementer"` only for `effort_axis=inherited`. Do not use top-level per-call `reasoning_effort` as the standard OAT selected-effort path; dogfooding showed that path can be inconsistent in some Codex runs.
+- **Codex implementer/fix dispatch:** default to a selected effort. Classify phase complexity, choose the lowest sufficient `effort_axis=selected:low|medium|high`, and dispatch the matching configured role: `agent_type: "oat-phase-implementer-low"`, `agent_type: "oat-phase-implementer-medium"`, or `agent_type: "oat-phase-implementer-high"`. Those roles set `model_reasoning_effort` in `.codex/agents/*.toml`. Use the base `agent_type: "oat-phase-implementer"` only when `effort_axis=inherited` is intentionally selected for an allowed reason below. Do not use top-level per-call `reasoning_effort` as the standard OAT selected-effort path; dogfooding showed that path can be inconsistent in some Codex runs.
+- **Codex inherited effort is the ceiling path:** because inherited effort follows the parent/orchestrator session and may be `xhigh`, do not use `effort_axis=inherited` merely because it is valid, convenient, or avoids choosing a selected effort. Use inherited effort for Codex implementer/fix dispatch only when the user explicitly requested inherited/default parent controls; a valid Dispatch Profile row explicitly requests inherited/default controls; the phase requires the parent-session ceiling and `selected:high` is insufficient; or the selected-effort roles are unavailable or fail to resolve. The dispatch rationale must cite the allowed reason. For ceiling-needed dispatch, explain why `selected:high` is insufficient.
 - **Codex xhigh:** do not create or select an `xhigh` implementer variant. Use `xhigh` only when the parent/orchestrator session is already xhigh and therefore `effort_axis=inherited` on the base role is the correct representation. If a phase appears to require xhigh while the parent is not xhigh, choose `selected:high` only if high is sufficient; otherwise split/revise the phase or stop for user re-invocation at xhigh.
 - **Claude Code `opus`:** unlike Codex `xhigh`, `opus` is directly selectable. Claude Code exposes `opus` through the Task tool's `model` parameter, so OAT may select it when available (`model_axis=selected:opus`) — including as a terminal escalation step. There is no `opus` inherited-only restriction; the `xhigh` rule above is specific to Codex's effort-variant mechanism, not a general "never select the maximum tier" rule.
 - **Reviewer dispatch on either host:** use `model_axis=inherited` by default. For `effort_axis`: use `inherited` on hosts that expose an effort axis (such as Codex); use `not-applicable` on hosts that do not expose a meaningful effort axis (such as Claude Code). Omit `model` and, on Codex, `reasoning_effort` overrides entirely.
@@ -231,6 +232,8 @@ Dispatch target: {host-specific subagent/role/tool target}
 Rationale: {short rationale grounded in phase scope}
 ```
+For Codex implementation/fix dispatches, the rationale must include the phase complexity class that drove the selected effort (for example, mechanical, normal multi-file, or broad/high-risk). If `Effort axis: inherited`, the rationale must also cite the allowed reason for using the parent-session ceiling instead of `selected:low|medium|high`.
 Examples:
 ```text
@@ -563,10 +566,11 @@ For each phase `pNN` in the plan (or each phase in the current parallel group),
 2. Perform a pre-dispatch assertion against the host invocation parameters. The Phase Scope fields are audit/context fields; selected axes must also be represented in the actual host dispatch call.
    - Codex implementer/fix dispatch:
+     - Before building the `spawn_agent` argument map, classify the phase complexity and choose the lowest sufficient selected effort (`low`, `medium`, or `high`) when the matching effort-specific role is available.
      - Build the `spawn_agent` argument map before logging the dispatch. If `effort_axis=selected:low|medium|high`, the argument map MUST use the matching `agent_type`: `"oat-phase-implementer-low"`, `"oat-phase-implementer-medium"`, or `"oat-phase-implementer-high"`. Then derive the `OAT Dispatch:` block `Effort axis:` field from that same argument map.
      - Example selected low payload shape: `agent_type: "oat-phase-implementer-low"` and a Phase Scope message containing `effort_axis: selected:low`.
      - Immediately after spawning, compare the returned Codex status line with the selected effort before waiting on the agent. If the spawned status reports a different effort than the selected value (for example, the log says `effort_axis=selected:medium` but the spawn result reports `gpt-5.5 high`), treat this as an orchestration deviation. Stop, record the deviation in `implementation.md`, and redispatch with corrected parameters before continuing. Do not use work from the mismatched dispatch.
-     - If `effort_axis=inherited`, use base `agent_type: "oat-phase-implementer"` and omit `reasoning_effort`.
+     - If `effort_axis=inherited`, use base `agent_type: "oat-phase-implementer"` and omit `reasoning_effort`. This is the parent-session ceiling path, so the dispatch rationale MUST cite the explicit user/Dispatch Profile override, explain why `selected:high` is insufficient, or record that the selected-effort roles are unavailable or failed to resolve.
    - Claude Code implementer/fix dispatch:
      - If `model_axis=selected:<value>`, the Task tool call MUST include `model: "<value>"`.
      - If `model_axis=inherited`, omit `model`.

package/assets/skills/oat-worktree-bootstrap-auto/SKILL.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
 name: oat-worktree-bootstrap-auto
-version: 1.3.0
+version: 1.4.0
 description: Use when an orchestrator/subagent needs autonomous worktree bootstrap. Non-interactive companion to oat-worktree-bootstrap.
 argument-hint: '<branch-name> [--base <ref>] [--path <root>] [--baseline-policy <strict|allow-failing>]'
-disable-model-invocation: true
+disable-model-invocation: false
 user-invocable: false
 allowed-tools: Read, Write, Bash, Glob, Grep
 ---
@@ -12,6 +12,8 @@ allowed-tools: Read, Write, Bash, Glob, Grep
 Non-interactive worktree bootstrap for orchestrator and subagent execution flows. Creates or reuses a worktree, runs baseline checks, and reports structured status — all without user prompts.
+This skill is **model-invocable** (`disable-model-invocation: false`): orchestrators such as `oat-project-implement` invoke it programmatically when a parallel phase group needs autonomous worktree bootstrap. It is **not** user-invocable (`user-invocable: false`) — it has no interactive surface and is never offered as a slash command.
 > ⚠️ **When not to substitute.** This skill is the **only** supported mechanism for orchestrator-driven worktree creation in OAT skills. Host-native isolation primitives — Claude Code's `Agent({ isolation: "worktree" })`, Cursor's worktree-isolated agent invocations, and equivalents in other hosts — are **not** substitutes. They may use the primary repo's checkout (often `main`) as the base regardless of the caller's current branch, silently producing a worktree at the wrong base. OAT orchestrators dispatching mid-run from a feature branch MUST go through this skill with an explicit `--base` so the resulting worktree contains the orchestrator's prior commits.
 ## Relationship to oat-worktree-bootstrap

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@open-agent-toolkit/cli",
-  "version": "0.1.3",
+  "version": "0.1.5",
   "private": false,
   "description": "Open Agent Toolkit CLI",
   "homepage": "https://github.com/voxmedia/open-agent-toolkit/tree/main/packages/cli",
@@ -33,7 +33,7 @@
     "ora": "^9.0.0",
     "yaml": "2.8.2",
     "zod": "^3.25.76",
-    "@open-agent-toolkit/control-plane": "0.1.3"
+    "@open-agent-toolkit/control-plane": "0.1.5"
   },
   "devDependencies": {
     "@types/node": "^22.10.0",