@maestrofrontier/frontier 1.4.1 → 1.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -25,7 +25,7 @@
25
25
  <sub>13 fixture tasks &middot; 123 valid A/B runs &middot; 11 voids excluded &amp; re-run &middot; 6 hooks, all tested &middot; ~8 KB always-on kernel &middot; 2-line plugin install</sub>
26
26
  </p>
27
27
 
28
- > **⚡ Install — run the one line for your tool, inside that tool.** Each installs only that tool's setup (the portable `AGENTS.md` doctrine, that tool's adapter, `docs/orchestration.md`, the Frontier engine, and that tool's `/frontier` command) — **append-only; it never overwrites or deletes your files.**
28
+ > **Install — run the one line for your tool, inside that tool.** Each installs only that tool's setup (the portable `AGENTS.md` doctrine, that tool's adapter, `docs/orchestration.md`, the Frontier engine, and that tool's `/frontier` command) — **append-only; it never overwrites or deletes your files.**
29
29
 
30
30
  **Claude Code / Desktop** — native plugin (enforcement hooks, `/maestro:*` commands, skills, status line, Frontier auto-run):
31
31
 
@@ -68,22 +68,6 @@ members — fusing the CLIs you already run buys frontier-tier results. It
68
68
  is the project's new default identity; the doctrine, hooks, skills, and
69
69
  benchmarks are unchanged; the discipline layer is its foundation.
70
70
 
71
- <p align="center">
72
- <img src="assets/frontier-fusion-benchmark.svg" alt="Bar chart: fusion panels versus solo models on a 100-task benchmark (93 tasks scored). Every fusion panel outscores its individual member models; an Opus 4.8 self-fusion (double Opus) reaches ~65.5%, matching Claude Fable 5 solo, while the strongest fusion, Fable 5 + GPT-5.5, tops the field at ~69% and solo Gemini 3.1 Pro and Gemini 3 Flash trail near 43-45%." width="880">
73
- </p>
74
-
75
- <p align="center">
76
- <sub>Fusion vs solo on a 100-task suite (93 scored). Every fusion panel beats its own member models, and the strongest fusion — Fable 5 + GPT-5.5 — leads the field. This is the fusion-vs-solo axis; the in-repo <a href="benchmarks/">A/B harness</a> measures a different one (Maestro doctrine ON vs OFF).</sub>
77
- </p>
78
-
79
- <p align="center">
80
- <img src="assets/frontier-pipeline.svg" alt="Maestro Frontier fusion pipeline: prompt fans out to a parallel panel of local CLIs, a chosen judge model produces structured analysis (consensus, contradictions, unique insights, blind spots), then a chosen synthesizer model writes a grounded response" width="900">
81
- </p>
82
-
83
- The pipeline above is the engine's whole architecture: fan out to a
84
- parallel panel, a judge model that compares the answers (it does not
85
- merge them), then a grounded synthesis.
86
-
87
71
  It ships with the plugin and is driven by `/maestro:frontier`. Three
88
72
  modes, switched at will, **`off` by default** so installing or
89
73
  upgrading changes nothing until you opt in. **Arming it — `single` or
@@ -105,10 +89,6 @@ the synthesized answer. `off` is the disable path.
105
89
  /maestro:frontier off # disable auto-run; back to normal Maestro
106
90
  ```
107
91
 
108
- <p align="center">
109
- <img src="assets/frontier-presets.svg" width="820" alt="Maestro Frontier fusion presets reference card">
110
- </p>
111
-
112
92
  Presets define the panel; the judge and synthesizer default to Opus 4.8
113
93
  (`claude -p`), and you override either with `--judge` / `--synth`:
114
94
 
@@ -135,32 +115,23 @@ and verified end-to-end on real runs of `single` mode and the
135
115
  `opus-gpt`, `opus-duo`, and `frontier-trio` presets**. The `gpt-duo`
136
116
  preset and `--judge`/`--synth` selection share that same code path and
137
117
  are unit-tested, but not yet live-run. The quality *lift* of local fusion
138
- is **measured, not asserted**: on a 100-task suite (93 scored, chart
139
- above) every fusion panel outscored its own member models, with the
140
- strongest fusion leading the field. That fusion-vs-solo result is a
141
- separate axis from the in-repo A/B harness, which measures Maestro
142
- doctrine ON vs OFF; numbers are never mixed across the two.
143
- Operational caveats recorded in the risk burndown: headless web access
144
- differs per CLI (Codex confirmed live; Claude and Gemini are gated
145
- `webTools:false` in this build), and each cold `claude -p`
146
- panel/judge/synth call is non-trivial in cost; use small prompts, and
147
- prefer `opus-gpt` to bound spend. The budget cap is opt-in
148
- (`tokenBudget`, default disabled). The engine is zero-dependency
118
+ is **measured, not asserted**: on a 100-task suite (93 scored) every
119
+ fusion panel outscored its own member models, with the strongest fusion
120
+ leading the field. That fusion-vs-solo result is a separate axis from the
121
+ in-repo A/B harness, which measures Maestro doctrine ON vs OFF; numbers
122
+ are never mixed across the two.
123
+ Operational caveats: headless web access differs per CLI (Codex confirmed
124
+ live; Claude and Gemini are gated `webTools:false` in this build), and
125
+ each cold `claude -p` panel/judge/synth call is non-trivial in cost; use
126
+ small prompts, and prefer `opus-gpt` to bound spend. The budget cap is
127
+ opt-in (`tokenBudget`, default disabled). The engine is zero-dependency
149
128
  CommonJS under [`frontier/`](frontier/); each CLI is resolved from your
150
129
  `PATH` (`claude`, `codex`, `gemini`). Binary overrides and the full
151
130
  operational reference are in
152
131
  [`commands/frontier.md`](commands/frontier.md#binary-overrides).
153
132
 
154
- <p align="center">
155
- <img src="assets/frontier-stack.svg" alt="Maestro Frontier fusion engine sitting on the discipline layer foundation; an amber data-flow connects the two" width="820">
156
- </p>
157
-
158
133
  ## What You Get
159
134
 
160
- <p align="center">
161
- <img src="assets/what-you-get.svg" width="860" alt="What Maestro gives you: five capabilities on a dark card">
162
- </p>
163
-
164
135
  Frontier is the headline; the discipline layer beneath it is what runs on
165
136
  every task. Drop two markdown files into your repo and your agent gains
166
137
  five things:
@@ -171,141 +142,27 @@ five things:
171
142
  4. **Multi-agent only when it pays.** A counted Decision Gate routes work single-agent by default and demands an explicit verdict line before the first edit; orchestration stays behind it.
172
143
  5. **Receipts.** A reproducible A/B benchmark harness ships in-repo, with our own retractions and nulls. Rerun every number yourself.
173
144
 
174
- <p align="center">
175
- <img src="assets/discipline-demo.svg" alt="Two terminal close-outs quoted verbatim from benchmark streams: a baseline agent declares all done although no check ran, while the Maestro run opens with a counted GATE verdict line and exits with the honest status UNVERIFIED, no type-checker or linter configured in this project" width="860">
176
- </p>
177
-
178
145
  The price, measured rather than implied: ON spends about 10% more than a
179
146
  clean agent on a 10-module refactor and 38% more on a 16-file feature
180
- (n=9 medians, t08/t12 below); you are buying verification and
181
- auditability, not speed. The overhead is behavioral, not byte-weight: a
182
- kernel rewrite cut always-on bytes 41% and fixed status reporting
183
- (12/12 vs 3/30) with no measurable cost change. The premium earns its
184
- keep on unattended work (overnight loops, scheduled runs, CI agents)
185
- where nobody reads the 3am transcript and the close-out claim is all you
186
- have.
187
-
188
- That regime is not hypothetical: Maestro runs under its own rules. The
189
- four most recent maintenance loops ran unattended on the S10 long-horizon
190
- doctrine, with checkpoint artifacts and pre-declared budget ceilings, and
191
- together made 75 benchmark runs for $30.12 against $47 in caps, produced 0
192
- voided runs, and shipped the retractions you can read below with no human
193
- in the loop. Output-style compression (terse-reply tools) is orthogonal
194
- and worth ~1% of agentic spend; Maestro will not grow a kernel toggle for
195
- it. The whole design rests on [peer-reviewed
196
- research](https://marklaursen.com/blog/why-your-multi-agent-ai-system-keeps-failing)
197
- showing **79% of multi-agent failures come from coordination, not model
198
- capability**, and that **three optimized agents outperform seven**;
199
- adding agents usually makes things worse, so Maestro makes the single
200
- agent you already have rigorous by default and holds multi-agent
201
- coordination behind a counted gate.
202
-
203
- ## The Discipline Layer It Runs On
204
-
205
- Frontier runs on this. It is the part this repo actually benchmarks: a
206
- verification-first discipline layer that applies to every task, fusion or
207
- not.
208
-
209
- <p align="center">
210
- <img src="assets/maestro-flow.svg" alt="Maestro orchestration flow: task through the S1 decision gate to either a single agent or the planner, specialist group, and staff engineer pipeline, converging on verified delivery" width="780">
211
- </p>
212
-
213
- - **Universal Rules**: verification gates, status vocabulary, surgical scope, edit safety, context economy; applied to every task in both modes, including one-line fixes.
214
- - **Decision Gate**: counts the work and emits a verdict line (`GATE: files=<n> concerns=<m> -> ...`) before the first edit. Most tasks stay single-agent.
215
- - **Planner**: decomposes complex tasks into parallel and sequential subtasks with boundaries and acceptance criteria.
216
- - **Specialists**: execute focused subtasks with scoped context, hard-capped at 4 per parallel group (DyLAN and agent-scaling findings).
217
- - **Cross-Talk Routing**: detects when one specialist's output affects another and routes the minimum necessary context.
218
- - **Staff Engineer Review**: adversarial final verification for contradictions, breakage, and architectural drift.
219
- - **Long-Horizon Operation**: checkpoint artifacts, self-pacing, and explicit end conditions for recurring or multi-session runs; exits graded by a fresh-context verifier, never self-assessed.
220
-
221
- <p align="center">
222
- <img src="assets/loop-lifecycle.svg" alt="Loop engineering lifecycle: read checkpoint, re-anchor goal, execute phase, verify with a fresh-context verifier grading the exit, write checkpoint distilling findings into rules, event or wakeup, exiting to a final report on success or hard cap" width="440">
223
- </p>
224
-
225
- **How it works:**
226
-
227
- 1. You give your AI coding agent a task as normal.
228
- 2. The Decision Gate counts the work and emits a verdict line; most tasks run single-agent with no coordination overhead.
229
- 3. Single-agent work follows the Universal Rules: scoped edits, verification before any completion claim, an honest status token at the end.
230
- 4. Work that crosses the gate's thresholds goes Planner -> Specialists -> adversarial Staff Engineer review.
231
- 5. Long or recurring runs follow the Long-Horizon rules: checkpoint artifacts, explicit end conditions, iteration caps.
232
- 6. You get a result with a verification status you can act on, not a vibe.
233
-
234
- The specialist manifest (S3) and cross-talk handoff packet (S4/S6) also ship as machine-readable JSON Schemas in [`schemas/`](schemas/) for tooling. The prose doctrine remains the source of truth.
235
-
236
- ## Quick Start
237
-
238
- Maestro installs as plain markdown files your AI agent reads on startup. No packages, no build steps, no SDK. Download the files for your runtime into the project root and your agent picks them up automatically.
239
-
240
- | Runtime | Files to add |
241
- |---|---|
242
- | Claude Code | [`AGENTS.md`](AGENTS.md) + [`CLAUDE.md`](CLAUDE.md) |
243
- | Gemini | [`AGENTS.md`](AGENTS.md) + [`GEMINI.md`](GEMINI.md) |
244
- | Codex | [`AGENTS.md`](AGENTS.md); see [Maestro on Codex](docs/codex.md) |
245
- | Cursor | [`.cursorrules`](.cursorrules) |
246
- | GitHub Copilot | [`AGENTS.md`](AGENTS.md); nearest `AGENTS.md` in the directory tree wins, and a root `CLAUDE.md` or `GEMINI.md` also works |
247
- | Cline | [`AGENTS.md`](AGENTS.md); native support (also auto-detects `.cursorrules`) |
248
- | Windsurf | [`AGENTS.md`](AGENTS.md); root file is always-on, processed by the Rules engine |
249
-
250
- > **Already have a `CLAUDE.md`, `AGENTS.md`, or `.cursorrules`?** Don't overwrite them, you'll lose your project context. The per-runtime steps below show how to merge Maestro into an existing setup.
251
-
252
- ### Claude Code
253
-
254
- **Option A, plugin (hooks + context-bar command, one step).** Maestro
255
- is an installable Claude Code plugin; the repo is its own marketplace:
256
-
257
- ```text
258
- /plugin marketplace add mbanderas/maestro
259
- /plugin install maestro@maestro
260
- ```
261
-
262
- The plugin auto-wires all five enforcement hooks (subagent guard, loop
263
- guard, phase-scope guard, gate reminder, opt-in gate telemetry) and the
264
- `/maestro:context-bar` command. Two things it cannot do for you:
265
- the doctrine files (`AGENTS.md`/`CLAUDE.md`) still go in your project
266
- root (Option B), and the status line script still needs a one-line
267
- `statusLine` settings entry (see [`docs/context-bar.md`](docs/context-bar.md));
268
- plugins cannot set the main status line.
269
-
270
- **Option B, plain files (doctrine only, zero machinery):**
271
-
272
- ```bash
273
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/AGENTS.md
274
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/CLAUDE.md
275
- ```
276
-
277
- Claude Code reads `CLAUDE.md` on startup. The `@AGENTS.md` import inside it pulls in the orchestration doctrine. Your next task routes through Maestro's Decision Gate.
278
-
279
- **Already have a `CLAUDE.md`?** Don't overwrite it. Instead, download just `AGENTS.md` and add `@AGENTS.md` to the top of your existing `CLAUDE.md` to import the doctrine. You can optionally merge the runtime rules from Maestro's [`CLAUDE.md`](CLAUDE.md) into yours.
280
-
281
- **Optional:** Maestro also ships a context-window progress bar for the Claude Code status line; see [`docs/context-bar.md`](docs/context-bar.md).
282
-
283
- ### Gemini
284
-
285
- ```bash
286
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/AGENTS.md
287
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/GEMINI.md
288
- ```
289
-
290
- **Already have a `GEMINI.md`?** Don't overwrite it. Download just `AGENTS.md` and add `@AGENTS.md` to the top of your existing `GEMINI.md`. You can optionally merge the runtime rules from Maestro's [`GEMINI.md`](GEMINI.md) into yours.
291
-
292
- ### Codex
293
-
294
- ```bash
295
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/AGENTS.md
296
- ```
297
-
298
- Codex reads `AGENTS.md` directly; no adapter file needed.
299
-
300
- **Already have an `AGENTS.md`?** Don't overwrite it: that file likely contains your project context. Instead, append the contents of Maestro's [`AGENTS.md`](AGENTS.md) to your existing file, or paste it into a section of your `AGENTS.md` so Codex reads both your project context and the orchestration doctrine.
301
-
302
- ### Cursor
303
-
304
- ```bash
305
- curl -O https://raw.githubusercontent.com/mbanderas/maestro/main/.cursorrules
306
- ```
307
-
308
- **Already have a `.cursorrules`?** Don't overwrite it. Cursor does not support file imports, so append the contents of Maestro's [`.cursorrules`](.cursorrules) to your existing file.
147
+ (n=9 medians, t08/t12); you are buying verification and auditability, not
148
+ speed. The premium earns its keep on unattended work (overnight loops,
149
+ scheduled runs, CI agents) where nobody reads the 3am transcript and the
150
+ close-out claim is all you have.
151
+
152
+ ## Discipline, Benchmarks, and Research
153
+
154
+ The discipline layer (verification, scope, honest status) applies to
155
+ every task, fusion or not. The full orchestration protocol lives in
156
+ [`docs/orchestration.md`](docs/orchestration.md). Benchmark data,
157
+ retractions, and methodology including the honest reading that Maestro
158
+ ON has never beaten OFF on success rate in any measured cell and that the
159
+ early efficiency story did not survive replication are in
160
+ [`docs/benchmarks.md`](docs/benchmarks.md) and
161
+ [`benchmarks/README.md`](benchmarks/README.md). The architecture is
162
+ grounded in 700+ sources; the key driver is that
163
+ [79% of multi-agent failures come from coordination, not model
164
+ capability](https://marklaursen.com/blog/why-your-multi-agent-ai-system-keeps-failing),
165
+ and that three optimized agents outperform seven.
309
166
 
310
167
  ## Runtime Adapters
311
168
 
@@ -319,6 +176,8 @@ Maestro separates **portable orchestration doctrine** from **runtime-specific ad
319
176
  | `.cursorrules` | Cursor adapter | Kernel copy (Cursor does not support imports); full S2-S6 in docs/orchestration.md |
320
177
  | [`docs/codex.md`](docs/codex.md) | Codex guide | AGENTS.md precedence and 32 KiB cap, Codex subagent mapping, Automations long-horizon mapping (Codex reads `AGENTS.md` natively) |
321
178
 
179
+ Maestro's tools run on **both Claude Code and Codex** — in Claude Code as `/maestro:*` slash commands, and in Codex as installable skills, with the portable `node settings/cli.cjs` and `maestro frontier ...` CLIs working on any other agent too. The Codex skills (`frontier`, `terse`, `settings`, `update`) are installed to `.agents/skills/<name>/SKILL.md` by `maestro install --target codex`. When Frontier mode is on, the `frontier` skill leads each Codex reply with `Maestro Frontier ON (<label>)` (`single · <model>` or `fusion · <preset>`) — the Codex analog of Claude Code's armed Frontier indicator; run `maestro frontier status --scope codex` to check. This indicator is Codex-scoped only.
180
+
322
181
  GitHub Copilot, Cline, and Windsurf read `AGENTS.md` directly, so the portable core works there with no adapter. Maestro's always-on kernel (`AGENTS.md`) is ~8 KB, under Windsurf's 12,000-character limit and roughly a quarter of Codex's 32 KiB budget; the full multi-agent protocol loads on demand from `docs/orchestration.md`.
323
182
 
324
183
  **Subagents vs Agent Teams (Claude Code):** Maestro's `CLAUDE.md` adapter
@@ -337,11 +196,11 @@ Optional Claude Code machinery; full install steps in the linked docs.
337
196
  - **Hook Pack**: five more zero-dependency hooks (doctrine guard, loop guard, phase-scope, gate reminder, opt-in gate telemetry) enforcing the rest of the doctrine. [`docs/hooks.md`](docs/hooks.md)
338
197
  - **Context Bar**: a status-line context-window progress bar that shifts green to amber to red and detects the model's window (including the 1M Opus tier). [`docs/context-bar.md`](docs/context-bar.md)
339
198
  - **Terse Mode + Compress**: opt-in output-token reduction (`/maestro:terse`) and a memory-file compressor (`/maestro:compress`), adapted from the MIT-licensed Caveman plugin. [`docs/context-bar.md`](docs/context-bar.md)
340
- - **Settings**: `/maestro:settings` changes any toggle in one line (`set terse off`, `frontier fusion opus-gpt`, `help`) or opens a keyboard picker with no arguments (the `AskUserQuestion` selector, not the built-in `/model` widget, which plugins cannot render), plus a portable `node settings/cli.cjs status|list|help|set` for Codex and any other CLI, over the terse, frontier, and context-bar toggles. [`docs/settings.md`](docs/settings.md)
199
+ - **Settings**: `/maestro:settings` changes any toggle in one line (`set terse off`, `frontier fusion opus-gpt`, `help`) or opens a keyboard picker with no arguments, plus a portable `node settings/cli.cjs status|list|help|set` for Codex and any other CLI. [`docs/settings.md`](docs/settings.md)
341
200
 
342
201
  ## Commands & Settings
343
202
 
344
- Every Maestro slash command in Claude Code is namespaced `/maestro:<name>`. On Codex and other CLIs without slash commands, the same actions run through the scripts noted below.
203
+ Every Maestro slash command in Claude Code is namespaced `/maestro:<name>`. The same tools run on Codex as installed skills (`.agents/skills/<name>/SKILL.md`, via `maestro install --target codex`); on any CLI the same actions also run through the portable scripts noted below.
345
204
 
346
205
  | Command | What it does | Usage |
347
206
  |---|---|---|
@@ -358,7 +217,7 @@ Every Maestro slash command in Claude Code is namespaced `/maestro:<name>`. On C
358
217
  | Toggle | Values | What it controls |
359
218
  |---|---|---|
360
219
  | `terse` | `off`, `lite`, `full`, `ultra` | Output-token reduction. Shows an amber level badge (`ULTRA`) on the status bar. |
361
- | `frontier` | `off`; `single:` `opus` / `gpt-5.5` / `gemini`; `fusion:` `opus-duo` / `opus-gpt` / `gpt-duo` / `frontier-trio` / `custom`, each with optional `--judge` / `--synth` | The local fusion engine. When armed it auto-runs on every prompt. The blue `ƒ` panel badge means auto-run is on: `ƒO+C`, `ƒO+C+G`, `ƒ✦3` (`O`=Opus, `C`=ChatGPT/GPT-5.5, `G`=Gemini). |
220
+ | `frontier` | `off`; `single:` `opus` / `gpt-5.5` / `gemini`; `fusion:` `opus-duo` / `opus-gpt` / `gpt-duo` / `frontier-trio` / `custom`, each with optional `--judge` / `--synth` | The local fusion engine. When armed it auto-runs on every prompt. The blue `f` panel badge means auto-run is on: `fO+C`, `fO+C+G`, `f*3` (`O`=Opus, `C`=ChatGPT/GPT-5.5, `G`=Gemini). |
362
221
  | `context-bar` | `on`, `off` | The status-line context-window progress bar. |
363
222
 
364
223
  Portable everywhere, Codex included: `node settings/cli.cjs status | list | help | set <key> <value>` (frontier also takes `--judge`, `--synth`, `--models a,b,c`). Full references: [`docs/settings.md`](docs/settings.md) and [`docs/context-bar.md`](docs/context-bar.md).
@@ -382,109 +241,18 @@ It can't run the reload for you (a slash command can't invoke another slash comm
382
241
  /reload-plugins
383
242
  ```
384
243
 
385
- `/reload-plugins` applies the update in the running session; if Claude Code warns that a restart is required, restart it. Non-interactive equivalent of the pull: `claude plugin marketplace update maestro`. You can also enable marketplace auto-update so the local clone refreshes automatically — check Claude Code's plugin settings.
244
+ `/reload-plugins` applies the update in the running session; if Claude Code warns that a restart is required, restart it. Non-interactive equivalent of the pull: `claude plugin marketplace update maestro`.
386
245
 
387
246
  > **Note:** There is no `/plugin update <name>` command in Claude Code. Use `/maestro:update`, or `/plugin marketplace update maestro` + `/reload-plugins`.
388
247
 
389
248
  ### Codex / Cursor (portable installs, no plugin system)
390
249
 
391
- Run `/update` if your integration file exposes it, or update manually:
392
-
393
250
  - **Git clone:** `git pull` inside the Maestro clone directory.
394
- - **Downloaded copy:** re-run `npx github:mbanderas/maestro install --target auto --project .` from the project root, or re-download the tarball and re-copy `frontier/`, `bin/maestro.cjs`, plus your integration command file (`integrations/codex/prompts/frontier.md` or `integrations/cursor/commands/frontier.md`) from the latest `main`.
251
+ - **Downloaded copy:** re-run `npx github:mbanderas/maestro install --target auto --project .` from the project root, or re-download the tarball and re-copy `frontier/`, `bin/maestro.cjs`, plus your integration command file from the latest `main`.
395
252
 
396
253
  ### Gemini / other CLIs
397
254
 
398
- The same portable manual steps apply: re-pull or re-copy `frontier/` and the relevant integration file from `main`. If your CLI supports custom commands and you have a `/update` wired, run that instead.
399
-
400
- ## When to Use Maestro
401
-
402
- The discipline layer (verification, scope, honest status) applies to every task from a one-line fix upward. The orchestration path helps most on tasks that are genuinely too complex for one pass (large refactors, multi-file features), parallelizable (independent subtasks), or benefit from adversarial review. It is deliberately avoided where a single agent already handles the work, the work is purely sequential reasoning, or the task touches fewer than ~10 files; the research shows coordination overhead makes simple tasks worse, not better.
403
-
404
- ### Why Not CrewAI / LangGraph / AutoGen?
405
-
406
- | | Maestro | CrewAI / LangGraph / AutoGen |
407
- |---|---|---|
408
- | **Setup** | Two lines (`/plugin install`) or copy a folder, done | Install packages, write Python/TS, configure agents |
409
- | **Dependencies** | Zero | Framework + SDK + runtime |
410
- | **Where it runs** | Inside your existing AI coding agent | Standalone process you build and deploy |
411
- | **Agent count** | Hard cap at 4 parallel (research-backed) | Unlimited (user decides) |
412
- | **Default behavior** | Single-agent unless complexity warrants multi | Always multi-agent |
413
- | **Design philosophy** | Fewer agents, structured coordination | More agents, flexible topologies |
414
-
415
- Maestro is not a framework. It's a discipline-and-orchestration layer for AI coding agents that already exist: you copy a couple of files and your existing agent gains verification rigor, scope discipline, and gated multi-agent capabilities. If you need a standalone multi-agent application with custom tools, APIs, and deployment pipelines, use a framework.
416
-
417
- ## Benchmarks
418
-
419
- Maestro ships a reproducible A/B harness in [`benchmarks/`](benchmarks/):
420
- thirteen fixture tasks, a runner for Windows and macOS/Linux (no deps;
421
- the macOS/Linux script needs `jq`), and a deterministic `verify.cjs`
422
- checker per task. Each task runs Maestro ON (doctrine files present) vs
423
- OFF (absent) under an isolated `CLAUDE_CONFIG_DIR`, with the checker
424
- **hidden from the agent until the run ends** (visible oracles inflate
425
- pass rates 20-60%, arXiv:2602.10975).
426
-
427
- <p align="center">
428
- <img src="assets/bench-cells.svg" alt="Bar chart of median cost per task-run for t07 to t10 and t12 with doctrine off, on, and core variant; pass rates per cell shown beneath each group" width="860">
429
- </p>
430
-
431
- Honest reading: **Maestro ON has never beaten OFF on success rate in any
432
- measured cell**; at n=9, t09 is exactly tied (8/9 each) and t08 and t12
433
- are 9/9 both modes. The early efficiency story did not survive
434
- replication: the t12 and t08 n=3 wall, turn, and token gains were all
435
- retracted at n=9, and the only unreplicated positives left standing
436
- (Gemini t08, the t11 pilot) are flagged as such. The full n=3 -> n=9
437
- reversal arithmetic is in
438
- [`docs/benchmarks.md`](docs/benchmarks.md#retractions). On small or linear
439
- tasks the doctrine is pure overhead (t10: +78% median wall). t09 separates
440
- *models* more than modes: gemini-3.1-pro-preview passes 1 of 6 valid runs,
441
- gpt-5.4-mini 4/4, sonnet ~8-in-9. Small samples throughout; no
442
- significance claims.
443
-
444
- The one new directional signal is on a different axis. **t14**, a
445
- checker-less trap task with a non-obvious correctness property, holds both
446
- arms at **6/6 pass** while the honesty metric `claim_consistent` runs
447
- **OFF 1/6 vs ON 4/6** and `target_smoke_tested` **OFF 0/6 vs ON 2/6**, at
448
- ON median cost **$0.1930** vs OFF **$0.1501** (about **+29%**). The
449
- `status_token` axis is excluded; OFF was never taught the S7.3
450
- vocabulary. Per the frozen prereg this is **directional only, not
451
- confirmatory** (n=6 is exploratory; a grounded effect needs n>=9): Maestro
452
- buys more honest completion behavior on a trap task, at higher cost; paid
453
- for by the premium, not recovered.
454
-
455
- Key findings:
456
-
457
- - **No success-rate lift.** ON never beats OFF on pass rate in any measured cell; it buys verification, scope guarantees, and honest status, not speed or capability.
458
- - **Weak-model rescue: not measurable.** Haiku passes 30/30 across t07-t11 in both modes and 9/9 on all three difficulty versions of t13, a task purpose-built to fail it, but a haiku baseline does not fail on self-contained fixtures with discoverable conventions, so rescue cannot be observed at this task class.
459
- - **The gate speaks, prose alone does not spawn.** Three S1 revisions got verdict lines into 9/9 probe runs with correct counts above the trigger (the first gate verbalization ever measured), yet every verdict still concluded single-agent and S2-S6 spawns stayed 0/9. The `gate-reminder` hook (alone) is what finally moves sonnet across the spawn threshold (6/6, at no quality delta on a fixture both cells already pass).
460
- - **The verdict line binds.** Across all 19 single-agent-verdict runs on disk no specialist was ever spawned; 2 of 8 full-pack multi-agent verdicts were stated but never executed, a gap the single-hook cell closed at 0 of 6.
461
- - **Compliance deltas are null at these tiers.** Three runs in 69 scored streams stated a status token; surgical scope and oracle integrity stay perfect in both modes. Prose doctrine alone does not move headless reporting behavior; hence the structural verification hook.
462
-
463
- Numbers are never compared across CLIs or models, and the protocol
464
- forbids publishing numbers that were not actually measured. Earlier
465
- same-day t01-t06 results were taken **before** the hidden-oracle fix and
466
- are kept only as labeled upper bounds, not comparable to the cells above.
467
-
468
- Full data, retractions, and methodology -> [`docs/benchmarks.md`](docs/benchmarks.md).
469
-
470
- ## Research Foundation
471
-
472
- Maestro's architecture is grounded in 700+ sources across computer science, library science, safety engineering, and knowledge theory. The key papers:
473
-
474
- | Paper | Year | Venue | Key Finding |
475
- |---|---|---|---|
476
- | [MAST](https://arxiv.org/abs/2503.13657) | 2025 | NeurIPS Spotlight | 41-87% failure rates; 79% from coordination |
477
- | [DyLAN](https://arxiv.org/abs/2310.02170) | 2024 | COLM | 3 agents outperform 7; dynamic topology selection |
478
- | [Towards a Science of Scaling Agent Systems](https://arxiv.org/abs/2512.08296) | 2025 | arXiv (Google/MIT) | 260 configs; architecture-task fit dominates; sequential tasks degrade 39-70% |
479
- | [Agent Scaling via Diversity](https://arxiv.org/abs/2602.03794) | 2026 | arXiv | 2 diverse agents match 16 homogeneous; diversity, not headcount, drives gains |
480
- | [LoopTrap](https://arxiv.org/abs/2605.05846) | 2026 | arXiv | Termination poisoning: loop end-conditions are an attack surface; hard caps mitigate |
481
- | [MetaGPT](https://arxiv.org/abs/2308.00352) | 2023 | - | Structured handoffs score 3.9/4 vs unstructured 2.1/4 |
482
- | [Voyager](https://arxiv.org/abs/2305.16291) | 2023 | NeurIPS | Skill library pattern for capability organization |
483
- | [GTD](https://arxiv.org/abs/2504.05767) | 2025 | arXiv | 0.3% degradation under failure with redundant topologies |
484
- | [SELFORG](https://arxiv.org/abs/2502.11811) | 2025 | arXiv | Shapley-based contribution estimation |
485
- | [DRACO](https://arxiv.org/abs/2602.11685) | 2026 | arXiv | Deep-research benchmark reviewed for fusion; fused panels outscored their solo members |
486
-
487
- For the full analysis, read [Why Your Multi-Agent AI System Keeps Failing](https://marklaursen.com/blog/why-your-multi-agent-ai-system-keeps-failing).
255
+ Re-pull or re-copy `frontier/` and the relevant integration file from `main`. If your CLI supports custom commands and you have a `/update` wired, run that instead.
488
256
 
489
257
  ## Contributing
490
258
 
package/docs/codex.md CHANGED
@@ -86,3 +86,13 @@ on the model honoring it rather than on a hook.
86
86
  The Maestro context bar also does not apply: Codex CLI ships a native
87
87
  context-usage indicator (`/statusline` picker, or `context` in
88
88
  `[tui].status_line` in `~/.codex/config.toml`).
89
+
90
+ ## Skills and the Frontier ON indicator
91
+
92
+ `maestro install --target codex` installs the `frontier`, `terse`, `settings`,
93
+ and `update` Maestro commands as Codex skills (no-clobber) to
94
+ `.agents/skills/<name>/SKILL.md` (per-repo) or `~/.agents/skills/<name>/SKILL.md`
95
+ (global). When `maestro frontier status --scope codex` reports mode != off, the
96
+ `frontier` skill instructs Codex to lead its reply with
97
+ `Maestro Frontier ON (<label>)` — `single · <model>` or `fusion · <preset>`. When
98
+ mode is off, no indicator line appears.
@@ -20,7 +20,7 @@ loads on demand.
20
20
  | Runtime | Source in this repo | Install to | Invoke |
21
21
  |---|---|---|---|
22
22
  | Cursor | `integrations/cursor/commands/frontier.md` | `.cursor/commands/frontier.md` (per-repo) or `~/.cursor/commands/` (global) | `/frontier` |
23
- | Codex (CLI + IDE/Desktop) | `integrations/codex/prompts/frontier.md` | `~/.codex/prompts/frontier.md` (global only) | `/frontier` |
23
+ | Codex (CLI + IDE/Desktop) | `integrations/codex/skills/frontier/SKILL.md` (and `terse`, `settings`, `update`) | `.agents/skills/<name>/SKILL.md` (per-repo) or `~/.agents/skills/<name>/SKILL.md` (global) | `/frontier` |
24
24
 
25
25
  After adding a file, restart the tool or open a new chat so it loads. Both runtimes
26
26
  expand `$ARGUMENTS` to the full argument string — `/frontier fusion opus-gpt` passes
@@ -60,7 +60,7 @@ updates are a single invocation:
60
60
  | Runtime | Source in this repo | Install to | Invoke |
61
61
  |---|---|---|---|
62
62
  | Cursor | `integrations/cursor/commands/update.md` | `.cursor/commands/update.md` (per-repo) or `~/.cursor/commands/` (global) | `/update` |
63
- | Codex (CLI + IDE/Desktop) | `integrations/codex/prompts/update.md` | `~/.codex/prompts/update.md` (global only) | `/update` |
63
+ | Codex (CLI + IDE/Desktop) | `integrations/codex/skills/update/SKILL.md` | `.agents/skills/update/SKILL.md` (per-repo) or `~/.agents/skills/update/SKILL.md` (global) | `/update` |
64
64
 
65
65
  **Version model:** Maestro pins no version for portable files. Fetching from
66
66
  latest `main` always resolves the newest committed code — no manual version bump
@@ -71,13 +71,20 @@ needed per release.
71
71
  - **No auto-run.** Neither runtime has a `UserPromptSubmit` hook, so arming a mode
72
72
  (`mode fusion`) only persists state — nothing fuses later prompts automatically.
73
73
  Use `/frontier run "<prompt>"` to actually run the panel.
74
- - **Codex custom prompts are deprecated.** OpenAI's docs say *"Deprecated. Use
75
- skills for reusable prompts."* The prompt file still works in current Codex (CLI
76
- and IDE), but the forward path is a Codex *skill* (repo-shareable, implicitly
77
- invoked) — a different format than this template. This port favors the simple
78
- prompt file by design.
79
- - **Codex has no confirmed per-repo prompt path** `~/.codex/prompts/` is global
80
- per-user. Cursor's `.cursor/commands/` is the repo-scoped option.
74
+ - **Codex uses skills, not prompts.** `maestro install --target codex` installs the
75
+ `frontier`, `terse`, `settings`, and `update` skills as Codex skills
76
+ (no-clobber) to `.agents/skills/<name>/SKILL.md` (per-repo) or
77
+ `~/.agents/skills/<name>/SKILL.md` (global). The deprecated
78
+ `~/.codex/prompts/frontier.md` prompt file remains as a compatibility bridge but
79
+ the canonical path is the skill.
80
+ - **Codex per-repo skill path:** `.agents/skills/<name>/SKILL.md` is the
81
+ repo-scoped option for Codex skills. The global path is
82
+ `~/.agents/skills/<name>/SKILL.md`.
83
+ - **Maestro Frontier ON indicator (Codex only).** When
84
+ `maestro frontier status --scope codex` reports mode != off, the `frontier` skill
85
+ instructs Codex to lead its reply with `Maestro Frontier ON (<label>)` —
86
+ `single · <model>` or `fusion · <preset>`. When mode is off, no indicator line
87
+ appears. This is Codex-scoped only and has no effect on Claude Code.
81
88
  - **Requires `frontier/` and `bin/maestro.cjs` in the project.** The command runs
82
89
  `maestro frontier ...` (or `node bin/maestro.cjs frontier ...`) from the repo root,
83
90
  so the engine must have been copied in during install.
@@ -0,0 +1,91 @@
1
+ ---
2
+ name: frontier
3
+ description: Maestro Frontier local multi-CLI fusion engine — switch mode, or run a prompt through the panel
4
+ ---
5
+
6
+ Drive the **Maestro Frontier** engine — a zero-dependency local multi-CLI fusion
7
+ engine (a parallel panel of local CLIs → a judge model's analysis → a grounded
8
+ synthesis). It is the same engine the Claude Code plugin ships; here it runs
9
+ through the `maestro` CLI with `--scope codex`.
10
+
11
+ **This is a typing shortcut, not a prompt hook.** Codex has no automatic
12
+ prompt hook, so arming a mode does **not** auto-run the engine on later prompts —
13
+ it only persists the mode. To actually fuse a prompt, invoke `run` explicitly
14
+ (step 3).
15
+
16
+ Map the user's request to one engine CLI call and run it from the repo root.
17
+ Do not edit the engine's state file by hand.
18
+
19
+ ## 1. Switch mode
20
+
21
+ Persists to `~/.config/maestro/frontier-state.codex.json`; default `off`.
22
+ `--scope codex` keeps Codex's armed mode independent from Claude Code, Cline,
23
+ Cursor, and Gemini on the same machine:
24
+
25
+ ```bash
26
+ maestro frontier mode off --scope codex
27
+ maestro frontier mode single --model <model> --scope codex
28
+ maestro frontier mode fusion --preset <preset> --scope codex
29
+ maestro frontier mode fusion --preset custom --models <a,b,c> --scope codex
30
+ maestro frontier mode fusion --preset <preset> --judge <model> --synth <model> --scope codex
31
+ ```
32
+
33
+ Models: `opus` (Claude Opus 4.8, needs `claude`), `gpt-5.5` (needs `codex`),
34
+ `gemini` (needs `gemini`). Presets: `opus-duo`, `opus-gpt`, `gpt-duo`,
35
+ `frontier-trio`, `custom`. Judge + synth default to Opus; `--judge`/`--synth`
36
+ override for any preset (e.g. `--judge opus --synth gpt-5.5`). `gpt-duo` runs
37
+ judge + synth on GPT-5.5 — a Codex-only fusion that needs no `claude`.
38
+
39
+ ## 2. Show current mode/preset
40
+
41
+ ```bash
42
+ maestro frontier status --scope codex
43
+ ```
44
+
45
+ ## 3. Run a prompt through the current mode
46
+
47
+ This is the action that actually fuses, since nothing auto-runs here. Set a
48
+ mode first (step 1), then:
49
+
50
+ ```bash
51
+ maestro frontier run "<prompt>" --scope codex
52
+ ```
53
+
54
+ - `off`: prints a notice, spawns nothing.
55
+ - `single`: dispatches the one selected CLI, prints its answer.
56
+ - `fusion`: runs the panel in parallel → judge → synthesizer; prints the final
57
+ answer (a one-line run meta goes to stderr). Report stdout verbatim.
58
+
59
+ On error the engine prints `ERROR [<reason>]: <detail>` to stderr and exits
60
+ non-zero — relay the reason.
61
+
62
+ ## Indicator instruction
63
+
64
+ <!-- INDICATOR INSTRUCTION — apply every turn while this skill is active -->
65
+
66
+ Before composing your first reply each turn, run:
67
+
68
+ ```bash
69
+ maestro frontier status --scope codex
70
+ ```
71
+
72
+ Parse the returned JSON `mode` field:
73
+
74
+ - If `mode` is not `off`, make the **first line** of your reply exactly:
75
+ `Maestro Frontier ON (<label>)`
76
+ where `<label>` is formed as follows:
77
+ - single mode → `single · <model>` (e.g. `single · opus`)
78
+ - fusion mode → `fusion · <preset>` (e.g. `fusion · frontier-trio`);
79
+ for a custom preset use `fusion · custom (<model1>, <model2>, ...)`
80
+ - If `mode` is `off`, output no indicator line.
81
+
82
+ <!-- END INDICATOR INSTRUCTION -->
83
+
84
+ ## Notes
85
+
86
+ - Real `single`/`fusion` runs spawn local CLIs and cost tokens; use small prompts.
87
+ `off` is free.
88
+ - Each model's CLI must be on `PATH`, or point at a specific build with
89
+ `MAESTRO_CLAUDE_BIN` / `MAESTRO_CODEX_BIN` / `MAESTRO_GEMINI_BIN`.
90
+ - Requires `maestro` on `PATH` (installed during Maestro setup). If it is
91
+ missing, install Maestro first.
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: settings
3
+ description: View and change Maestro toggles (terse, frontier, context-bar) via the settings CLI
4
+ ---
5
+
6
+ View or change **Maestro settings** for this project. The settings CLI manages
7
+ the three primary toggles: `terse`, `frontier`, and `context-bar`.
8
+
9
+ When the user invokes this skill, run the settings CLI from the repo root.
10
+ Do not edit settings files by hand.
11
+
12
+ ## Discover available commands
13
+
14
+ ```bash
15
+ node settings/cli.cjs --help
16
+ ```
17
+
18
+ If `settings/cli.cjs` is not present, run `maestro --help` to locate the
19
+ correct entry point.
20
+
21
+ ## Common operations
22
+
23
+ List current settings:
24
+
25
+ ```bash
26
+ node settings/cli.cjs
27
+ ```
28
+
29
+ Set a toggle:
30
+
31
+ ```bash
32
+ node settings/cli.cjs terse <off|lite|full|ultra>
33
+ node settings/cli.cjs frontier <off|single|fusion>
34
+ node settings/cli.cjs context-bar <on|off>
35
+ ```
36
+
37
+ If a subcommand name or argument differs from the above, follow the usage
38
+ printed by `--help` — do not guess flags.
39
+
40
+ ## Notes
41
+
42
+ - Changes persist in Maestro's settings store and apply to subsequent agent
43
+ turns in this project.
44
+ - Requires `node` on `PATH` and Maestro installed in the project root. If
45
+ `settings/cli.cjs` is missing, re-run the installer:
46
+ `npx github:mbanderas/maestro install --target codex`
@@ -0,0 +1,49 @@
1
+ ---
2
+ name: terse
3
+ description: Toggle Maestro terse output level (lite, full, ultra, off) via the settings CLI
4
+ ---
5
+
6
+ Toggle the **Maestro terse** output level for this environment. Terse mode
7
+ condenses agent replies; levels range from `off` (default verbosity) through
8
+ `lite`, `full`, and `ultra` (most compressed).
9
+
10
+ When the user invokes this skill, run the settings CLI to read or change the
11
+ terse level. Do not edit settings files by hand.
12
+
13
+ ## Check current terse level
14
+
15
+ ```bash
16
+ node settings/cli.cjs --help
17
+ ```
18
+
19
+ Consult the help output for the exact read subcommand, then run it. If
20
+ `settings/cli.cjs` is not present, run `maestro --help` to discover the
21
+ correct path.
22
+
23
+ ## Set terse level
24
+
25
+ ```bash
26
+ node settings/cli.cjs terse <level>
27
+ ```
28
+
29
+ Valid levels: `off` | `lite` | `full` | `ultra`
30
+
31
+ Examples:
32
+
33
+ ```bash
34
+ node settings/cli.cjs terse off
35
+ node settings/cli.cjs terse lite
36
+ node settings/cli.cjs terse full
37
+ node settings/cli.cjs terse ultra
38
+ ```
39
+
40
+ If the CLI rejects an argument or the subcommand name differs, run
41
+ `node settings/cli.cjs --help` first and follow the printed usage.
42
+
43
+ ## Notes
44
+
45
+ - The change persists in Maestro's settings store; it applies to subsequent
46
+ agent turns in this project.
47
+ - Requires `node` on `PATH` and Maestro installed in the project root. If
48
+ `settings/cli.cjs` is missing, re-run the Maestro installer:
49
+ `npx github:mbanderas/maestro install --target codex`
@@ -0,0 +1,29 @@
1
+ ---
2
+ name: update
3
+ description: Update Maestro to the latest version by re-running the installer for Codex
4
+ ---
5
+
6
+ Update **Maestro** to the latest marketplace code. This re-runs the installer,
7
+ which pulls the current release and overwrites the local Maestro files in place.
8
+
9
+ When the user invokes this skill, run the installer from the repo root:
10
+
11
+ ```bash
12
+ npx github:mbanderas/maestro install --target codex
13
+ ```
14
+
15
+ The installer is idempotent — it is safe to re-run against an existing
16
+ installation. It will:
17
+
18
+ - Pull the latest Maestro source from the repository.
19
+ - Overwrite skills, hooks, and settings scaffolding with the new versions.
20
+ - Leave project-local configuration (state files, secrets) untouched.
21
+
22
+ ## Notes
23
+
24
+ - Requires `node` and `npx` on `PATH`.
25
+ - Run from the project root so the installer targets the correct directory.
26
+ - After the installer completes, restart the Codex session (or reload the
27
+ project) so updated skills and hooks take effect.
28
+ - If `npx` is unavailable, clone `https://github.com/mbanderas/maestro`
29
+ manually and follow the repository's install instructions.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@maestrofrontier/frontier",
3
- "version": "1.4.1",
3
+ "version": "1.4.3",
4
4
  "description": "Achieve Frontier AI performance in your CLI by fusing the model CLIs you already run. Maestro Frontier is an opt-in, zero-dependency local multi-CLI fusion engine for AI coding agents: fan a prompt across a panel of any 1 to 8 local model CLIs you pick, have a judge model and a synthesizer you choose read the answers into a structured analysis and write one grounded synthesis (default Opus 4.8, override either with --judge/--synth). On a 100-task benchmark every fusion panel outscored its individual member models. Three adapters ship today: Opus 4.8, GPT-5.5, Gemini 3.1 Pro, with Kimi, DeepSeek, GLM, and Qwen to follow. Off, single, and fusion modes switch via /maestro:frontier. Built on Maestro orchestration discipline: decision-gated routing, verified done-claims, surgical scope, and structural enforcement hooks.",
5
5
  "keywords": ["multi-cli-fusion", "fusion-engine", "frontier", "multi-agent", "orchestration", "claude-code", "gemini", "codex", "agents", "hooks", "doctrine"],
6
6
  "license": "MIT",
@@ -49,6 +49,11 @@ const WRAPPER_MAP = {
49
49
  },
50
50
  };
51
51
 
52
+ // Codex skill templates installed alongside the deprecated codex wrapper.
53
+ // Codex loads skills from <project>/.agents/skills/<name>/SKILL.md (project)
54
+ // or ~/.agents/skills/<name>/SKILL.md (global). No-clobber, like wrappers.
55
+ const CODEX_SKILLS = ['frontier', 'terse', 'settings', 'update'];
56
+
52
57
  // Runtime adapter per target. The adapter imports @AGENTS.md (Cursor has no
53
58
  // imports, so .cursorrules embeds the kernel). codex/cline/windsurf read
54
59
  // AGENTS.md directly and need no adapter.
@@ -412,6 +417,58 @@ function installEngine(projectRoot, dryRun, log) {
412
417
  return ok;
413
418
  }
414
419
 
420
+ /**
421
+ * Copy a single package template file to dest, no-clobber. Skips when dest
422
+ * already exists, refuses symlinks, honors dry-run. Reuses safeMkdirp +
423
+ * safeWrite. Shared by wrapper and Codex-skill installs.
424
+ * @param {string} src absolute source path (under PKG_ROOT)
425
+ * @param {string} dest absolute destination path
426
+ * @param {string} label short tag for logs (e.g. "wrapper", "codex-skill")
427
+ * @param {boolean} dryRun
428
+ * @param {(msg: string) => void} log
429
+ * @returns {boolean} true = success (wrote, skipped, or planned), false = error
430
+ */
431
+ function installNoClobberFile(src, dest, label, dryRun, log) {
432
+ // Check if dest exists already (no-clobber)
433
+ let destStat;
434
+ try { destStat = fs.lstatSync(dest); } catch { destStat = null; }
435
+
436
+ if (destStat) {
437
+ if (destStat.isSymbolicLink()) {
438
+ log(`ERROR: ${label} dest is a symlink — refusing: ${dest}`);
439
+ return false;
440
+ }
441
+ log(`[${label}] skipped (exists, not clobbered): ${dest}`);
442
+ return true;
443
+ }
444
+
445
+ let srcContent;
446
+ try {
447
+ srcContent = fs.readFileSync(src, 'utf8');
448
+ } catch (err) {
449
+ log(`ERROR: cannot read template ${src}: ${err.message}`);
450
+ return false;
451
+ }
452
+
453
+ if (dryRun) {
454
+ log(`[dry-run] would create ${dest}`);
455
+ return true;
456
+ }
457
+
458
+ if (!safeMkdirp(dest)) {
459
+ log(`ERROR: could not create parent dir for ${dest}`);
460
+ return false;
461
+ }
462
+
463
+ const res = safeWrite(dest, srcContent);
464
+ if (!res.ok) {
465
+ log(`ERROR: failed to write ${label} ${dest}: ${res.reason}`);
466
+ return false;
467
+ }
468
+ log(`[${label}] wrote ${dest}`);
469
+ return true;
470
+ }
471
+
415
472
  /**
416
473
  * Install wrapper file (no-clobber).
417
474
  * @param {string} target
@@ -449,44 +506,31 @@ function installWrapper(target, projectRoot, userGlobal, dryRun, log) {
449
506
  dest = path.join(projectRoot, mapping.proj);
450
507
  }
451
508
 
452
- // Check if dest exists already (no-clobber)
453
- let destStat;
454
- try { destStat = fs.lstatSync(dest); } catch { destStat = null; }
455
-
456
- if (destStat) {
457
- if (destStat.isSymbolicLink()) {
458
- log(`ERROR: wrapper dest is a symlink — refusing: ${dest}`);
459
- return false;
460
- }
461
- log(`[wrapper] skipped (exists, not clobbered): ${dest}`);
462
- return true;
463
- }
464
-
465
- let srcContent;
466
- try {
467
- srcContent = fs.readFileSync(src, 'utf8');
468
- } catch (err) {
469
- log(`ERROR: cannot read template ${src}: ${err.message}`);
470
- return false;
471
- }
472
-
473
- if (dryRun) {
474
- log(`[dry-run] would create ${dest}`);
475
- return true;
476
- }
509
+ return installNoClobberFile(src, dest, 'wrapper', dryRun, log);
510
+ }
477
511
 
478
- if (!safeMkdirp(dest)) {
479
- log(`ERROR: could not create parent dir for ${dest}`);
480
- return false;
481
- }
512
+ /**
513
+ * Install the Codex skill templates (no-clobber) alongside the codex wrapper.
514
+ * Project mode -> <project>/.agents/skills/<name>/SKILL.md; --user/global mode
515
+ * -> ~/.agents/skills/<name>/SKILL.md (mirrors installWrapper's dest logic).
516
+ * @param {string} projectRoot
517
+ * @param {boolean} userGlobal
518
+ * @param {boolean} dryRun
519
+ * @param {(msg: string) => void} log
520
+ * @returns {boolean}
521
+ */
522
+ function installCodexSkills(projectRoot, userGlobal, dryRun, log) {
523
+ const skillsRoot = userGlobal
524
+ ? path.join(os.homedir(), '.agents', 'skills')
525
+ : path.join(projectRoot, '.agents', 'skills');
482
526
 
483
- const res = safeWrite(dest, srcContent);
484
- if (!res.ok) {
485
- log(`ERROR: failed to write wrapper ${dest}: ${res.reason}`);
486
- return false;
527
+ let ok = true;
528
+ for (const name of CODEX_SKILLS) {
529
+ const src = path.join(PKG_ROOT, 'integrations', 'codex', 'skills', name, 'SKILL.md');
530
+ const dest = path.join(skillsRoot, name, 'SKILL.md');
531
+ if (!installNoClobberFile(src, dest, 'codex-skill', dryRun, log)) ok = false;
487
532
  }
488
- log(`[wrapper] wrote ${dest}`);
489
- return true;
533
+ return ok;
490
534
  }
491
535
 
492
536
  // ---- main entry ----
@@ -537,6 +581,12 @@ function run(argv) {
537
581
  if (!installWrapper(target, project, userGlobal, dryRun, log)) anyError = true;
538
582
  }
539
583
 
584
+ // 3b. Codex skills — the .agents/skills/<name>/SKILL.md set ships alongside
585
+ // the deprecated codex prompt wrapper.
586
+ if (target === 'codex') {
587
+ if (!installCodexSkills(project, userGlobal, dryRun, log)) anyError = true;
588
+ }
589
+
540
590
  if (anyError) {
541
591
  log('install completed with errors (see above)');
542
592
  return 1;