devlyn-cli 1.11.0 → 1.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md CHANGED
@@ -72,7 +72,8 @@ Optional flags:
72
72
  - `--skip-review` — skip team-review phase
73
73
  - `--skip-clean` — skip clean phase
74
74
  - `--skip-docs` — skip update-docs phase
75
- - `--with-codex [evaluate|review|both]` — use OpenAI Codex as cross-model evaluator/reviewer (requires codex-mcp-server)
75
+ - `--engine auto|codex|claude` — intelligent model routing. `auto` routes each phase and team role to the optimal model (Claude or Codex GPT-5.4) based on benchmark data. `codex` forces Codex for implementation, Claude for evaluation. `claude` (default) uses Claude for everything. Requires codex-mcp-server.
76
+ - `--with-codex [evaluate|review|both]` — (legacy, superseded by `--engine`) use OpenAI Codex as cross-model evaluator/reviewer (requires codex-mcp-server)
76
77
 
77
78
  ## Preflight Check (Post-Roadmap Verification)
78
79
 
@@ -91,6 +92,7 @@ Optional flags:
91
92
  - `--autofix` — auto-promote CRITICAL/HIGH findings and run auto-resolve
92
93
  - `--skip-browser` — skip browser validation
93
94
  - `--skip-docs` — skip documentation audit
95
+ - `--engine auto|codex|claude` — route code-auditor to Codex (better at code analysis), docs/browser to Claude
94
96
 
95
97
  **Recommended workflow**: `/devlyn:ideate` → `/devlyn:auto-resolve` (repeat) → `/devlyn:preflight` → fix gaps → `/devlyn:preflight` (verify)
96
98
 
package/README.md CHANGED
@@ -79,6 +79,7 @@ Build → Build Gate → Browser Test → Evaluate → Fix Loop → Simplify →
79
79
 
80
80
  Skip phases you don't need: `--skip-browser`, `--skip-review`, `--skip-clean`, `--skip-docs`, `--skip-build-gate`, `--max-rounds 6`
81
81
  Customize the build gate: `--build-gate strict` (warnings = errors), `--build-gate no-docker` (skip Docker builds for speed)
82
+ Use dual-model routing: `--engine auto` (Codex builds, Claude evaluates — see below)
82
83
 
83
84
  ### Step 3 — Verify with `/devlyn:preflight`
84
85
 
@@ -100,18 +101,64 @@ Reads every commitment from your vision, roadmap, and item specs, then audits th
100
101
 
101
102
  Confirmed gaps become new roadmap items — feed them back into auto-resolve. Use `--autofix` to do this automatically, or `--phase 2` to check only one phase.
102
103
 
103
- ### Bonus — Dual-Model Mode with Codex
104
+ ### Bonus — Intelligent Model Routing with `--engine`
104
105
 
105
106
  Install the Codex MCP server during setup, then:
106
107
 
107
108
  ```
108
- /devlyn:auto-resolve "fix the auth bug" --with-codex
109
+ /devlyn:auto-resolve "fix the auth bug" --engine auto
110
+ ```
111
+
112
+ **`--engine auto`** routes each pipeline phase and team role to the optimal model (Claude Opus 4.6 or GPT-5.4) — validated through A/B testing, not just benchmarks.
113
+
114
+ > `--engine auto` (recommended) · `--engine codex` (force Codex for build) · `--engine claude` (default, Claude only)
115
+
116
+ Works across the full pipeline:
117
+
118
+ ```
119
+ /devlyn:auto-resolve "implement feature" --engine auto
120
+ /devlyn:ideate "plan new project" --engine auto
121
+ /devlyn:preflight --engine auto
109
122
  ```
110
123
 
111
- Claude builds, **OpenAI Codex evaluates independently** — two models collaborating, catching what a single model misses.
124
+ <details>
125
+ <summary><strong>How routing works</strong> — A/B tested on 6 roles, 11 integration tests</summary>
126
+
127
+ **Pipeline phases** — builder and critic are always different models (GAN dynamic):
128
+
129
+ | Phase | Model | Why |
130
+ |---|---|---|
131
+ | Build (implementation) | **Codex GPT-5.4** | SWE-bench Pro +11.7pp for hard coding tasks |
132
+ | Evaluate | **Claude** | Long-context (MRCR +28pp) for full-diff grading |
133
+ | Fix Loop | **Codex GPT-5.4** | Same advantage as Build |
134
+ | Challenge | **Claude** | Fresh skeptical review needs different model family |
135
+ | Browser Validate | **Claude** | Chrome MCP session-bound |
136
+
137
+ **Team roles** — each of 21 roles routes to the best model:
138
+
139
+ | Engine | Roles | Examples |
140
+ |---|---|---|
141
+ | Claude (11) | Analysis, design, architecture | root-cause-analyst, architecture-reviewer, ux-designer, product-analyst |
142
+ | Codex (4) | Code generation, performance | implementation-planner, test-engineer, performance-engineer |
143
+ | Dual (6) | Both models find unique issues | security-auditor, quality-reviewer, api-designer |
144
+
145
+ **Key finding**: Benchmark predictions were only 33% accurate. 4 of 6 A/B-tested roles needed routing changes after real testing — proving that benchmarks alone are insufficient for optimal routing.
146
+
147
+ </details>
148
+
149
+ <details>
150
+ <summary>Legacy: <code>--with-codex</code> (superseded by <code>--engine</code>)</summary>
151
+
152
+ ```
153
+ /devlyn:auto-resolve "fix the auth bug" --with-codex
154
+ ```
112
155
 
113
156
  > `--with-codex evaluate` (default) · `--with-codex review` · `--with-codex both`
114
157
 
158
+ `--engine auto` subsumes `--with-codex both` with broader coverage — Codex is used for build, fix, and 4 team roles, not just evaluate/review.
159
+
160
+ </details>
161
+
115
162
  ---
116
163
 
117
164
  ## Manual Commands
@@ -181,6 +228,7 @@ Selected during install. Run `npx devlyn-cli` again to add more.
181
228
 
182
229
  | Skill | Description |
183
230
  |---|---|
231
+ | `asset-creator` | AI pixel art game asset pipeline — generate, chroma-key, catalog |
184
232
  | `cloudflare-nextjs-setup` | Cloudflare Workers + Next.js with OpenNext |
185
233
  | `generate-skill` | Create Claude Code skills following Anthropic best practices |
186
234
  | `prompt-engineering` | Claude 4 prompt optimization |
@@ -210,7 +258,7 @@ Selected during install. Run `npx devlyn-cli` again to add more.
210
258
 
211
259
  | Server | Description |
212
260
  |---|---|
213
- | `codex-cli` | Codex MCP server — enables `--with-codex` dual-model mode |
261
+ | `codex-cli` | Codex MCP server — enables `--engine auto/codex` intelligent model routing and legacy `--with-codex` mode |
214
262
  | `playwright` | Playwright MCP — powers browser-validate Tier 2 |
215
263
 
216
264
  </details>
package/bin/devlyn.js CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  const fs = require('fs');
4
4
  const path = require('path');
5
+ const os = require('os');
5
6
  const readline = require('readline');
6
7
  const { execSync } = require('child_process');
7
8
 
@@ -581,6 +582,29 @@ async function init(skipPrompts = false) {
581
582
  log(' → settings.json (agent teams + pipeline permissions)', 'dim');
582
583
  }
583
584
 
585
+ // Configure global Claude Code settings (~/.claude/settings.json)
586
+ const globalClaudeDir = path.join(os.homedir(), '.claude');
587
+ const globalSettingsPath = path.join(globalClaudeDir, 'settings.json');
588
+ let globalSettings = {};
589
+ if (fs.existsSync(globalSettingsPath)) {
590
+ try {
591
+ globalSettings = JSON.parse(fs.readFileSync(globalSettingsPath, 'utf8'));
592
+ } catch {
593
+ globalSettings = {};
594
+ }
595
+ }
596
+ if (!globalSettings.env) globalSettings.env = {};
597
+ let globalSettingsChanged = false;
598
+ if (!globalSettings.env.CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING) {
599
+ globalSettings.env.CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING = '1';
600
+ globalSettingsChanged = true;
601
+ }
602
+ if (globalSettingsChanged) {
603
+ if (!fs.existsSync(globalClaudeDir)) fs.mkdirSync(globalClaudeDir, { recursive: true });
604
+ fs.writeFileSync(globalSettingsPath, JSON.stringify(globalSettings, null, 2) + '\n');
605
+ log(' → ~/.claude/settings.json (disabled adaptive thinking)', 'dim');
606
+ }
607
+
584
608
  // Install agents for other detected CLIs
585
609
  const detected = detectOtherCLIs();
586
610
  if (detected.length > 0) {
@@ -33,28 +33,36 @@ This pipeline runs hands-free. The user launches it to walk away and come back t
33
33
  - `--skip-docs` (false) — skip update-docs phase
34
34
  - `--skip-build-gate` (false) — skip the deterministic build gate (Phase 1.4). Not recommended — the build gate is the primary defense against "tests pass locally, breaks in CI/Docker/production" class of bugs.
35
35
  - `--build-gate MODE` (auto) — controls build gate behavior. `auto`: detect project type and run appropriate build/typecheck/lint commands; if Dockerfile(s) are present, Docker builds are included automatically. `strict`: auto + treat warnings as errors. `no-docker`: auto but skip Docker builds even if Dockerfiles exist (for faster iteration). `skip`: same as --skip-build-gate.
36
- - `--with-codex` (false) — use OpenAI Codex as a cross-model evaluator/reviewer via `mcp__codex-cli__*` MCP tools. Accepts: `evaluate`, `review`, or `both` (default when flag is present without value). When enabled, Codex provides an independent second opinion from a different model family, creating a GAN-like dynamic where Claude builds and Codex critiques.
36
+ - `--with-codex` (false) — use OpenAI Codex as a cross-model evaluator/reviewer via `mcp__codex-cli__*` MCP tools. Accepts: `evaluate`, `review`, or `both` (default when flag is present without value). When enabled, Codex provides an independent second opinion from a different model family, creating a GAN-like dynamic where Claude builds and Codex critiques. **Ignored if `--engine` is set** (engine routing subsumes this).
37
+ - `--engine MODE` (claude) — controls which model handles each pipeline phase and team role. Modes:
38
+ - `claude` (default): all phases use Claude subagents. Current behavior, no Codex calls.
39
+ - `codex`: Codex handles implementation/analysis phases, Claude handles orchestration, evaluation, and Chrome MCP.
40
+ - `auto`: each phase and team role routes to the optimal model based on benchmark data. Recommended when Codex MCP server is available. Subsumes `--with-codex both`.
37
41
 
38
42
  Flags can be passed naturally: `/devlyn:auto-resolve fix the auth bug --max-rounds 3 --skip-docs`
39
- Codex examples: `--with-codex` (both), `--with-codex evaluate`, `--with-codex review`
43
+ Engine examples: `--engine auto`, `--engine codex`, `--engine claude`
44
+ Codex examples (legacy): `--with-codex` (both), `--with-codex evaluate`, `--with-codex review`
40
45
  If no flags are present, use defaults.
41
46
 
42
- 3. **If `--with-codex` is enabled**: Read `references/codex-integration.md` and run the "PRE-FLIGHT CHECK" section to verify Codex MCP server availability before proceeding.
47
+ 3. **If `--engine` is `auto` or `codex`**: Read `references/engine-routing.md` for the full routing table. Then call `mcp__codex-cli__ping` to verify the Codex MCP server is available. If ping fails, warn the user and offer: [1] Continue with `--engine claude` (fallback), [2] Abort. If `--engine` is not set but `--with-codex` is enabled, read `references/codex-integration.md` instead and run its pre-flight check.
43
48
 
44
49
  4. Announce the pipeline plan:
45
50
  ```
46
51
  Auto-resolve pipeline starting
47
52
  Task: [extracted task description]
53
+ Engine: [auto / codex / claude]
48
54
  Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → Challenge → [Security] → [Clean] → [Docs]
49
55
  Max evaluation rounds: [N]
50
- Cross-model evaluation (Codex): [evaluate / review / both / disabled]
56
+ Cross-model evaluation (Codex): [evaluate / review / both / disabled / subsumed by --engine]
51
57
  ```
52
58
 
53
59
  ## PHASE 1: BUILD
54
60
 
61
+ **Engine routing**: If `--engine` is `auto` or `codex`, read `references/engine-routing.md` "How to Spawn a Codex BUILD/FIX Agent" section. Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "workspace-write"`, `fullAuto: true`, and the full agent prompt below as the `prompt` parameter. If `--engine` is `claude` (default), spawn a Claude subagent as described below.
62
+
55
63
  Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to investigate and implement the fix. The subagent does NOT have access to skills, so include all necessary instructions inline.
56
64
 
57
- Agent prompt — pass this to the Agent tool:
65
+ Agent prompt — pass this to the Agent tool (or to `mcp__codex-cli__codex` prompt if engine routes to Codex):
58
66
 
59
67
  Investigate and implement the following task. Work through these phases in order:
60
68
 
@@ -74,6 +82,8 @@ Read relevant files in parallel. Build a clear picture of what exists and what n
74
82
  - UI/UX: product-designer + ux-designer + ui-designer (+ accessibility-auditor as needed)
75
83
  Each teammate investigates from their perspective and sends findings back.
76
84
 
85
+ **Engine routing for teammates**: If the orchestrator's `--engine` is `auto` or `codex`, read `references/engine-routing.md` for per-role routing. Roles marked **Codex** are called via `mcp__codex-cli__codex` instead of spawning Agent teammates — include the full role prompt and issue context inline. Roles marked **Claude** use normal Agent teammates. Roles marked **Dual** run both in parallel and merge findings. The orchestrator relays Codex role outputs to Claude teammates that need them.
86
+
77
87
  **Phase D — Synthesize and implement**: After all teammates report, compile findings into a unified plan. Implement the solution — no workarounds, no hardcoded values, no silent error swallowing. For bugs: write a failing test first, then fix. For features: implement following existing patterns, then write tests. For refactors: ensure tests pass before and after.
78
88
 
79
89
  **Phase E — Update done criteria**: Mark each criterion in `.devlyn/done-criteria.md` as satisfied. Run the full test suite.
@@ -127,9 +137,11 @@ Triggered only when PHASE 1.4 returns FAIL.
127
137
 
128
138
  Track a round counter (shared with the main fix loop counter against `max-rounds`). If `round >= max-rounds`, stop with a clear failure report — do NOT continue to evaluate/browser/etc. Code that doesn't build cannot be meaningfully evaluated or tested.
129
139
 
140
+ **Engine routing**: Same as PHASE 2.5 FIX LOOP — if `--engine` is `auto` or `codex`, use `mcp__codex-cli__codex` with `workspace-write` and `fullAuto: true`. If `--engine` is `claude`, spawn a Claude subagent.
141
+
130
142
  Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
131
143
 
132
- Agent prompt — pass this to the Agent tool:
144
+ Agent prompt — pass this to the Agent tool (or to `mcp__codex-cli__codex` prompt if engine routes to Codex):
133
145
 
134
146
  Read `.devlyn/BUILD-GATE.md` — it contains deterministic build/typecheck/lint failures from real compiler output. These are not opinions; the compiler rejected this code. Fix every listed failure at the root cause level.
135
147
 
@@ -220,7 +232,8 @@ Do NOT delete `.devlyn/done-criteria.md` or `.devlyn/EVAL-FINDINGS.md` — the o
220
232
  **After the agent completes**:
221
233
  1. Read `.devlyn/EVAL-FINDINGS.md`
222
234
  2. Extract the verdict
223
- 3. **If `--with-codex` includes `evaluate` or `both`**: Read `references/codex-integration.md` and follow the "PHASE 2-CODEX: CROSS-MODEL EVALUATE" section. This runs Codex as a second evaluator and merges findings into `EVAL-FINDINGS.md`.
235
+ 3. **If `--engine` is `auto` or `codex`**: The evaluate phase always uses Claude (see `references/engine-routing.md`). When `--engine auto`, the builder was Codex Claude evaluating Codex's work creates the GAN dynamic automatically. No separate Codex evaluation pass is needed.
236
+ **If `--engine` is not set and `--with-codex` includes `evaluate` or `both`** (legacy): Read `references/codex-integration.md` and follow the "PHASE 2-CODEX: CROSS-MODEL EVALUATE" section. This runs Codex as a second evaluator and merges findings into `EVAL-FINDINGS.md`.
224
237
  4. Branch on verdict (from the merged findings if Codex was used):
225
238
  - `PASS` → skip to PHASE 3
226
239
  - `PASS WITH ISSUES` → go to PHASE 2.5 (fix loop) — LOW-only issues are still issues; fix them
@@ -232,9 +245,11 @@ Do NOT delete `.devlyn/done-criteria.md` or `.devlyn/EVAL-FINDINGS.md` — the o
232
245
 
233
246
  Track the current round number. If `round >= max-rounds`, stop the loop and proceed to PHASE 3 with a warning that unresolved findings remain.
234
247
 
248
+ **Engine routing**: If `--engine` is `auto` or `codex`, call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "workspace-write"`, `fullAuto: true`, and the fix prompt below. Use a fresh call each round (no sessionId reuse — sandbox/fullAuto only apply on first call of a session). If `--engine` is `claude`, spawn a Claude subagent as below.
249
+
235
250
  Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to fix the evaluation findings.
236
251
 
237
- Agent prompt — pass this to the Agent tool:
252
+ Agent prompt — pass this to the Agent tool (or to `mcp__codex-cli__codex` prompt if engine routes to Codex):
238
253
 
239
254
  Read `.devlyn/EVAL-FINDINGS.md` — it contains specific issues found by an independent evaluator. Fix every finding regardless of severity (CRITICAL, HIGH, MEDIUM, and LOW). The pipeline loops until the evaluator returns PASS — there is no "shippable with issues" shortcut.
240
255
 
@@ -268,11 +283,14 @@ Agent prompt — pass this to the Agent tool:
268
283
 
269
284
  Review all recent changes in this codebase (use `git diff main` and `git status` to determine scope). Assemble a review team using TeamCreate with specialized reviewers: security reviewer, quality reviewer, test analyst. Add UX reviewer, performance reviewer, or API reviewer based on the changes.
270
285
 
286
+ **Engine routing for reviewers**: If the orchestrator passed `--engine auto` or `--engine codex`, read `references/engine-routing.md` for per-role routing in the "team-review roles" table. Route each reviewer to Claude Agent or `mcp__codex-cli__codex` accordingly. For Dual roles (security-reviewer), run both models in parallel and merge findings per the "How to Spawn a Dual Role" section. For `--engine claude`, all reviewers are Claude Agent teammates.
287
+
271
288
  Each reviewer evaluates from their perspective, sends findings with file:line evidence grouped by severity (CRITICAL, HIGH, MEDIUM, LOW). After all reviewers report, synthesize findings, deduplicate, and fix any CRITICAL issues directly. For HIGH issues, fix if straightforward.
272
289
 
273
290
  Clean up the team after completion.
274
291
 
275
- **If `--with-codex` includes `review` or `both`**: Read `references/codex-integration.md` and follow the "PHASE 4B: CODEX REVIEW" section. This runs Codex's independent code review and reconciles findings with the Claude team review.
292
+ **If `--engine` is set**: engine routing already handles cross-model review via per-role routing skip the legacy `--with-codex` review step below.
293
+ **If `--with-codex` includes `review` or `both`** (legacy, only when `--engine` is not set): Read `references/codex-integration.md` and follow the "PHASE 4B: CODEX REVIEW" section. This runs Codex's independent code review and reconciles findings with the Claude team review.
276
294
 
277
295
  **After the review phase completes**:
278
296
  1. If CRITICAL issues remain unfixed, log a warning in the final report
@@ -403,6 +421,7 @@ After all phases complete:
403
421
  ### Auto-Resolve Pipeline Complete
404
422
 
405
423
  **Task**: [original task description]
424
+ **Engine**: [auto / codex / claude — if auto, note which phases used which model]
406
425
 
407
426
  **Pipeline Summary**:
408
427
  | Phase | Status | Notes |
@@ -1,6 +1,8 @@
1
- # Codex Cross-Model Integration
1
+ # Codex Cross-Model Integration (Legacy)
2
2
 
3
- Instructions for using OpenAI Codex as an independent evaluator/reviewer in the auto-resolve pipeline. Only read this file when `--with-codex` is enabled.
3
+ > **Note**: This file is the legacy `--with-codex` integration. For the newer `--engine` flag (which subsumes `--with-codex`), see `references/engine-routing.md`. Only read this file when `--with-codex` is enabled AND `--engine` is NOT set.
4
+
5
+ Instructions for using OpenAI Codex as an independent evaluator/reviewer in the auto-resolve pipeline.
4
6
 
5
7
  Codex is accessed via `mcp__codex-cli__*` MCP tools (provided by codex-mcp-server). This creates a GAN-like adversarial dynamic — Claude builds and Codex critiques, reducing shared blind spots between model families.
6
8
 
@@ -37,7 +39,8 @@ Call `mcp__codex-cli__codex` with:
37
39
  - `prompt`: Include the full content of `.devlyn/done-criteria.md` and the output of `git diff HEAD~1`. Ask Codex to evaluate the changes against the done criteria and report issues by severity (CRITICAL, HIGH, MEDIUM, LOW) with file:line references.
38
40
  - `workingDirectory`: the project root
39
41
  - `sandbox`: `"read-only"` (Codex should only read, not modify files)
40
- - `reasoningEffort`: `"high"`
42
+ - `reasoningEffort`: `"high"` (note: for `--engine auto`, the engine-routing.md uses `"xhigh"` by default)
43
+ - `model`: `"gpt-5.4"` (pass explicitly — the MCP schema default may be outdated)
41
44
 
42
45
  Example prompt to pass:
43
46
  ```
@@ -0,0 +1,205 @@
1
+ # Engine Routing: Intelligent Model Selection
2
+
3
+ Instructions for routing work to the optimal model (Claude or Codex) per role and phase. Only read this file when `--engine` is set to `auto` or `codex`.
4
+
5
+ The routing table below is derived from published benchmarks (April 2026) comparing Claude Opus 4.6 and GPT-5.4 across task-relevant dimensions. The principle: each role's work goes to the model that objectively performs better at that task type.
6
+
7
+ ---
8
+
9
+ ## Benchmark Basis
10
+
11
+ | Dimension | Claude Opus 4.6 | GPT-5.4 | Gap | Source |
12
+ |-----------|-----------------|---------|-----|--------|
13
+ | Long-context retrieval (256k) | 92% | ~64% | Claude +28pp | MRCR v2 |
14
+ | Graduate-level reasoning | 87.4% | 83.9% | Claude +3.5pp | GPQA Diamond |
15
+ | Hard coding problems | ~46% | 57.7% | Codex +11.7pp | SWE-bench Pro |
16
+ | Function-level code gen | 90.4% | 93.1% | Codex +2.7pp | HumanEval |
17
+ | Terminal/CLI tasks | 65.4% | 75.1% | Codex +9.7pp | Terminal-Bench 2.0 |
18
+ | Real-world issue resolution | ~80% | ~80% | Tied | SWE-bench Verified |
19
+ | Security vulnerability detection | — | — | Tied | Semgrep 2025 study |
20
+ | Agentic computer use | 72.7% | 75.0% | Codex +2.3pp | OSWorld |
21
+ | Ambiguous intent handling | Preferred by 70% devs | — | Claude | Developer surveys |
22
+
23
+ ---
24
+
25
+ ## Codex Call Defaults
26
+
27
+ Every Codex call in this file uses these defaults unless stated otherwise:
28
+
29
+ ```
30
+ model: "gpt-5.4"
31
+ reasoningEffort: "xhigh"
32
+ sandbox: varies per role (see table)
33
+ workingDirectory: project root
34
+ ```
35
+
36
+ The `model` field accepts any string — pass `"gpt-5.4"` even if the MCP schema lists older defaults. The Codex CLI resolves it.
37
+
38
+ ---
39
+
40
+ ## Role Routing Table
41
+
42
+ ### team-resolve roles
43
+
44
+ | Role | Engine | Sandbox | Rationale |
45
+ |------|--------|---------|-----------|
46
+ | root-cause-analyst | **Claude** | — | A/B test: Claude traced git history (15 tool calls) finding exact commit + unchecked migration plan. Codex analyzed structure well but lacked git history depth. Tool access > SWE-bench Pro advantage for this role. |
47
+ | test-engineer | **Codex** | workspace-write | Test code generation = HumanEval (+2.7pp), needs file write |
48
+ | security-auditor | **Dual** | read-only | Semgrep: both find unique vulns; GAN > single model |
49
+ | implementation-planner | **Codex** | read-only | Implementation planning = SWE-bench Pro (+11.7pp) |
50
+ | product-designer | **Claude** | — | Ambiguous requirements, user intent = Claude strength |
51
+ | ui-designer | **Claude** | — | Visual spec, design reasoning = non-coding task |
52
+ | ux-designer | **Claude** | — | User flow analysis = ambiguous intent handling |
53
+ | accessibility-auditor | **Claude** | — | A/B test: Claude found 12 issues (1 CRITICAL) vs Codex 4. WCAG auditing requires thoroughness and domain knowledge depth, not code generation speed. Claude 3x coverage. |
54
+ | product-analyst | **Claude** | — | Requirements clarity, scope judgment = ambiguity handling |
55
+ | architecture-reviewer | **Claude** | — | Codebase-wide pattern review = MRCR long-context (+28pp) |
56
+ | performance-engineer | **Codex** | read-only | Terminal tasks + algorithm analysis = Terminal-Bench (+9.7pp) |
57
+ | api-designer | **Dual** | read-only | A/B test: Claude found 9 issues, Codex found 6, with unique findings on both sides (Claude: --version, exit codes; Codex: YAML folded scalar parsing bug). Dual maximizes coverage for API surface review. |
58
+
59
+ ### team-review roles
60
+
61
+ | Role | Engine | Sandbox | Rationale |
62
+ |------|--------|---------|-----------|
63
+ | security-reviewer | **Dual** | read-only | Same as team-resolve security-auditor |
64
+ | quality-reviewer | **Dual** | read-only | A/B test: Claude found 14 issues (2 HIGH), Codex found 11 (3 HIGH), only ~6 overlap. Dual yields ~19 unique findings (+36-73% coverage). Both models find HIGH-severity issues the other misses. |
65
+ | test-analyst | **Codex** | workspace-write | Test gap analysis + test code suggestions |
66
+ | ux-reviewer | **Claude** | — | UX flow assessment = ambiguity handling |
67
+ | ui-reviewer | **Claude** | — | Design token consistency = non-coding task |
68
+ | accessibility-reviewer | **Claude** | — | Same rationale as team-resolve accessibility-auditor: Claude 3x finding coverage on WCAG audits |
69
+ | product-validator | **Claude** | — | Business logic intent = ambiguity handling |
70
+ | api-reviewer | **Dual** | read-only | Same rationale as team-resolve api-designer: both models find unique API issues |
71
+ | performance-reviewer | **Codex** | read-only | Algorithm complexity = Terminal-Bench (+9.7pp) |
72
+
73
+ ### Summary distribution
74
+
75
+ | Engine | team-resolve (12) | team-review (9) | Total |
76
+ |--------|-------------------|-----------------|-------|
77
+ | Claude | 7 | 4 | 11 |
78
+ | Codex | 2 | 2 | 4 |
79
+ | Dual | 3 | 3 | 6 |
80
+
81
+ ---
82
+
83
+ ## Pipeline Phase Routing (auto-resolve)
84
+
85
+ | Phase | --engine auto | --engine codex | --engine claude |
86
+ |-------|--------------|----------------|-----------------|
87
+ | BUILD (implementation) | **Codex** | Codex | Claude |
88
+ | BUILD GATE | bash (model-agnostic) | bash | bash |
89
+ | BROWSER VALIDATE | Claude (Chrome MCP only) | Claude | Claude |
90
+ | EVALUATE | **Claude** | Claude | Claude |
91
+ | FIX LOOP | **Codex** | Codex | Claude |
92
+ | SIMPLIFY | Claude | Codex | Claude |
93
+ | REVIEW (team) | **Mixed per table** | Codex all | Claude all |
94
+ | CHALLENGE | **Claude** | Claude | Claude |
95
+ | SECURITY REVIEW | **Dual** | Codex | Claude |
96
+ | CLEAN | Claude | Codex | Claude |
97
+ | DOCS | Claude | Codex | Claude |
98
+
99
+ Rationale for `--engine auto` choices:
100
+ - BUILD/FIX: Codex — SWE-bench Pro 57.7% vs 46%. The biggest model gap is in hard coding tasks.
101
+ - EVALUATE/CHALLENGE: Claude — evaluating a full diff requires long-context retrieval (MRCR +28pp) and skeptical reasoning (GPQA +3.5pp). Different model family from builder creates GAN dynamic.
102
+ - BROWSER: Claude — Chrome MCP tools are Claude Code session-bound.
103
+ - SECURITY: Dual — Semgrep study shows both models find unique vulnerabilities.
104
+
105
+ ---
106
+
107
+ ## Pipeline Phase Routing (ideate)
108
+
109
+ | Phase | --engine auto | --engine codex | --engine claude |
110
+ |-------|--------------|----------------|-----------------|
111
+ | FRAME | **Claude** | Codex | Claude |
112
+ | EXPLORE | **Claude** | Codex | Claude |
113
+ | CONVERGE | **Claude** | Codex | Claude |
114
+ | CHALLENGE | **Codex** (rubric critic) | Claude (role reversal) | Claude |
115
+ | DOCUMENT | **Claude** | Codex | Claude |
116
+
117
+ Rationale:
118
+ - FRAME/EXPLORE/CONVERGE: Claude — ambiguous intent handling, multi-perspective reasoning.
119
+ - CHALLENGE: When `--engine auto`, Codex runs the rubric pass as critic (same role as `--with-codex` but automatic). When `--engine codex`, Claude runs the challenge (role reversal — builder and critic are always different models).
120
+ - DOCUMENT: Claude — writing quality for spec generation.
121
+
122
+ ---
123
+
124
+ ## Pipeline Phase Routing (preflight)
125
+
126
+ | Phase | --engine auto | --engine codex | --engine claude |
127
+ |-------|--------------|----------------|-----------------|
128
+ | EXTRACT COMMITMENTS | Claude | Codex | Claude |
129
+ | CODE AUDIT | **Codex** | Codex | Claude |
130
+ | DOCS AUDIT | **Claude** | Codex | Claude |
131
+ | BROWSER AUDIT | Claude (Chrome MCP) | Claude | Claude |
132
+ | SYNTHESIZE | Claude | Claude | Claude |
133
+
134
+ ---
135
+
136
+ ## How to Spawn a Codex Role
137
+
138
+ For roles marked **Codex** in the routing table, call `mcp__codex-cli__codex` instead of spawning a Claude Agent subagent. Package the role's full prompt (from the skill's teammate prompt section) into the Codex call.
139
+
140
+ Template:
141
+
142
+ ```
143
+ mcp__codex-cli__codex({
144
+ prompt: "[full role prompt with issue context, file paths, and deliverable format]",
145
+ model: "gpt-5.4",
146
+ reasoningEffort: "xhigh",
147
+ sandbox: "[read-only or workspace-write per table]",
148
+ workingDirectory: "[project root]"
149
+ })
150
+ ```
151
+
152
+ **Important**: Codex has no access to team infrastructure (TeamCreate, SendMessage, TaskCreate). For Codex roles:
153
+ - Include ALL context inline in the prompt (issue description, file paths from investigation, deliverable format)
154
+ - The orchestrator collects Codex's response and routes it where it would have gone via SendMessage
155
+ - Codex roles cannot communicate with other teammates directly — the orchestrator relays findings
156
+
157
+ For roles marked **Claude**, spawn a normal Agent subagent as before.
158
+
159
+ ---
160
+
161
+ ## How to Spawn a Dual Role
162
+
163
+ For roles marked **Dual**, run BOTH models in parallel and merge findings:
164
+
165
+ 1. Spawn a Claude Agent subagent with the role's prompt
166
+ 2. Call `mcp__codex-cli__codex` with the same role's prompt (sandbox: "read-only")
167
+ 3. Wait for both to complete
168
+ 4. Merge findings:
169
+ - Same finding from both → keep more detailed description, mark "confirmed by both models"
170
+ - Claude-only → keep as-is
171
+ - Codex-only → prefix with `[codex]`
172
+ - Conflicting findings → keep both, note the disagreement
173
+ - Take the MORE SEVERE verdict between the two
174
+
175
+ ---
176
+
177
+ ## How to Spawn a Codex BUILD/FIX Agent
178
+
179
+ For BUILD and FIX LOOP phases when engine routes to Codex:
180
+
181
+ ```
182
+ mcp__codex-cli__codex({
183
+ prompt: "[full build/fix prompt with task description, done criteria, and implementation instructions]",
184
+ model: "gpt-5.4",
185
+ reasoningEffort: "xhigh",
186
+ sandbox: "workspace-write",
187
+ fullAuto: true,
188
+ workingDirectory: "[project root]"
189
+ })
190
+ ```
191
+
192
+ **After Codex completes**: verify changes were made (`git diff --stat`), then proceed to the next phase as normal. The file-based handoff (`.devlyn/done-criteria.md`, `.devlyn/EVAL-FINDINGS.md`, etc.) works identically — Codex writes the same files Claude would.
193
+
194
+ **Session management**: For FIX LOOP iterations, use a fresh call each time (no `sessionId` reuse) because sandbox/fullAuto parameters only apply on the first call of a session.
195
+
196
+ ---
197
+
198
+ ## Override Behavior
199
+
200
+ - `--engine claude` → all roles and phases use Claude (current default behavior, no Codex calls)
201
+ - `--engine codex` → all phases use Codex for implementation/analysis, Claude only for orchestration and Chrome MCP
202
+ - `--engine auto` → each role and phase routes to the optimal model per this table
203
+ - `--engine auto` is the recommended default when Codex MCP server is available
204
+
205
+ `--engine` and `--with-codex` are **mutually exclusive**. `--engine auto` subsumes `--with-codex both` — it uses Codex where it's optimal (broader than just evaluate/review). If both flags are passed, `--engine` takes precedence and `--with-codex` is ignored with a warning.
@@ -24,9 +24,15 @@ Concretely:
24
24
 
25
25
  Parse these from the user's invocation message:
26
26
 
27
- - `--with-codex` (default: off) — bare flag. When set, OpenAI Codex runs an independent rubric pass during Phase 3.5 CHALLENGE via `mcp__codex-cli__*` MCP tools, using the same rubric as the solo pass. Codex always runs at `reasoningEffort: "xhigh"` — the entire reason for the flag is maximum reasoning from a second model family.
27
+ - `--with-codex` (default: off) — bare flag. When set, OpenAI Codex runs an independent rubric pass during Phase 3.5 CHALLENGE via `mcp__codex-cli__*` MCP tools, using the same rubric as the solo pass. Codex always runs at `reasoningEffort: "xhigh"` — the entire reason for the flag is maximum reasoning from a second model family. **Ignored if `--engine` is set** (engine routing subsumes this).
28
+ - `--engine MODE` (claude) — controls which model handles each ideation phase. Modes:
29
+ - `claude` (default): all phases use Claude. Current behavior.
30
+ - `codex`: Codex handles FRAME/EXPLORE/CONVERGE/DOCUMENT, Claude runs CHALLENGE (role reversal — builder and critic are always different models).
31
+ - `auto`: Claude handles FRAME/EXPLORE/CONVERGE/DOCUMENT (ambiguous intent, writing quality), Codex runs the CHALLENGE rubric pass as critic (GAN dynamic). Subsumes `--with-codex`. Recommended when Codex MCP is available.
28
32
 
29
- **If `--with-codex` is set**: read `references/challenge-rubric.md` and `references/codex-debate.md` up front, then run the pre-flight check described in `codex-debate.md` to verify the Codex MCP server is available before starting the pipeline. If the server is unavailable and the user opts to continue without Codex, the solo CHALLENGE pass still runs only the cross-model rubric pass is disabled.
33
+ **If `--engine` is `auto` or `codex`**: call `mcp__codex-cli__ping` to verify the Codex MCP server is available. If ping fails, warn the user and offer: [1] Continue with `--engine claude`, [2] Abort. Also read `references/challenge-rubric.md` up front. The engine routing table is defined in the auto-resolve skill's `references/engine-routing.md` under "Pipeline Phase Routing (ideate)".
34
+
35
+ **If `--engine` is not set and `--with-codex` is set** (legacy): read `references/challenge-rubric.md` and `references/codex-debate.md` up front, then run the pre-flight check described in `codex-debate.md` to verify the Codex MCP server is available before starting the pipeline. If the server is unavailable and the user opts to continue without Codex, the solo CHALLENGE pass still runs — only the cross-model rubric pass is disabled.
30
36
 
31
37
  <why_this_matters>
32
38
  When ideas flow directly from conversation to `/devlyn:auto-resolve`, context degrades at each handoff:
@@ -307,9 +313,13 @@ For Quick Add with one new item, one solo pass is enough. For a full greenfield
307
313
 
308
314
  If the plan came from one model in one pass, it almost always fails at least one axis somewhere. Nodding along to your own draft defeats the entire point of the phase.
309
315
 
310
- ### Codex pass (only if `--with-codex` is set)
316
+ ### Codex pass (engine-routed or legacy `--with-codex`)
317
+
318
+ **If `--engine auto`**: Codex runs the CHALLENGE rubric pass automatically. Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, and the packaged plan + rubric as prompt (same format as `codex-debate.md` Step 2). Reconcile findings: same finding from both → "confirmed by both", Codex-only → prefix `[codex]`.
319
+
320
+ **If `--engine codex`**: Role reversal — Codex built the plan (FRAME/EXPLORE/CONVERGE/DOCUMENT), so Claude runs the solo CHALLENGE pass. Do NOT also run Codex on CHALLENGE — builder and critic must be different models. Skip this section entirely.
311
321
 
312
- If the flag is set you have already loaded `references/codex-debate.md` during argument parsing — follow its "PHASE 3.5-CODEX" section now. Codex applies the rubric from `challenge-rubric.md` independently at `reasoningEffort: "xhigh"`. Reconcile findings as `codex-debate.md` describes — findings raised by both sides get "confirmed by both", Codex-only findings get prefixed `[codex]` in internal notes so the user can see where each push came from.
322
+ **If `--engine claude` or `--engine` not set, and `--with-codex` is set** (legacy): follow `references/codex-debate.md` "PHASE 3.5-CODEX" section. Codex applies the rubric from `challenge-rubric.md` independently at `reasoningEffort: "xhigh"`. Reconcile findings as `codex-debate.md` describes — findings raised by both sides get "confirmed by both", Codex-only findings get prefixed `[codex]` in internal notes so the user can see where each push came from.
313
323
 
314
324
  ### Respect explicit user intent
315
325
 
@@ -346,6 +356,12 @@ Get explicit confirmation before proceeding to DOCUMENT.
346
356
 
347
357
  For single-item additions, run one solo rubric pass on just the new item. Even then do not skip — single-item additions are exactly where overengineering and workarounds slip in unnoticed, because the lack of surrounding context makes a bad item look self-contained and harmless.
348
358
 
359
+ ## Engine Routing for FRAME / EXPLORE / CONVERGE / DOCUMENT
360
+
361
+ **If `--engine codex`**: Phases 1-3 and Phase 4 are delegated to Codex. For each phase, call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "workspace-write"`, and the phase instructions + user context as the prompt. Use `sessionId` to maintain conversational context across phases (note: sandbox/fullAuto only apply on the first call). Claude remains the orchestrator — it reads Codex's output, manages the conversation with the user (confirmation prompts, clarifying questions), and routes findings between phases.
362
+
363
+ **If `--engine auto` or `--engine claude`**: All planning phases use Claude directly (current behavior). Claude's ambiguous intent handling and writing quality benchmarks favor it for planning tasks.
364
+
349
365
  ## Phase 4: DOCUMENT
350
366
 
351
367
  <phase_goal>Generate the three-layer document set.</phase_goal>
@@ -1,15 +1,17 @@
1
- # Codex Cross-Model Rubric Pass
1
+ # Codex Cross-Model Rubric Pass (Legacy)
2
+
3
+ > **Note**: This file is the legacy `--with-codex` integration for ideate. For the newer `--engine` flag (which subsumes `--with-codex`), see the engine routing section in SKILL.md. Only read this file when `--with-codex` is set AND `--engine` is NOT set.
2
4
 
3
5
  ## Contents
4
6
  - Pre-flight check (verify Codex MCP server availability)
5
7
  - PHASE 3.5-CODEX: packaging the plan, calling Codex, reconciling findings with the solo pass
6
8
  - Cost notes (one Codex call per ideation session)
7
9
 
8
- Instructions for using OpenAI Codex as an independent critic during Phase 3.5 CHALLENGE. Only read this file when `--with-codex` is set. The 5-axis rubric itself lives in `challenge-rubric.md` — Claude loads that file directly from SKILL.md, not via this file.
10
+ Instructions for using OpenAI Codex as an independent critic during Phase 3.5 CHALLENGE. The 5-axis rubric itself lives in `challenge-rubric.md` — Claude loads that file directly from SKILL.md, not via this file.
9
11
 
10
12
  Codex is accessed via `mcp__codex-cli__*` MCP tools (provided by codex-mcp-server). The intent: one opinionated rubric pass from a different model family, applied right before the user sees the plan. Two model families catch different blind spots; one pass at maximum effort catches more than multiple shallow passes.
11
13
 
12
- **Always use `reasoningEffort: "xhigh"` and `sandbox: "read-only"` for every Codex call in this file.** Maximum reasoning is the whole reason the `--with-codex` flag exists — lowering it defeats the purpose of bringing in a second model.
14
+ **Always use `model: "gpt-5.4"`, `reasoningEffort: "xhigh"` and `sandbox: "read-only"` for every Codex call in this file.** Maximum reasoning is the whole reason the `--with-codex` flag exists — lowering it defeats the purpose of bringing in a second model. Pass `model: "gpt-5.4"` explicitly as the MCP schema default may be outdated.
13
15
 
14
16
  ---
15
17
 
@@ -69,6 +71,7 @@ Call `mcp__codex-cli__codex` with:
69
71
  - `prompt`: the packaged context above, followed by the instructions below
70
72
  - `workingDirectory`: the project root
71
73
  - `sandbox`: `"read-only"`
74
+ - `model`: `"gpt-5.4"` — pass explicitly; the MCP schema default may still show `gpt-5.3-codex`
72
75
  - `reasoningEffort`: `"xhigh"` — the highest setting in the Codex enum (`none < minimal < low < medium < high < xhigh`). Always pick the top level; this is the entire reason for the flag.
73
76
 
74
77
  Instructions to append to the packaged context. **Before sending, inline the full text of `references/challenge-rubric.md` into the prompt under a `## Rubric` heading** — Codex does not have filesystem access to this project, so Claude must ship the rubric itself. Claude already has the rubric loaded from Phase 3.5 setup.
@@ -44,8 +44,15 @@ Parse from `<preflight_config>`:
44
44
  - `--autofix` — auto-promote all findings to roadmap items and run auto-resolve on each
45
45
  - `--skip-browser` — skip browser validation
46
46
  - `--skip-docs` — skip documentation audit
47
+ - `--engine MODE` (claude) — controls which model handles audit phases. Modes:
48
+ - `claude` (default): all auditors use Claude subagents.
49
+ - `codex`: code-auditor uses Codex, docs-auditor and browser-auditor use Claude.
50
+ - `auto`: code-auditor uses Codex (SWE-bench Pro +11.7pp for code analysis), docs-auditor uses Claude (writing quality), browser-auditor uses Claude (Chrome MCP). Recommended when Codex MCP is available.
47
51
 
48
52
  Example: `/devlyn:preflight --phase 2 --skip-browser`
53
+ Example with engine: `/devlyn:preflight --engine auto`
54
+
55
+ **If `--engine` is `auto` or `codex`**: call `mcp__codex-cli__ping` to verify Codex MCP availability. If ping fails, fall back to `--engine claude` with a warning.
49
56
 
50
57
  ## PHASE 0: DISCOVER & SCOPE
51
58
 
@@ -128,6 +135,8 @@ Spawn all applicable auditors in parallel. Each reads `.devlyn/commitment-regist
128
135
 
129
136
  ### code-auditor (always)
130
137
 
138
+ **Engine routing**: If `--engine auto` or `--engine codex`, call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, and the full code-auditor prompt (read from `references/auditors/code-auditor.md`). Include the commitment registry content inline in the prompt since Codex cannot read `.devlyn/commitment-registry.md` directly in read-only sandbox. If `--engine claude`, spawn a Claude subagent as below.
139
+
131
140
  Spawn a subagent with `mode: "bypassPermissions"`. Read the full prompt from `references/auditors/code-auditor.md` and pass it to the subagent.
132
141
 
133
142
  The code-auditor classifies each commitment as IMPLEMENTED, MISSING, INCOMPLETE, DIVERGENT, or BROKEN — with file:line evidence. Also catches cross-feature integration gaps and constraint violations. Writes to `.devlyn/audit-code.md`.
@@ -130,9 +130,28 @@ Use the Agent Teams infrastructure:
130
130
 
131
131
  **IMPORTANT**: When spawning teammates, replace `{team-name}` in each prompt below with the actual team name you chose (e.g., `resolve-null-user-crash`). Include the relevant file paths from your Phase 1 investigation in the spawn prompt.
132
132
 
133
+ ### Engine-Routed Teammate Spawning
134
+
135
+ If the caller passed `--engine auto` or `--engine codex` (check the orchestrator's context or the pipeline config), read the auto-resolve skill's `references/engine-routing.md` for per-role routing under "team-resolve roles".
136
+
137
+ **For roles routed to Codex**: Instead of spawning a Claude Agent teammate, call `mcp__codex-cli__codex` with:
138
+ - `model`: `"gpt-5.4"`
139
+ - `reasoningEffort`: `"xhigh"`
140
+ - `sandbox`: per routing table (`"read-only"` or `"workspace-write"`)
141
+ - `workingDirectory`: project root
142
+ - `prompt`: the full teammate prompt below, with issue context and file paths included inline
143
+
144
+ Codex roles cannot use TeamCreate/SendMessage — the Team Lead (you) relays their findings to other teammates and collects their output directly from the MCP call response.
145
+
146
+ **For roles routed to Claude**: Spawn via Task tool as normal (prompts below).
147
+
148
+ **For Dual roles** (e.g., security-auditor): Run BOTH a Claude Agent teammate AND a `mcp__codex-cli__codex` call in parallel with the same prompt. Merge findings per `engine-routing.md` "How to Spawn a Dual Role" section.
149
+
150
+ If `--engine claude` or no `--engine` flag: all roles use Claude Agent teammates (current default behavior).
151
+
133
152
  ### Teammate Prompts
134
153
 
135
- When spawning each teammate via the Task tool, use these prompts:
154
+ When spawning each teammate via the Task tool (or passing to `mcp__codex-cli__codex` for Codex-routed roles), use these prompts:
136
155
 
137
156
  <root_cause_analyst_prompt>
138
157
  You are the **Root Cause Analyst** on an Agent Team resolving an issue.
@@ -66,9 +66,28 @@ Use the Agent Teams infrastructure:
66
66
 
67
67
  **IMPORTANT**: When spawning reviewers, replace `{team-name}` in each prompt below with the actual team name you chose. Include the specific changed file paths in each reviewer's spawn prompt.
68
68
 
69
+ ### Engine-Routed Reviewer Spawning
70
+
71
+ If the caller passed `--engine auto` or `--engine codex` (check the orchestrator's context or the pipeline config), read the auto-resolve skill's `references/engine-routing.md` for per-role routing under "team-review roles".
72
+
73
+ **For roles routed to Codex**: Instead of spawning a Claude Agent reviewer, call `mcp__codex-cli__codex` with:
74
+ - `model`: `"gpt-5.4"`
75
+ - `reasoningEffort`: `"xhigh"`
76
+ - `sandbox`: per routing table (`"read-only"` or `"workspace-write"`)
77
+ - `workingDirectory`: project root
78
+ - `prompt`: the full reviewer prompt below, with changed file paths and diff included inline
79
+
80
+ Codex reviewers cannot use TeamCreate/SendMessage — the Review Lead (you) collects their output directly from the MCP call response and relays cross-cutting findings to other reviewers.
81
+
82
+ **For roles routed to Claude**: Spawn via Task tool as normal (prompts below).
83
+
84
+ **For Dual roles** (e.g., security-reviewer): Run BOTH a Claude Agent reviewer AND a `mcp__codex-cli__codex` call in parallel with the same prompt. Merge findings per `engine-routing.md` "How to Spawn a Dual Role" section.
85
+
86
+ If `--engine claude` or no `--engine` flag: all roles use Claude Agent reviewers (current default behavior).
87
+
69
88
  ### Reviewer Prompts
70
89
 
71
- When spawning each reviewer via the Task tool, use these prompts:
90
+ When spawning each reviewer via the Task tool (or passing to `mcp__codex-cli__codex` for Codex-routed roles), use these prompts:
72
91
 
73
92
  <security_reviewer_prompt>
74
93
  You are the **Security Reviewer** on an Agent Team performing a code review.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "devlyn-cli",
3
- "version": "1.11.0",
3
+ "version": "1.12.1",
4
4
  "description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration",
5
5
  "homepage": "https://github.com/fysoul17/devlyn-cli#readme",
6
6
  "bin": {