devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (158) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +135 -21
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +175 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -429
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
  117. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  118. package/package.json +12 -2
  119. package/scripts/lint-skills.sh +431 -0
  120. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
  121. package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
  122. package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
  125. package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
  126. package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
  127. package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
  128. package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
  129. package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
  130. package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
  131. package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
  132. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  133. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  134. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  135. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  136. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  137. package/config/skills/devlyn:clean/SKILL.md +0 -285
  138. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  139. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  140. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  141. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  142. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  143. package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
  144. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  145. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  146. package/config/skills/devlyn:preflight/SKILL.md +0 -355
  147. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  148. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
  149. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  150. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  151. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  152. package/config/skills/devlyn:review/SKILL.md +0 -161
  153. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  154. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  155. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  156. package/config/skills/workflow-routing/SKILL.md +0 -73
  157. /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
  158. /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
package/bin/devlyn.js CHANGED
@@ -17,7 +17,12 @@ const CLI_TARGETS = {
17
17
  codex: {
18
18
  name: 'Codex CLI (OpenAI)',
19
19
  instructionsFile: 'AGENTS.md',
20
+ baseInstructionsFile: 'AGENTS.md',
20
21
  configDir: null, // Codex uses AGENTS.md at project root
22
+ // Codex auto-loads skills from ~/.codex/skills/ (user-global). Same
23
+ // SKILL.md format as Claude Code; descriptions must stay ≤1024 chars.
24
+ skillsDir: path.join(os.homedir(), '.codex', 'skills'),
25
+ skillsToInstall: ['devlyn:resolve', 'devlyn:ideate', '_shared'],
21
26
  detect: () => fs.existsSync(path.join(process.cwd(), 'AGENTS.md')) || fs.existsSync(path.join(process.cwd(), '.codex')),
22
27
  },
23
28
  gemini: {
@@ -68,8 +73,15 @@ const DEPRECATED_FILES = [
68
73
  'commands/devlyn.pencil-push.md', // migrated to skills/devlyn:pencil-push
69
74
  ];
70
75
 
71
- // Skill directories renamed from devlyn-* to devlyn:* in v0.7.x
76
+ // Skill directories renamed from devlyn-* to devlyn:* in v0.7.x, plus
77
+ // iter-0034 Phase 4 cutover (2026-05-03): 15 user skills deleted and 3 moved
78
+ // to optional-skills/. Listed here so post-cutover `npx devlyn-cli` upgrades
79
+ // force-remove stale legacy skill dirs from downstream `~/.claude/skills/`
80
+ // even though the source dirs no longer exist (cleanManagedSkillDirs only
81
+ // removes target dirs that still exist in source — without this list,
82
+ // deleted-from-source skills persist in user installs forever).
72
83
  const DEPRECATED_DIRS = [
84
+ // v0.7.x rename: devlyn-* → devlyn:*
73
85
  'skills/devlyn-clean',
74
86
  'skills/devlyn-design-system',
75
87
  'skills/devlyn-design-ui',
@@ -87,6 +99,28 @@ const DEPRECATED_DIRS = [
87
99
  'skills/devlyn-update-docs',
88
100
  'skills/devlyn-pencil-pull',
89
101
  'skills/devlyn-pencil-push',
102
+ // iter-0034 Phase 4 cutover: deleted user skills
103
+ 'skills/devlyn:auto-resolve',
104
+ 'skills/devlyn:browser-validate',
105
+ 'skills/devlyn:clean',
106
+ 'skills/devlyn:design-ui',
107
+ 'skills/devlyn:discover-product',
108
+ 'skills/devlyn:evaluate',
109
+ 'skills/devlyn:feature-spec',
110
+ 'skills/devlyn:implement-ui',
111
+ 'skills/devlyn:preflight',
112
+ 'skills/devlyn:product-spec',
113
+ 'skills/devlyn:recommend-features',
114
+ 'skills/devlyn:review',
115
+ 'skills/devlyn:team-resolve',
116
+ 'skills/devlyn:team-review',
117
+ 'skills/devlyn:update-docs',
118
+ // iter-0034 Phase 4 cutover: moved to optional-skills/. Force-removed on
119
+ // upgrade so users only have them if they opt in via the interactive
120
+ // installer (matches the pencil-pull / pencil-push pattern).
121
+ 'skills/devlyn:reap',
122
+ 'skills/devlyn:design-system',
123
+ 'skills/devlyn:team-design-ui',
90
124
  ];
91
125
 
92
126
  function getTargetDir() {
@@ -148,6 +182,9 @@ const OPTIONAL_ADDONS = [
148
182
  { name: 'dokkit', desc: 'Document template filling for DOCX/HWPX — ingest, fill, review, export', type: 'local' },
149
183
  { name: 'devlyn:pencil-pull', desc: 'Pull Pencil designs into code with exact visual fidelity', type: 'local' },
150
184
  { name: 'devlyn:pencil-push', desc: 'Push codebase UI to Pencil canvas for design sync', type: 'local' },
185
+ { name: 'devlyn:reap', desc: 'Safely reap orphaned MCP / codex / Superset child processes left behind by long Claude sessions', type: 'local' },
186
+ { name: 'devlyn:design-system', desc: 'Extract design tokens from a chosen UI style for exact reproduction (creative power-user)', type: 'local' },
187
+ { name: 'devlyn:team-design-ui', desc: '5 distinct UI style explorations from a full design team (creative power-user)', type: 'local' },
151
188
  // External skill packs (installed via npx skills add)
152
189
  { name: 'vercel-labs/agent-skills', desc: 'React, Next.js, React Native best practices', type: 'external' },
153
190
  { name: 'supabase/agent-skills', desc: 'Supabase integration patterns', type: 'external' },
@@ -155,8 +192,10 @@ const OPTIONAL_ADDONS = [
155
192
  { name: 'anthropics/skills', desc: 'Official Anthropic skill-creator with eval framework and description optimizer', type: 'external' },
156
193
  { name: 'Leonxlnx/taste-skill', desc: 'Premium frontend design skills — modern layouts, animations, and visual refinement', type: 'external' },
157
194
  // MCP servers (installed via claude mcp add)
158
- { name: 'codex-cli', desc: 'Codex MCP server for cross-model evaluation via OpenAI Codex', type: 'mcp', command: 'npx -y codex-mcp-server' },
159
- { name: 'playwright', desc: 'Playwright MCP for browser testing powers devlyn:browser-validate Tier 2', type: 'mcp', command: 'npx -y @anthropic-ai/mcp-playwright' },
195
+ // Note: the Codex integration uses the local `codex` CLI binary (not MCP).
196
+ // Install the CLI separately per https://platform.openai.com/docs/codexthe
197
+ // harness auto-detects availability and downgrades to Claude-only on failure.
198
+ { name: 'playwright', desc: 'Playwright MCP for browser testing — powers /devlyn:resolve BUILD_GATE browser tier', type: 'mcp', command: 'npx -y @anthropic-ai/mcp-playwright' },
160
199
  ];
161
200
 
162
201
  function log(msg, color = 'reset') {
@@ -262,7 +301,7 @@ function cleanupDeprecated(targetDir) {
262
301
  const fullPath = path.join(targetDir, relPath);
263
302
  if (fs.existsSync(fullPath)) {
264
303
  fs.rmSync(fullPath, { recursive: true });
265
- log(` ✕ ${relPath}/ (renamed)`, 'dim');
304
+ log(` ✕ ${relPath}/ (removed)`, 'dim');
266
305
  removed++;
267
306
  }
268
307
  }
@@ -273,6 +312,8 @@ function copyRecursive(src, dest, baseDir) {
273
312
  const stats = fs.statSync(src);
274
313
 
275
314
  if (stats.isDirectory()) {
315
+ // Never install dev workspaces, even when running from source repo.
316
+ if (UNSHIPPED_SKILL_DIRS.has(path.basename(src))) return;
276
317
  if (!fs.existsSync(dest)) {
277
318
  fs.mkdirSync(dest, { recursive: true });
278
319
  }
@@ -290,6 +331,37 @@ function copyRecursive(src, dest, baseDir) {
290
331
  }
291
332
  }
292
333
 
334
+ // Dev artifacts that live under config/skills/ but must never ship or install.
335
+ // Mirrors the `!` exclusions in package.json files[].
336
+ const UNSHIPPED_SKILL_DIRS = new Set([
337
+ 'devlyn:auto-resolve-workspace',
338
+ 'devlyn:ideate-workspace',
339
+ 'preflight-workspace',
340
+ 'roadmap-archival-workspace',
341
+ ]);
342
+
343
+ // Clean managed skill directories before copy to prevent stale-file drift.
344
+ // copyRecursive is a pure overlay: if a file was removed or renamed in source,
345
+ // the installed mirror keeps the old copy. For each top-level dir under
346
+ // config/skills/, remove its counterpart in target/skills/ before the copy so
347
+ // each managed skill is fully replaced on every sync. User-installed skills
348
+ // (e.g. skill-creator from optional addons) are left alone because they have
349
+ // no counterpart in source. Dev workspaces are skipped entirely.
350
+ function cleanManagedSkillDirs(sourceSkillsDir, targetSkillsDir) {
351
+ if (!fs.existsSync(sourceSkillsDir) || !fs.existsSync(targetSkillsDir)) return 0;
352
+ let cleaned = 0;
353
+ for (const entry of fs.readdirSync(sourceSkillsDir, { withFileTypes: true })) {
354
+ if (!entry.isDirectory()) continue;
355
+ if (UNSHIPPED_SKILL_DIRS.has(entry.name)) continue;
356
+ const targetPath = path.join(targetSkillsDir, entry.name);
357
+ if (fs.existsSync(targetPath)) {
358
+ fs.rmSync(targetPath, { recursive: true, force: true });
359
+ cleaned++;
360
+ }
361
+ }
362
+ return cleaned;
363
+ }
364
+
293
365
  function multiSelect(items) {
294
366
  return new Promise((resolve) => {
295
367
  const selected = new Set();
@@ -310,8 +382,8 @@ function multiSelect(items) {
310
382
  const checkbox = selected.has(i) ? `${COLORS.green}◉${COLORS.reset}` : `${COLORS.dim}○${COLORS.reset}`;
311
383
  const pointer = i === cursor ? `${COLORS.cyan}❯${COLORS.reset}` : ' ';
312
384
  const name = i === cursor ? `${COLORS.cyan}${item.name}${COLORS.reset}` : item.name;
313
- const tagLabel = item.type === 'mcp' ? 'mcp' : item.type === 'local' ? 'skill' : 'pack';
314
- const tagColor = item.type === 'mcp' ? COLORS.green : item.type === 'local' ? COLORS.magenta : COLORS.cyan;
385
+ const tagLabel = item.type === 'mcp' ? 'mcp' : item.type === 'local' ? 'skill' : item.type === 'cli' ? 'cli' : 'pack';
386
+ const tagColor = item.type === 'mcp' ? COLORS.green : item.type === 'local' ? COLORS.magenta : item.type === 'cli' ? COLORS.blue : COLORS.cyan;
315
387
  const tag = `${tagColor}${tagLabel}${COLORS.reset}`;
316
388
  console.log(`${pointer} ${checkbox} ${name} ${COLORS.dim}[${tag}${COLORS.dim}]${COLORS.reset}`);
317
389
  console.log(` ${COLORS.dim}${item.desc}${COLORS.reset}`);
@@ -441,6 +513,37 @@ function detectOtherCLIs() {
441
513
  return detected;
442
514
  }
443
515
 
516
+ // Install /devlyn:resolve + /devlyn:ideate + _shared skills into a CLI's
517
+ // global skills directory (e.g. ~/.codex/skills/). Returns count of skills
518
+ // copied. Skipped silently for CLIs without a skillsDir (e.g. cursor, copilot
519
+ // at the time of writing — they don't have an analogous skill-loader).
520
+ function installSkillsForCLI(cliKey) {
521
+ const cli = CLI_TARGETS[cliKey];
522
+ if (!cli || !cli.skillsDir || !cli.skillsToInstall) return 0;
523
+
524
+ const sourceSkillsDir = path.join(CONFIG_SOURCE, 'skills');
525
+ if (!fs.existsSync(sourceSkillsDir)) return 0;
526
+ if (!fs.existsSync(cli.skillsDir)) {
527
+ fs.mkdirSync(cli.skillsDir, { recursive: true });
528
+ }
529
+
530
+ let copied = 0;
531
+ for (const skillName of cli.skillsToInstall) {
532
+ const src = path.join(sourceSkillsDir, skillName);
533
+ const dest = path.join(cli.skillsDir, skillName);
534
+ if (!fs.existsSync(src)) continue;
535
+ // Full replace per cleanManagedSkillDirs semantics: stale files in the
536
+ // installed mirror would otherwise persist forever.
537
+ if (fs.existsSync(dest)) {
538
+ fs.rmSync(dest, { recursive: true, force: true });
539
+ }
540
+ copyRecursive(src, dest, cli.skillsDir);
541
+ copied++;
542
+ log(` → ${cli.skillsDir.replace(os.homedir(), '~')}/${skillName}`, 'dim');
543
+ }
544
+ return copied;
545
+ }
546
+
444
547
  function installAgentsForCLI(cliKey) {
445
548
  const cli = CLI_TARGETS[cliKey];
446
549
  if (!cli) return false;
@@ -482,12 +585,25 @@ function installAgentsForCLI(cliKey) {
482
585
  const sepIdx = existing.lastIndexOf('---', markerIdx);
483
586
  existing = existing.slice(0, sepIdx > 0 ? sepIdx : markerIdx).trimEnd();
484
587
  }
588
+ } else if (cli.baseInstructionsFile) {
589
+ const baseInstructionsSrc = path.join(__dirname, '..', cli.baseInstructionsFile);
590
+ if (fs.existsSync(baseInstructionsSrc)) {
591
+ existing = fs.readFileSync(baseInstructionsSrc, 'utf8').trimEnd();
592
+ }
485
593
  }
486
594
 
487
595
  fs.writeFileSync(destFile, existing + separator + agentContent + '\n');
488
596
  log(` → ${cli.instructionsFile} (agent instructions appended)`, 'dim');
489
597
  }
490
598
 
599
+ // If this CLI also supports a global skill-loader (currently Codex), install
600
+ // /devlyn:resolve + /devlyn:ideate + _shared so the same slash commands work
601
+ // there. Skipped for CLIs without a skillsDir entry.
602
+ const skillsCopied = installSkillsForCLI(cliKey);
603
+ if (skillsCopied > 0) {
604
+ log(` → ${skillsCopied} skill${skillsCopied > 1 ? 's' : ''} installed (devlyn:resolve / devlyn:ideate / _shared)`, 'dim');
605
+ }
606
+
491
607
  return true;
492
608
  }
493
609
 
@@ -514,6 +630,13 @@ async function init(skipPrompts = false) {
514
630
  // Install core config
515
631
  const targetDir = getTargetDir();
516
632
  log('\n📁 Installing core config to .claude/', 'green');
633
+ const refreshed = cleanManagedSkillDirs(
634
+ path.join(CONFIG_SOURCE, 'skills'),
635
+ path.join(targetDir, 'skills'),
636
+ );
637
+ if (refreshed > 0) {
638
+ log(` 🔄 Refreshing ${refreshed} managed skill director${refreshed === 1 ? 'y' : 'ies'}`, 'dim');
639
+ }
517
640
  copyRecursive(CONFIG_SOURCE, targetDir, targetDir);
518
641
 
519
642
  // Remove deprecated files from previous versions
@@ -522,7 +645,8 @@ async function init(skipPrompts = false) {
522
645
  log(`\n🧹 Cleaned up ${removed} deprecated file${removed > 1 ? 's' : ''}`, 'yellow');
523
646
  }
524
647
 
525
- // Copy CLAUDE.md to project root
648
+ // Copy Claude project instructions to project root. Other CLI instruction
649
+ // files are installed only when explicitly selected below or via `agents`.
526
650
  const claudeMdSrc = path.join(__dirname, '..', 'CLAUDE.md');
527
651
  const claudeMdDest = path.join(process.cwd(), 'CLAUDE.md');
528
652
  if (fs.existsSync(claudeMdSrc)) {
@@ -609,26 +733,42 @@ async function init(skipPrompts = false) {
609
733
  log(' → ~/.claude/settings.json (disabled adaptive thinking, enabled 1h prompt caching)', 'dim');
610
734
  }
611
735
 
612
- // Install agents for other detected CLIs
613
- const detected = detectOtherCLIs();
614
- if (detected.length > 0) {
615
- log(`\n🔍 Detected other AI CLIs: ${detected.map((k) => CLI_TARGETS[k].name).join(', ')}`, 'blue');
616
- const agentsInstalled = installAgentsForAllDetected();
617
- if (agentsInstalled > 0) {
618
- log(` ✅ Agent instructions installed for ${agentsInstalled} CLI${agentsInstalled > 1 ? 's' : ''}`, 'green');
619
- }
620
- }
621
-
622
736
  log('\n✅ Core config installed!', 'green');
623
737
 
624
738
  // Skip prompts if -y flag or non-interactive
625
739
  if (skipPrompts || !process.stdin.isTTY) {
626
740
  log('\n💡 Add optional addons later: run `npx devlyn-cli` without -y', 'dim');
741
+ log(' Add Codex instructions + skills later: run `npx devlyn-cli agents codex`', 'dim');
627
742
  log(`\n${COLORS.dim} Enjoying devlyn? Star it on GitHub — it helps others find it:${COLORS.reset}`);
628
743
  log(` ${COLORS.purple}→ https://github.com/fysoul17/devlyn-cli${COLORS.reset}\n`);
629
744
  return;
630
745
  }
631
746
 
747
+ // Ask which non-Claude CLIs should receive instruction files.
748
+ log('\n🤖 Optional AI CLI instructions:\n', 'blue');
749
+ const cliOptions = Object.entries(CLI_TARGETS).map(([key, cli]) => {
750
+ let desc;
751
+ if (cli.configDir) {
752
+ desc = `Install agents into ${cli.configDir}/`;
753
+ } else if (cli.skillsDir) {
754
+ desc = `Install ${cli.instructionsFile} + /devlyn:resolve + /devlyn:ideate skills (~/.codex/skills/)`;
755
+ } else {
756
+ desc = `Install ${cli.instructionsFile}`;
757
+ }
758
+ return { key, name: cli.name, desc, type: 'cli' };
759
+ });
760
+ const selectedClis = await multiSelect(cliOptions);
761
+ if (selectedClis.length > 0) {
762
+ let agentsInstalled = 0;
763
+ for (const selectedCli of selectedClis) {
764
+ if (installAgentsForCLI(selectedCli.key)) agentsInstalled++;
765
+ }
766
+ log(` ✅ Agent instructions installed for ${agentsInstalled} CLI${agentsInstalled !== 1 ? 's' : ''}`, 'green');
767
+ } else {
768
+ log('💡 No additional CLI instructions selected', 'dim');
769
+ log(' Run `npx devlyn-cli agents codex` later to install Codex AGENTS.md + /devlyn skills', 'dim');
770
+ }
771
+
632
772
  // Ask about optional addons (local skills + external packs)
633
773
  log('\n📚 Optional skills & packs:\n', 'blue');
634
774
 
@@ -657,6 +797,9 @@ function showHelp() {
657
797
  log(' npx devlyn-cli -y Install without prompts');
658
798
  log(' npx devlyn-cli agents Install agents for detected CLIs');
659
799
  log(' npx devlyn-cli agents all Install agents for all supported CLIs');
800
+ log(' npx devlyn-cli benchmark Run the full A/B benchmark suite vs bare');
801
+ log(' npx devlyn-cli benchmark --n 3 --bless Ship-decision run + promote baseline if pass');
802
+ log(' npx devlyn-cli benchmark --dry-run Validate suite setup without model invocation');
660
803
  log(' npx devlyn-cli --help Show this help\n');
661
804
  log('Optional skills (select during install):', 'green');
662
805
  OPTIONAL_ADDONS.filter((a) => a.type === 'local').forEach((skill) => {
@@ -694,6 +837,21 @@ switch (command) {
694
837
  case 'ls':
695
838
  listContents();
696
839
  break;
840
+ case 'benchmark':
841
+ case 'bench': {
842
+ // Delegate to benchmark/auto-resolve/scripts/run-suite.sh with all remaining args.
843
+ const runSuite = path.join(__dirname, '..', 'benchmark', 'auto-resolve', 'scripts', 'run-suite.sh');
844
+ if (!fs.existsSync(runSuite)) {
845
+ log('❌ Benchmark suite runner missing — is this a clean devlyn-cli checkout?', 'yellow');
846
+ log(` Expected: ${runSuite}`, 'dim');
847
+ process.exit(1);
848
+ }
849
+ const { spawnSync } = require('child_process');
850
+ const forwardedArgs = args.slice(1);
851
+ const res = spawnSync('bash', [runSuite, ...forwardedArgs], { stdio: 'inherit' });
852
+ process.exit(res.status ?? 1);
853
+ break;
854
+ }
697
855
  case 'agents': {
698
856
  showLogo();
699
857
  log('─'.repeat(44), 'dim');
@@ -0,0 +1,64 @@
1
+ # Per-engine prompt adapters
2
+
3
+ This folder is the LLM-specific delta layer. The harness's canonical phase prompts (in each skill's `references/phases/<phase>.md`) stay model-neutral and outcome-first. Each adapter file in this folder is a **small delta header** that gets injected BEFORE the canonical body when the phase runs against that specific engine.
4
+
5
+ ## Why adapters exist
6
+
7
+ Anthropic and OpenAI publish official prompt-engineering guides for their flagship models. The two guides converge on outcome-first + decision rules + mechanical validation but **diverge on tactics** (XML structure vs stop-rules format, literal interpretation vs decision-rule phrasing, self-check pattern vs validation-tool primacy). A single canonical prompt can't hit both ceilings.
8
+
9
+ The split:
10
+ - **Canonical body** (in `<skill>/references/phases/`) = the contract: goal, output format, invariants, common-ground rules from both guides.
11
+ - **Adapter header** (here) = the per-engine elaboration: model-specific guidance from that engine's official guide.
12
+
13
+ This is also the load-bearing piece for **multi-LLM evolution**. When Qwen / Gemini / Gemma are added (Mission 2/3), each gets its own adapter file here. The canonical body never moves.
14
+
15
+ ## Format
16
+
17
+ Each adapter is a single markdown file named `<model-id>.md` (e.g. `opus-4-7.md`, `gpt-5-5.md`). Structure:
18
+
19
+ ```markdown
20
+ # <Model name> adapter
21
+
22
+ > Source: <official-prompt-engineering-guide URL>
23
+
24
+ ## Identity
25
+ 1-2 lines telling the model who it is + which guide governs.
26
+
27
+ ## Output discipline
28
+ Verbosity, formatting, length conventions specific to this model.
29
+
30
+ ## Tool-use posture
31
+ When to use tools, when to reason, parallel/sequential preferences.
32
+
33
+ ## Validation pattern
34
+ How this model verifies its work — mechanical-first vs self-check, etc.
35
+
36
+ ## Anti-patterns
37
+ Specific patterns the official guide warns about for this model.
38
+ ```
39
+
40
+ Keep each section to ≤ 8 lines. Adapters are deltas, not full prompts. If an adapter grows past ~80 lines, the content probably belongs in canonical body.
41
+
42
+ ## When to add a new adapter
43
+
44
+ A new adapter file ships when:
45
+ 1. A new LLM is integrated into the pipeline (the engine is now invocable).
46
+ 2. An official prompt-engineering guide for that LLM exists (or a vendor-recommended pattern set).
47
+ 3. An empirical A/B shows the adapter's specific guidance lifts that model's performance over the canonical body alone.
48
+
49
+ Not all models need adapters. If a model performs well on the canonical body without delta, ship without one.
50
+
51
+ ## What NOT to put here
52
+
53
+ - ❌ Universal rules (those go in canonical body or `_shared/runtime-principles.md`).
54
+ - ❌ Iter-history annotations (`*(iter-0020: F4 evidence...)*` style).
55
+ - ❌ Full phase prompts (defeats the decoupling).
56
+ - ❌ Per-task or per-spec content (adapters are model-scope, not task-scope).
57
+
58
+ ## Runtime injection
59
+
60
+ A skill's phase invocation prepends the resolved engine's adapter file to the canonical body before sending. Mechanism is left to each skill (a `_shared/adapter-inject.sh` helper may land in a later iter); for now, skills consume the adapter file by direct read at phase-spawn time.
61
+
62
+ ## Standing rule
63
+
64
+ Any iter that touches an adapter file MUST cite the corresponding official guide as part of acceptance: "guide section X.Y says Z, this change applies Z." Generic preferences ("feels cleaner") are rejected.
@@ -0,0 +1,29 @@
1
+ # OpenAI GPT-5.5 adapter
2
+
3
+ > Source: <https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5>
4
+
5
+ ## Identity
6
+
7
+ You are GPT-5.5 by OpenAI. OpenAI's prompt-guidance for this model governs your behavior on top of the canonical phase prompt below. When the canonical body and this header conflict on tactics, the canonical body wins on what to deliver; this header wins on how to deliver it.
8
+
9
+ ## Output discipline
10
+
11
+ Your default is efficient, direct, task-oriented. The canonical body specifies the outcome and constraints; you choose the efficient path. Do not over-specify process steps when an outcome is clearly stated. Use headers, bullets, and bold sparingly — favor short paragraphs and natural transitions unless the canonical body or user requests structure. When `text.verbosity` is `low`, prefer even shorter responses.
12
+
13
+ ## Tool-use posture
14
+
15
+ Resolve the request in the fewest useful tool loops without sacrificing correctness. For retrieval tasks: start with one broad search using short discriminative keywords; make another retrieval call only when the top results don't answer the core question or a required fact / parameter / source is missing. For tool-heavy tasks, start with a brief preamble: a one-line acknowledgment of the request and the first step you'll take.
16
+
17
+ ## Validation pattern
18
+
19
+ Validation is concrete commands and tools, not self-belief. When the canonical body lists verification commands, execute them and trust their output. Do not substitute your judgment for a deterministic check the harness has provided. When validation tools are available (test runners, lint, type-check, the harness's `spec-verify-check.py`), run them before declaring success. The minimum evidence sufficient to answer correctly, cited precisely — then stop.
20
+
21
+ ## Anti-patterns
22
+
23
+ The official guide warns explicitly about carrying over instructions from older prompt stacks — earlier models needed more help, and process-heavy directives now narrow GPT-5.5's search space.
24
+
25
+ 1. **Avoid absolute imperatives for judgment calls.** ALWAYS / NEVER / must / only are reserved for true safety invariants and required output fields. For judgment calls, use decision rules with conditions ("when X, do Y"). The canonical body uses this style; do not promote softer guidance to absolute rules.
26
+ 2. **Don't over-specify process when the destination is clear.** If the canonical body names the outcome, choose the path; do not narrate every step.
27
+ 3. **Stop rules are explicit.** When the canonical body or the harness asks you to stop / abstain / ask, follow the stop rule rather than retrying loops indefinitely. Loop-minimization does not outrank correctness or required citation.
28
+
29
+ Do not narrate internal deliberation. State results and decisions directly.
@@ -0,0 +1,29 @@
1
+ # Claude Opus 4.7 adapter
2
+
3
+ > Source: <https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices>
4
+
5
+ ## Identity
6
+
7
+ You are Claude Opus 4.7 by Anthropic. Anthropic's prompt-engineering guide for this model governs your behavior on top of the canonical phase prompt below. When the canonical body and this header conflict on tactics, the canonical body wins on what to deliver; this header wins on how to deliver it.
8
+
9
+ ## Output discipline
10
+
11
+ You calibrate response length to task complexity automatically — keep simple lookups short, scale up only when the task warrants it. Do NOT pad with context the user didn't ask for. When the canonical body sets a structural format (XML, JSON, sections), follow it literally; do not silently restructure.
12
+
13
+ ## Tool-use posture
14
+
15
+ You default to fewer tool calls than prior Claude generations. When the canonical body lists tools, use them when their result would change your answer. Make independent tool calls in parallel; chain only when one depends on another's output. Do not narrate "I'll now call X" preambles unless the canonical body requests progress updates.
16
+
17
+ ## Validation pattern
18
+
19
+ When the canonical body asks you to verify your output before declaring done ("self-check" instructions), execute that step literally — re-read the spec's acceptance criteria, run the listed verification commands if available, list any gap. This is not optional. Mechanical gates owned by the harness (spec-verify-check.py, build-gate.py) are the primary correctness guard; your self-check is the secondary layer that catches what regex cannot.
20
+
21
+ ## Anti-patterns
22
+
23
+ You interpret instructions more literally than prior Claude versions. The official guide is explicit about three failure modes:
24
+
25
+ 1. **Review-prompt self-filtering**: when the canonical body asks for findings, report every issue you find — including low-severity and low-confidence ones. Do NOT pre-filter for importance; the harness has a separate filter step.
26
+ 2. **Subagent over-spawning**: do NOT spawn a subagent for work you can complete in a single response. Spawn only when the canonical body explicitly requests it OR when fanning out across independent items.
27
+ 3. **Overengineering**: do NOT add files, abstractions, error handling, validation, or "future flexibility" beyond what the spec asks. A bug fix doesn't need surrounding cleanup. The right complexity is the minimum needed for the current task.
28
+
29
+ You do NOT need stronger imperatives ("CRITICAL!", "YOU MUST!") to follow rules. Normal phrasing is sufficient.
@@ -26,6 +26,32 @@ PER_RUN_PATTERNS = (
26
26
  "*.log.md",
27
27
  "fix-batch.round-*.json",
28
28
  "criteria.generated.md",
29
+ # iter-0019.8: spec-verify carrier artifacts get archived alongside
30
+ # other per-run state. Killed mid-run cleanup is enforced separately
31
+ # by spec-verify-check.py main() — when source markdown has no json
32
+ # block AND BENCH_WORKDIR is unset (real-user mode), the script drops
33
+ # any pre-existing .devlyn/spec-verify.json so a stale orphan from a
34
+ # killed prior run cannot poison this run's gate.
35
+ "spec-verify.json",
36
+ "spec-verify.results.json",
37
+ "spec-verify-findings.jsonl",
38
+ # iter-0033a/2026-04-30 archive-fix iter: NEW /devlyn:resolve emits
39
+ # plan.md (PLAN output) + final-report.md (PHASE 6 render) +
40
+ # cumulative.patch (cumulative diff). Smoke 2's archive listing
41
+ # captured all three; archive_run.py was missing them because the
42
+ # patterns predated the new skill's artifact set. Added explicitly
43
+ # so the move is deterministic.
44
+ "plan.md",
45
+ "final-report.md",
46
+ "cumulative.patch",
47
+ # iter-0033c (Codex R-final-smoke Q2): pair-mode VERIFY emits per-judge
48
+ # deliberation transcripts (verify-judge-claude.md / verify-judge-codex.md
49
+ # — and any future-engine analogue via wildcard). Smoke 1a (F2 l2_forced)
50
+ # surfaced the gap: the orchestrator wrote them and listed them as
51
+ # artifacts, but archive_run.py left them in .devlyn/. Gate 8
52
+ # ("pair_judge findings archive distinguishable") would false-fail on
53
+ # every paired fixture without this glob.
54
+ "verify-judge-*.md",
29
55
  )
30
56
 
31
57
 
@@ -0,0 +1,54 @@
1
+ # Shared — Codex Invocation
2
+
3
+ Single source of truth for how every skill calls Codex. **MCP is not used.** Skills shell out via the wrapper at `_shared/codex-monitored.sh`, which fronts the local Codex CLI (shipped by the `openai-codex` Claude Code plugin).
4
+
5
+ ## Canonical invocations
6
+
7
+ All long-running Codex calls go through `codex-monitored.sh` — a thin wrapper that closes stdin (codex 0.124.0 hangs when both stdin is open and a prompt arg is given), streams Codex stdout fully (no `tail -n` truncation), and prints a `[codex-monitored] heartbeat` line every 30s so the outer `claude -p` byte-watchdog stays fed during long reasoning gaps. The wrapper passes its arguments through verbatim to the underlying CLI, so the canonical flag set is unchanged from a raw call — only the launcher differs.
8
+
9
+ **Read-only critique / adversarial review / debate** (ideate CHALLENGE phase, `/devlyn:resolve` VERIFY pair-mode when triggered). Security review is delegated to the native `security-review` Claude Code skill, invoked from `/devlyn:resolve` BUILD_GATE rather than from Codex.
10
+
11
+ ```bash
12
+ bash .claude/skills/_shared/codex-monitored.sh \
13
+ -C <project-root> \
14
+ -s read-only \
15
+ -c model_reasoning_effort=xhigh \
16
+ "<inlined-prompt>"
17
+ ```
18
+
19
+ **Workspace-write implementation** (`/devlyn:resolve` IMPLEMENT phase when `--engine codex` or `--engine auto` routes to Codex, plus codex-routed `/devlyn:ideate` phases):
20
+
21
+ ```bash
22
+ bash .claude/skills/_shared/codex-monitored.sh \
23
+ -C <project-root> \
24
+ --full-auto \
25
+ -c model_reasoning_effort=xhigh \
26
+ "<inlined-prompt>"
27
+ ```
28
+
29
+ Notes:
30
+ - `-C` — project root so Codex's working directory matches.
31
+ - `-s read-only` / `--full-auto` — sandbox policy. `--full-auto` = `-s workspace-write` with auto-approval of sandboxed commands.
32
+ - `-c model_reasoning_effort=xhigh` — config override for reasoning depth. Required for deep critique; skills may choose `high` or `medium` when thoroughness doesn't warrant xhigh.
33
+ - **Omit `-m <model>`** — Codex CLI uses its configured flagship (currently `gpt-5.5`, automatically whatever ships next). This is the zero-touch mechanism. Only name `-m` when a role explicitly needs a different model (e.g., `gpt-5.3-codex` for SWE-bench-heavy coding tasks, `gpt-5.3-codex-spark` for speed).
34
+ - Raw `codex exec ...` invocations are **forbidden** in skill prompts. The benchmark variant arm runs a PATH shim (`scripts/codex-shim/codex`) that transparently re-routes any raw `codex exec` to the wrapper as a safety net, but skills should always emit the wrapper form directly so the orchestrator's first-attempt has the right shape. Two prior iterations (iter-0006 universal foreground ban, iter-0008 prompt-level kill-shape contract) failed because the orchestrator picked starvation-prone shapes (`codex exec ... 2>&1 | tail -200`) from its own pattern prior — the wrapper plus the shim is the runtime binding layer those iters lacked. See `autoresearch/iterations/0009-wrapper-and-hook.md`.
35
+
36
+ ## Availability check
37
+
38
+ Before the first Codex call in a run, verify the CLI is on PATH:
39
+
40
+ ```bash
41
+ command -v codex >/dev/null 2>&1
42
+ ```
43
+
44
+ If the check fails, the skill follows the `_shared/engine-preflight.md` downgrade rule — silently switch to Claude for this run and log `engine downgraded: codex-unavailable` in the final report. Never prompt, never abort.
45
+
46
+ ## Why CLI over other paths
47
+
48
+ The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check covers cleanly.
49
+
50
+ ## Invocation from inside a skill prompt
51
+
52
+ Skills write the invocation as a Bash command the runtime executes. Example shape from `/devlyn:resolve` PHASE 2 IMPLEMENT when routed to Codex:
53
+
54
+ > Run `bash .claude/skills/_shared/codex-monitored.sh -C <state.base_ref.repo_root> --full-auto -c model_reasoning_effort=xhigh "<IMPLEMENT prompt>"`. Omit `-m` so the CLI flagship is auto-selected. Capture stdout as the IMPLEMENT reply; non-zero exit → treat as subagent failure. The wrapper emits `[codex-monitored]` heartbeat and lifecycle lines on **stderr** — stdout stays clean for Codex output, so the orchestrator can parse the reply without filtering. Heartbeat-on-stderr keeps the orchestrator's combined-output stream non-silent (defeats the iter-0008 byte-watchdog kill) without polluting the codex-reply view of stdout.