devlyn-cli 1.15.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +104 -0
- package/CLAUDE.md +135 -21
- package/README.md +43 -125
- package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
- package/benchmark/auto-resolve/README.md +114 -0
- package/benchmark/auto-resolve/RUBRIC.md +162 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
- package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
- package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
- package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
- package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
- package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
- package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
- package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
- package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
- package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
- package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
- package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
- package/benchmark/auto-resolve/scripts/judge.sh +359 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
- package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
- package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
- package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
- package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
- package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
- package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
- package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
- package/bin/devlyn.js +175 -17
- package/config/skills/_shared/adapters/README.md +64 -0
- package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
- package/config/skills/_shared/adapters/opus-4-7.md +29 -0
- package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
- package/config/skills/_shared/codex-config.md +54 -0
- package/config/skills/_shared/codex-monitored.sh +141 -0
- package/config/skills/_shared/engine-preflight.md +35 -0
- package/config/skills/_shared/expected.schema.json +93 -0
- package/config/skills/_shared/pair-plan-schema.md +298 -0
- package/config/skills/_shared/runtime-principles.md +110 -0
- package/config/skills/_shared/spec-verify-check.py +519 -0
- package/config/skills/devlyn:ideate/SKILL.md +99 -429
- package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
- package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
- package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
- package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
- package/config/skills/devlyn:resolve/SKILL.md +172 -184
- package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
- package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
- package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
- package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
- package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
- package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
- package/package.json +12 -2
- package/scripts/lint-skills.sh +431 -0
- package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
- package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
- package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
- package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
- package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
- package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
- package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
- package/config/skills/devlyn:clean/SKILL.md +0 -285
- package/config/skills/devlyn:design-ui/SKILL.md +0 -351
- package/config/skills/devlyn:discover-product/SKILL.md +0 -124
- package/config/skills/devlyn:evaluate/SKILL.md +0 -564
- package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
- package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
- package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
- package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
- package/config/skills/devlyn:preflight/SKILL.md +0 -355
- package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
- package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
- package/config/skills/devlyn:product-spec/SKILL.md +0 -603
- package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
- package/config/skills/devlyn:review/SKILL.md +0 -161
- package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
- package/config/skills/devlyn:team-review/SKILL.md +0 -493
- package/config/skills/devlyn:update-docs/SKILL.md +0 -463
- package/config/skills/workflow-routing/SKILL.md +0 -73
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
package/bin/devlyn.js
CHANGED
|
@@ -17,7 +17,12 @@ const CLI_TARGETS = {
|
|
|
17
17
|
codex: {
|
|
18
18
|
name: 'Codex CLI (OpenAI)',
|
|
19
19
|
instructionsFile: 'AGENTS.md',
|
|
20
|
+
baseInstructionsFile: 'AGENTS.md',
|
|
20
21
|
configDir: null, // Codex uses AGENTS.md at project root
|
|
22
|
+
// Codex auto-loads skills from ~/.codex/skills/ (user-global). Same
|
|
23
|
+
// SKILL.md format as Claude Code; descriptions must stay ≤1024 chars.
|
|
24
|
+
skillsDir: path.join(os.homedir(), '.codex', 'skills'),
|
|
25
|
+
skillsToInstall: ['devlyn:resolve', 'devlyn:ideate', '_shared'],
|
|
21
26
|
detect: () => fs.existsSync(path.join(process.cwd(), 'AGENTS.md')) || fs.existsSync(path.join(process.cwd(), '.codex')),
|
|
22
27
|
},
|
|
23
28
|
gemini: {
|
|
@@ -68,8 +73,15 @@ const DEPRECATED_FILES = [
|
|
|
68
73
|
'commands/devlyn.pencil-push.md', // migrated to skills/devlyn:pencil-push
|
|
69
74
|
];
|
|
70
75
|
|
|
71
|
-
// Skill directories renamed from devlyn-* to devlyn:* in v0.7.x
|
|
76
|
+
// Skill directories renamed from devlyn-* to devlyn:* in v0.7.x, plus
|
|
77
|
+
// iter-0034 Phase 4 cutover (2026-05-03): 15 user skills deleted and 3 moved
|
|
78
|
+
// to optional-skills/. Listed here so post-cutover `npx devlyn-cli` upgrades
|
|
79
|
+
// force-remove stale legacy skill dirs from downstream `~/.claude/skills/`
|
|
80
|
+
// even though the source dirs no longer exist (cleanManagedSkillDirs only
|
|
81
|
+
// removes target dirs that still exist in source — without this list,
|
|
82
|
+
// deleted-from-source skills persist in user installs forever).
|
|
72
83
|
const DEPRECATED_DIRS = [
|
|
84
|
+
// v0.7.x rename: devlyn-* → devlyn:*
|
|
73
85
|
'skills/devlyn-clean',
|
|
74
86
|
'skills/devlyn-design-system',
|
|
75
87
|
'skills/devlyn-design-ui',
|
|
@@ -87,6 +99,28 @@ const DEPRECATED_DIRS = [
|
|
|
87
99
|
'skills/devlyn-update-docs',
|
|
88
100
|
'skills/devlyn-pencil-pull',
|
|
89
101
|
'skills/devlyn-pencil-push',
|
|
102
|
+
// iter-0034 Phase 4 cutover: deleted user skills
|
|
103
|
+
'skills/devlyn:auto-resolve',
|
|
104
|
+
'skills/devlyn:browser-validate',
|
|
105
|
+
'skills/devlyn:clean',
|
|
106
|
+
'skills/devlyn:design-ui',
|
|
107
|
+
'skills/devlyn:discover-product',
|
|
108
|
+
'skills/devlyn:evaluate',
|
|
109
|
+
'skills/devlyn:feature-spec',
|
|
110
|
+
'skills/devlyn:implement-ui',
|
|
111
|
+
'skills/devlyn:preflight',
|
|
112
|
+
'skills/devlyn:product-spec',
|
|
113
|
+
'skills/devlyn:recommend-features',
|
|
114
|
+
'skills/devlyn:review',
|
|
115
|
+
'skills/devlyn:team-resolve',
|
|
116
|
+
'skills/devlyn:team-review',
|
|
117
|
+
'skills/devlyn:update-docs',
|
|
118
|
+
// iter-0034 Phase 4 cutover: moved to optional-skills/. Force-removed on
|
|
119
|
+
// upgrade so users only have them if they opt in via the interactive
|
|
120
|
+
// installer (matches the pencil-pull / pencil-push pattern).
|
|
121
|
+
'skills/devlyn:reap',
|
|
122
|
+
'skills/devlyn:design-system',
|
|
123
|
+
'skills/devlyn:team-design-ui',
|
|
90
124
|
];
|
|
91
125
|
|
|
92
126
|
function getTargetDir() {
|
|
@@ -148,6 +182,9 @@ const OPTIONAL_ADDONS = [
|
|
|
148
182
|
{ name: 'dokkit', desc: 'Document template filling for DOCX/HWPX — ingest, fill, review, export', type: 'local' },
|
|
149
183
|
{ name: 'devlyn:pencil-pull', desc: 'Pull Pencil designs into code with exact visual fidelity', type: 'local' },
|
|
150
184
|
{ name: 'devlyn:pencil-push', desc: 'Push codebase UI to Pencil canvas for design sync', type: 'local' },
|
|
185
|
+
{ name: 'devlyn:reap', desc: 'Safely reap orphaned MCP / codex / Superset child processes left behind by long Claude sessions', type: 'local' },
|
|
186
|
+
{ name: 'devlyn:design-system', desc: 'Extract design tokens from a chosen UI style for exact reproduction (creative power-user)', type: 'local' },
|
|
187
|
+
{ name: 'devlyn:team-design-ui', desc: '5 distinct UI style explorations from a full design team (creative power-user)', type: 'local' },
|
|
151
188
|
// External skill packs (installed via npx skills add)
|
|
152
189
|
{ name: 'vercel-labs/agent-skills', desc: 'React, Next.js, React Native best practices', type: 'external' },
|
|
153
190
|
{ name: 'supabase/agent-skills', desc: 'Supabase integration patterns', type: 'external' },
|
|
@@ -155,8 +192,10 @@ const OPTIONAL_ADDONS = [
|
|
|
155
192
|
{ name: 'anthropics/skills', desc: 'Official Anthropic skill-creator with eval framework and description optimizer', type: 'external' },
|
|
156
193
|
{ name: 'Leonxlnx/taste-skill', desc: 'Premium frontend design skills — modern layouts, animations, and visual refinement', type: 'external' },
|
|
157
194
|
// MCP servers (installed via claude mcp add)
|
|
158
|
-
|
|
159
|
-
|
|
195
|
+
// Note: the Codex integration uses the local `codex` CLI binary (not MCP).
|
|
196
|
+
// Install the CLI separately per https://platform.openai.com/docs/codex — the
|
|
197
|
+
// harness auto-detects availability and downgrades to Claude-only on failure.
|
|
198
|
+
{ name: 'playwright', desc: 'Playwright MCP for browser testing — powers /devlyn:resolve BUILD_GATE browser tier', type: 'mcp', command: 'npx -y @anthropic-ai/mcp-playwright' },
|
|
160
199
|
];
|
|
161
200
|
|
|
162
201
|
function log(msg, color = 'reset') {
|
|
@@ -262,7 +301,7 @@ function cleanupDeprecated(targetDir) {
|
|
|
262
301
|
const fullPath = path.join(targetDir, relPath);
|
|
263
302
|
if (fs.existsSync(fullPath)) {
|
|
264
303
|
fs.rmSync(fullPath, { recursive: true });
|
|
265
|
-
log(` ✕ ${relPath}/ (
|
|
304
|
+
log(` ✕ ${relPath}/ (removed)`, 'dim');
|
|
266
305
|
removed++;
|
|
267
306
|
}
|
|
268
307
|
}
|
|
@@ -273,6 +312,8 @@ function copyRecursive(src, dest, baseDir) {
|
|
|
273
312
|
const stats = fs.statSync(src);
|
|
274
313
|
|
|
275
314
|
if (stats.isDirectory()) {
|
|
315
|
+
// Never install dev workspaces, even when running from source repo.
|
|
316
|
+
if (UNSHIPPED_SKILL_DIRS.has(path.basename(src))) return;
|
|
276
317
|
if (!fs.existsSync(dest)) {
|
|
277
318
|
fs.mkdirSync(dest, { recursive: true });
|
|
278
319
|
}
|
|
@@ -290,6 +331,37 @@ function copyRecursive(src, dest, baseDir) {
|
|
|
290
331
|
}
|
|
291
332
|
}
|
|
292
333
|
|
|
334
|
+
// Dev artifacts that live under config/skills/ but must never ship or install.
|
|
335
|
+
// Mirrors the `!` exclusions in package.json files[].
|
|
336
|
+
const UNSHIPPED_SKILL_DIRS = new Set([
|
|
337
|
+
'devlyn:auto-resolve-workspace',
|
|
338
|
+
'devlyn:ideate-workspace',
|
|
339
|
+
'preflight-workspace',
|
|
340
|
+
'roadmap-archival-workspace',
|
|
341
|
+
]);
|
|
342
|
+
|
|
343
|
+
// Clean managed skill directories before copy to prevent stale-file drift.
|
|
344
|
+
// copyRecursive is a pure overlay: if a file was removed or renamed in source,
|
|
345
|
+
// the installed mirror keeps the old copy. For each top-level dir under
|
|
346
|
+
// config/skills/, remove its counterpart in target/skills/ before the copy so
|
|
347
|
+
// each managed skill is fully replaced on every sync. User-installed skills
|
|
348
|
+
// (e.g. skill-creator from optional addons) are left alone because they have
|
|
349
|
+
// no counterpart in source. Dev workspaces are skipped entirely.
|
|
350
|
+
function cleanManagedSkillDirs(sourceSkillsDir, targetSkillsDir) {
|
|
351
|
+
if (!fs.existsSync(sourceSkillsDir) || !fs.existsSync(targetSkillsDir)) return 0;
|
|
352
|
+
let cleaned = 0;
|
|
353
|
+
for (const entry of fs.readdirSync(sourceSkillsDir, { withFileTypes: true })) {
|
|
354
|
+
if (!entry.isDirectory()) continue;
|
|
355
|
+
if (UNSHIPPED_SKILL_DIRS.has(entry.name)) continue;
|
|
356
|
+
const targetPath = path.join(targetSkillsDir, entry.name);
|
|
357
|
+
if (fs.existsSync(targetPath)) {
|
|
358
|
+
fs.rmSync(targetPath, { recursive: true, force: true });
|
|
359
|
+
cleaned++;
|
|
360
|
+
}
|
|
361
|
+
}
|
|
362
|
+
return cleaned;
|
|
363
|
+
}
|
|
364
|
+
|
|
293
365
|
function multiSelect(items) {
|
|
294
366
|
return new Promise((resolve) => {
|
|
295
367
|
const selected = new Set();
|
|
@@ -310,8 +382,8 @@ function multiSelect(items) {
|
|
|
310
382
|
const checkbox = selected.has(i) ? `${COLORS.green}◉${COLORS.reset}` : `${COLORS.dim}○${COLORS.reset}`;
|
|
311
383
|
const pointer = i === cursor ? `${COLORS.cyan}❯${COLORS.reset}` : ' ';
|
|
312
384
|
const name = i === cursor ? `${COLORS.cyan}${item.name}${COLORS.reset}` : item.name;
|
|
313
|
-
const tagLabel = item.type === 'mcp' ? 'mcp' : item.type === 'local' ? 'skill' : 'pack';
|
|
314
|
-
const tagColor = item.type === 'mcp' ? COLORS.green : item.type === 'local' ? COLORS.magenta : COLORS.cyan;
|
|
385
|
+
const tagLabel = item.type === 'mcp' ? 'mcp' : item.type === 'local' ? 'skill' : item.type === 'cli' ? 'cli' : 'pack';
|
|
386
|
+
const tagColor = item.type === 'mcp' ? COLORS.green : item.type === 'local' ? COLORS.magenta : item.type === 'cli' ? COLORS.blue : COLORS.cyan;
|
|
315
387
|
const tag = `${tagColor}${tagLabel}${COLORS.reset}`;
|
|
316
388
|
console.log(`${pointer} ${checkbox} ${name} ${COLORS.dim}[${tag}${COLORS.dim}]${COLORS.reset}`);
|
|
317
389
|
console.log(` ${COLORS.dim}${item.desc}${COLORS.reset}`);
|
|
@@ -441,6 +513,37 @@ function detectOtherCLIs() {
|
|
|
441
513
|
return detected;
|
|
442
514
|
}
|
|
443
515
|
|
|
516
|
+
// Install /devlyn:resolve + /devlyn:ideate + _shared skills into a CLI's
|
|
517
|
+
// global skills directory (e.g. ~/.codex/skills/). Returns count of skills
|
|
518
|
+
// copied. Skipped silently for CLIs without a skillsDir (e.g. cursor, copilot
|
|
519
|
+
// at the time of writing — they don't have an analogous skill-loader).
|
|
520
|
+
function installSkillsForCLI(cliKey) {
|
|
521
|
+
const cli = CLI_TARGETS[cliKey];
|
|
522
|
+
if (!cli || !cli.skillsDir || !cli.skillsToInstall) return 0;
|
|
523
|
+
|
|
524
|
+
const sourceSkillsDir = path.join(CONFIG_SOURCE, 'skills');
|
|
525
|
+
if (!fs.existsSync(sourceSkillsDir)) return 0;
|
|
526
|
+
if (!fs.existsSync(cli.skillsDir)) {
|
|
527
|
+
fs.mkdirSync(cli.skillsDir, { recursive: true });
|
|
528
|
+
}
|
|
529
|
+
|
|
530
|
+
let copied = 0;
|
|
531
|
+
for (const skillName of cli.skillsToInstall) {
|
|
532
|
+
const src = path.join(sourceSkillsDir, skillName);
|
|
533
|
+
const dest = path.join(cli.skillsDir, skillName);
|
|
534
|
+
if (!fs.existsSync(src)) continue;
|
|
535
|
+
// Full replace per cleanManagedSkillDirs semantics: stale files in the
|
|
536
|
+
// installed mirror would otherwise persist forever.
|
|
537
|
+
if (fs.existsSync(dest)) {
|
|
538
|
+
fs.rmSync(dest, { recursive: true, force: true });
|
|
539
|
+
}
|
|
540
|
+
copyRecursive(src, dest, cli.skillsDir);
|
|
541
|
+
copied++;
|
|
542
|
+
log(` → ${cli.skillsDir.replace(os.homedir(), '~')}/${skillName}`, 'dim');
|
|
543
|
+
}
|
|
544
|
+
return copied;
|
|
545
|
+
}
|
|
546
|
+
|
|
444
547
|
function installAgentsForCLI(cliKey) {
|
|
445
548
|
const cli = CLI_TARGETS[cliKey];
|
|
446
549
|
if (!cli) return false;
|
|
@@ -482,12 +585,25 @@ function installAgentsForCLI(cliKey) {
|
|
|
482
585
|
const sepIdx = existing.lastIndexOf('---', markerIdx);
|
|
483
586
|
existing = existing.slice(0, sepIdx > 0 ? sepIdx : markerIdx).trimEnd();
|
|
484
587
|
}
|
|
588
|
+
} else if (cli.baseInstructionsFile) {
|
|
589
|
+
const baseInstructionsSrc = path.join(__dirname, '..', cli.baseInstructionsFile);
|
|
590
|
+
if (fs.existsSync(baseInstructionsSrc)) {
|
|
591
|
+
existing = fs.readFileSync(baseInstructionsSrc, 'utf8').trimEnd();
|
|
592
|
+
}
|
|
485
593
|
}
|
|
486
594
|
|
|
487
595
|
fs.writeFileSync(destFile, existing + separator + agentContent + '\n');
|
|
488
596
|
log(` → ${cli.instructionsFile} (agent instructions appended)`, 'dim');
|
|
489
597
|
}
|
|
490
598
|
|
|
599
|
+
// If this CLI also supports a global skill-loader (currently Codex), install
|
|
600
|
+
// /devlyn:resolve + /devlyn:ideate + _shared so the same slash commands work
|
|
601
|
+
// there. Skipped for CLIs without a skillsDir entry.
|
|
602
|
+
const skillsCopied = installSkillsForCLI(cliKey);
|
|
603
|
+
if (skillsCopied > 0) {
|
|
604
|
+
log(` → ${skillsCopied} skill${skillsCopied > 1 ? 's' : ''} installed (devlyn:resolve / devlyn:ideate / _shared)`, 'dim');
|
|
605
|
+
}
|
|
606
|
+
|
|
491
607
|
return true;
|
|
492
608
|
}
|
|
493
609
|
|
|
@@ -514,6 +630,13 @@ async function init(skipPrompts = false) {
|
|
|
514
630
|
// Install core config
|
|
515
631
|
const targetDir = getTargetDir();
|
|
516
632
|
log('\n📁 Installing core config to .claude/', 'green');
|
|
633
|
+
const refreshed = cleanManagedSkillDirs(
|
|
634
|
+
path.join(CONFIG_SOURCE, 'skills'),
|
|
635
|
+
path.join(targetDir, 'skills'),
|
|
636
|
+
);
|
|
637
|
+
if (refreshed > 0) {
|
|
638
|
+
log(` 🔄 Refreshing ${refreshed} managed skill director${refreshed === 1 ? 'y' : 'ies'}`, 'dim');
|
|
639
|
+
}
|
|
517
640
|
copyRecursive(CONFIG_SOURCE, targetDir, targetDir);
|
|
518
641
|
|
|
519
642
|
// Remove deprecated files from previous versions
|
|
@@ -522,7 +645,8 @@ async function init(skipPrompts = false) {
|
|
|
522
645
|
log(`\n🧹 Cleaned up ${removed} deprecated file${removed > 1 ? 's' : ''}`, 'yellow');
|
|
523
646
|
}
|
|
524
647
|
|
|
525
|
-
// Copy
|
|
648
|
+
// Copy Claude project instructions to project root. Other CLI instruction
|
|
649
|
+
// files are installed only when explicitly selected below or via `agents`.
|
|
526
650
|
const claudeMdSrc = path.join(__dirname, '..', 'CLAUDE.md');
|
|
527
651
|
const claudeMdDest = path.join(process.cwd(), 'CLAUDE.md');
|
|
528
652
|
if (fs.existsSync(claudeMdSrc)) {
|
|
@@ -609,26 +733,42 @@ async function init(skipPrompts = false) {
|
|
|
609
733
|
log(' → ~/.claude/settings.json (disabled adaptive thinking, enabled 1h prompt caching)', 'dim');
|
|
610
734
|
}
|
|
611
735
|
|
|
612
|
-
// Install agents for other detected CLIs
|
|
613
|
-
const detected = detectOtherCLIs();
|
|
614
|
-
if (detected.length > 0) {
|
|
615
|
-
log(`\n🔍 Detected other AI CLIs: ${detected.map((k) => CLI_TARGETS[k].name).join(', ')}`, 'blue');
|
|
616
|
-
const agentsInstalled = installAgentsForAllDetected();
|
|
617
|
-
if (agentsInstalled > 0) {
|
|
618
|
-
log(` ✅ Agent instructions installed for ${agentsInstalled} CLI${agentsInstalled > 1 ? 's' : ''}`, 'green');
|
|
619
|
-
}
|
|
620
|
-
}
|
|
621
|
-
|
|
622
736
|
log('\n✅ Core config installed!', 'green');
|
|
623
737
|
|
|
624
738
|
// Skip prompts if -y flag or non-interactive
|
|
625
739
|
if (skipPrompts || !process.stdin.isTTY) {
|
|
626
740
|
log('\n💡 Add optional addons later: run `npx devlyn-cli` without -y', 'dim');
|
|
741
|
+
log(' Add Codex instructions + skills later: run `npx devlyn-cli agents codex`', 'dim');
|
|
627
742
|
log(`\n${COLORS.dim} Enjoying devlyn? Star it on GitHub — it helps others find it:${COLORS.reset}`);
|
|
628
743
|
log(` ${COLORS.purple}→ https://github.com/fysoul17/devlyn-cli${COLORS.reset}\n`);
|
|
629
744
|
return;
|
|
630
745
|
}
|
|
631
746
|
|
|
747
|
+
// Ask which non-Claude CLIs should receive instruction files.
|
|
748
|
+
log('\n🤖 Optional AI CLI instructions:\n', 'blue');
|
|
749
|
+
const cliOptions = Object.entries(CLI_TARGETS).map(([key, cli]) => {
|
|
750
|
+
let desc;
|
|
751
|
+
if (cli.configDir) {
|
|
752
|
+
desc = `Install agents into ${cli.configDir}/`;
|
|
753
|
+
} else if (cli.skillsDir) {
|
|
754
|
+
desc = `Install ${cli.instructionsFile} + /devlyn:resolve + /devlyn:ideate skills (~/.codex/skills/)`;
|
|
755
|
+
} else {
|
|
756
|
+
desc = `Install ${cli.instructionsFile}`;
|
|
757
|
+
}
|
|
758
|
+
return { key, name: cli.name, desc, type: 'cli' };
|
|
759
|
+
});
|
|
760
|
+
const selectedClis = await multiSelect(cliOptions);
|
|
761
|
+
if (selectedClis.length > 0) {
|
|
762
|
+
let agentsInstalled = 0;
|
|
763
|
+
for (const selectedCli of selectedClis) {
|
|
764
|
+
if (installAgentsForCLI(selectedCli.key)) agentsInstalled++;
|
|
765
|
+
}
|
|
766
|
+
log(` ✅ Agent instructions installed for ${agentsInstalled} CLI${agentsInstalled !== 1 ? 's' : ''}`, 'green');
|
|
767
|
+
} else {
|
|
768
|
+
log('💡 No additional CLI instructions selected', 'dim');
|
|
769
|
+
log(' Run `npx devlyn-cli agents codex` later to install Codex AGENTS.md + /devlyn skills', 'dim');
|
|
770
|
+
}
|
|
771
|
+
|
|
632
772
|
// Ask about optional addons (local skills + external packs)
|
|
633
773
|
log('\n📚 Optional skills & packs:\n', 'blue');
|
|
634
774
|
|
|
@@ -657,6 +797,9 @@ function showHelp() {
|
|
|
657
797
|
log(' npx devlyn-cli -y Install without prompts');
|
|
658
798
|
log(' npx devlyn-cli agents Install agents for detected CLIs');
|
|
659
799
|
log(' npx devlyn-cli agents all Install agents for all supported CLIs');
|
|
800
|
+
log(' npx devlyn-cli benchmark Run the full A/B benchmark suite vs bare');
|
|
801
|
+
log(' npx devlyn-cli benchmark --n 3 --bless Ship-decision run + promote baseline if pass');
|
|
802
|
+
log(' npx devlyn-cli benchmark --dry-run Validate suite setup without model invocation');
|
|
660
803
|
log(' npx devlyn-cli --help Show this help\n');
|
|
661
804
|
log('Optional skills (select during install):', 'green');
|
|
662
805
|
OPTIONAL_ADDONS.filter((a) => a.type === 'local').forEach((skill) => {
|
|
@@ -694,6 +837,21 @@ switch (command) {
|
|
|
694
837
|
case 'ls':
|
|
695
838
|
listContents();
|
|
696
839
|
break;
|
|
840
|
+
case 'benchmark':
|
|
841
|
+
case 'bench': {
|
|
842
|
+
// Delegate to benchmark/auto-resolve/scripts/run-suite.sh with all remaining args.
|
|
843
|
+
const runSuite = path.join(__dirname, '..', 'benchmark', 'auto-resolve', 'scripts', 'run-suite.sh');
|
|
844
|
+
if (!fs.existsSync(runSuite)) {
|
|
845
|
+
log('❌ Benchmark suite runner missing — is this a clean devlyn-cli checkout?', 'yellow');
|
|
846
|
+
log(` Expected: ${runSuite}`, 'dim');
|
|
847
|
+
process.exit(1);
|
|
848
|
+
}
|
|
849
|
+
const { spawnSync } = require('child_process');
|
|
850
|
+
const forwardedArgs = args.slice(1);
|
|
851
|
+
const res = spawnSync('bash', [runSuite, ...forwardedArgs], { stdio: 'inherit' });
|
|
852
|
+
process.exit(res.status ?? 1);
|
|
853
|
+
break;
|
|
854
|
+
}
|
|
697
855
|
case 'agents': {
|
|
698
856
|
showLogo();
|
|
699
857
|
log('─'.repeat(44), 'dim');
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# Per-engine prompt adapters
|
|
2
|
+
|
|
3
|
+
This folder is the LLM-specific delta layer. The harness's canonical phase prompts (in each skill's `references/phases/<phase>.md`) stay model-neutral and outcome-first. Each adapter file in this folder is a **small delta header** that gets injected BEFORE the canonical body when the phase runs against that specific engine.
|
|
4
|
+
|
|
5
|
+
## Why adapters exist
|
|
6
|
+
|
|
7
|
+
Anthropic and OpenAI publish official prompt-engineering guides for their flagship models. The two guides converge on outcome-first + decision rules + mechanical validation but **diverge on tactics** (XML structure vs stop-rules format, literal interpretation vs decision-rule phrasing, self-check pattern vs validation-tool primacy). A single canonical prompt can't hit both ceilings.
|
|
8
|
+
|
|
9
|
+
The split:
|
|
10
|
+
- **Canonical body** (in `<skill>/references/phases/`) = the contract: goal, output format, invariants, common-ground rules from both guides.
|
|
11
|
+
- **Adapter header** (here) = the per-engine elaboration: model-specific guidance from that engine's official guide.
|
|
12
|
+
|
|
13
|
+
This is also the load-bearing piece for **multi-LLM evolution**. When Qwen / Gemini / Gemma are added (Mission 2/3), each gets its own adapter file here. The canonical body never moves.
|
|
14
|
+
|
|
15
|
+
## Format
|
|
16
|
+
|
|
17
|
+
Each adapter is a single markdown file named `<model-id>.md` (e.g. `opus-4-7.md`, `gpt-5-5.md`). Structure:
|
|
18
|
+
|
|
19
|
+
```markdown
|
|
20
|
+
# <Model name> adapter
|
|
21
|
+
|
|
22
|
+
> Source: <official-prompt-engineering-guide URL>
|
|
23
|
+
|
|
24
|
+
## Identity
|
|
25
|
+
1-2 lines telling the model who it is + which guide governs.
|
|
26
|
+
|
|
27
|
+
## Output discipline
|
|
28
|
+
Verbosity, formatting, length conventions specific to this model.
|
|
29
|
+
|
|
30
|
+
## Tool-use posture
|
|
31
|
+
When to use tools, when to reason, parallel/sequential preferences.
|
|
32
|
+
|
|
33
|
+
## Validation pattern
|
|
34
|
+
How this model verifies its work — mechanical-first vs self-check, etc.
|
|
35
|
+
|
|
36
|
+
## Anti-patterns
|
|
37
|
+
Specific patterns the official guide warns about for this model.
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Keep each section to ≤ 8 lines. Adapters are deltas, not full prompts. If an adapter grows past ~80 lines, the content probably belongs in canonical body.
|
|
41
|
+
|
|
42
|
+
## When to add a new adapter
|
|
43
|
+
|
|
44
|
+
A new adapter file ships when:
|
|
45
|
+
1. A new LLM is integrated into the pipeline (the engine is now invocable).
|
|
46
|
+
2. An official prompt-engineering guide for that LLM exists (or a vendor-recommended pattern set).
|
|
47
|
+
3. An empirical A/B shows the adapter's specific guidance lifts that model's performance over the canonical body alone.
|
|
48
|
+
|
|
49
|
+
Not all models need adapters. If a model performs well on the canonical body without delta, ship without one.
|
|
50
|
+
|
|
51
|
+
## What NOT to put here
|
|
52
|
+
|
|
53
|
+
- ❌ Universal rules (those go in canonical body or `_shared/runtime-principles.md`).
|
|
54
|
+
- ❌ Iter-history annotations (`*(iter-0020: F4 evidence...)*` style).
|
|
55
|
+
- ❌ Full phase prompts (defeats the decoupling).
|
|
56
|
+
- ❌ Per-task or per-spec content (adapters are model-scope, not task-scope).
|
|
57
|
+
|
|
58
|
+
## Runtime injection
|
|
59
|
+
|
|
60
|
+
A skill's phase invocation prepends the resolved engine's adapter file to the canonical body before sending. Mechanism is left to each skill (a `_shared/adapter-inject.sh` helper may land in a later iter); for now, skills consume the adapter file by direct read at phase-spawn time.
|
|
61
|
+
|
|
62
|
+
## Standing rule
|
|
63
|
+
|
|
64
|
+
Any iter that touches an adapter file MUST cite the corresponding official guide as part of acceptance: "guide section X.Y says Z, this change applies Z." Generic preferences ("feels cleaner") are rejected.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# OpenAI GPT-5.5 adapter
|
|
2
|
+
|
|
3
|
+
> Source: <https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5>
|
|
4
|
+
|
|
5
|
+
## Identity
|
|
6
|
+
|
|
7
|
+
You are GPT-5.5 by OpenAI. OpenAI's prompt-guidance for this model governs your behavior on top of the canonical phase prompt below. When the canonical body and this header conflict on tactics, the canonical body wins on what to deliver; this header wins on how to deliver it.
|
|
8
|
+
|
|
9
|
+
## Output discipline
|
|
10
|
+
|
|
11
|
+
Your default is efficient, direct, task-oriented. The canonical body specifies the outcome and constraints; you choose the efficient path. Do not over-specify process steps when an outcome is clearly stated. Use headers, bullets, and bold sparingly — favor short paragraphs and natural transitions unless the canonical body or user requests structure. When `text.verbosity` is `low`, prefer even shorter responses.
|
|
12
|
+
|
|
13
|
+
## Tool-use posture
|
|
14
|
+
|
|
15
|
+
Resolve the request in the fewest useful tool loops without sacrificing correctness. For retrieval tasks: start with one broad search using short discriminative keywords; make another retrieval call only when the top results don't answer the core question or a required fact / parameter / source is missing. For tool-heavy tasks, start with a brief preamble: a one-line acknowledgment of the request and the first step you'll take.
|
|
16
|
+
|
|
17
|
+
## Validation pattern
|
|
18
|
+
|
|
19
|
+
Validation is concrete commands and tools, not self-belief. When the canonical body lists verification commands, execute them and trust their output. Do not substitute your judgment for a deterministic check the harness has provided. When validation tools are available (test runners, lint, type-check, the harness's `spec-verify-check.py`), run them before declaring success. The minimum evidence sufficient to answer correctly, cited precisely — then stop.
|
|
20
|
+
|
|
21
|
+
## Anti-patterns
|
|
22
|
+
|
|
23
|
+
The official guide warns explicitly about carrying over instructions from older prompt stacks — earlier models needed more help, and process-heavy directives now narrow GPT-5.5's search space.
|
|
24
|
+
|
|
25
|
+
1. **Avoid absolute imperatives for judgment calls.** ALWAYS / NEVER / must / only are reserved for true safety invariants and required output fields. For judgment calls, use decision rules with conditions ("when X, do Y"). The canonical body uses this style; do not promote softer guidance to absolute rules.
|
|
26
|
+
2. **Don't over-specify process when the destination is clear.** If the canonical body names the outcome, choose the path; do not narrate every step.
|
|
27
|
+
3. **Stop rules are explicit.** When the canonical body or the harness asks you to stop / abstain / ask, follow the stop rule rather than retrying loops indefinitely. Loop-minimization does not outrank correctness or required citation.
|
|
28
|
+
|
|
29
|
+
Do not narrate internal deliberation. State results and decisions directly.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Claude Opus 4.7 adapter
|
|
2
|
+
|
|
3
|
+
> Source: <https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices>
|
|
4
|
+
|
|
5
|
+
## Identity
|
|
6
|
+
|
|
7
|
+
You are Claude Opus 4.7 by Anthropic. Anthropic's prompt-engineering guide for this model governs your behavior on top of the canonical phase prompt below. When the canonical body and this header conflict on tactics, the canonical body wins on what to deliver; this header wins on how to deliver it.
|
|
8
|
+
|
|
9
|
+
## Output discipline
|
|
10
|
+
|
|
11
|
+
You calibrate response length to task complexity automatically — keep simple lookups short, scale up only when the task warrants it. Do NOT pad with context the user didn't ask for. When the canonical body sets a structural format (XML, JSON, sections), follow it literally; do not silently restructure.
|
|
12
|
+
|
|
13
|
+
## Tool-use posture
|
|
14
|
+
|
|
15
|
+
You default to fewer tool calls than prior Claude generations. When the canonical body lists tools, use them when their result would change your answer. Make independent tool calls in parallel; chain only when one depends on another's output. Do not narrate "I'll now call X" preambles unless the canonical body requests progress updates.
|
|
16
|
+
|
|
17
|
+
## Validation pattern
|
|
18
|
+
|
|
19
|
+
When the canonical body asks you to verify your output before declaring done ("self-check" instructions), execute that step literally — re-read the spec's acceptance criteria, run the listed verification commands if available, list any gap. This is not optional. Mechanical gates owned by the harness (spec-verify-check.py, build-gate.py) are the primary correctness guard; your self-check is the secondary layer that catches what regex cannot.
|
|
20
|
+
|
|
21
|
+
## Anti-patterns
|
|
22
|
+
|
|
23
|
+
You interpret instructions more literally than prior Claude versions. The official guide is explicit about three failure modes:
|
|
24
|
+
|
|
25
|
+
1. **Review-prompt self-filtering**: when the canonical body asks for findings, report every issue you find — including low-severity and low-confidence ones. Do NOT pre-filter for importance; the harness has a separate filter step.
|
|
26
|
+
2. **Subagent over-spawning**: do NOT spawn a subagent for work you can complete in a single response. Spawn only when the canonical body explicitly requests it OR when fanning out across independent items.
|
|
27
|
+
3. **Overengineering**: do NOT add files, abstractions, error handling, validation, or "future flexibility" beyond what the spec asks. A bug fix doesn't need surrounding cleanup. The right complexity is the minimum needed for the current task.
|
|
28
|
+
|
|
29
|
+
You do NOT need stronger imperatives ("CRITICAL!", "YOU MUST!") to follow rules. Normal phrasing is sufficient.
|
|
@@ -26,6 +26,32 @@ PER_RUN_PATTERNS = (
|
|
|
26
26
|
"*.log.md",
|
|
27
27
|
"fix-batch.round-*.json",
|
|
28
28
|
"criteria.generated.md",
|
|
29
|
+
# iter-0019.8: spec-verify carrier artifacts get archived alongside
|
|
30
|
+
# other per-run state. Killed mid-run cleanup is enforced separately
|
|
31
|
+
# by spec-verify-check.py main() — when source markdown has no json
|
|
32
|
+
# block AND BENCH_WORKDIR is unset (real-user mode), the script drops
|
|
33
|
+
# any pre-existing .devlyn/spec-verify.json so a stale orphan from a
|
|
34
|
+
# killed prior run cannot poison this run's gate.
|
|
35
|
+
"spec-verify.json",
|
|
36
|
+
"spec-verify.results.json",
|
|
37
|
+
"spec-verify-findings.jsonl",
|
|
38
|
+
# iter-0033a/2026-04-30 archive-fix iter: NEW /devlyn:resolve emits
|
|
39
|
+
# plan.md (PLAN output) + final-report.md (PHASE 6 render) +
|
|
40
|
+
# cumulative.patch (cumulative diff). Smoke 2's archive listing
|
|
41
|
+
# captured all three; archive_run.py was missing them because the
|
|
42
|
+
# patterns predated the new skill's artifact set. Added explicitly
|
|
43
|
+
# so the move is deterministic.
|
|
44
|
+
"plan.md",
|
|
45
|
+
"final-report.md",
|
|
46
|
+
"cumulative.patch",
|
|
47
|
+
# iter-0033c (Codex R-final-smoke Q2): pair-mode VERIFY emits per-judge
|
|
48
|
+
# deliberation transcripts (verify-judge-claude.md / verify-judge-codex.md
|
|
49
|
+
# — and any future-engine analogue via wildcard). Smoke 1a (F2 l2_forced)
|
|
50
|
+
# surfaced the gap: the orchestrator wrote them and listed them as
|
|
51
|
+
# artifacts, but archive_run.py left them in .devlyn/. Gate 8
|
|
52
|
+
# ("pair_judge findings archive distinguishable") would false-fail on
|
|
53
|
+
# every paired fixture without this glob.
|
|
54
|
+
"verify-judge-*.md",
|
|
29
55
|
)
|
|
30
56
|
|
|
31
57
|
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Shared — Codex Invocation
|
|
2
|
+
|
|
3
|
+
Single source of truth for how every skill calls Codex. **MCP is not used.** Skills shell out via the wrapper at `_shared/codex-monitored.sh`, which fronts the local Codex CLI (shipped by the `openai-codex` Claude Code plugin).
|
|
4
|
+
|
|
5
|
+
## Canonical invocations
|
|
6
|
+
|
|
7
|
+
All long-running Codex calls go through `codex-monitored.sh` — a thin wrapper that closes stdin (codex 0.124.0 hangs when both stdin is open and a prompt arg is given), streams Codex stdout fully (no `tail -n` truncation), and prints a `[codex-monitored] heartbeat` line every 30s so the outer `claude -p` byte-watchdog stays fed during long reasoning gaps. The wrapper passes its arguments through verbatim to the underlying CLI, so the canonical flag set is unchanged from a raw call — only the launcher differs.
|
|
8
|
+
|
|
9
|
+
**Read-only critique / adversarial review / debate** (ideate CHALLENGE phase, `/devlyn:resolve` VERIFY pair-mode when triggered). Security review is delegated to the native `security-review` Claude Code skill, invoked from `/devlyn:resolve` BUILD_GATE rather than from Codex.
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
bash .claude/skills/_shared/codex-monitored.sh \
|
|
13
|
+
-C <project-root> \
|
|
14
|
+
-s read-only \
|
|
15
|
+
-c model_reasoning_effort=xhigh \
|
|
16
|
+
"<inlined-prompt>"
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
**Workspace-write implementation** (`/devlyn:resolve` IMPLEMENT phase when `--engine codex` or `--engine auto` routes to Codex, plus codex-routed `/devlyn:ideate` phases):
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
bash .claude/skills/_shared/codex-monitored.sh \
|
|
23
|
+
-C <project-root> \
|
|
24
|
+
--full-auto \
|
|
25
|
+
-c model_reasoning_effort=xhigh \
|
|
26
|
+
"<inlined-prompt>"
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Notes:
|
|
30
|
+
- `-C` — project root so Codex's working directory matches.
|
|
31
|
+
- `-s read-only` / `--full-auto` — sandbox policy. `--full-auto` = `-s workspace-write` with auto-approval of sandboxed commands.
|
|
32
|
+
- `-c model_reasoning_effort=xhigh` — config override for reasoning depth. Required for deep critique; skills may choose `high` or `medium` when thoroughness doesn't warrant xhigh.
|
|
33
|
+
- **Omit `-m <model>`** — Codex CLI uses its configured flagship (currently `gpt-5.5`, automatically whatever ships next). This is the zero-touch mechanism. Only name `-m` when a role explicitly needs a different model (e.g., `gpt-5.3-codex` for SWE-bench-heavy coding tasks, `gpt-5.3-codex-spark` for speed).
|
|
34
|
+
- Raw `codex exec ...` invocations are **forbidden** in skill prompts. The benchmark variant arm runs a PATH shim (`scripts/codex-shim/codex`) that transparently re-routes any raw `codex exec` to the wrapper as a safety net, but skills should always emit the wrapper form directly so the orchestrator's first-attempt has the right shape. Two prior iterations (iter-0006 universal foreground ban, iter-0008 prompt-level kill-shape contract) failed because the orchestrator picked starvation-prone shapes (`codex exec ... 2>&1 | tail -200`) from its own pattern prior — the wrapper plus the shim is the runtime binding layer those iters lacked. See `autoresearch/iterations/0009-wrapper-and-hook.md`.
|
|
35
|
+
|
|
36
|
+
## Availability check
|
|
37
|
+
|
|
38
|
+
Before the first Codex call in a run, verify the CLI is on PATH:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
command -v codex >/dev/null 2>&1
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
If the check fails, the skill follows the `_shared/engine-preflight.md` downgrade rule — silently switch to Claude for this run and log `engine downgraded: codex-unavailable` in the final report. Never prompt, never abort.
|
|
45
|
+
|
|
46
|
+
## Why CLI over other paths
|
|
47
|
+
|
|
48
|
+
The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check covers cleanly.
|
|
49
|
+
|
|
50
|
+
## Invocation from inside a skill prompt
|
|
51
|
+
|
|
52
|
+
Skills write the invocation as a Bash command the runtime executes. Example shape from `/devlyn:resolve` PHASE 2 IMPLEMENT when routed to Codex:
|
|
53
|
+
|
|
54
|
+
> Run `bash .claude/skills/_shared/codex-monitored.sh -C <state.base_ref.repo_root> --full-auto -c model_reasoning_effort=xhigh "<IMPLEMENT prompt>"`. Omit `-m` so the CLI flagship is auto-selected. Capture stdout as the IMPLEMENT reply; non-zero exit → treat as subagent failure. The wrapper emits `[codex-monitored]` heartbeat and lifecycle lines on **stderr** — stdout stays clean for Codex output, so the orchestrator can parse the reply without filtering. Heartbeat-on-stderr keeps the orchestrator's combined-output stream non-silent (defeats the iter-0008 byte-watchdog kill) without polluting the codex-reply view of stdout.
|