devlyn-cli 2.2.2 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +2 -2
- package/CLAUDE.md +4 -4
- package/README.md +5 -5
- package/bin/devlyn.js +12 -1
- package/config/skills/_shared/codex-config.md +3 -3
- package/config/skills/_shared/engine-preflight.md +17 -13
- package/config/skills/_shared/runtime-principles.md +1 -1
- package/config/skills/devlyn:resolve/SKILL.md +32 -11
- package/config/skills/devlyn:resolve/references/phases/implement.md +1 -1
- package/config/skills/devlyn:resolve/references/phases/verify.md +19 -2
- package/config/skills/devlyn:resolve/references/state-schema.md +5 -3
- package/optional-skills/devlyn:design-ui/SKILL.md +364 -0
- package/package.json +1 -1
- package/scripts/lint-skills.sh +4 -4
package/AGENTS.md
CHANGED
|
@@ -28,7 +28,7 @@ ideate (optional) -> resolve -> ship
|
|
|
28
28
|
|
|
29
29
|
- `/devlyn:ideate` (optional) — unstructured idea → `docs/specs/<id>/spec.md` + `spec.expected.json`. Modes: default Q&A, `--quick` (autonomous-pipeline-safe), `--from-spec <path>`, `--project` (multi-feature).
|
|
30
30
|
- `/devlyn:resolve` — hands-free pipeline for any coding task. Free-form goal, `--spec <path>`, or `--verify-only <ref> --spec <path>`. Phases run inline: PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY (fresh-subagent, findings-only).
|
|
31
|
-
-
|
|
31
|
+
- Four creative power-user skills (`/devlyn:reap`, `/devlyn:design-system`, `/devlyn:design-ui`, `/devlyn:team-design-ui`) live in `optional-skills/` and install only when the user opts in.
|
|
32
32
|
|
|
33
33
|
Each skill's `SKILL.md` is the source of truth for flags and workflow. Do not duplicate.
|
|
34
34
|
|
|
@@ -73,7 +73,7 @@ No silent fallbacks.
|
|
|
73
73
|
- Fallbacks allowed only when widely accepted and harmless (CSS fallback fonts, CDN failover, image placeholders).
|
|
74
74
|
- Silent `catch` blocks are bugs.
|
|
75
75
|
- Logging is not user-visible error handling.
|
|
76
|
-
-
|
|
76
|
+
- No engine-availability fallback is permitted for pair/risk-probe routes: if required Codex or Claude is unavailable, emit `BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` with setup guidance. `--no-pair` and `--no-risk-probes` are explicit user opt-outs, not fallbacks.
|
|
77
77
|
|
|
78
78
|
## Evidence Over Claim
|
|
79
79
|
|
package/CLAUDE.md
CHANGED
|
@@ -24,7 +24,7 @@ The runtime sub-agent contract below (Subtractive-first / Goal-locked / No-worka
|
|
|
24
24
|
|
|
25
25
|
## Quick Start
|
|
26
26
|
|
|
27
|
-
Two skills cover the full cycle post iter-0034 Phase 4 cutover (2026-05-04). `/devlyn:ideate` is OPTIONAL; `/devlyn:resolve` is REQUIRED. **Both default to `--engine claude`** for PLAN/IMPLEMENT. Codex BUILD/IMPLEMENT and PLAN-pair remain research-only, but `/devlyn:resolve` VERIFY has
|
|
27
|
+
Two skills cover the full cycle post iter-0034 Phase 4 cutover (2026-05-04). `/devlyn:ideate` is OPTIONAL; `/devlyn:resolve` is REQUIRED. **Both default to `--engine claude`** for PLAN/IMPLEMENT. Codex BUILD/IMPLEMENT and PLAN-pair remain research-only, but `/devlyn:resolve` VERIFY has conditional-default pair-JUDGE when its `SKILL.md` trigger policy fires. Pass `--engine auto` or `--engine codex` explicitly to opt into the broader research path. If a selected or conditionally required engine is unavailable, the run stops with `BLOCKED:<engine>-unavailable` and setup guidance.
|
|
28
28
|
|
|
29
29
|
1. `/devlyn:ideate` (optional) — unstructured idea → `docs/specs/<id>/spec.md` + `spec.expected.json`. Modes: default Q&A, `--quick` (autonomous-pipeline-safe), `--from-spec <path>`, `--project`.
|
|
30
30
|
2. `/devlyn:resolve` — hands-free pipeline for any coding task. Free-form goal, `--spec <path>`, or `--verify-only <diff> --spec <path>`. Phases: PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY (fresh subagent, findings-only).
|
|
@@ -123,7 +123,7 @@ No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scr
|
|
|
123
123
|
|
|
124
124
|
**Permitted exceptions** (explicitly carved out):
|
|
125
125
|
- CSS fallback fonts, CDN failover, image placeholders — widely-accepted best practices.
|
|
126
|
-
-
|
|
126
|
+
- No engine-availability fallback is permitted for `/devlyn:resolve` pair/risk-probe routes. If Codex or Claude is required and unavailable, the run stops with `BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` plus setup guidance. `--no-pair` / `--no-risk-probes` are explicit user opt-outs, not fallbacks.
|
|
127
127
|
<!-- runtime-principles:section=no-workaround:end -->
|
|
128
128
|
|
|
129
129
|
### Evidence over claim
|
|
@@ -141,7 +141,7 @@ A finding without one of these forms is excluded. Vague findings produce vague f
|
|
|
141
141
|
|
|
142
142
|
## Codex invocation
|
|
143
143
|
|
|
144
|
-
When `/devlyn:resolve` or `/devlyn:ideate` route a phase to Codex (`--engine codex
|
|
144
|
+
When `/devlyn:resolve` or `/devlyn:ideate` route a phase to Codex (`--engine codex`, `--engine auto`, or conditional VERIFY pair/risk-probe routing), the wrapper-form contract lives in `config/skills/_shared/codex-config.md` (or `.claude/skills/_shared/codex-config.md` once installed). Omit `-m <model>` — the CLI's current flagship is used automatically. MCP is not in the loop. If Codex is required and unavailable, stop with `BLOCKED:codex-unavailable` and setup guidance.
|
|
145
145
|
|
|
146
146
|
## Working Mode
|
|
147
147
|
|
|
@@ -152,7 +152,7 @@ When `/devlyn:resolve` or `/devlyn:ideate` route a phase to Codex (`--engine cod
|
|
|
152
152
|
|
|
153
153
|
## Skill Boundary Policy
|
|
154
154
|
|
|
155
|
-
Post iter-0034 Phase 4 cutover (2026-05-04) the runtime product surface is two skills — `/devlyn:resolve` and `/devlyn:ideate`. `/devlyn:resolve` runs PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY inline; verification, cleanup, and security review (delegated to the native `security-review` Claude Code skill from BUILD_GATE) all live inside the pipeline. There are no standalone `/devlyn:review`, `/devlyn:evaluate`, `/devlyn:team-resolve`, etc. surfaces to delegate to — those skills were folded into resolve's phases or removed in iter-0034.
|
|
155
|
+
Post iter-0034 Phase 4 cutover (2026-05-04) the runtime product surface is two skills — `/devlyn:resolve` and `/devlyn:ideate`. `/devlyn:resolve` runs PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY inline; verification, cleanup, and security review (delegated to the native `security-review` Claude Code skill from BUILD_GATE) all live inside the pipeline. There are no standalone `/devlyn:review`, `/devlyn:evaluate`, `/devlyn:team-resolve`, etc. surfaces to delegate to — those skills were folded into resolve's phases or removed in iter-0034. Four creative power-user skills (`/devlyn:reap`, `/devlyn:design-system`, `/devlyn:design-ui`, `/devlyn:team-design-ui`) live in `optional-skills/` and are user-invoked only; resolve never delegates to them.
|
|
156
156
|
|
|
157
157
|
Browser validation routes through `_shared/browser-runner.sh` (Chrome MCP → Playwright → curl tier) directly from BUILD_GATE — there is no separate `/devlyn:browser-validate` skill at HEAD.
|
|
158
158
|
|
package/README.md
CHANGED
|
@@ -79,11 +79,11 @@ PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY (fresh subagent
|
|
|
79
79
|
- **VERIFY** runs in a fresh subagent context with no code-mutation tools — findings only, structurally independent.
|
|
80
80
|
- Git checkpoints at every phase for safe rollback. Fix-loop budget shared across BUILD_GATE and VERIFY (`--max-rounds N`, default 4).
|
|
81
81
|
|
|
82
|
-
Common flags: `--engine claude|codex|auto` (default `claude`), `--bypass build-gate,cleanup`, `--pair-verify` (force pair-mode JUDGE in VERIFY), `--perf` (per-phase timing).
|
|
82
|
+
Common flags: `--engine claude|codex|auto` (default `claude`), `--bypass build-gate,cleanup`, `--pair-verify` (force pair-mode JUDGE in VERIFY), `--no-pair` (intentional solo VERIFY), `--risk-probes` / `--no-risk-probes`, `--perf` (per-phase timing).
|
|
83
83
|
|
|
84
|
-
### Engine selection — Claude
|
|
84
|
+
### Engine selection — Claude implementation, conditional pair VERIFY
|
|
85
85
|
|
|
86
|
-
`--engine claude` (default) is the canonical surface.
|
|
86
|
+
`--engine claude` (default) is the canonical implementation surface for PLAN, IMPLEMENT, BUILD_GATE, and CLEANUP. VERIFY/JUDGE conditionally runs pair mode for verify-only runs, high-risk specs, risk probes, mechanical warnings, coverage gaps, or explicit `--pair-verify`.
|
|
87
87
|
|
|
88
88
|
`--engine codex` routes IMPLEMENT to Codex; `--engine auto` opts into the experimental dual-engine routing where applicable. Both are research-only at HEAD: iter-0020 closed Codex BUILD/IMPLEMENT below the quality floor on the 9-fixture suite (L2 vs L1 = −3.6, 3/8 gated fixtures cleared the +5 margin floor — release-readiness FAIL); iter-0033g + iter-0034 closed PLAN-pair as research-only with explicit unblock conditions (container/sandbox infra OR production telemetry capturing positive evidence of subagent introspection). Install the Codex CLI (https://platform.openai.com/docs/codex) and pass the flag explicitly to opt in:
|
|
89
89
|
|
|
@@ -91,7 +91,7 @@ Common flags: `--engine claude|codex|auto` (default `claude`), `--bypass build-g
|
|
|
91
91
|
/devlyn:resolve "fix the auth bug" --engine auto # experimental, research-only
|
|
92
92
|
```
|
|
93
93
|
|
|
94
|
-
If Codex is absent when
|
|
94
|
+
If Codex or Claude is absent when explicitly selected or conditionally required, the harness stops with `BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` and prints setup guidance. Use `--no-pair` only when intentionally accepting solo VERIFY; use `--no-risk-probes` only when intentionally disabling automatic high-risk probes.
|
|
95
95
|
|
|
96
96
|
<details>
|
|
97
97
|
<summary><strong>What's new in 1.14.0</strong> — CPO lens + handoff enforcement</summary>
|
|
@@ -194,7 +194,7 @@ Selected during install. Run `npx devlyn-cli` again to add more.
|
|
|
194
194
|
|---|---|
|
|
195
195
|
| `playwright` | Playwright MCP — powers `/devlyn:resolve` BUILD_GATE browser tier (Chrome MCP → Playwright → curl fallback) |
|
|
196
196
|
|
|
197
|
-
> `--engine auto/codex`
|
|
197
|
+
> `--engine auto/codex` and conditional VERIFY pair mode use the local `codex` CLI binary, not MCP. Install from https://platform.openai.com/docs/codex, run the current Codex auth/login flow, verify `codex --version`, then rerun.
|
|
198
198
|
|
|
199
199
|
</details>
|
|
200
200
|
|
package/bin/devlyn.js
CHANGED
|
@@ -103,7 +103,6 @@ const DEPRECATED_DIRS = [
|
|
|
103
103
|
'skills/devlyn:auto-resolve',
|
|
104
104
|
'skills/devlyn:browser-validate',
|
|
105
105
|
'skills/devlyn:clean',
|
|
106
|
-
'skills/devlyn:design-ui',
|
|
107
106
|
'skills/devlyn:discover-product',
|
|
108
107
|
'skills/devlyn:evaluate',
|
|
109
108
|
'skills/devlyn:feature-spec',
|
|
@@ -184,6 +183,7 @@ const OPTIONAL_ADDONS = [
|
|
|
184
183
|
{ name: 'devlyn:pencil-push', desc: 'Push codebase UI to Pencil canvas for design sync', type: 'local' },
|
|
185
184
|
{ name: 'devlyn:reap', desc: 'Safely reap orphaned MCP / codex / Superset child processes left behind by long Claude sessions', type: 'local' },
|
|
186
185
|
{ name: 'devlyn:design-system', desc: 'Extract design tokens from a chosen UI style for exact reproduction (creative power-user)', type: 'local' },
|
|
186
|
+
{ name: 'devlyn:design-ui', desc: 'N (default 5) distinct UI style explorations from a single Lead Designer (creative power-user)', type: 'local' },
|
|
187
187
|
{ name: 'devlyn:team-design-ui', desc: '5 distinct UI style explorations from a full design team (creative power-user)', type: 'local' },
|
|
188
188
|
// External skill packs (installed via npx skills add)
|
|
189
189
|
{ name: 'vercel-labs/agent-skills', desc: 'React, Next.js, React Native best practices', type: 'external' },
|
|
@@ -467,6 +467,17 @@ function installLocalSkill(skillName) {
|
|
|
467
467
|
|
|
468
468
|
log(`\n🛠️ Installing ${skillName}...`, 'cyan');
|
|
469
469
|
copyRecursive(src, dest, targetDir);
|
|
470
|
+
|
|
471
|
+
// Mirror to every CLI skill-loader directory that already exists so optional
|
|
472
|
+
// skills are picked up by Codex (and any future CLI with a skillsDir) the
|
|
473
|
+
// same way required skills are. Existing dir, not new dir — we don't create
|
|
474
|
+
// a Codex install just because someone opted into a Claude-side skill.
|
|
475
|
+
for (const cli of Object.values(CLI_TARGETS)) {
|
|
476
|
+
if (!cli.skillsDir || !fs.existsSync(cli.skillsDir)) continue;
|
|
477
|
+
const cliDest = path.join(cli.skillsDir, skillName);
|
|
478
|
+
if (fs.existsSync(cliDest)) fs.rmSync(cliDest, { recursive: true, force: true });
|
|
479
|
+
copyRecursive(src, cliDest, cli.skillsDir);
|
|
480
|
+
}
|
|
470
481
|
return true;
|
|
471
482
|
}
|
|
472
483
|
|
|
@@ -6,7 +6,7 @@ Single source of truth for how every skill calls Codex. **MCP is not used.** Ski
|
|
|
6
6
|
|
|
7
7
|
All long-running Codex calls go through `codex-monitored.sh` — a thin wrapper that closes stdin (codex 0.124.0 hangs when both stdin is open and a prompt arg is given), streams Codex stdout fully (no `tail -n` truncation), and prints a `[codex-monitored] heartbeat` line every 30s so the outer `claude -p` byte-watchdog stays fed during long reasoning gaps. The wrapper passes its arguments through verbatim to the underlying CLI, so the canonical flag set is unchanged from a raw call — only the launcher differs.
|
|
8
8
|
|
|
9
|
-
**Read-only critique / adversarial review / debate** (ideate CHALLENGE phase, `/devlyn:resolve` VERIFY pair-mode
|
|
9
|
+
**Read-only critique / adversarial review / debate** (ideate CHALLENGE phase, `/devlyn:resolve` VERIFY conditional pair-mode). Security review is delegated to the native `security-review` Claude Code skill, invoked from `/devlyn:resolve` BUILD_GATE rather than from Codex. Read-only critique returns findings on stdout; the orchestrator writes any files.
|
|
10
10
|
|
|
11
11
|
```bash
|
|
12
12
|
bash .claude/skills/_shared/codex-monitored.sh \
|
|
@@ -41,11 +41,11 @@ Before the first Codex call in a run, verify the CLI is on PATH:
|
|
|
41
41
|
command -v codex >/dev/null 2>&1
|
|
42
42
|
```
|
|
43
43
|
|
|
44
|
-
If the check fails
|
|
44
|
+
If the check fails while Codex is explicitly selected or conditionally required by pair/risk-probe VERIFY, follow `_shared/engine-preflight.md`: stop with `BLOCKED:codex-unavailable`, preserve run evidence, and print setup guidance. Do not convert the run to Claude. `--no-pair` and `--no-risk-probes` are explicit user opt-outs for reruns, not automatic fallbacks.
|
|
45
45
|
|
|
46
46
|
## Why CLI over other paths
|
|
47
47
|
|
|
48
|
-
The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check
|
|
48
|
+
The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check reports explicitly.
|
|
49
49
|
|
|
50
50
|
## Invocation from inside a skill prompt
|
|
51
51
|
|
|
@@ -1,34 +1,38 @@
|
|
|
1
|
-
# Shared —
|
|
1
|
+
# Shared — Engine Pre-flight
|
|
2
2
|
|
|
3
3
|
Used by `/devlyn:resolve` and `/devlyn:ideate`. One shared availability rule so every skill routes identically.
|
|
4
4
|
|
|
5
5
|
## Rule
|
|
6
6
|
|
|
7
|
-
Each skill resolves the effective engine from its own SKILL.md default plus any explicit `--engine` flag passed by the user.
|
|
7
|
+
Each skill resolves the effective engine from its own SKILL.md default plus any explicit `--engine` flag passed by the user. `/devlyn:resolve` also computes conditional pair/risk-probe requirements before the phase that needs the OTHER engine.
|
|
8
8
|
|
|
9
|
-
When
|
|
9
|
+
When a run or phase requires Codex, before spawning that phase:
|
|
10
10
|
|
|
11
11
|
1. Check if the Codex CLI is installed: `command -v codex >/dev/null 2>&1` (or equivalent bash test).
|
|
12
|
-
2. On failure
|
|
13
|
-
3. On success
|
|
12
|
+
2. On failure -> set the current phase/run verdict to `BLOCKED:codex-unavailable`, preserve the failed check evidence, and show setup guidance: install/configure the Codex CLI, run the current Codex auth/login flow, verify `codex --version`, then rerun. If the user intentionally wants solo VERIFY, they may rerun with `--no-pair`.
|
|
13
|
+
3. On success -> proceed with the original engine value.
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
When a run or phase requires Claude, before spawning that phase:
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
1. Confirm the runtime can spawn Claude agents. Where the CLI is the launcher, `command -v claude >/dev/null 2>&1` is the equivalent check.
|
|
18
|
+
2. On failure -> set the current phase/run verdict to `BLOCKED:claude-unavailable` and show setup guidance: install/configure Claude Code, verify `claude --version` where available, then rerun.
|
|
19
|
+
3. On success -> proceed.
|
|
18
20
|
|
|
19
|
-
|
|
21
|
+
Never prompt the user mid-pipeline. Missing required engines are explicit BLOCKED states, not silent fallbacks.
|
|
20
22
|
|
|
21
|
-
`
|
|
23
|
+
Per-skill defaults: `/devlyn:resolve` defaults to `claude` for PLAN/IMPLEMENT (post iter-0020 close-out — Codex BUILD/IMPLEMENT below quality floor; iter-0033g + iter-0034 close-out — PLAN-pair research-only until container/sandbox infra justifies a measurement). `/devlyn:resolve` VERIFY is the exception: conditional-default pair-JUDGE may invoke the OTHER engine when its SKILL.md trigger policy fires. `/devlyn:ideate` may use cross-model challenge phases when configured. Each skill's SKILL.md flag block is the source of truth for that skill's default.
|
|
22
24
|
|
|
23
|
-
## What a skill must
|
|
25
|
+
## What a skill must report after a BLOCKED engine check
|
|
24
26
|
|
|
25
|
-
When
|
|
27
|
+
When an engine required by the selected route or conditional pair trigger is absent, the final user-facing report/summary shows the requested route, the missing engine, and setup steps:
|
|
26
28
|
|
|
27
29
|
```
|
|
28
|
-
Engine: claude
|
|
30
|
+
Engine: claude + codex pair required
|
|
31
|
+
Verdict: BLOCKED:codex-unavailable
|
|
32
|
+
Setup: install/configure Codex CLI; run the current Codex auth/login flow; verify `codex --version`; rerun. Use `--no-pair` only for an intentional solo VERIFY run.
|
|
29
33
|
```
|
|
30
34
|
|
|
31
|
-
|
|
35
|
+
Do not report a downgraded successful run when a required engine is missing.
|
|
32
36
|
|
|
33
37
|
## Canonical Codex invocation
|
|
34
38
|
|
|
@@ -79,7 +79,7 @@ No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scr
|
|
|
79
79
|
|
|
80
80
|
**Permitted exceptions** (explicitly carved out):
|
|
81
81
|
- CSS fallback fonts, CDN failover, image placeholders — widely-accepted best practices.
|
|
82
|
-
-
|
|
82
|
+
- No engine-availability fallback is permitted for `/devlyn:resolve` pair/risk-probe routes. If Codex or Claude is required and unavailable, the run stops with `BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` plus setup guidance. `--no-pair` / `--no-risk-probes` are explicit user opt-outs, not fallbacks.
|
|
83
83
|
<!-- runtime-principles:section=no-workaround:end -->
|
|
84
84
|
|
|
85
85
|
## Evidence over claim
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: devlyn:resolve
|
|
3
|
-
description: Hands-free pipeline for any coding task — bug fix, feature, refactor, debug, modify, PR review. Free-form goal or formal spec input. Plan → Implement → Build-gate → Cleanup → Verify (fresh subagent, findings-only). Mechanical-first verification; pair-mode is
|
|
3
|
+
description: Hands-free pipeline for any coding task — bug fix, feature, refactor, debug, modify, PR review. Free-form goal or formal spec input. Plan → Implement → Build-gate → Cleanup → Verify (fresh subagent, findings-only). Mechanical-first verification; pair-mode is conditional-default in Verify. Use when the user says "resolve this", "fix this", "implement this", "refactor this", "debug this", "review this PR", or wants hands-off completion.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
Orchestrator for the 2-skill harness pipeline. One subagent per phase; file-based handoff via `.devlyn/pipeline.state.json`. VERIFY spawns a fresh-context subagent so independence is structural — not advisory.
|
|
@@ -17,7 +17,7 @@ Long-horizon agentic work; context auto-compacts. State lives in `.devlyn/pipeli
|
|
|
17
17
|
Hands-free. Measured by how far we get without human intervention.
|
|
18
18
|
|
|
19
19
|
1. Do not prompt the user mid-pipeline. When tempted to ask, pick the safe default, proceed, and log it in the final report.
|
|
20
|
-
2.
|
|
20
|
+
2. Engine availability: follow `_shared/engine-preflight.md`. When a selected or conditionally-required engine is unavailable, fail closed with `BLOCKED:<engine>-unavailable` and setup guidance; do not convert a pair-required or explicitly requested engine into a solo run.
|
|
21
21
|
3. Phases run in declared order. No extra phases.
|
|
22
22
|
4. Orchestrator does not write code. It parses input, spawns phases, reads state, branches on verdicts, emits the report.
|
|
23
23
|
5. Continue by default. Halt only on (a) unrecoverable subagent failure, (b) IMPLEMENT producing zero code changes, (c) BUILD_GATE or VERIFY fix-loop exhausting `max_rounds`.
|
|
@@ -32,7 +32,7 @@ Each phase routes to an engine and prepends the per-engine adapter header from `
|
|
|
32
32
|
|
|
33
33
|
- Claude phases: spawn `Agent` (`mode: "bypassPermissions"`); prompt = adapter-header + canonical-body + task-context.
|
|
34
34
|
- Codex phases: shell out via `bash _shared/codex-monitored.sh` with the same compounded prompt. The wrapper closes stdin and emits a heartbeat. No MCP.
|
|
35
|
-
- Default engine: Claude. `--engine codex` routes IMPLEMENT to Codex; orchestration stays Claude. Pair-mode
|
|
35
|
+
- Default engine: Claude for PLAN / IMPLEMENT / BUILD_GATE / CLEANUP. `--engine codex` routes IMPLEMENT to Codex; orchestration stays Claude. Pair-mode is conditional-default in VERIFY/JUDGE and selects the OTHER engine for the fresh subagent when the trigger policy fires.
|
|
36
36
|
- Multi-LLM evolution: when a new model adapter ships in `_shared/adapters/`, that engine becomes selectable via `--engine <model>` without further skill changes (NORTH-STAR.md "Multi-LLM evolution direction").
|
|
37
37
|
</engine_routing>
|
|
38
38
|
|
|
@@ -59,20 +59,24 @@ Once `state.implement_passed_sha` is non-null (PHASE 2 returned and produced a d
|
|
|
59
59
|
- `--spec <path>` — switches to spec mode.
|
|
60
60
|
- `--verify-only <ref>` — switches to verify-only mode. Requires `--spec`.
|
|
61
61
|
- `--pair-verify` — force pair-mode JUDGE in PHASE 5 even when not auto-triggered.
|
|
62
|
+
- `--no-pair` — disable conditional VERIFY pair-JUDGE for this run. Record `pair_trigger.skipped_reason: "user_no_pair"` whenever a trigger would otherwise fire.
|
|
62
63
|
- `--risk-probes` — insert PHASE 1.5 cross-engine probe derivation. The OTHER engine converts visible `## Verification` bullets into bounded executable probes before IMPLEMENT; BUILD_GATE and VERIFY replay them mechanically.
|
|
64
|
+
- `--no-risk-probes` — disable automatic high-risk risk probes. Explicit `--risk-probes` wins over `--no-risk-probes`.
|
|
63
65
|
- `--bypass <phase>[,...]` — skip specific phases. Valid: `build-gate`, `cleanup`. PLAN, IMPLEMENT, VERIFY are non-bypassable.
|
|
64
66
|
- `--perf` — opt in to per-phase timing.
|
|
65
67
|
|
|
66
|
-
2. Engine pre-flight: follow `_shared/engine-preflight.md`.
|
|
68
|
+
2. Engine pre-flight: follow `_shared/engine-preflight.md`. If a required engine is unavailable, halt with a BLOCKED verdict and setup instructions instead of downgrading.
|
|
67
69
|
|
|
68
|
-
3. Initialize `.devlyn/pipeline.state.json` per `references/state-schema.md`. Set `state.run_id`, `started_at`, `engine`, `base_ref.{branch, sha}`, `rounds.{max_rounds, global: 0}`, `bypasses`, empty `phases`, empty `criteria`.
|
|
70
|
+
3. Initialize `.devlyn/pipeline.state.json` per `references/state-schema.md`. Set `state.run_id`, `started_at`, `engine`, `base_ref.{branch, sha}`, `rounds.{max_rounds, global: 0}`, `bypasses`, empty `phases`, empty `criteria`, and `risk_profile: { high_risk: false, reasons: [], risk_probes_enabled: false, pair_default_enabled: true }`.
|
|
69
71
|
|
|
70
72
|
4. **Mode-specific init**:
|
|
71
73
|
- **Free-form**: read `references/free-form-mode.md`. Run the complexity classifier deterministically (rules over keyword density / file count / spec-shape signals). Set `state.complexity ∈ {trivial, medium, large}`. Trivial: write internal mini-spec to `.devlyn/criteria.generated.md` and proceed. Medium: synthesize a minimal spec from the goal + add 1-2 context anchors from the codebase, write to `.devlyn/criteria.generated.md`, proceed. Large: log `recommend: /devlyn:ideate first` in the final report and either halt (default) or proceed with assumed defaults if `--continue-on-large` flag set.
|
|
72
74
|
- **Spec**: validate spec exists + `## Verification` block parses (run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate carrier shape). Compute `state.source.spec_sha256`. Stage `.devlyn/spec-verify.json` from the spec's verification block.
|
|
73
75
|
- **Verify-only**: skip to PHASE 5 with `state.source.spec_path` set, the supplied diff captured at `.devlyn/external-diff.patch`.
|
|
74
76
|
|
|
75
|
-
5.
|
|
77
|
+
5. Compute `state.risk_profile` from the user goal plus spec/criteria text. Mark `high_risk: true` when the work touches any of: auth/authz, permissions, security, token/session, payment/money/billing/invoice/pricing/tax/ledger, persistence/data mutation/deletion/migration, idempotency/replay/duplicate, API/webhook/raw-body/signature, allocation/scheduling/inventory/rollback/transaction, or explicit error-priority/output-shape contracts. If high-risk and `--no-risk-probes` is absent, set `risk_probes_enabled: true`; explicit `--risk-probes` also sets it true. If `--no-pair` is present, set `pair_default_enabled: false`.
|
|
78
|
+
|
|
79
|
+
6. Announce one line: `resolve starting — run <run_id> — engine <engine> — mode <mode> — complexity <complexity-or-na> — pair <conditional|disabled> — risk_probes <on|off>`.
|
|
76
80
|
|
|
77
81
|
## PHASE 1: PLAN
|
|
78
82
|
|
|
@@ -90,8 +94,11 @@ After return:
|
|
|
90
94
|
|
|
91
95
|
## PHASE 1.5: RISK_PROBES
|
|
92
96
|
|
|
93
|
-
Skip unless `--risk-probes` is set
|
|
94
|
-
not a second plan and not
|
|
97
|
+
Skip unless `--risk-probes` is set OR `state.risk_profile.risk_probes_enabled`
|
|
98
|
+
is true. This phase is findings-as-executable-checks, not a second plan and not
|
|
99
|
+
debate. If this phase is required and the OTHER engine is unavailable, halt with
|
|
100
|
+
`BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` plus setup guidance;
|
|
101
|
+
do not silently continue without probes.
|
|
95
102
|
|
|
96
103
|
Engine: OTHER engine from PHASE 2's selected IMPLEMENT engine. Prompt body:
|
|
97
104
|
`references/phases/probe-derive.md`.
|
|
@@ -196,6 +203,9 @@ Two sub-phases:
|
|
|
196
203
|
|
|
197
204
|
2. **JUDGE** (fresh-context Agent): grade the diff against the spec on rubric axes (spec compliance, scope, quality, consistency). Split each Requirement into binding clauses and trace code-order counterexamples; a passing verifier proves only the case it exercises, not neighboring `once` / `regardless` / `duplicate` / auth-order / rollback invariants. Respect scope qualifiers such as `inside a warehouse`, `per resource`, `for this line`, and `after validation`; do not widen a scoped clause into a global invariant, and compose multiple ordering rules in the stated order. For stateful flows, explicitly trace failed-operation rollback and the next entity's state before hunting broader edge cases. For high-complexity specs, construct at least one interaction counterexample that combines ordering/priority with failure handling and state mutation, then execute at least one such scenario through the repo's existing CLI/API/test runner without leaving tracked files behind; one-axis examples and pure mental tracing are insufficient. Default engine = same as IMPLEMENT (solo). Pair-mode (cross-model JUDGE) is eligible only when MECHANICAL has no HIGH/CRITICAL findings; deterministic blockers already decide the verdict and route to the fix loop. Pair-mode fires when eligible and:
|
|
198
205
|
- `--pair-verify` flag set, OR
|
|
206
|
+
- `state.mode == "verify-only"`, OR
|
|
207
|
+
- `state.risk_profile.high_risk == true`, OR
|
|
208
|
+
- `.devlyn/risk-probes.jsonl` exists or `state.risk_profile.risk_probes_enabled == true`, OR
|
|
199
209
|
- spec frontmatter has `complexity: high`, OR `state.complexity` is `"high"` or `"large"`, OR
|
|
200
210
|
- MECHANICAL emits findings flagged `severity: warning` (not disqualifier — those route to fix loop directly), OR
|
|
201
211
|
- `state.verify.coverage_failed == true` (judge could not exercise a required spec axis from available evidence).
|
|
@@ -205,11 +215,22 @@ Before spawning JUDGE, compute `pair_trigger = { eligible, reasons[] }` and writ
|
|
|
205
215
|
The `--engine` flag never suppresses this rule. Explicit `--engine claude`
|
|
206
216
|
means "Claude is the primary judge"; it does not mean "do not run Codex as the
|
|
207
217
|
second pair judge." The only valid skip reasons after a non-empty eligible
|
|
208
|
-
trigger are deterministic MECHANICAL HIGH/CRITICAL blockers or
|
|
209
|
-
unavailability
|
|
218
|
+
trigger are deterministic MECHANICAL HIGH/CRITICAL blockers or an explicit
|
|
219
|
+
`--no-pair`. Engine unavailability is a `BLOCKED:<engine>-unavailable` verdict,
|
|
220
|
+
not a skip reason.
|
|
210
221
|
|
|
211
222
|
Pair-mode JUDGE: spawn a second Agent with the OTHER engine's adapter; the second judge is a bounded adversarial complement, not a duplicate broad audit. The primary judge owns broad coverage; pair-JUDGE targets the two highest-risk explicit `## Verification` bullets that cross state mutation, all-or-nothing rollback, ordering, idempotency, auth, or error-priority clauses. It must not read `.claude/skills`, `.codex/skills`, `CLAUDE.md`, `AGENTS.md`, or other harness docs unless the orchestrator pasted a specific excerpt into the prompt. It may use only the spec, diff, implementation files, tests, and the repo's existing CLI/API/test runner. It may execute at most two targeted probes before first output, and each probe must compare the full externally visible result (exit/stdout/stderr plus full parsed output object, including accepted/scheduled rows, rejected rows, and remaining state when present), not just a single property. For priority/stateful specs, at least one probe must include an earlier input entity that would succeed under input-order processing, a later higher-priority entity that consumes or blocks the critical resource, and a failure/blocked/rollback edge that determines a later entity's state. For cart/pricing specs where visible verification combines duplicate items, line promotions, tax, coupon, and shipping, the success-path probe must include interleaved duplicates plus taxable and non-taxable items and assert full output rows. Scope qualifiers are binding: pair-JUDGE must not reinterpret `inside a warehouse`, `per resource`, or line-scoped rules as global rules. When both priority ordering and rollback/blocked-interval behavior appear in the spec, this dominance-loss probe is mandatory and comes before any other probe: an earlier lower-priority entity that would succeed alone or under input-order processing must lose because a later higher-priority entity is processed first; a failed/blocked middle entity must not corrupt later state; and the assertion must cover complete accepted/scheduled and rejected output ordering. It must stop and emit JSONL immediately on the first verdict-binding finding, and must emit PASS immediately if both probes plus static scope/dependency checks pass. Both judgments merge with the rule "any HIGH/CRITICAL finding either model surfaces is verdict-binding; high-confidence MEDIUM findings are also verdict-binding when they identify a concrete behavioral regression against the spec, public contract, or existing test contract." Cross-model disagreement on advisory lower-severity findings is logged but does not change the verdict. If MECHANICAL has a HIGH/CRITICAL finding, skip the second judge and record `pair_judge: null`; the fix loop needs the deterministic finding, not duplicate review.
|
|
212
223
|
|
|
224
|
+
If pair-mode is triggered and the OTHER engine is unavailable, do not downgrade
|
|
225
|
+
or skip the required judge. Set VERIFY to `BLOCKED:<engine>-unavailable`, preserve the
|
|
226
|
+
failed availability check evidence, and print setup guidance:
|
|
227
|
+
|
|
228
|
+
- Codex: install/configure the Codex CLI, run `codex auth` or the current login
|
|
229
|
+
flow, verify `codex --version`, then rerun. Use `--no-pair` only when the user
|
|
230
|
+
intentionally accepts solo VERIFY for this run.
|
|
231
|
+
- Claude: install/configure Claude Code, run `claude --version` when available,
|
|
232
|
+
confirm the host can spawn Claude agents, then rerun.
|
|
233
|
+
|
|
213
234
|
Findings written to `.devlyn/verify.findings.jsonl`. **VERIFY agents have no code-mutation tools.** Codex pair-JUDGE is read-only: invoke `codex-monitored.sh` directly with `-c model_reasoning_effort=medium` for this bounded two-probe review, without piping to `tail`/`head`/`grep`, capture stdout/stderr by direct tool capture or file redirection, require JSONL findings on stdout, and have the orchestrator write `.devlyn/verify.pair.findings.jsonl`. If stdout is first captured as `.devlyn/codex-judge.stdout`, run `python3 .claude/skills/_shared/collect-codex-findings.py` before merge; that script is the deterministic boundary writer for `.devlyn/verify.pair.findings.jsonl`. Raw stdout remains diagnostic only: if stdout contains findings or a non-PASS summary while `.devlyn/verify.pair.findings.jsonl` is empty, `verify-merge-findings.py` blocks VERIFY for `verify.pair.emission-contract`. Do not ask Codex to `apply_patch` or edit `.devlyn`. After primary and pair findings are written, run `python3 .claude/skills/_shared/verify-merge-findings.py --write-state`. Branch only on the merged `state.phases.verify.verdict`; a HIGH/CRITICAL finding from either judge must mechanically become `NEEDS_WORK`. Never write `.devlyn/verify-merged.findings.jsonl` or `.devlyn/verify-merge.summary.json` by hand; `verify-merge-findings.py` is their only writer. State write: `phases.verify.{started_at, verdict, completed_at, duration_ms, sub_verdicts: {mechanical, judge, pair_judge?}, artifacts}`.
|
|
214
235
|
|
|
215
236
|
Branch:
|
|
@@ -223,7 +244,7 @@ State write: `phases.final_report.started_at` at the top of this phase.
|
|
|
223
244
|
|
|
224
245
|
1. **Terminal verdict** — derive from `state.phases.{plan, implement, build_gate, cleanup, verify}.verdict` per the precedence rules in `references/state-schema.md#terminal-verdict`. Verify-only mode short-circuits to `state.phases.verify.verdict`.
|
|
225
246
|
|
|
226
|
-
2. **Render report** — sections: header (run_id, engine, mode, verdict, wall-time), per-phase summary, findings table (verify findings only — post-IMPLEMENT phases are findings-only), follow-up notes (any `--continue-on-large` assumptions, any
|
|
247
|
+
2. **Render report** — sections: header (run_id, engine, mode, verdict, wall-time), per-phase summary, pair/risk-probe status, findings table (verify findings only — post-IMPLEMENT phases are findings-only), follow-up notes (any `--continue-on-large` assumptions, any `--no-pair` / `--no-risk-probes` opt-out, and any engine setup guidance after BLOCKED).
|
|
227
248
|
|
|
228
249
|
3. State write: `phases.final_report.{verdict, completed_at, duration_ms}` BEFORE archive runs (archive prune logic skips runs whose `final_report.verdict` is null).
|
|
229
250
|
|
|
@@ -33,7 +33,7 @@ Read `_shared/runtime-principles.md`. Codex-routed phases receive the inlined ex
|
|
|
33
33
|
|
|
34
34
|
- Subtractive-first: every accretion-shaped change is visible in the commit message or a flagged finding. Net-deletion is the default; pure-addition needs a citation.
|
|
35
35
|
- Goal-locked: implement only the listed Requirements. Adjacent code that "looks fixable" is drift unless the spec or plan listed it.
|
|
36
|
-
- No-workaround: no `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass root cause.
|
|
36
|
+
- No-workaround: no `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass root cause. Required unavailable engines stop with `BLOCKED:<engine>-unavailable`; they do not downgrade.
|
|
37
37
|
- Evidence: every claim cites file:line you opened. Hallucinated APIs are excluded.
|
|
38
38
|
</runtime_principles>
|
|
39
39
|
|
|
@@ -95,11 +95,19 @@ VERIFY agent.
|
|
|
95
95
|
|
|
96
96
|
When eligible, trigger pair-mode if any of these are true:
|
|
97
97
|
- `--pair-verify` was set.
|
|
98
|
+
- `state.mode == "verify-only"`.
|
|
98
99
|
- The spec frontmatter has `complexity: high`.
|
|
99
100
|
- `state.complexity` is `"high"` or `"large"`.
|
|
101
|
+
- `state.risk_profile.high_risk == true`.
|
|
102
|
+
- `.devlyn/risk-probes.jsonl` exists or `state.risk_profile.risk_probes_enabled == true`.
|
|
100
103
|
- MECHANICAL emitted warning-level findings but no HIGH/CRITICAL blockers.
|
|
101
104
|
- `state.verify.coverage_failed == true`.
|
|
102
105
|
|
|
106
|
+
If `--no-pair` was set, do not spawn the OTHER-engine judge. Record
|
|
107
|
+
`pair_trigger: { eligible: false, reasons: [], skipped_reason: "user_no_pair" }`
|
|
108
|
+
and continue with solo VERIFY. This is an explicit user opt-out, not an engine
|
|
109
|
+
availability fallback.
|
|
110
|
+
|
|
103
111
|
Before JUDGE spawn, compute and persist:
|
|
104
112
|
|
|
105
113
|
```json
|
|
@@ -118,8 +126,17 @@ The `--engine` flag never disables this rule. Explicit `--engine claude` means
|
|
|
118
126
|
Claude is the primary judge; if pair-mode triggers, Codex is still the mandatory
|
|
119
127
|
OTHER-engine judge. Do not record "explicit --engine claude" as a skip reason.
|
|
120
128
|
The only valid skip reasons after a non-empty eligible trigger are deterministic
|
|
121
|
-
MECHANICAL HIGH/CRITICAL blockers or
|
|
122
|
-
|
|
129
|
+
MECHANICAL HIGH/CRITICAL blockers or an explicit `--no-pair`. Engine
|
|
130
|
+
unavailability is not a skip reason; it is `BLOCKED:<engine>-unavailable`.
|
|
131
|
+
|
|
132
|
+
Before invoking the OTHER-engine judge, run the shared availability pre-flight
|
|
133
|
+
for that engine. If Codex is required and unavailable, set VERIFY to
|
|
134
|
+
`BLOCKED:codex-unavailable` and tell the user to install/configure the Codex CLI,
|
|
135
|
+
run the current Codex auth/login flow, verify `codex --version`, and rerun. If
|
|
136
|
+
Claude is required and the host cannot spawn a Claude agent, set VERIFY to
|
|
137
|
+
`BLOCKED:claude-unavailable` and tell the user to install/configure Claude Code,
|
|
138
|
+
verify `claude --version` where available, and rerun. Do not convert this to a
|
|
139
|
+
solo pass, and do not synthesize pair findings.
|
|
123
140
|
|
|
124
141
|
When eligible and the orchestrator spawns a second VERIFY agent with the OTHER engine's adapter, both judgments are merged:
|
|
125
142
|
- Any HIGH/CRITICAL finding either model surfaces is verdict-binding.
|
|
@@ -12,6 +12,7 @@ Single authoritative verdict source for `/devlyn:resolve`. The orchestrator bran
|
|
|
12
12
|
"engine": "claude",
|
|
13
13
|
"mode": "spec",
|
|
14
14
|
"complexity": null,
|
|
15
|
+
"risk_profile": { "high_risk": false, "reasons": [], "risk_probes_enabled": false, "pair_default_enabled": true },
|
|
15
16
|
"base_ref": { "branch": "main", "sha": "abc123..." },
|
|
16
17
|
"rounds": { "max_rounds": 4, "global": 0 },
|
|
17
18
|
"bypasses": [],
|
|
@@ -44,13 +45,14 @@ Single authoritative verdict source for `/devlyn:resolve`. The orchestrator bran
|
|
|
44
45
|
- **version** — string. Bump major on a breaking schema change.
|
|
45
46
|
- **mode** — `"free-form" | "spec" | "verify-only"`.
|
|
46
47
|
- **complexity** — `null | "trivial" | "medium" | "large"`. Free-form mode populates this; spec/verify-only mode leaves it null.
|
|
47
|
-
- **engine** — `"claude" | "codex" | "auto"` initially;
|
|
48
|
+
- **engine** — `"claude" | "codex" | "auto"` initially; a required unavailable engine stops the run with `BLOCKED:<engine>-unavailable`.
|
|
49
|
+
- **risk_profile** — PHASE 0 classification for conditional defaults. `high_risk` records durable-risk signals from the goal/spec; `risk_probes_enabled` is true for explicit `--risk-probes` or high-risk specs unless `--no-risk-probes`; `pair_default_enabled` is false only for explicit `--no-pair`.
|
|
48
50
|
- **rounds.global** — incremented every fix-loop pass (BUILD_GATE → fix-loop OR VERIFY → fix-loop).
|
|
49
51
|
- **phases.probe_derive** — optional PHASE 1.5 entry when `--risk-probes` is enabled. Artifacts include `.devlyn/risk-probes.jsonl`. Probe failures later surface through BUILD_GATE/VERIFY as `correctness.risk-probe-failed`.
|
|
50
52
|
- **bypasses** — array of phase names from `--bypass`. Valid: `"build-gate" | "cleanup"`. PLAN, IMPLEMENT, VERIFY are non-bypassable (orchestrator rejects at parse time).
|
|
51
53
|
- **implement_passed_sha** — captured at end of PHASE 2; null until then. Activates the post-implement invariant for CLEANUP and VERIFY.
|
|
52
54
|
- **criteria** — generated from spec's `## Requirements` checklist (one per `- [ ]`). `status: pending → implemented` is the legal transition. `failed_by_finding_ids` populates when VERIFY surfaces a finding tied to a criterion.
|
|
53
|
-
- **verify.coverage_failed** — set by VERIFY's JUDGE sub-phase when a spec axis could not be exercised against the diff. Triggers pair-mode escalation when set. Pair-mode also triggers for `complexity: high` specs or `state.complexity` of `"high"`/`"large"` when MECHANICAL has no HIGH/CRITICAL blockers.
|
|
55
|
+
- **verify.coverage_failed** — set by VERIFY's JUDGE sub-phase when a spec axis could not be exercised against the diff. Triggers pair-mode escalation when set. Pair-mode also triggers for verify-only mode, high-risk specs, active risk probes, `complexity: high` specs, or `state.complexity` of `"high"`/`"large"` when MECHANICAL has no HIGH/CRITICAL blockers.
|
|
54
56
|
- **verify.pair_trigger** — VERIFY's trigger decision: `{ "eligible": boolean, "reasons": string[], "skipped_reason": string|null }`. If eligible with any reason, `pair_judge` must be non-null.
|
|
55
57
|
|
|
56
58
|
## Per-phase shape
|
|
@@ -105,7 +107,7 @@ Per-phase summary table: `phase | verdict | duration_ms | round | triggered_by |
|
|
|
105
107
|
|
|
106
108
|
Findings table (post-IMPLEMENT phases only — they are findings-only): each finding's `severity | rule_id | file:line | message | confidence`.
|
|
107
109
|
|
|
108
|
-
Follow-up notes: any `--continue-on-large` assumptions,
|
|
110
|
+
Follow-up notes: any `--continue-on-large` assumptions, pair/risk-probe opt-out state, engine setup guidance for `BLOCKED:<engine>-unavailable`, any `state.verify.coverage_failed` axes.
|
|
109
111
|
|
|
110
112
|
## Archive contract
|
|
111
113
|
|
|
@@ -0,0 +1,364 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: devlyn:design-ui
|
|
3
|
+
description: Generate N (default 5) radically distinct UI style options from a PRD as portfolio-worthy HTML/CSS samples. Pass a leading integer or `count:N` in the brief to change the count.
|
|
4
|
+
source: project
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are the **Lead Designer** with full creative authority. Create N portfolio-worthy HTML/CSS style samples that help stakeholders visualize design directions. These aren't mockups—they're design statements.
|
|
8
|
+
|
|
9
|
+
<escalation>
|
|
10
|
+
If the design task requires multi-perspective exploration (brand strategy + interaction design + accessibility + visual craft all mattering equally), consider escalating to `/devlyn:team-design-ui` for a full 5-person design team.
|
|
11
|
+
</escalation>
|
|
12
|
+
|
|
13
|
+
<context>
|
|
14
|
+
$ARGUMENTS
|
|
15
|
+
</context>
|
|
16
|
+
|
|
17
|
+
<count_resolution>
|
|
18
|
+
**Resolve N before doing any design work.**
|
|
19
|
+
|
|
20
|
+
1. If `$ARGUMENTS` begins with a positive integer (e.g. `3`, `7 dark dashboard`), that is N.
|
|
21
|
+
2. Else if `$ARGUMENTS` contains a `count:N` or `n=N` token (any case), that is N.
|
|
22
|
+
3. Otherwise N defaults to **5**.
|
|
23
|
+
|
|
24
|
+
Clamp N to the range `1..10`. Values outside that range default to 5 and are noted in the final report. After resolving, use N consistently across every phase below — concept count, file count, output table rows.
|
|
25
|
+
|
|
26
|
+
Strip the count token from the brief before using `$ARGUMENTS` as the product description.
|
|
27
|
+
</count_resolution>
|
|
28
|
+
|
|
29
|
+
<input_handling>
|
|
30
|
+
The context above may contain:
|
|
31
|
+
|
|
32
|
+
- **PRD document**: Extract product goals, target users, and brand requirements
|
|
33
|
+
- **Product description**: Parse key features and emotional direction
|
|
34
|
+
- **Image references**: Analyze and replicate the visual style as closely as possible
|
|
35
|
+
|
|
36
|
+
If no input is provided, check for existing PRD at `docs/prd.md` or `README.md`.
|
|
37
|
+
|
|
38
|
+
### When Image References Are Provided
|
|
39
|
+
|
|
40
|
+
**Your primary goal shifts to replication, not invention.**
|
|
41
|
+
|
|
42
|
+
1. **Analyze the reference image(s) precisely:**
|
|
43
|
+
|
|
44
|
+
- Extract exact color values (use color picker precision: #RRGGBB)
|
|
45
|
+
- Identify font characteristics (serif/sans, weight, spacing, size ratios)
|
|
46
|
+
- Map layout structure (grid, spacing rhythm, alignment patterns)
|
|
47
|
+
- Note visual effects (shadows, gradients, blur, textures, border styles)
|
|
48
|
+
- Capture motion cues (if animated reference or implied motion)
|
|
49
|
+
|
|
50
|
+
2. **Generate designs that match the reference:**
|
|
51
|
+
|
|
52
|
+
- **First ~40% of N designs**: Replicate the reference style as closely as possible, adapting to the PRD's content (with N=5 this is designs 1-2; with N=3 just design 1; with N=10 designs 1-4).
|
|
53
|
+
- **Remaining designs**: Variations that preserve the reference's core aesthetic while exploring different directions within that style.
|
|
54
|
+
|
|
55
|
+
3. **Fidelity checklist for reference-based designs:**
|
|
56
|
+
- [ ] Color palette within ±5% of reference values
|
|
57
|
+
- [ ] Typography style matches (same category, similar weight/spacing)
|
|
58
|
+
- [ ] Layout proportions preserved
|
|
59
|
+
- [ ] Visual effects replicated (shadows, gradients, textures)
|
|
60
|
+
- [ ] Overall "feel" is recognizably similar to reference
|
|
61
|
+
|
|
62
|
+
### When No Image References Are Provided
|
|
63
|
+
|
|
64
|
+
Follow the standard creative process: invent tension-based concept names, map across spectrums, and generate N radically different directions.
|
|
65
|
+
</input_handling>
|
|
66
|
+
|
|
67
|
+
<instructions>
|
|
68
|
+
|
|
69
|
+
## Phase 1: Extract Design DNA
|
|
70
|
+
|
|
71
|
+
Keep this brief—creative naming drives the design, not over-analysis.
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
**Product:** [one sentence]
|
|
75
|
+
**User:** [who, in what context, with what goal]
|
|
76
|
+
**Must convey:** [2-3 essential feelings]
|
|
77
|
+
**Count (N):** [resolved N]
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## Phase 2: Invent N Creative Directions
|
|
81
|
+
|
|
82
|
+
### Check Existing Styles
|
|
83
|
+
|
|
84
|
+
Read `docs/design/` directory. If `style_K_*.html` files exist, continue numbering from K+1. New styles must be visually distinct from existing ones.
|
|
85
|
+
|
|
86
|
+
### Create N Concept Names
|
|
87
|
+
|
|
88
|
+
**Before any design work, invent N evocative names.**
|
|
89
|
+
|
|
90
|
+
Name format: `[word_A]_[word_B]` where:
|
|
91
|
+
|
|
92
|
+
- Word A and Word B create **tension or contrast**
|
|
93
|
+
- The combination should feel unexpected, not obvious
|
|
94
|
+
- Each word pulls the design in a different direction
|
|
95
|
+
|
|
96
|
+
Good patterns:
|
|
97
|
+
|
|
98
|
+
- [temperature]\_[movement]: warm vs cold, static vs dynamic
|
|
99
|
+
- [texture]\_[era]: rough vs smooth, retro vs futuristic
|
|
100
|
+
- [emotion]\_[structure]: soft vs rigid, chaotic vs ordered
|
|
101
|
+
- [material]\_[concept]: organic vs digital, heavy vs light
|
|
102
|
+
|
|
103
|
+
Avoid:
|
|
104
|
+
|
|
105
|
+
- Single adjectives
|
|
106
|
+
- Obvious pairings without tension
|
|
107
|
+
- Generic descriptors
|
|
108
|
+
|
|
109
|
+
**The name drives the design.** Tension in the name forces creative problem-solving.
|
|
110
|
+
|
|
111
|
+
### Map Each Concept Across 7 Spectrums
|
|
112
|
+
|
|
113
|
+
For each concept, mark its position. **Extremes create distinctiveness—avoid the middle.**
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
Concept: [name]
|
|
117
|
+
|
|
118
|
+
Layout: Dense ●○○○○ Spacious
|
|
119
|
+
Color: Monochrome ○○○○● Vibrant
|
|
120
|
+
Typography: Serif ○○●○○ Display
|
|
121
|
+
Depth: Flat ○○○○● Layered
|
|
122
|
+
Energy: Calm ○●○○○ Dynamic
|
|
123
|
+
Theme: Dark ●○○○○ Light
|
|
124
|
+
Shape: Angular ○○○○● Curved
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Extreme Rule (Mandatory)
|
|
128
|
+
|
|
129
|
+
**Each design MUST have at least 2 extreme positions** (●○○○○ or ○○○○●).
|
|
130
|
+
|
|
131
|
+
Why: Middle positions (○○●○○) converge to "safe" averages. Extremes force distinctive choices.
|
|
132
|
+
|
|
133
|
+
### Verify Contrast
|
|
134
|
+
|
|
135
|
+
Before proceeding:
|
|
136
|
+
|
|
137
|
+
- [ ] Each design has **2+ extreme positions**
|
|
138
|
+
- [ ] No two concepts share the same position on 4+ spectrums
|
|
139
|
+
- [ ] If N ≥ 2, mix of dark and light themes across the N designs
|
|
140
|
+
- [ ] If N ≥ 2, mix of angular and curved across the N designs
|
|
141
|
+
|
|
142
|
+
## Phase 3: Define Concrete Specifications
|
|
143
|
+
|
|
144
|
+
For each concept, specify exact values—no adjectives.
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
### [Concept Name]
|
|
148
|
+
|
|
149
|
+
**Palette:**
|
|
150
|
+
- Background: #______
|
|
151
|
+
- Surface: #______
|
|
152
|
+
- Text: #______
|
|
153
|
+
- Text muted: #______
|
|
154
|
+
- Accent: #______
|
|
155
|
+
|
|
156
|
+
**Typography:**
|
|
157
|
+
- Font: [Google Font name]
|
|
158
|
+
- Headline: [size]px / [weight] / [letter-spacing]em
|
|
159
|
+
- Body: [size]px / [weight] / [line-height]
|
|
160
|
+
|
|
161
|
+
**Spacing:**
|
|
162
|
+
- Container max-width: [value]px
|
|
163
|
+
- Section padding: [value]px
|
|
164
|
+
- Element gap: [value]px
|
|
165
|
+
- Border-radius: [value]px
|
|
166
|
+
|
|
167
|
+
**Motion:**
|
|
168
|
+
- Duration: [value]s
|
|
169
|
+
- Easing: cubic-bezier([values])
|
|
170
|
+
- Stagger delay: [value]s
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## Phase 4: Generate HTML Files
|
|
174
|
+
|
|
175
|
+
<use_parallel_tool_calls>
|
|
176
|
+
Write all N HTML files simultaneously by making N independent Write tool calls in a single response. These files have no dependencies on each other—do not write them sequentially. Maximize parallel execution for speed.
|
|
177
|
+
</use_parallel_tool_calls>
|
|
178
|
+
|
|
179
|
+
<frontend_aesthetics>
|
|
180
|
+
You tend to converge toward generic outputs. Avoid this:
|
|
181
|
+
|
|
182
|
+
**Typography:** Never use Inter, Roboto, Arial, Helvetica, Open Sans, Space Grotesk, or system fonts. Choose distinctive typefaces. Use weight extremes (100 vs 900, not 400 vs 600). Dramatic size jumps (3x+). Tight headline letter-spacing (-0.02em to -0.05em).
|
|
183
|
+
|
|
184
|
+
**Color:** One dominant + one sharp accent. Never pure #FFFFFF or #000000 backgrounds—add subtle tint. No purple gradients.
|
|
185
|
+
|
|
186
|
+
**Motion:** Focus on high-impact moments, not scattered micro-interactions.
|
|
187
|
+
|
|
188
|
+
- **Page load**: Orchestrated staggered reveals (vary `animation-delay` by 0.05-0.1s increments)
|
|
189
|
+
- **Scroll**: Use `IntersectionObserver` for scroll-triggered fade-ins (vanilla JS, no frameworks)
|
|
190
|
+
- **Hover**: Transform + opacity + subtle shadow shifts, not just color changes
|
|
191
|
+
- **Transitions**: Custom `cubic-bezier` easings that feel physical (e.g., `cubic-bezier(0.34, 1.56, 0.64, 1)` for bounce)
|
|
192
|
+
- **Advanced**: Gradient animations via `background-position`, `backdrop-filter` transitions, CSS `@property` for animatable custom properties
|
|
193
|
+
- **Restraint**: One dramatic sequence beats many small animations. If everything moves, nothing stands out.
|
|
194
|
+
|
|
195
|
+
**Backgrounds:** Never flat solid colors. Layer gradients, add subtle noise/grain, create atmosphere.
|
|
196
|
+
|
|
197
|
+
**Layout:** Break at least one standard pattern per design. Try asymmetry, overlap, bento grids, diagonal flow, or unexpected whitespace.
|
|
198
|
+
</frontend_aesthetics>
|
|
199
|
+
|
|
200
|
+
### File Requirements
|
|
201
|
+
|
|
202
|
+
| Requirement | Details |
|
|
203
|
+
| ------------------ | ------------------------------------------------- |
|
|
204
|
+
| **Path** | `docs/design/style_{n}_{concept_name}.html` |
|
|
205
|
+
| **Content** | Realistic view matching product purpose |
|
|
206
|
+
| **Self-contained** | Inline CSS, only Google Fonts external |
|
|
207
|
+
| **Interactivity** | Hover, active, focus states + page load animation |
|
|
208
|
+
| **Responsive** | Basic mobile adaptation |
|
|
209
|
+
| **Real content** | Actual copy from PRD, no lorem ipsum |
|
|
210
|
+
|
|
211
|
+
### HTML Structure
|
|
212
|
+
|
|
213
|
+
```html
|
|
214
|
+
<!DOCTYPE html>
|
|
215
|
+
<html lang="en">
|
|
216
|
+
<head>
|
|
217
|
+
<meta charset="UTF-8" />
|
|
218
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
219
|
+
<title>[Product] - [Concept]</title>
|
|
220
|
+
|
|
221
|
+
<link rel="preconnect" href="https://fonts.googleapis.com" />
|
|
222
|
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
|
|
223
|
+
<link href="https://fonts.googleapis.com/css2?family=[Font]:[weights]&display=swap" rel="stylesheet" />
|
|
224
|
+
|
|
225
|
+
<style>
|
|
226
|
+
/* Concept: [name]
|
|
227
|
+
Spectrum: L[x] C[x] T[x] D[x] E[x] Th[x] Sh[x]
|
|
228
|
+
Extremes: [list which 2+ are extreme] */
|
|
229
|
+
|
|
230
|
+
:root {
|
|
231
|
+
--bg: #[hex];
|
|
232
|
+
--surface: #[hex];
|
|
233
|
+
--text: #[hex];
|
|
234
|
+
--text-muted: #[hex];
|
|
235
|
+
--accent: #[hex];
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
* {
|
|
239
|
+
margin: 0;
|
|
240
|
+
padding: 0;
|
|
241
|
+
box-sizing: border-box;
|
|
242
|
+
}
|
|
243
|
+
|
|
244
|
+
body {
|
|
245
|
+
font-family: "[Font]", sans-serif;
|
|
246
|
+
background: var(--bg);
|
|
247
|
+
color: var(--text);
|
|
248
|
+
}
|
|
249
|
+
|
|
250
|
+
/* Page load: staggered reveal */
|
|
251
|
+
.reveal {
|
|
252
|
+
opacity: 0;
|
|
253
|
+
transform: translateY(20px);
|
|
254
|
+
animation: fadeUp 0.6s cubic-bezier(0.16, 1, 0.3, 1) forwards;
|
|
255
|
+
}
|
|
256
|
+
.reveal:nth-child(1) {
|
|
257
|
+
animation-delay: 0.1s;
|
|
258
|
+
}
|
|
259
|
+
.reveal:nth-child(2) {
|
|
260
|
+
animation-delay: 0.15s;
|
|
261
|
+
}
|
|
262
|
+
.reveal:nth-child(3) {
|
|
263
|
+
animation-delay: 0.2s;
|
|
264
|
+
}
|
|
265
|
+
|
|
266
|
+
@keyframes fadeUp {
|
|
267
|
+
to {
|
|
268
|
+
opacity: 1;
|
|
269
|
+
transform: translateY(0);
|
|
270
|
+
}
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
/* Scroll-triggered: hidden until in view */
|
|
274
|
+
.scroll-reveal {
|
|
275
|
+
opacity: 0;
|
|
276
|
+
transform: translateY(30px);
|
|
277
|
+
transition: opacity 0.6s cubic-bezier(0.16, 1, 0.3, 1), transform 0.6s cubic-bezier(0.16, 1, 0.3, 1);
|
|
278
|
+
}
|
|
279
|
+
.scroll-reveal.visible {
|
|
280
|
+
opacity: 1;
|
|
281
|
+
transform: translateY(0);
|
|
282
|
+
}
|
|
283
|
+
|
|
284
|
+
/* Hover: physical-feeling bounce */
|
|
285
|
+
.interactive {
|
|
286
|
+
transition: transform 0.3s cubic-bezier(0.34, 1.56, 0.64, 1), box-shadow 0.3s ease;
|
|
287
|
+
}
|
|
288
|
+
.interactive:hover {
|
|
289
|
+
transform: translateY(-4px);
|
|
290
|
+
box-shadow: 0 12px 24px -8px rgba(0, 0, 0, 0.15);
|
|
291
|
+
}
|
|
292
|
+
</style>
|
|
293
|
+
</head>
|
|
294
|
+
<body>
|
|
295
|
+
<!-- Semantic HTML with real content -->
|
|
296
|
+
|
|
297
|
+
<script>
|
|
298
|
+
// Scroll-triggered animations
|
|
299
|
+
const observer = new IntersectionObserver(
|
|
300
|
+
(entries) => {
|
|
301
|
+
entries.forEach((entry) => {
|
|
302
|
+
if (entry.isIntersecting) {
|
|
303
|
+
entry.target.classList.add("visible");
|
|
304
|
+
}
|
|
305
|
+
});
|
|
306
|
+
},
|
|
307
|
+
{ threshold: 0.1 }
|
|
308
|
+
);
|
|
309
|
+
|
|
310
|
+
document.querySelectorAll(".scroll-reveal").forEach((el) => observer.observe(el));
|
|
311
|
+
</script>
|
|
312
|
+
</body>
|
|
313
|
+
</html>
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Phase 5: Verify Quality
|
|
317
|
+
|
|
318
|
+
### Per-Design Checklist
|
|
319
|
+
|
|
320
|
+
- [ ] Font is distinctive (not Inter/Roboto/Arial/system)
|
|
321
|
+
- [ ] Background has depth (not flat white/black)
|
|
322
|
+
- [ ] Page load animation with staggered delays
|
|
323
|
+
- [ ] Scroll-triggered reveals on below-fold content
|
|
324
|
+
- [ ] Hover states with transform + shadow (not just color)
|
|
325
|
+
- [ ] Custom easing (cubic-bezier), not default `ease` or `linear`
|
|
326
|
+
- [ ] CSS custom properties for colors
|
|
327
|
+
- [ ] Layout breaks at least one standard pattern
|
|
328
|
+
|
|
329
|
+
### Cross-Design Contrast
|
|
330
|
+
|
|
331
|
+
Each pair of designs must have 5+ obvious visual differences. If not, revise. (Skipped automatically when N=1.)
|
|
332
|
+
|
|
333
|
+
## Phase 6: Save & Report
|
|
334
|
+
|
|
335
|
+
Create `docs/design/` directory if needed. Save all N HTML files.
|
|
336
|
+
|
|
337
|
+
</instructions>
|
|
338
|
+
|
|
339
|
+
<output_format>
|
|
340
|
+
|
|
341
|
+
```
|
|
342
|
+
## Generated Styles (N = {N})
|
|
343
|
+
|
|
344
|
+
| # | Name | Spectrum (L/C/T/D/E/Th/Sh) | Extremes | Palette | Font |
|
|
345
|
+
|---|------|---------------------------|----------|---------|------|
|
|
346
|
+
| {k} | {name} | [x][x][x][x][x][x][x] | {which 2+} | #___, #___, #___ | {font} |
|
|
347
|
+
|
|
348
|
+
### Files
|
|
349
|
+
- docs/design/style_{k}_{name}.html
|
|
350
|
+
- ...
|
|
351
|
+
|
|
352
|
+
### Rationale
|
|
353
|
+
1. **{name}**: [1 sentence connecting to product requirements]
|
|
354
|
+
2. ...
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
</output_format>
|
|
358
|
+
|
|
359
|
+
Make bold choices. Each design should be portfolio-worthy—something you'd proudly present.
|
|
360
|
+
|
|
361
|
+
<next_step>
|
|
362
|
+
After the user picks a style, suggest:
|
|
363
|
+
→ Run `/devlyn:design-system [style-number]` to extract design tokens from the chosen style into a reusable design system reference.
|
|
364
|
+
</next_step>
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "devlyn-cli",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.3.0",
|
|
4
4
|
"description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration",
|
|
5
5
|
"homepage": "https://github.com/fysoul17/devlyn-cli#readme",
|
|
6
6
|
"bin": {
|
package/scripts/lint-skills.sh
CHANGED
|
@@ -283,10 +283,10 @@ else
|
|
|
283
283
|
fi
|
|
284
284
|
|
|
285
285
|
# ---------------------------------------------------------------------------
|
|
286
|
-
# 9. Engine
|
|
286
|
+
# 9. Engine availability fails closed; stale silent-downgrade wording is forbidden.
|
|
287
287
|
# ---------------------------------------------------------------------------
|
|
288
|
-
section "Check 9:
|
|
289
|
-
offenders=$(grep -
|
|
288
|
+
section "Check 9: Engine availability fails closed"
|
|
289
|
+
offenders=$(grep -RInE 'codex-ping failed|codex-ping fail|engine downgraded: codex-unavailable|silently downgrades|silently downgrade|silently switch to Claude|Codex CLI availability downgrade' \
|
|
290
290
|
config/skills CLAUDE.md README.md bin/ 2>/dev/null \
|
|
291
291
|
| grep -v 'roadmap-archival-workspace/' \
|
|
292
292
|
| grep -v 'devlyn:auto-resolve-workspace/' \
|
|
@@ -294,7 +294,7 @@ offenders=$(grep -RIln 'codex-ping failed\|codex-ping fail' \
|
|
|
294
294
|
| grep -v 'preflight-workspace/' \
|
|
295
295
|
|| true)
|
|
296
296
|
if [ -z "$offenders" ]; then
|
|
297
|
-
ok "
|
|
297
|
+
ok "engine availability fail-closed wording canonical"
|
|
298
298
|
else
|
|
299
299
|
while IFS= read -r f; do bad "$f"; done <<< "$offenders"
|
|
300
300
|
fi
|