devlyn-cli 1.14.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (148) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +112 -119
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +129 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/_shared/archive_run.py +130 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -481
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/optional-skills/devlyn:reap/SKILL.md +105 -0
  117. package/optional-skills/devlyn:reap/scripts/reap.sh +129 -0
  118. package/optional-skills/devlyn:reap/scripts/scan.sh +116 -0
  119. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  120. package/package.json +16 -2
  121. package/scripts/lint-skills.sh +431 -0
  122. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -602
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -116
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -204
  125. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  126. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  127. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  128. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  129. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  130. package/config/skills/devlyn:clean/SKILL.md +0 -285
  131. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  132. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  133. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  134. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  135. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  136. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  137. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  138. package/config/skills/devlyn:preflight/SKILL.md +0 -370
  139. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  140. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -90
  141. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  142. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  143. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  144. package/config/skills/devlyn:review/SKILL.md +0 -161
  145. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  146. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  147. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  148. package/config/skills/workflow-routing/SKILL.md +0 -73
@@ -1,519 +1,137 @@
1
1
  ---
2
2
  name: devlyn:ideate
3
- description: Transforms unstructured ideas into implementation-ready planning documents through structured brainstorming, research, and a built-in self-skeptical rubric pass. Produces a three-layer document architecture (Vision, Roadmap index, auto-resolve-ready specs) to eliminate context pollution in the implementation pipeline. Default `--engine auto` routes the CHALLENGE rubric pass to OpenAI Codex (GPT-5.4) as a cross-model critic for a GAN dynamic. Use when the user wants to brainstorm, plan a new project or feature set, create a vision and roadmap, or structure scattered ideas into an actionable plan. Triggers on "let's brainstorm", "let's plan", "ideate", "I have an idea for", "help me think through", "let's explore", new project planning, feature discovery, roadmap creation, or when the user is throwing ideas that need structuring.
3
+ description: Extract a verifiable spec from a user's idea by driving the conversation with focused questions. Output is a single-feature `spec.md` + `spec.expected.json` that `/devlyn:resolve --spec` consumes directly. Use when the user has an idea but not a spec, or wants AI to elicit the missing engineering context. Modes: default (single spec, AI drives Q&A), `--quick` (assume-and-confirm from one-line goal), `--from-spec <path>` (normalize external spec), `--project` (plan.md index + N specs). Optional in the pipeline `/devlyn:resolve` works standalone via free-form mode for users who skip ideate.
4
4
  ---
5
5
 
6
- # Ideation to Implementation Bridge
6
+ Spec-elicitation surface for users who have ideas but not engineering specifications. AI drives the conversation with focused questions until a structurally-valid, verifiable spec exists. Output consumed directly by `/devlyn:resolve --spec`.
7
7
 
8
- Turn unstructured thinking into auto-resolve-ready documents. The output is a precision-engineered context pipeline — each document layer serves a specific role so that implementation agents receive exactly the context they need, nothing more.
8
+ <elicit_config>
9
+ $ARGUMENTS
10
+ </elicit_config>
9
11
 
10
- <hard_boundary>
11
- This skill is a PLANNING tool, not an IMPLEMENTATION tool. Your output is documents (VISION.md, ROADMAP.md, item specs) never code changes.
12
+ <orchestrator_context>
13
+ This skill is OPTIONAL. `/devlyn:resolve` is standalone-capable: free-form mode handles trivial/medium tasks without a spec, `--spec` mode accepts handwritten specs from any source. Use ideate when the user wants AI to do the elicitation work.
14
+ </orchestrator_context>
12
15
 
13
- When the user describes a bug, improvement, or feature request through ideate, they want it CAPTURED in the roadmap, not FIXED in the codebase. Even if the fix seems trivial and obvious, resist the urge to implement it. The user chose `/devlyn:ideate` over `/devlyn:resolve` for a reason — they want planning, not coding.
16
+ <elicitation_contract>
17
+ The user does not know context engineering. They will under-specify and over-assume. AI's job is to ask focused, specific questions that surface the missing engineering decisions.
14
18
 
15
- Concretely:
16
- - Do NOT read source code to find and fix issues
17
- - Do NOT edit application files (.tsx, .ts, .py, .js, etc.)
18
- - DO create or update roadmap documents (VISION.md, ROADMAP.md, item specs)
19
- - DO explore and research the problem space to write better specs
20
- - If you catch yourself about to open a source file to make a code change, stop — that's a signal you've left ideation mode
21
- </hard_boundary>
19
+ 1. Ask one or two questions per turn, not more. Multi-question lists overwhelm and produce shallow answers.
20
+ 2. Questions are concrete and decision-grade what is the input, what is the expected output, what command verifies success, what files are out of scope.
21
+ 3. Do not ask design preferences the user clearly does not have. Infer the simplest reasonable default and confirm in one line.
22
+ 4. Stop when the spec passes structural lint AND the user explicitly confirms or 8 turns have elapsed (whichever comes first). Eight turns is a hard ceiling — beyond that, the spec is either ready or the task is too large for ideate.
23
+ 5. The output is the spec, not a transcript. Do not include the conversation in the saved files.
24
+ </elicitation_contract>
22
25
 
23
- ## Arguments
26
+ <harness_principles>
27
+ Read `_shared/runtime-principles.md` (Subtractive-first / Goal-locked / No-workaround / Evidence). The principles bind the spec content as well as your conversation. A spec that says "for future flexibility" is a Subtractive-first violation. A spec that asks for `try { ... } catch { return null }` is a No-workaround violation. AI flags these in elicitation, not after `/devlyn:resolve` has built them.
28
+ </harness_principles>
24
29
 
25
- Parse these from the user's invocation message:
30
+ <engine_routing>
31
+ Default engine: Claude. The per-engine adapter from `_shared/adapters/<model>.md` is prepended to the elicitation prompt so the model honors its own official prompt-engineering guidance during the Q&A.
32
+ </engine_routing>
26
33
 
27
- - `--engine MODE` (auto) — controls which model handles each ideation phase. Modes:
28
- - `auto` (default): Claude handles FRAME/EXPLORE/CONVERGE/DOCUMENT (ambiguous intent, writing quality), Codex runs the CHALLENGE rubric pass as critic (GAN dynamic). Requires Codex MCP server.
29
- - `codex`: Codex handles FRAME/EXPLORE/CONVERGE/DOCUMENT, Claude runs CHALLENGE (role reversal — builder and critic are always different models).
30
- - `claude`: all phases use Claude. No Codex calls.
34
+ <modes>
35
+ Four modes, selected by flag:
31
36
 
32
- **Engine pre-flight** (runs unless `--engine claude` was explicitly passed):
33
- - The default engine is `auto`. If the user did not pass `--engine`, the engine is `auto` not `claude`.
34
- - Call `mcp__codex-cli__ping` to verify the Codex MCP server is available. If ping fails, warn the user and offer: [1] Continue with `--engine claude`, [2] Abort.
35
- - Read `references/challenge-rubric.md` up front.
37
+ 1. **Default** (no flag) single-spec elicitation. AI asks questions in-conversation until lint passes. Output: `<spec-dir>/<id>-<slug>/spec.md` + `<spec-dir>/<id>-<slug>/spec.expected.json`. Default spec dir: `docs/specs/` (configurable via `--spec-dir <path>`).
38
+ 2. **`--quick`** — one-line goal, AI synthesizes a spec with explicit assumptions block, asks the user to confirm or correct in a single turn. Use when the user wants speed over thoroughness.
39
+ 3. **`--from-spec <path>`** external spec exists. AI lints it for the canonical structure, normalizes section names, generates a missing `spec.expected.json` if absent, fixes minor schema issues, and stops. Does NOT reshape Requirements / Out-of-Scope content; structural changes only.
40
+ 4. **`--project`** — multi-feature project. AI elicits a project description, decomposes it into 3-7 feature specs, writes `<spec-dir>/plan.md` (the index) and one `<spec-dir>/<id>/spec.md` + `<spec-dir>/<id>/spec.expected.json` per feature. See `references/project-mode.md`.
36
41
 
37
- **Consolidated flag**: `--with-codex` was rolled into the smarter `--engine auto` default. If the user passes it, inform them once and proceed with `--engine auto`: "Note: `--with-codex` was consolidated into `--engine auto` (default), which routes the CHALLENGE rubric pass to Codex automatically. No flag needed. Continuing with `--engine auto`."
42
+ `--spec-dir <path>` overrides the default output directory. `--engine <model>` selects the adapter.
43
+ </modes>
38
44
 
39
- ## Output Architecture
45
+ <spec_kind_escape_hatch>
46
+ The spec carries `spec.kind ∈ {feature, spike, prototype}` in its frontmatter. The kind changes downstream behavior:
40
47
 
41
- The skill produces a three-layer progressive disclosure structure:
48
+ - **feature** production-quality implementation expected. `/devlyn:resolve --spec` runs the full pipeline (PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY).
49
+ - **spike** — exploratory work; deliverable is learning, evidence, or a disposable demo. `/devlyn:resolve --spec` proceeds but VERIFY's quality bar is relaxed for code that the spike says is throwaway.
50
+ - **prototype** — between feature and spike. Production-shape but not production-grade. CLEANUP runs; VERIFY's quality bar is stricter than spike, looser than feature.
42
51
 
43
- ```
44
- docs/
45
- ├── VISION.md # Layer 1: Strategic WHY (~50-100 lines)
46
- │ # Orientation only. auto-resolve never reads this.
47
-
48
- ├── ROADMAP.md # Layer 2: Tactical index (what, in what order)
49
- │ # Thin table linking to detail specs. auto-resolve never reads this.
50
-
51
- └── roadmap/ # Layer 3: Auto-resolve-ready specs
52
- ├── phase-1/
53
- │ ├── _overview.md # Phase-level context and goals
54
- │ ├── 1.1-xxx.md # Self-contained spec → direct auto-resolve input
55
- │ └── 1.2-yyy.md
56
- ├── phase-2/
57
- │ └── ...
58
- ├── decisions/ # Architecture decision records (why we chose X over Y)
59
- │ └── 001-xxx.md
60
- └── backlog/ # Ideas acknowledged but not yet phased
61
- └── ...
62
- ```
52
+ The user picks the kind during elicitation. Default = feature when not specified. `--quick` infers from the goal text (verbs like "explore", "investigate", "spike" → spike; "implement", "ship", "add" → feature).
53
+ </spec_kind_escape_hatch>
63
54
 
64
- **Core principle**: auto-resolve reads ONE spec file. That file is self-contained. Vision and Roadmap exist for humans and for this ideation skill — not for the implementation pipeline.
55
+ ## PHASE 0: PARSE + ROUTE
65
56
 
66
- Read `references/templates/` for the exact format of each document type when generating output.
57
+ 1. Parse flags from `<elicit_config>`:
58
+ - `--quick`
59
+ - `--from-spec <path>`
60
+ - `--project`
61
+ - `--spec-dir <path>` (default `docs/specs/`)
62
+ - `--engine MODE` (default `claude`)
63
+ - `--spec-id <id>` — optional explicit id; auto-generated when absent.
67
64
 
68
- ## Conversation Protocol
65
+ 2. Engine pre-flight: `_shared/engine-preflight.md`.
69
66
 
70
- Ideation is a dialogue, not a monologue. The user will come in with scattered ideas, incomplete thoughts, and implicit assumptions. Your job is to draw out accurate, complete information through back-and-forth conversation — not to fill gaps with guesses.
67
+ 3. Mode dispatch:
68
+ - default → PHASE 1.
69
+ - `--quick` → PHASE 1Q (single turn assume-and-confirm).
70
+ - `--from-spec` → PHASE 1F (lint + normalize external).
71
+ - `--project` → PHASE 1P (project decomposition).
71
72
 
72
- <conversation_rhythm>
73
- **Ask, don't assume.** When information is missing or ambiguous, ask targeted questions. Generating a spec with wrong assumptions is worse than asking one more question. The user wants accuracy (documents they can trust and hand to auto-resolve), not speed.
73
+ ## PHASE 1: ELICITATION (default mode)
74
74
 
75
- **2-3 questions at a time, max.** Don't dump a 10-item questionnaire. Ask the most important unknowns, get answers, then ask the next batch based on what you learned. Each exchange should build on the last.
75
+ Prompt body: `references/elicitation.md`. Adapter prepended.
76
76
 
77
- **Summarize after each exchange.** After the user shares information, reflect it back concisely: "So what I'm hearing is [X]. Is that right, or am I missing something?" This catches misunderstandings early — much cheaper than rewriting specs later.
77
+ The elicitation agent:
78
+ 1. Reads the user's initial goal from `<elicit_config>`.
79
+ 2. Identifies the missing engineering decisions (input shape, output shape, success command, scope boundary, constraints).
80
+ 3. Asks 1-2 focused questions per turn until each blank is filled or the user accepts an inferred default.
81
+ 4. Maintains a running draft spec in `.devlyn/ideate-draft.md` (run-scoped, gitignored).
82
+ 5. Stops when the structural lint passes AND user confirms, or 8 turns elapsed.
78
83
 
79
- **Confirm before phase transitions.** Before moving from FRAME → EXPLORE, or EXPLORE → CONVERGE, summarize the current state and ask if the user is ready to move on. Never silently transition.
84
+ Structural lint (inline check, no script needed):
85
+ - Frontmatter has `id`, `title`, `kind`, `status: planned`.
86
+ - `## Context` non-empty (≥ 1 sentence).
87
+ - `## Requirements` has ≥ 1 `- [ ]` bullet.
88
+ - `## Out of Scope` present (may list "none" if truly nothing).
89
+ - `## Verification` has either ≥ 1 named command OR an explicit "all Requirements are pure-design" note.
80
90
 
81
- **Capture energy, then clarify.** When the user is excited and throwing out rapid-fire ideas, don't interrupt the flow with structural questions. Let them finish, capture everything, then come back with targeted clarifications: "Love these ideas. A few things I want to make sure I get right: [questions]."
91
+ After lint passes:
92
+ 1. Write `<spec-dir>/<id>-<slug>/spec.md` (the spec).
93
+ 2. Generate `<spec-dir>/<id>-<slug>/spec.expected.json` from the spec's `## Verification` block + any `forbidden_patterns` / `required_files` / `forbidden_files` / `max_deps_added` the conversation surfaced.
94
+ 3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2, fix the carrier and re-run.
95
+ 4. Print: `spec ready — /devlyn:resolve --spec <spec-path>`.
82
96
 
83
- **Track what's confirmed vs. assumed.** Mentally separate facts the user stated from inferences you made. When generating documents, only write confirmed facts. Flag assumptions explicitly: "I'm assuming [X] based on [Y] — correct?"
84
- </conversation_rhythm>
97
+ ## PHASE 1Q: QUICK MODE
85
98
 
86
- ## Detecting the Mode
99
+ Single-turn assume-and-confirm. Prompt body: see `references/elicitation.md` § "Quick mode".
87
100
 
88
- Before starting, identify what the user needs:
101
+ 1. AI synthesizes a spec from the one-line goal.
102
+ 2. AI surfaces an explicit "Assumptions made" section listing every inferred decision.
103
+ 3. User responds with "go" / "fix X" / "no, different".
104
+ 4. On "go": write spec + spec.expected.json + lint + announce.
105
+ 5. On "fix X": apply correction, re-show, ask again. Maximum 3 correction rounds before escalating to default mode.
89
106
 
90
- | Signal | Mode | Approach |
91
- |--------|------|----------|
92
- | No existing docs, new project or idea | **Greenfield** | Full flow: Frame → Explore → Converge → Document |
93
- | Existing docs, user adds new ideas | **Expand** | Lighter Frame, focused Explore on new area, merge into existing phases |
94
- | Existing docs, user describes a single bug/improvement/idea | **Quick Add** | Read existing roadmap, create one item spec, add row to ROADMAP.md |
95
- | One specific feature needs deep thought | **Deep-dive** | Intensive Explore on one topic, output 1-3 specs |
96
- | User shares links/resources to process | **Research-first** | Lead with Explore (research synthesis), then standard flow |
97
- | Existing roadmap, user wants to reprioritize | **Replan** | Read existing docs, focus on Converge, update documents |
107
+ ## PHASE 1F: FROM-SPEC MODE
98
108
 
99
- **Tie-breaks when a request matches two modes:** choose the narrowest mode that satisfies the request. Quick Add wins over Expand when the user has one concrete item in mind. Research-first wins over Deep-dive when links or resources are the primary input. Deep-dive wins over Expand when one topic specifically needs depth. Replan is chosen only when priority or order changes are explicit. If two modes still look equally plausible after applying these rules, present the top two to the user and let them pick — silently choosing one wastes the session if the other was right.
109
+ Prompt body: `references/from-spec-mode.md`.
100
110
 
101
- Announce the detected mode and confirm before proceeding.
111
+ 1. Read the external spec at `<path>`.
112
+ 2. Lint structure (same checks as default mode).
113
+ 3. Identify missing pieces (no frontmatter, missing sections, malformed Verification block).
114
+ 4. Apply structural fixes only — do NOT reshape Requirements / Out-of-Scope content. The user's substantive intent is preserved.
115
+ 5. Generate `spec.expected.json` if absent (best-effort from `## Verification` block).
116
+ 6. Write the normalized spec back to `<spec-dir>/<id>-<slug>/` (preserves original at `<path>` untouched unless user passes `--in-place`).
117
+ 7. Lint pass → announce. Lint fail → surface the unfixable issue and exit non-zero.
102
118
 
103
- ### Expand Mode Detail
119
+ ## PHASE 1P: PROJECT MODE
104
120
 
105
- Expand is the most common mode after initial setup — the user already has Vision + Roadmap and wants to add new capabilities. This mode requires careful integration with existing documents.
121
+ Prompt body: `references/project-mode.md`.
106
122
 
107
- **On entry:**
108
- 1. Read `docs/VISION.md`, `docs/ROADMAP.md`, and existing phase `_overview.md` files to understand the established context
109
- 2. Scan existing item specs to understand what's built and what's planned
110
- 3. **Run the Archive Pass** (see Context Archiving below) before summarizing. Summarizing a stale roadmap to the user wastes the exchange — they'll see "Phase 1 has 4 items" when really all 4 are already Done and the phase should be collapsed.
111
- 4. Summarize your understanding: "Here's what exists: [phases, item count, current status]. You want to add [new area]. Does this expand an existing phase or warrant a new one?"
123
+ 1. AI elicits a project description (longer Q&A — multi-feature scope warrants more turns).
124
+ 2. AI decomposes the project into 3-7 feature specs. Each feature is independently shippable; cross-feature dependencies surface explicitly in the spec frontmatter `depends_on:` field.
125
+ 3. AI writes `<spec-dir>/plan.md` — index file with: project name, decomposition rationale, list of feature specs with id + title + dependency, suggested implementation order.
126
+ 4. AI writes one `<spec-dir>/<id>/spec.md` + `<spec-dir>/<id>/spec.expected.json` per feature, each lint-validated.
127
+ 5. Announce: `project ready N specs at <spec-dir>/. Start with /devlyn:resolve --spec <first-spec-path>`.
112
128
 
113
- **During ideation:**
114
- - FRAME is lighter — the vision already exists, focus on framing the NEW area only
115
- - EXPLORE focuses specifically on the new capability and how it integrates with existing features
116
- - CONVERGE must consider dependencies on existing items, not just new ones
129
+ `/devlyn:resolve` consumes one spec at a time; the user works through `plan.md`'s suggested order. Multi-feature parallel runs are Mission 2 work.
117
130
 
118
- **During document generation:**
119
- - Don't overwrite existing VISION.md unless the user explicitly wants to update it
120
- - Continue numbering from existing IDs (if Phase 2 exists with 2.1-2.4, new items start at 2.5 or create Phase 3)
121
- - Add new rows to ROADMAP.md, don't regenerate the whole table
122
- - New item specs can reference existing items in their Dependencies section
123
- - If new items change the meaning of existing items, flag this: "Adding [X] may affect the scope of existing item [Y]. Should we update [Y]'s spec?"
131
+ ## State management
124
132
 
125
- In Replan mode: read existing docs first, **run the Archive Pass** (see Context Archiving below) before any reprioritization — you can't sensibly reorder work that's already finished — then focus on the Converge phase to reprioritize what remains. The Archive Pass also surfaces Backlog items whose Revisit date has passed, which are natural candidates when replanning.
133
+ ideate is conversational, not pipeline-staged. State lives in:
134
+ - `.devlyn/ideate-draft.md` — current draft spec during elicitation (run-scoped, gitignored).
135
+ - `<spec-dir>/<id>-<slug>/` — final output (committed to repo by user choice).
126
136
 
127
- ### Quick Add Mode Detail
128
-
129
- Quick Add is for when the user has a single concrete idea, bug report, or improvement — they don't need a full ideation session, just a new entry in the roadmap. This is the most common trigger for misuse: the request looks like a simple fix, so the temptation is to implement it. Don't. Capture it.
130
-
131
- **On entry:**
132
- 1. Read `docs/ROADMAP.md` and relevant phase `_overview.md` files
133
- 2. **Run the Archive Pass first** (see Context Archiving below). Do this *before* you figure out where the new item goes — a stale roadmap will mislead phase selection and ID numbering. If the pass moves a phase out of the active section, the new item's natural home may change.
134
- 3. Identify the best-fit phase for the new item (or suggest a new phase if it doesn't fit)
135
- 4. Determine the next available item ID (e.g., if phase 2 has 2.1-2.4, the new item is 2.5)
136
-
137
- **Workflow (minimal — no full Frame/Explore/Converge):**
138
- 1. Confirm the idea with the user: "I'll add this as [item title] in Phase [N]. That sound right?"
139
- 2. Ask 1-2 clarifying questions if the requirement is unclear (skip if the user gave enough detail)
140
- 3. Generate the item spec following `references/templates/item-spec.md`
141
- 4. Add a row to `docs/ROADMAP.md`
142
- 5. Output confirmation: the file path and a suggested auto-resolve command
143
-
144
- **Example output:**
145
- ```
146
- Added: docs/roadmap/phase-2/2.5-back-to-review-button.md
147
-
148
- To implement:
149
- /devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-2/2.5-back-to-review-button.md"
150
- ```
151
-
152
- ### Context Archiving
153
-
154
- ROADMAP.md is the tactical index. Every row that isn't Planned / In Progress / Blocked is noise — it dilutes attention, pads the file past its 150-line target, and makes future ideation sessions read stale context they'll have to mentally filter out. Done work should move; it shouldn't disappear.
155
-
156
- The goal state: the active section of ROADMAP.md only lists work that still needs doing. Everything completed lives under a collapsed `## Completed` block at the bottom. Item spec files themselves stay in place — they remain on disk at `docs/roadmap/phase-N/{id}.md` because other specs may reference them — only the index row moves.
157
-
158
- #### The Archive Pass
159
-
160
- Run this at the start of every Quick Add, Expand, and Replan session (each mode's "On entry" checklist tells you when). It's deterministic and cheap. Never skip it to "save time" — the time you save by skipping it is immediately spent by you and the user arguing about a roadmap that shows phantom work.
161
-
162
- 1. **Read `docs/ROADMAP.md`.** For each phase, look at the Status column of every row.
163
- 2. **For each phase where every row is `Done`:** archive the whole phase.
164
- - Cut the phase's `## Phase N: …` heading and table out of the active section.
165
- - If no `## Completed` section exists yet at the bottom of the file, create one.
166
- - Add a `<details>` block inside Completed for this phase (see format below). Use the latest completion date you can find in the item spec frontmatter (`completed:` field, or today's date if absent). Item count is the row count.
167
- 3. **For individual `Done` rows inside an otherwise-active phase:** leave them in place. A row only moves when its whole phase is finished. (Mixed-state phases stay mixed so the user can see recent wins alongside open work.)
168
- 4. **Scan the Backlog table.** Surface any row whose "Revisit" date has passed — mention it to the user as a replan candidate. Don't auto-promote it; that's a conversation.
169
- 5. **Scan `docs/roadmap/decisions/`.** Flag any decision whose status is `accepted` but whose reasoning is visibly contradicted by the work that's now Done. Don't silently edit decisions; raise them as open questions.
170
- 6. **Report what you did.** Before moving on to the mode's main work, tell the user in one short paragraph: "Archived Phase 1 (3 items). Active roadmap is now Phase 2 (2 items). Proceeding with [Quick Add / Expand / Replan]." Skip the report only if nothing changed.
171
-
172
- **Completed block format** (place at the bottom of ROADMAP.md, below Decisions):
173
-
174
- ```markdown
175
- ## Completed
176
- <details>
177
- <summary>Phase 1: Foundation (completed 2026-04-15, 4 items)</summary>
178
-
179
- | # | Feature | Completed |
180
- |---|---------|-----------|
181
- | 1.1 | Auth & Onboarding | 2026-02-10 |
182
- | 1.2 | Order Management | 2026-03-05 |
183
- | 1.3 | Inventory Tracking | 2026-03-28 |
184
- | 1.4 | Customer Directory | 2026-04-15 |
185
- </details>
186
- ```
187
-
188
- If the `## Completed` section already exists and you're archiving an additional phase, append a new `<details>` block — don't rewrite existing ones.
189
-
190
- #### Outdated decisions
191
-
192
- When a decision becomes wrong because the world changed under it:
193
- - Don't delete it — set its `status:` to `superseded` in the decision file's frontmatter and add a one-line pointer to the replacement decision record.
194
- - This preserves the reasoning history for future reference, which matters more than a tidy decisions table.
195
-
196
- ## Phase 1: FRAME
197
-
198
- <phase_goal>Establish problem space boundaries before exploring solutions.</phase_goal>
199
-
200
- The biggest risk in ideation is premature convergence — jumping to solutions before understanding the problem. This phase prevents that.
201
-
202
- Establish through conversation:
203
- 1. **Job-to-be-Done**: In one sentence — "When [situation], [user] wants to [motivation], so they can [outcome]." Capture this before anything else. If the user cannot produce it, that is itself the finding — pause and explore the situation until the sentence exists. A bare problem statement without this frame is a state description, not a job, and downstream specs built from it will describe system behavior instead of customer progress.
204
- 2. **Constraints**: What can't change? (tech stack, timeline, existing commitments)
205
- 3. **Success criteria**: How will we know this worked? (outcomes, not outputs)
206
- 4. **Anti-goals**: What are we explicitly NOT trying to do?
207
-
208
- Adapt to what the user has already shared — if they came in with a clear vision, this might be a quick confirmation. If the idea is fuzzy, spend more time here. Ask conversationally, not as a rigid questionnaire.
209
-
210
- Don't write documents yet. The output of this phase is a shared mental model between you and the user.
211
-
212
- ## Phase 2: EXPLORE
213
-
214
- <phase_goal>Systematically expand the possibility space before narrowing it.</phase_goal>
215
-
216
- This is the creative core — the phase that should take the most conversational turns. The user chose to ideate with AI because they want perspectives, research, and creative expansion they wouldn't get alone.
217
-
218
- <use_parallel_tool_calls>
219
- EXPLORE often needs several independent lookups: web search for prior art, doc fetches, repo greps for existing patterns. When tool calls have no dependencies on each other, issue them in parallel in the same response. Spawn subagents in parallel when fanning out across distinct research topics. Only chain calls that depend on a previous call's output. Pace research across turns rather than front-loading every lookup before the user has framed direction — EXPLORE is dialogue-driven, parallel is just for the lookups inside any single turn.
220
- </use_parallel_tool_calls>
221
-
222
- <research_protocol>
223
- When relevant, actively research before and during brainstorming:
224
- - **Existing solutions**: What's already out there? (web search, documentation)
225
- - **Technical feasibility**: Can this be built within the constraints? Where are the hard parts?
226
- - **Patterns and prior art**: How have similar problems been solved?
227
- - **Market/user context**: Who else needs this? What do they currently use?
228
- - **Evidence discipline**: Treat prior art as source-backed only when verified by a fetched link or documentation the user can open. If a pattern is inferred from memory or analogy, label it `[UNVERIFIED]` inline and do not present it as market fact. The CHALLENGE rubric's NO GUESSWORK axis fires hard on unlabeled claims that look authoritative but are actually recall.
229
-
230
- Not every ideation needs all of these — a personal side project doesn't need market research. Judge what's relevant and use subagents for parallel research when multiple topics need investigation.
231
- </research_protocol>
232
-
233
- <multi_perspective>
234
- For each major idea, consider it from at least three angles:
235
- - **User**: Is this actually useful? Does it solve a real pain?
236
- - **Technical**: Is this buildable? Where are the complexity hotspots?
237
- - **Strategic**: Does this align with the vision? Does it create leverage for future work?
238
-
239
- Add perspectives as relevant:
240
- - **Risk**: What could go wrong? What are the dependencies?
241
- - **Business**: Does this create value? Is the effort justified?
242
- - **Accessibility**: Is this inclusive? Who gets left out?
243
- </multi_perspective>
244
-
245
- <creative_expansion>
246
- When the conversation needs energy or the user feels stuck:
247
- - **"What if..."** — Remove a constraint and see what emerges
248
- - **Analogy transfer** — "How does [adjacent domain] solve this?"
249
- - **Inversion** — "What's the worst version? Now invert it."
250
- - **10x thinking** — "If this needed 10x users, what changes?"
251
- - **Minimum viable magic** — "What's the smallest thing that would feel magical?"
252
-
253
- Use these naturally in conversation, not as a mechanical checklist.
254
- </creative_expansion>
255
-
256
- As ideas accumulate, periodically synthesize:
257
- ```
258
- Here's where we are:
259
- - Core ideas: [list]
260
- - Open questions: [list]
261
- - Tensions to resolve: [list]
262
- - Research still needed: [list]
263
- ```
264
-
265
- This prevents circular conversations and gives the user a clear sense of progress.
266
-
267
- ## Phase 3: CONVERGE
268
-
269
- <phase_goal>Transform exploration into decisions.</phase_goal>
270
-
271
- When the user signals readiness or exploration winds down naturally, shift to convergence.
272
-
273
- ### Theme Clustering
274
- Group related ideas into coherent themes:
275
- ```
276
- Theme A: [name]
277
- - Ideas: 1, 3, 7
278
- - Value: [why this matters]
279
- - Risk: [what could go wrong]
280
- ```
281
-
282
- ### Prioritization
283
- Use value x feasibility as the primary framework:
284
- - **High value + High feasibility** → Phase 1 (build first)
285
- - **High value + Low feasibility** → Phase 2+ (build after foundation exists)
286
- - **Low value + High feasibility** → Backlog (if time permits)
287
- - **Low value + Low feasibility** → Cut
288
-
289
- Present as a recommendation — the user makes the final call on ordering.
290
-
291
- ### Sequencing
292
- Within each phase:
293
- - **Dependencies**: What must exist before what?
294
- - **Risk ordering**: Build uncertain things first (fail fast)
295
- - **Value delivery**: Each phase should deliver usable value, not just infrastructure
296
-
297
- ### Architecture Decisions
298
- Surface decisions that affect multiple items — technology choices, data model, integration approaches, UX patterns. For each: **What** was decided, **Why** (tradeoffs), and **What alternatives** were considered. These become decision records.
299
-
300
- ### Internal draft — do not show the user yet
301
-
302
- At this point you have an internal convergence draft: themes, phases, items, decisions. **Do not present it to the user yet.** Phase 3.5 CHALLENGE runs next, and the user will see exactly one summary — the post-challenge plan, with visibility into what CHALLENGE changed. Showing the pre-challenge draft first and then changing it after challenge creates a two-round confirmation loop that burns the user's trust.
303
-
304
- ## Phase 3.5: CHALLENGE
305
-
306
- <phase_goal>Apply a strict 5-axis rubric to the internal convergence draft, then present one post-challenge summary to the user for confirmation. Always runs.</phase_goal>
307
-
308
- <thinking_effort>
309
- Engage maximum thinking effort here — both the solo rubric pass and, if enabled, the Codex pass. Use extended thinking ("ultrathink") when reading each item, applying each axis, and producing revisions. The default Claude failure mode in self-review is nodding along to the draft you just produced; shallow thinking here is the exact pattern this phase exists to prevent.
310
-
311
- Before finalizing the rubric pass, verify your findings against the rubric one more time: every flagged item should have a specific Quote, a failing axis, and a concrete revision — not a vague concern.
312
- </thinking_effort>
313
-
314
- ### The rubric — single source of truth
315
-
316
- Read `references/challenge-rubric.md` before starting. That file is the only definition of the 5 axes, the finding format, the hard rule about respecting explicit user intent, and the good-vs-bad examples. Both the solo pass and the Codex pass use the same rubric; do not re-derive it inline.
317
-
318
- ### Solo pass (always runs)
319
-
320
- Apply the rubric to the internal convergence draft. Produce findings in the format specified in `challenge-rubric.md` (Severity / Quote / Axis / Why / Fix).
321
-
322
- For Quick Add with one new item, one solo pass is enough. For a full greenfield or expand plan, run the rubric once, revise, and run it again on the revision. If a third pass would be needed, the plan has structural problems that belong in the user-facing summary as open questions — surface them rather than iterating further.
323
-
324
- ### Codex critic pass (engine-routed)
325
-
326
- **If `--engine auto`** (default): Codex runs the CHALLENGE rubric pass automatically as critic.
327
-
328
- Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, `workingDirectory: <project root>`. The `prompt` parameter is built from the packaged plan + the inlined rubric + the appended Codex instructions. Codex has no filesystem access to this project, so everything it needs travels in the prompt.
329
-
330
- **Step 1 — Package the post-solo plan.** Build the prompt with these sections in this order:
331
-
332
- ```
333
- ## Problem framing (from FRAME phase)
334
- [problem statement, constraints, success criteria, anti-goals]
335
-
336
- ## Confirmed facts vs assumptions
337
- Confirmed by user: [list each fact the user explicitly confirmed]
338
- Assumptions (not yet confirmed): [list each assumption the agent made]
339
-
340
- ## Plan (post-solo-CHALLENGE)
341
- Vision: [one sentence]
342
- Phase 1 ([theme]): [items with one-line descriptions and dependencies]
343
- Phase 2 ([theme]): ...
344
- Architecture decisions: [each with what / why / alternatives considered]
345
- Deferred to backlog: [items + reason]
346
-
347
- ## Findings from the solo rubric pass
348
- [list each with: severity, axis, quote, why, fix, whether applied]
349
-
350
- ## Rubric
351
- [INLINE the full text of references/challenge-rubric.md here verbatim — Codex needs the rubric definition in the prompt itself]
352
-
353
- ## Your job
354
- You are applying an independent rubric pass to the PLANNING document above. This is a roadmap, not code — judge the shape of the plan, not implementation details. The user explicitly asked to be challenged because soft-pedaled plans waste their time.
355
-
356
- You are running AFTER a solo pass by Claude. Catch what the solo pass missed; do not just agree with what it already caught. For each existing solo finding, reply either "confirmed" (with one-line agreement) or "I would frame this differently" (with a reason). Then add your own findings that the solo pass missed.
357
-
358
- Use the finding format from the rubric above: Severity / Quote / Axis / Why / Fix. The Quote field is load-bearing — anchor each finding to a specific line from the plan.
359
-
360
- Respect explicit user intent. If the user confirmed something in the "Confirmed facts" section, the rubric does not override it silently. Raise the conflict as a note and let the orchestrator surface it to the user.
361
-
362
- End with a verdict: PASS / PASS WITH MINOR FIXES / FAIL — REVISION REQUIRED, plus a one-line explanation.
363
- ```
364
-
365
- **Step 2 — Reconcile.** Merge the two finding lists:
366
- - Same finding from both → keep the more specific wording, mark "confirmed by both"
367
- - Codex-only → prefix `[codex]` in internal notes so the user-facing summary can attribute correctly
368
- - Solo-only → keep as-is
369
- - Conflicts (solo says X, Codex says not-X) → record both, do not silently pick one; if material, surface as an open question in the user-facing summary
370
-
371
- If Codex raised CRITICAL or HIGH findings the solo pass missed, apply the fixes to the plan before presenting the user-facing summary — unless fixing would change something the user explicitly confirmed, in which case follow the rubric's "Respect explicit user intent" rule.
372
-
373
- **Do not loop.** One Codex pass is enough. If the result is still FAIL after reconciliation, the plan has structural problems that belong in the user-facing summary as open questions rather than further iteration.
374
-
375
- **If `--engine codex`**: Role reversal — Codex built the plan, so Claude runs the solo CHALLENGE pass and that is the only pass. Do not also run Codex on CHALLENGE — builder and critic should always be different models. Skip this section.
376
-
377
- **If `--engine claude`**: No Codex calls. The solo pass is the only pass.
378
-
379
- ### Respect explicit user intent
380
-
381
- The rubric is a quality lens, not an override. If a finding conflicts with something the user explicitly and clearly asked for, follow the "Hard rule" section in `challenge-rubric.md`: record the finding, **do not silently rewrite the plan**, and surface it as an open question in the summary below. The user makes the call.
382
-
383
- ### User-facing summary (the first and only time the user sees the plan)
384
-
385
- After the rubric pass(es), present the post-challenge plan to the user for confirmation. This is the first time the user sees the converged plan — by design, so they see a rubric-checked result rather than a draft that immediately gets revised.
386
-
387
- Format:
388
- ```
389
- Vision: [one sentence]
390
- Phases: [N] phases, [M] total items
391
- Phase 1 ([theme]): [items with brief descriptions]
392
- Phase 2 ([theme]): [items]
393
- Key decisions: [list]
394
- Deferred: [items with reasons]
395
-
396
- ## CHALLENGE results
397
-
398
- Solo pass: [N findings, M applied]
399
- Codex pass: [N findings, M applied] ← only on --engine auto
400
-
401
- Changes applied during CHALLENGE:
402
- - [item]: [what changed and which axis triggered it]
403
-
404
- Open questions for you (rubric flagged something you explicitly asked for):
405
- - [item]: rubric says [finding]; you asked for [original]; here is the tradeoff — proceed as-is, or adopt the alternative?
406
- ```
407
-
408
- Get explicit confirmation before proceeding to DOCUMENT.
409
-
410
- ### Quick Add mode
411
-
412
- For single-item additions, run one solo rubric pass on just the new item. Even then do not skip — single-item additions are exactly where overengineering and workarounds slip in unnoticed, because the lack of surrounding context makes a bad item look self-contained and harmless.
413
-
414
- ## Engine Routing for FRAME / EXPLORE / CONVERGE / DOCUMENT
415
-
416
- **If `--engine codex`**: Phases 1-3 and Phase 4 are delegated to Codex. For each phase, call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "workspace-write"`, and the phase instructions + user context as the prompt. Use `sessionId` to maintain conversational context across phases (note: sandbox/fullAuto only apply on the first call). Claude remains the orchestrator — it reads Codex's output, manages the conversation with the user (confirmation prompts, clarifying questions), and routes findings between phases.
417
-
418
- **If `--engine auto` or `--engine claude`**: All planning phases use Claude directly (current behavior). Claude's ambiguous intent handling and writing quality benchmarks favor it for planning tasks.
419
-
420
- ## Phase 4: DOCUMENT
421
-
422
- <phase_goal>Generate the three-layer document set.</phase_goal>
423
-
424
- Read the templates before generating:
425
- - `references/templates/vision.md` — VISION.md format
426
- - `references/templates/roadmap.md` — ROADMAP.md index format
427
- - `references/templates/item-spec.md` — Auto-resolve-ready spec format
428
- - `references/templates/decision.md` — Architecture decision record format
429
-
430
- ### Generation Order
431
- 1. `docs/VISION.md` — from Phase 1 framing + Phase 3 decisions
432
- 2. `docs/roadmap/decisions/` — one file per architecture decision
433
- 3. `docs/roadmap/phase-N/_overview.md` — phase-level context
434
- 4. `docs/roadmap/phase-N/{id}-{name}.md` — one per roadmap item
435
- 5. `docs/ROADMAP.md` — index linking to everything above
436
-
437
- ### Item Spec Quality
438
-
439
- Each Layer 3 spec is the direct input to auto-resolve. Its quality determines implementation quality.
440
-
441
- <spec_quality_criteria>
442
- **Requirements section** — becomes auto-resolve's done-criteria:
443
- - Testable: a test can assert it OR a human can verify in under 30 seconds
444
- - Specific: not "handles errors well" but "returns 400 with `{error: 'missing_field', field: 'email'}`"
445
- - Scoped: tied to this item only, not aspirational
446
-
447
- **Context section** — 2-3 sentences maximum. Just enough for auto-resolve to understand WHY without loading the full vision.
448
-
449
- **Out of Scope** — explicitly states what this item does NOT do. This is what prevents auto-resolve from over-building, which is one of its most common failure modes.
450
-
451
- **Constraints** — technical constraints with reasoning. Auto-resolve respects constraints significantly better when it understands the motivation behind them.
452
- </spec_quality_criteria>
453
-
454
- If an item is too vague to write specific requirements, it needs more exploration (revisit Phase 2 for that item) or should be split into smaller items.
455
-
456
- ### Handling Existing Documents
457
- In **Expand** and **Replan** modes:
458
- - Read existing documents first
459
- - Merge new items into the existing phase structure
460
- - Preserve existing items (don't overwrite or reorder without confirmation)
461
- - Update ROADMAP.md index to include new entries
462
-
463
- ### Output Summary
464
- After generating all documents:
465
- ```
466
- Documents created:
467
- - docs/VISION.md
468
- - docs/ROADMAP.md
469
- - docs/roadmap/phase-1/_overview.md
470
- - docs/roadmap/phase-1/1.1-xxx.md
471
- - docs/roadmap/phase-1/1.2-yyy.md
472
- - docs/roadmap/decisions/001-xxx.md
473
- [total: N files]
474
- ```
475
-
476
- ## Phase 5: BRIDGE
477
-
478
- <phase_goal>Connect documents to the implementation pipeline.</phase_goal>
479
-
480
- After document generation, output the implementation guide:
481
-
482
- ```
483
- ## Implementation
484
-
485
- To implement each item:
486
- /devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-1/1.1-xxx.md — read the spec file for requirements, constraints, and scope boundaries"
487
-
488
- Recommended order (respecting dependencies):
489
- 1. 1.1 [name] — no dependencies
490
- 2. 1.2 [name] — depends on 1.1
491
- 3. 1.3 [name] — depends on 1.1
492
- ...
493
-
494
- After completing each item:
495
- 1. Update status in the item spec frontmatter (status: done)
496
- 2. Update ROADMAP.md status column
497
- ```
498
-
499
- The auto-resolve prompt explicitly tells the build agent to read the spec file — this ensures done-criteria are adopted from the spec rather than generated from scratch, preserving the ideation context through to implementation.
500
-
501
- ## Quality Checklist
502
-
503
- Before finalizing, verify:
504
- - [ ] Every roadmap item has a linked spec file
505
- - [ ] Every spec has testable requirements (not vague statements)
506
- - [ ] Every spec has an Out of Scope section
507
- - [ ] Every spec's Context section is 3 sentences or fewer
508
- - [ ] ROADMAP.md is an index only — no inline specifications
509
- - [ ] No spec requires reading VISION.md to be understood (self-contained)
510
- - [ ] Dependencies between items are documented in both specs
511
- - [ ] Architecture decisions include reasoning and alternatives considered
512
- - [ ] CHALLENGE ran against `references/challenge-rubric.md` (solo, plus Codex critic on `--engine auto`); no item still fails any axis at CRITICAL or HIGH severity
513
- - [ ] User saw the post-challenge plan as the first and only confirmation prompt — no pre-challenge draft was shown first
514
- - [ ] Any rubric finding that conflicted with explicit user intent was surfaced as an open question, not silently applied
515
- - [ ] Every requirement is traceable to a confirmed fact, a verified source, or an explicitly labeled assumption — no unmarked guesses slipped into the specs
516
-
517
- ## Language
518
-
519
- Generate all documents in the language the user communicates in. If the user mixes languages, match their primary language for prose and keep technical terms in English.
137
+ No `pipeline.state.json` here that's resolve's surface.