devlyn-cli 1.15.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +104 -0
- package/CLAUDE.md +135 -21
- package/README.md +43 -125
- package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
- package/benchmark/auto-resolve/README.md +114 -0
- package/benchmark/auto-resolve/RUBRIC.md +162 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
- package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
- package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
- package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
- package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
- package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
- package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
- package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
- package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
- package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
- package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
- package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
- package/benchmark/auto-resolve/scripts/judge.sh +359 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
- package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
- package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
- package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
- package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
- package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
- package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
- package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
- package/bin/devlyn.js +129 -17
- package/config/skills/_shared/adapters/README.md +64 -0
- package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
- package/config/skills/_shared/adapters/opus-4-7.md +29 -0
- package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
- package/config/skills/_shared/codex-config.md +54 -0
- package/config/skills/_shared/codex-monitored.sh +141 -0
- package/config/skills/_shared/engine-preflight.md +35 -0
- package/config/skills/_shared/expected.schema.json +93 -0
- package/config/skills/_shared/pair-plan-schema.md +298 -0
- package/config/skills/_shared/runtime-principles.md +110 -0
- package/config/skills/_shared/spec-verify-check.py +519 -0
- package/config/skills/devlyn:ideate/SKILL.md +99 -429
- package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
- package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
- package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
- package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
- package/config/skills/devlyn:resolve/SKILL.md +172 -184
- package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
- package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
- package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
- package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
- package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
- package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
- package/package.json +12 -2
- package/scripts/lint-skills.sh +431 -0
- package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
- package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
- package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
- package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
- package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
- package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
- package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
- package/config/skills/devlyn:clean/SKILL.md +0 -285
- package/config/skills/devlyn:design-ui/SKILL.md +0 -351
- package/config/skills/devlyn:discover-product/SKILL.md +0 -124
- package/config/skills/devlyn:evaluate/SKILL.md +0 -564
- package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
- package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
- package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
- package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
- package/config/skills/devlyn:preflight/SKILL.md +0 -355
- package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
- package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
- package/config/skills/devlyn:product-spec/SKILL.md +0 -603
- package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
- package/config/skills/devlyn:review/SKILL.md +0 -161
- package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
- package/config/skills/devlyn:team-review/SKILL.md +0 -493
- package/config/skills/devlyn:update-docs/SKILL.md +0 -463
- package/config/skills/workflow-routing/SKILL.md +0 -73
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
|
@@ -1,467 +1,137 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: devlyn:ideate
|
|
3
|
-
description:
|
|
3
|
+
description: Extract a verifiable spec from a user's idea by driving the conversation with focused questions. Output is a single-feature `spec.md` + `spec.expected.json` that `/devlyn:resolve --spec` consumes directly. Use when the user has an idea but not a spec, or wants AI to elicit the missing engineering context. Modes: default (single spec, AI drives Q&A), `--quick` (assume-and-confirm from one-line goal), `--from-spec <path>` (normalize external spec), `--project` (plan.md index + N specs). Optional in the pipeline — `/devlyn:resolve` works standalone via free-form mode for users who skip ideate.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
Spec-elicitation surface for users who have ideas but not engineering specifications. AI drives the conversation with focused questions until a structurally-valid, verifiable spec exists. Output consumed directly by `/devlyn:resolve --spec`.
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
<elicit_config>
|
|
9
|
+
$ARGUMENTS
|
|
10
|
+
</elicit_config>
|
|
9
11
|
|
|
10
|
-
<
|
|
11
|
-
This skill is
|
|
12
|
+
<orchestrator_context>
|
|
13
|
+
This skill is OPTIONAL. `/devlyn:resolve` is standalone-capable: free-form mode handles trivial/medium tasks without a spec, `--spec` mode accepts handwritten specs from any source. Use ideate when the user wants AI to do the elicitation work.
|
|
14
|
+
</orchestrator_context>
|
|
12
15
|
|
|
13
|
-
|
|
16
|
+
<elicitation_contract>
|
|
17
|
+
The user does not know context engineering. They will under-specify and over-assume. AI's job is to ask focused, specific questions that surface the missing engineering decisions.
|
|
14
18
|
|
|
15
|
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
</hard_boundary>
|
|
19
|
+
1. Ask one or two questions per turn, not more. Multi-question lists overwhelm and produce shallow answers.
|
|
20
|
+
2. Questions are concrete and decision-grade — what is the input, what is the expected output, what command verifies success, what files are out of scope.
|
|
21
|
+
3. Do not ask design preferences the user clearly does not have. Infer the simplest reasonable default and confirm in one line.
|
|
22
|
+
4. Stop when the spec passes structural lint AND the user explicitly confirms or 8 turns have elapsed (whichever comes first). Eight turns is a hard ceiling — beyond that, the spec is either ready or the task is too large for ideate.
|
|
23
|
+
5. The output is the spec, not a transcript. Do not include the conversation in the saved files.
|
|
24
|
+
</elicitation_contract>
|
|
22
25
|
|
|
23
|
-
|
|
26
|
+
<harness_principles>
|
|
27
|
+
Read `_shared/runtime-principles.md` (Subtractive-first / Goal-locked / No-workaround / Evidence). The principles bind the spec content as well as your conversation. A spec that says "for future flexibility" is a Subtractive-first violation. A spec that asks for `try { ... } catch { return null }` is a No-workaround violation. AI flags these in elicitation, not after `/devlyn:resolve` has built them.
|
|
28
|
+
</harness_principles>
|
|
24
29
|
|
|
25
|
-
|
|
30
|
+
<engine_routing>
|
|
31
|
+
Default engine: Claude. The per-engine adapter from `_shared/adapters/<model>.md` is prepended to the elicitation prompt so the model honors its own official prompt-engineering guidance during the Q&A.
|
|
32
|
+
</engine_routing>
|
|
26
33
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
- `codex`: Codex handles FRAME/EXPLORE/CONVERGE/DOCUMENT, Claude runs CHALLENGE (role reversal — builder and critic are always different models).
|
|
30
|
-
- `claude`: all phases use Claude. No Codex calls.
|
|
34
|
+
<modes>
|
|
35
|
+
Four modes, selected by flag:
|
|
31
36
|
|
|
32
|
-
|
|
33
|
-
-
|
|
34
|
-
|
|
35
|
-
-
|
|
37
|
+
1. **Default** (no flag) — single-spec elicitation. AI asks questions in-conversation until lint passes. Output: `<spec-dir>/<id>-<slug>/spec.md` + `<spec-dir>/<id>-<slug>/spec.expected.json`. Default spec dir: `docs/specs/` (configurable via `--spec-dir <path>`).
|
|
38
|
+
2. **`--quick`** — one-line goal, AI synthesizes a spec with explicit assumptions block, asks the user to confirm or correct in a single turn. Use when the user wants speed over thoroughness.
|
|
39
|
+
3. **`--from-spec <path>`** — external spec exists. AI lints it for the canonical structure, normalizes section names, generates a missing `spec.expected.json` if absent, fixes minor schema issues, and stops. Does NOT reshape Requirements / Out-of-Scope content; structural changes only.
|
|
40
|
+
4. **`--project`** — multi-feature project. AI elicits a project description, decomposes it into 3-7 feature specs, writes `<spec-dir>/plan.md` (the index) and one `<spec-dir>/<id>/spec.md` + `<spec-dir>/<id>/spec.expected.json` per feature. See `references/project-mode.md`.
|
|
36
41
|
|
|
37
|
-
|
|
42
|
+
`--spec-dir <path>` overrides the default output directory. `--engine <model>` selects the adapter.
|
|
43
|
+
</modes>
|
|
38
44
|
|
|
39
|
-
|
|
45
|
+
<spec_kind_escape_hatch>
|
|
46
|
+
The spec carries `spec.kind ∈ {feature, spike, prototype}` in its frontmatter. The kind changes downstream behavior:
|
|
40
47
|
|
|
41
|
-
|
|
48
|
+
- **feature** — production-quality implementation expected. `/devlyn:resolve --spec` runs the full pipeline (PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY).
|
|
49
|
+
- **spike** — exploratory work; deliverable is learning, evidence, or a disposable demo. `/devlyn:resolve --spec` proceeds but VERIFY's quality bar is relaxed for code that the spike says is throwaway.
|
|
50
|
+
- **prototype** — between feature and spike. Production-shape but not production-grade. CLEANUP runs; VERIFY's quality bar is stricter than spike, looser than feature.
|
|
42
51
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
├── VISION.md # Layer 1: Strategic WHY (~50-100 lines)
|
|
46
|
-
│ # Orientation only. auto-resolve never reads this.
|
|
47
|
-
│
|
|
48
|
-
├── ROADMAP.md # Layer 2: Tactical index (what, in what order)
|
|
49
|
-
│ # Thin table linking to detail specs. auto-resolve never reads this.
|
|
50
|
-
│
|
|
51
|
-
└── roadmap/ # Layer 3: Auto-resolve-ready specs
|
|
52
|
-
├── phase-1/
|
|
53
|
-
│ ├── _overview.md # Phase-level context and goals
|
|
54
|
-
│ ├── 1.1-xxx.md # Self-contained spec → direct auto-resolve input
|
|
55
|
-
│ └── 1.2-yyy.md
|
|
56
|
-
├── phase-2/
|
|
57
|
-
│ └── ...
|
|
58
|
-
├── decisions/ # Architecture decision records (why we chose X over Y)
|
|
59
|
-
│ └── 001-xxx.md
|
|
60
|
-
└── backlog/ # Ideas acknowledged but not yet phased
|
|
61
|
-
└── ...
|
|
62
|
-
```
|
|
52
|
+
The user picks the kind during elicitation. Default = feature when not specified. `--quick` infers from the goal text (verbs like "explore", "investigate", "spike" → spike; "implement", "ship", "add" → feature).
|
|
53
|
+
</spec_kind_escape_hatch>
|
|
63
54
|
|
|
64
|
-
|
|
55
|
+
## PHASE 0: PARSE + ROUTE
|
|
65
56
|
|
|
66
|
-
|
|
57
|
+
1. Parse flags from `<elicit_config>`:
|
|
58
|
+
- `--quick`
|
|
59
|
+
- `--from-spec <path>`
|
|
60
|
+
- `--project`
|
|
61
|
+
- `--spec-dir <path>` (default `docs/specs/`)
|
|
62
|
+
- `--engine MODE` (default `claude`)
|
|
63
|
+
- `--spec-id <id>` — optional explicit id; auto-generated when absent.
|
|
67
64
|
|
|
68
|
-
|
|
65
|
+
2. Engine pre-flight: `_shared/engine-preflight.md`.
|
|
69
66
|
|
|
70
|
-
|
|
67
|
+
3. Mode dispatch:
|
|
68
|
+
- default → PHASE 1.
|
|
69
|
+
- `--quick` → PHASE 1Q (single turn assume-and-confirm).
|
|
70
|
+
- `--from-spec` → PHASE 1F (lint + normalize external).
|
|
71
|
+
- `--project` → PHASE 1P (project decomposition).
|
|
71
72
|
|
|
72
|
-
|
|
73
|
-
**Ask, don't assume.** When information is missing or ambiguous, ask targeted questions. Generating a spec with wrong assumptions is worse than asking one more question. The user wants accuracy (documents they can trust and hand to auto-resolve), not speed.
|
|
73
|
+
## PHASE 1: ELICITATION (default mode)
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
Prompt body: `references/elicitation.md`. Adapter prepended.
|
|
76
76
|
|
|
77
|
-
|
|
77
|
+
The elicitation agent:
|
|
78
|
+
1. Reads the user's initial goal from `<elicit_config>`.
|
|
79
|
+
2. Identifies the missing engineering decisions (input shape, output shape, success command, scope boundary, constraints).
|
|
80
|
+
3. Asks 1-2 focused questions per turn until each blank is filled or the user accepts an inferred default.
|
|
81
|
+
4. Maintains a running draft spec in `.devlyn/ideate-draft.md` (run-scoped, gitignored).
|
|
82
|
+
5. Stops when the structural lint passes AND user confirms, or 8 turns elapsed.
|
|
78
83
|
|
|
79
|
-
|
|
84
|
+
Structural lint (inline check, no script needed):
|
|
85
|
+
- Frontmatter has `id`, `title`, `kind`, `status: planned`.
|
|
86
|
+
- `## Context` non-empty (≥ 1 sentence).
|
|
87
|
+
- `## Requirements` has ≥ 1 `- [ ]` bullet.
|
|
88
|
+
- `## Out of Scope` present (may list "none" if truly nothing).
|
|
89
|
+
- `## Verification` has either ≥ 1 named command OR an explicit "all Requirements are pure-design" note.
|
|
80
90
|
|
|
81
|
-
|
|
91
|
+
After lint passes:
|
|
92
|
+
1. Write `<spec-dir>/<id>-<slug>/spec.md` (the spec).
|
|
93
|
+
2. Generate `<spec-dir>/<id>-<slug>/spec.expected.json` from the spec's `## Verification` block + any `forbidden_patterns` / `required_files` / `forbidden_files` / `max_deps_added` the conversation surfaced.
|
|
94
|
+
3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2, fix the carrier and re-run.
|
|
95
|
+
4. Print: `spec ready — /devlyn:resolve --spec <spec-path>`.
|
|
82
96
|
|
|
83
|
-
|
|
84
|
-
</conversation_rhythm>
|
|
97
|
+
## PHASE 1Q: QUICK MODE
|
|
85
98
|
|
|
86
|
-
|
|
99
|
+
Single-turn assume-and-confirm. Prompt body: see `references/elicitation.md` § "Quick mode".
|
|
87
100
|
|
|
88
|
-
|
|
101
|
+
1. AI synthesizes a spec from the one-line goal.
|
|
102
|
+
2. AI surfaces an explicit "Assumptions made" section listing every inferred decision.
|
|
103
|
+
3. User responds with "go" / "fix X" / "no, different".
|
|
104
|
+
4. On "go": write spec + spec.expected.json + lint + announce.
|
|
105
|
+
5. On "fix X": apply correction, re-show, ask again. Maximum 3 correction rounds before escalating to default mode.
|
|
89
106
|
|
|
90
|
-
|
|
91
|
-
|--------|------|----------|
|
|
92
|
-
| No existing docs, new project or idea | **Greenfield** | Full flow: Frame → Explore → Converge → Document |
|
|
93
|
-
| Existing docs, user adds new ideas | **Expand** | Lighter Frame, focused Explore on new area, merge into existing phases |
|
|
94
|
-
| Existing docs, user describes a single bug/improvement/idea | **Quick Add** | Read existing roadmap, create one item spec, add row to ROADMAP.md |
|
|
95
|
-
| One specific feature needs deep thought | **Deep-dive** | Intensive Explore on one topic, output 1-3 specs |
|
|
96
|
-
| User shares links/resources to process | **Research-first** | Lead with Explore (research synthesis), then standard flow |
|
|
97
|
-
| Existing roadmap, user wants to reprioritize | **Replan** | Read existing docs, focus on Converge, update documents |
|
|
107
|
+
## PHASE 1F: FROM-SPEC MODE
|
|
98
108
|
|
|
99
|
-
|
|
109
|
+
Prompt body: `references/from-spec-mode.md`.
|
|
100
110
|
|
|
101
|
-
|
|
111
|
+
1. Read the external spec at `<path>`.
|
|
112
|
+
2. Lint structure (same checks as default mode).
|
|
113
|
+
3. Identify missing pieces (no frontmatter, missing sections, malformed Verification block).
|
|
114
|
+
4. Apply structural fixes only — do NOT reshape Requirements / Out-of-Scope content. The user's substantive intent is preserved.
|
|
115
|
+
5. Generate `spec.expected.json` if absent (best-effort from `## Verification` block).
|
|
116
|
+
6. Write the normalized spec back to `<spec-dir>/<id>-<slug>/` (preserves original at `<path>` untouched unless user passes `--in-place`).
|
|
117
|
+
7. Lint pass → announce. Lint fail → surface the unfixable issue and exit non-zero.
|
|
102
118
|
|
|
103
|
-
|
|
119
|
+
## PHASE 1P: PROJECT MODE
|
|
104
120
|
|
|
105
|
-
|
|
121
|
+
Prompt body: `references/project-mode.md`.
|
|
106
122
|
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
123
|
+
1. AI elicits a project description (longer Q&A — multi-feature scope warrants more turns).
|
|
124
|
+
2. AI decomposes the project into 3-7 feature specs. Each feature is independently shippable; cross-feature dependencies surface explicitly in the spec frontmatter `depends_on:` field.
|
|
125
|
+
3. AI writes `<spec-dir>/plan.md` — index file with: project name, decomposition rationale, list of feature specs with id + title + dependency, suggested implementation order.
|
|
126
|
+
4. AI writes one `<spec-dir>/<id>/spec.md` + `<spec-dir>/<id>/spec.expected.json` per feature, each lint-validated.
|
|
127
|
+
5. Announce: `project ready — N specs at <spec-dir>/. Start with /devlyn:resolve --spec <first-spec-path>`.
|
|
112
128
|
|
|
113
|
-
|
|
114
|
-
- FRAME is lighter — the vision already exists, focus on framing the NEW area only
|
|
115
|
-
- EXPLORE focuses specifically on the new capability and how it integrates with existing features
|
|
116
|
-
- CONVERGE must consider dependencies on existing items, not just new ones
|
|
129
|
+
`/devlyn:resolve` consumes one spec at a time; the user works through `plan.md`'s suggested order. Multi-feature parallel runs are Mission 2 work.
|
|
117
130
|
|
|
118
|
-
|
|
119
|
-
- Don't overwrite existing VISION.md unless the user explicitly wants to update it
|
|
120
|
-
- Continue numbering from existing IDs (if Phase 2 exists with 2.1-2.4, new items start at 2.5 or create Phase 3)
|
|
121
|
-
- Add new rows to ROADMAP.md, don't regenerate the whole table
|
|
122
|
-
- New item specs can reference existing items in their Dependencies section
|
|
123
|
-
- If new items change the meaning of existing items, flag this: "Adding [X] may affect the scope of existing item [Y]. Should we update [Y]'s spec?"
|
|
131
|
+
## State management
|
|
124
132
|
|
|
125
|
-
|
|
133
|
+
ideate is conversational, not pipeline-staged. State lives in:
|
|
134
|
+
- `.devlyn/ideate-draft.md` — current draft spec during elicitation (run-scoped, gitignored).
|
|
135
|
+
- `<spec-dir>/<id>-<slug>/` — final output (committed to repo by user choice).
|
|
126
136
|
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
Quick Add is for when the user has a single concrete idea, bug report, or improvement — they don't need a full ideation session, just a new entry in the roadmap. This is the most common trigger for misuse: the request looks like a simple fix, so the temptation is to implement it. Don't. Capture it.
|
|
130
|
-
|
|
131
|
-
**On entry:**
|
|
132
|
-
1. Read `docs/ROADMAP.md` and relevant phase `_overview.md` files
|
|
133
|
-
2. **Run the Archive Pass first** (see Context Archiving below). Do this *before* you figure out where the new item goes — a stale roadmap will mislead phase selection and ID numbering. If the pass moves a phase out of the active section, the new item's natural home may change.
|
|
134
|
-
3. Identify the best-fit phase for the new item (or suggest a new phase if it doesn't fit)
|
|
135
|
-
4. Determine the next available item ID (e.g., if phase 2 has 2.1-2.4, the new item is 2.5)
|
|
136
|
-
|
|
137
|
-
**Workflow (minimal — no full Frame/Explore/Converge):**
|
|
138
|
-
1. Confirm the idea with the user: "I'll add this as [item title] in Phase [N]. That sound right?"
|
|
139
|
-
2. Ask 1-2 clarifying questions if the requirement is unclear (skip if the user gave enough detail)
|
|
140
|
-
3. Generate the item spec following `references/templates/item-spec.md`
|
|
141
|
-
4. Add a row to `docs/ROADMAP.md`
|
|
142
|
-
5. Output confirmation: the file path and a suggested auto-resolve command
|
|
143
|
-
|
|
144
|
-
**Example output:**
|
|
145
|
-
```
|
|
146
|
-
Added: docs/roadmap/phase-2/2.5-back-to-review-button.md
|
|
147
|
-
|
|
148
|
-
To implement:
|
|
149
|
-
/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-2/2.5-back-to-review-button.md"
|
|
150
|
-
```
|
|
151
|
-
|
|
152
|
-
### Context Archiving
|
|
153
|
-
|
|
154
|
-
ROADMAP.md is the tactical index. Done work should move to a collapsed `## Completed` block at the bottom, not clutter the active view. Item spec files stay on disk at `docs/roadmap/phase-N/{id}.md` — only the index row moves.
|
|
155
|
-
|
|
156
|
-
#### The Archive Pass (conditional)
|
|
157
|
-
|
|
158
|
-
Run this at the start of Quick Add / Expand / Replan **only when** `docs/ROADMAP.md` contains at least one phase where every row is `Done`. A quick scan tells you within seconds. Skip the pass otherwise — running it on a roadmap with no fully-done phases is no-op bookkeeping that burns the user's turn.
|
|
159
|
-
|
|
160
|
-
When it runs:
|
|
161
|
-
|
|
162
|
-
1. Read `docs/ROADMAP.md`.
|
|
163
|
-
2. For each phase where every row is `Done`: cut the `## Phase N: …` heading and table, move it into a new or existing `## Completed` block at the bottom as a `<details>` entry (see format below). Use the latest completion date found in item spec frontmatter (`completed:`), or today's if absent. Item count is the row count.
|
|
164
|
-
3. Individual `Done` rows inside an otherwise-active phase stay put — mixed phases show recent wins alongside open work.
|
|
165
|
-
4. Scan the Backlog table; surface any row whose `Revisit` date has passed as a replan candidate (don't auto-promote — that's a conversation).
|
|
166
|
-
5. Scan `docs/roadmap/decisions/` for `accepted` decisions whose reasoning is visibly contradicted by newly-Done work; raise them as open questions rather than silently editing.
|
|
167
|
-
6. One-sentence report of what was archived, then proceed with the mode's main work. Skip the report if nothing changed.
|
|
168
|
-
|
|
169
|
-
**Completed block format** (place at the bottom of ROADMAP.md, below Decisions):
|
|
170
|
-
|
|
171
|
-
```markdown
|
|
172
|
-
## Completed
|
|
173
|
-
<details>
|
|
174
|
-
<summary>Phase 1: Foundation (completed 2026-04-15, 4 items)</summary>
|
|
175
|
-
|
|
176
|
-
| # | Feature | Completed |
|
|
177
|
-
|---|---------|-----------|
|
|
178
|
-
| 1.1 | Auth & Onboarding | 2026-02-10 |
|
|
179
|
-
| 1.2 | Order Management | 2026-03-05 |
|
|
180
|
-
| 1.3 | Inventory Tracking | 2026-03-28 |
|
|
181
|
-
| 1.4 | Customer Directory | 2026-04-15 |
|
|
182
|
-
</details>
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
If the `## Completed` section already exists and you're archiving an additional phase, append a new `<details>` block — don't rewrite existing ones.
|
|
186
|
-
|
|
187
|
-
#### Outdated decisions
|
|
188
|
-
|
|
189
|
-
When a decision becomes wrong because the world changed under it:
|
|
190
|
-
- Don't delete it — set its `status:` to `superseded` in the decision file's frontmatter and add a one-line pointer to the replacement decision record.
|
|
191
|
-
- This preserves the reasoning history for future reference, which matters more than a tidy decisions table.
|
|
192
|
-
|
|
193
|
-
## Phase 1: FRAME
|
|
194
|
-
|
|
195
|
-
<phase_goal>Establish problem space boundaries before exploring solutions.</phase_goal>
|
|
196
|
-
|
|
197
|
-
The biggest risk in ideation is premature convergence — jumping to solutions before understanding the problem. This phase prevents that.
|
|
198
|
-
|
|
199
|
-
Establish through conversation:
|
|
200
|
-
1. **Job-to-be-Done**: In one sentence — "When [situation], [user] wants to [motivation], so they can [outcome]." Capture this before anything else. If the user cannot produce it, that is itself the finding — pause and explore the situation until the sentence exists. A bare problem statement without this frame is a state description, not a job, and downstream specs built from it will describe system behavior instead of customer progress.
|
|
201
|
-
2. **Constraints**: What can't change? (tech stack, timeline, existing commitments)
|
|
202
|
-
3. **Success criteria**: How will we know this worked? (outcomes, not outputs)
|
|
203
|
-
4. **Anti-goals**: What are we explicitly NOT trying to do?
|
|
204
|
-
|
|
205
|
-
Adapt to what the user has already shared — if they came in with a clear vision, this might be a quick confirmation. If the idea is fuzzy, spend more time here. Ask conversationally, not as a rigid questionnaire.
|
|
206
|
-
|
|
207
|
-
Don't write documents yet. The output of this phase is a shared mental model between you and the user.
|
|
208
|
-
|
|
209
|
-
## Phase 2: EXPLORE
|
|
210
|
-
|
|
211
|
-
<phase_goal>Systematically expand the possibility space before narrowing it.</phase_goal>
|
|
212
|
-
|
|
213
|
-
This is the creative core — the phase that should take the most conversational turns. The user chose to ideate with AI because they want perspectives, research, and creative expansion they wouldn't get alone.
|
|
214
|
-
|
|
215
|
-
<use_parallel_tool_calls>
|
|
216
|
-
EXPLORE often needs several independent lookups: web search for prior art, doc fetches, repo greps for existing patterns. When tool calls have no dependencies on each other, issue them in parallel in the same response. Spawn subagents in parallel when fanning out across distinct research topics. Only chain calls that depend on a previous call's output. Pace research across turns rather than front-loading every lookup before the user has framed direction — EXPLORE is dialogue-driven, parallel is just for the lookups inside any single turn.
|
|
217
|
-
</use_parallel_tool_calls>
|
|
218
|
-
|
|
219
|
-
<research_protocol>
|
|
220
|
-
When relevant, actively research before and during brainstorming:
|
|
221
|
-
- **Existing solutions**: What's already out there? (web search, documentation)
|
|
222
|
-
- **Technical feasibility**: Can this be built within the constraints? Where are the hard parts?
|
|
223
|
-
- **Patterns and prior art**: How have similar problems been solved?
|
|
224
|
-
- **Market/user context**: Who else needs this? What do they currently use?
|
|
225
|
-
- **Evidence discipline**: Treat prior art as source-backed only when verified by a fetched link or documentation the user can open. If a pattern is inferred from memory or analogy, label it `[UNVERIFIED]` inline and do not present it as market fact. The CHALLENGE rubric's NO GUESSWORK axis fires hard on unlabeled claims that look authoritative but are actually recall.
|
|
226
|
-
|
|
227
|
-
Not every ideation needs all of these — a personal side project doesn't need market research. Judge what's relevant and use subagents for parallel research when multiple topics need investigation.
|
|
228
|
-
</research_protocol>
|
|
229
|
-
|
|
230
|
-
<multi_perspective>
|
|
231
|
-
For each major idea, consider it from at least three angles:
|
|
232
|
-
- **User**: Is this actually useful? Does it solve a real pain?
|
|
233
|
-
- **Technical**: Is this buildable? Where are the complexity hotspots?
|
|
234
|
-
- **Strategic**: Does this align with the vision? Does it create leverage for future work?
|
|
235
|
-
|
|
236
|
-
Add perspectives as relevant:
|
|
237
|
-
- **Risk**: What could go wrong? What are the dependencies?
|
|
238
|
-
- **Business**: Does this create value? Is the effort justified?
|
|
239
|
-
- **Accessibility**: Is this inclusive? Who gets left out?
|
|
240
|
-
</multi_perspective>
|
|
241
|
-
|
|
242
|
-
<creative_expansion>
|
|
243
|
-
When the conversation needs energy or the user feels stuck:
|
|
244
|
-
- **"What if..."** — Remove a constraint and see what emerges
|
|
245
|
-
- **Analogy transfer** — "How does [adjacent domain] solve this?"
|
|
246
|
-
- **Inversion** — "What's the worst version? Now invert it."
|
|
247
|
-
- **10x thinking** — "If this needed 10x users, what changes?"
|
|
248
|
-
- **Minimum viable magic** — "What's the smallest thing that would feel magical?"
|
|
249
|
-
|
|
250
|
-
Use these naturally in conversation, not as a mechanical checklist.
|
|
251
|
-
</creative_expansion>
|
|
252
|
-
|
|
253
|
-
As ideas accumulate, periodically synthesize:
|
|
254
|
-
```
|
|
255
|
-
Here's where we are:
|
|
256
|
-
- Core ideas: [list]
|
|
257
|
-
- Open questions: [list]
|
|
258
|
-
- Tensions to resolve: [list]
|
|
259
|
-
- Research still needed: [list]
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
This prevents circular conversations and gives the user a clear sense of progress.
|
|
263
|
-
|
|
264
|
-
## Phase 3: CONVERGE
|
|
265
|
-
|
|
266
|
-
<phase_goal>Transform exploration into decisions.</phase_goal>
|
|
267
|
-
|
|
268
|
-
When the user signals readiness or exploration winds down naturally, shift to convergence.
|
|
269
|
-
|
|
270
|
-
### Theme Clustering
|
|
271
|
-
Group related ideas into coherent themes:
|
|
272
|
-
```
|
|
273
|
-
Theme A: [name]
|
|
274
|
-
- Ideas: 1, 3, 7
|
|
275
|
-
- Value: [why this matters]
|
|
276
|
-
- Risk: [what could go wrong]
|
|
277
|
-
```
|
|
278
|
-
|
|
279
|
-
### Prioritization
|
|
280
|
-
Use value x feasibility as the primary framework:
|
|
281
|
-
- **High value + High feasibility** → Phase 1 (build first)
|
|
282
|
-
- **High value + Low feasibility** → Phase 2+ (build after foundation exists)
|
|
283
|
-
- **Low value + High feasibility** → Backlog (if time permits)
|
|
284
|
-
- **Low value + Low feasibility** → Cut
|
|
285
|
-
|
|
286
|
-
Present as a recommendation — the user makes the final call on ordering.
|
|
287
|
-
|
|
288
|
-
### Sequencing
|
|
289
|
-
Within each phase:
|
|
290
|
-
- **Dependencies**: What must exist before what?
|
|
291
|
-
- **Risk ordering**: Build uncertain things first (fail fast)
|
|
292
|
-
- **Value delivery**: Each phase should deliver usable value, not just infrastructure
|
|
293
|
-
|
|
294
|
-
### Architecture Decisions
|
|
295
|
-
Surface decisions that affect multiple items — technology choices, data model, integration approaches, UX patterns. For each: **What** was decided, **Why** (tradeoffs), and **What alternatives** were considered. These become decision records.
|
|
296
|
-
|
|
297
|
-
### Internal draft — do not show the user yet
|
|
298
|
-
|
|
299
|
-
At this point you have an internal convergence draft: themes, phases, items, decisions. **Do not present it to the user yet.** Phase 3.5 CHALLENGE runs next, and the user will see exactly one summary — the post-challenge plan, with visibility into what CHALLENGE changed. Showing the pre-challenge draft first and then changing it after challenge creates a two-round confirmation loop that burns the user's trust.
|
|
300
|
-
|
|
301
|
-
## Phase 3.5: CHALLENGE
|
|
302
|
-
|
|
303
|
-
<phase_goal>Apply a strict 5-axis rubric to the internal convergence draft, then present one post-challenge summary to the user for confirmation. Always runs.</phase_goal>
|
|
304
|
-
|
|
305
|
-
<thinking_effort>
|
|
306
|
-
Engage maximum thinking effort here — both the solo rubric pass and, if enabled, the Codex pass. Use extended thinking ("ultrathink") when reading each item, applying each axis, and producing revisions. The default Claude failure mode in self-review is nodding along to the draft you just produced; shallow thinking here is the exact pattern this phase exists to prevent.
|
|
307
|
-
|
|
308
|
-
Before finalizing the rubric pass, verify your findings against the rubric one more time: every flagged item should have a specific Quote, a failing axis, and a concrete revision — not a vague concern.
|
|
309
|
-
</thinking_effort>
|
|
310
|
-
|
|
311
|
-
### The rubric — single source of truth
|
|
312
|
-
|
|
313
|
-
Read `references/challenge-rubric.md` before starting. That file is the only definition of the 5 axes, the finding format, the hard rule about respecting explicit user intent, and the good-vs-bad examples. Both the solo pass and the Codex pass use the same rubric; do not re-derive it inline.
|
|
314
|
-
|
|
315
|
-
### Solo pass (always runs)
|
|
316
|
-
|
|
317
|
-
Apply the rubric to the internal convergence draft. Produce findings in the format specified in `challenge-rubric.md` (Severity / Quote / Axis / Why / Fix).
|
|
318
|
-
|
|
319
|
-
For Quick Add with one new item, one solo pass is enough. For a full greenfield or expand plan, run the rubric once, revise, and run it again on the revision. If a third pass would be needed, the plan has structural problems that belong in the user-facing summary as open questions — surface them rather than iterating further.
|
|
320
|
-
|
|
321
|
-
### Codex critic pass (engine-routed)
|
|
322
|
-
|
|
323
|
-
**If `--engine auto`** (default): Codex runs the CHALLENGE rubric pass automatically as critic.
|
|
324
|
-
|
|
325
|
-
Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, `workingDirectory: <project root>`. The `prompt` parameter is built from the packaged plan + the inlined rubric + the appended Codex instructions. Codex has no filesystem access to this project, so everything it needs travels in the prompt.
|
|
326
|
-
|
|
327
|
-
**Step 1 — Package the post-solo plan.** Build the prompt per `references/codex-critic-template.md` (section order, rubric inlining, Codex-specific instructions all live there verbatim — follow the template structure, fill in the plan/findings sections).
|
|
328
|
-
|
|
329
|
-
**Step 2 — Reconcile.** Merge the two finding lists:
|
|
330
|
-
- Same finding from both → keep the more specific wording, mark "confirmed by both"
|
|
331
|
-
- Codex-only → prefix `[codex]` in internal notes so the user-facing summary can attribute correctly
|
|
332
|
-
- Solo-only → keep as-is
|
|
333
|
-
- Conflicts (solo says X, Codex says not-X) → record both, do not silently pick one; if material, surface as an open question in the user-facing summary
|
|
334
|
-
|
|
335
|
-
If Codex raised CRITICAL or HIGH findings the solo pass missed, apply the fixes to the plan before presenting the user-facing summary — unless fixing would change something the user explicitly confirmed, in which case follow the rubric's "Respect explicit user intent" rule.
|
|
336
|
-
|
|
337
|
-
**Do not loop.** One Codex pass is enough. If the result is still FAIL after reconciliation, the plan has structural problems that belong in the user-facing summary as open questions rather than further iteration.
|
|
338
|
-
|
|
339
|
-
**If `--engine codex`**: Role reversal — Codex built the plan, so Claude runs the solo CHALLENGE pass and that is the only pass. Do not also run Codex on CHALLENGE — builder and critic should always be different models. Skip this section.
|
|
340
|
-
|
|
341
|
-
**If `--engine claude`**: No Codex calls. The solo pass is the only pass.
|
|
342
|
-
|
|
343
|
-
### Respect explicit user intent
|
|
344
|
-
|
|
345
|
-
The rubric is a quality lens, not an override. If a finding conflicts with something the user explicitly and clearly asked for, follow the "Hard rule" section in `challenge-rubric.md`: record the finding, **do not silently rewrite the plan**, and surface it as an open question in the summary below. The user makes the call.
|
|
346
|
-
|
|
347
|
-
### User-facing summary (the first and only time the user sees the plan)
|
|
348
|
-
|
|
349
|
-
After the rubric pass(es), present the post-challenge plan to the user for confirmation. This is the first time the user sees the converged plan — by design, so they see a rubric-checked result rather than a draft that immediately gets revised.
|
|
350
|
-
|
|
351
|
-
Format:
|
|
352
|
-
```
|
|
353
|
-
Vision: [one sentence]
|
|
354
|
-
Phases: [N] phases, [M] total items
|
|
355
|
-
Phase 1 ([theme]): [items with brief descriptions]
|
|
356
|
-
Phase 2 ([theme]): [items]
|
|
357
|
-
Key decisions: [list]
|
|
358
|
-
Deferred: [items with reasons]
|
|
359
|
-
|
|
360
|
-
## CHALLENGE results
|
|
361
|
-
|
|
362
|
-
Solo pass: [N findings, M applied]
|
|
363
|
-
Codex pass: [N findings, M applied] ← only on --engine auto
|
|
364
|
-
|
|
365
|
-
Changes applied during CHALLENGE:
|
|
366
|
-
- [item]: [what changed and which axis triggered it]
|
|
367
|
-
|
|
368
|
-
Open questions for you (rubric flagged something you explicitly asked for):
|
|
369
|
-
- [item]: rubric says [finding]; you asked for [original]; here is the tradeoff — proceed as-is, or adopt the alternative?
|
|
370
|
-
```
|
|
371
|
-
|
|
372
|
-
Get explicit confirmation before proceeding to DOCUMENT.
|
|
373
|
-
|
|
374
|
-
### Quick Add mode
|
|
375
|
-
|
|
376
|
-
For single-item additions, run one solo rubric pass on just the new item. Even then do not skip — single-item additions are exactly where overengineering and workarounds slip in unnoticed, because the lack of surrounding context makes a bad item look self-contained and harmless.
|
|
377
|
-
|
|
378
|
-
## Engine Routing for FRAME / EXPLORE / CONVERGE / DOCUMENT
|
|
379
|
-
|
|
380
|
-
**If `--engine codex`**: Phases 1-3 and Phase 4 are delegated to Codex. For each phase, call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "workspace-write"`, and the phase instructions + user context as the prompt. Use `sessionId` to maintain conversational context across phases (note: sandbox/fullAuto only apply on the first call). Claude remains the orchestrator — it reads Codex's output, manages the conversation with the user (confirmation prompts, clarifying questions), and routes findings between phases.
|
|
381
|
-
|
|
382
|
-
**If `--engine auto` or `--engine claude`**: All planning phases use Claude directly (current behavior). Claude's ambiguous intent handling and writing quality benchmarks favor it for planning tasks.
|
|
383
|
-
|
|
384
|
-
## Phase 4: DOCUMENT
|
|
385
|
-
|
|
386
|
-
<phase_goal>Generate the three-layer document set.</phase_goal>
|
|
387
|
-
|
|
388
|
-
Read the templates before generating:
|
|
389
|
-
- `references/templates/vision.md` — VISION.md format
|
|
390
|
-
- `references/templates/roadmap.md` — ROADMAP.md index format
|
|
391
|
-
- `references/templates/item-spec.md` — Auto-resolve-ready spec format
|
|
392
|
-
- `references/templates/decision.md` — Architecture decision record format
|
|
393
|
-
|
|
394
|
-
### Generation Order
|
|
395
|
-
1. `docs/VISION.md` — from Phase 1 framing + Phase 3 decisions
|
|
396
|
-
2. `docs/roadmap/decisions/` — one file per architecture decision
|
|
397
|
-
3. `docs/roadmap/phase-N/_overview.md` — phase-level context
|
|
398
|
-
4. `docs/roadmap/phase-N/{id}-{name}.md` — one per roadmap item
|
|
399
|
-
5. `docs/ROADMAP.md` — index linking to everything above
|
|
400
|
-
|
|
401
|
-
### Item Spec Quality
|
|
402
|
-
|
|
403
|
-
Each Layer 3 spec is the direct input to auto-resolve. Its quality determines implementation quality.
|
|
404
|
-
|
|
405
|
-
<spec_quality_criteria>
|
|
406
|
-
**Requirements section** — becomes auto-resolve's done-criteria:
|
|
407
|
-
- Testable: a test can assert it OR a human can verify in under 30 seconds
|
|
408
|
-
- Specific: not "handles errors well" but "returns 400 with `{error: 'missing_field', field: 'email'}`"
|
|
409
|
-
- Scoped: tied to this item only, not aspirational
|
|
410
|
-
|
|
411
|
-
**Context section** — 2-3 sentences maximum. Just enough for auto-resolve to understand WHY without loading the full vision.
|
|
412
|
-
|
|
413
|
-
**Out of Scope** — explicitly states what this item does NOT do. This is what prevents auto-resolve from over-building, which is one of its most common failure modes.
|
|
414
|
-
|
|
415
|
-
**Constraints** — technical constraints with reasoning. Auto-resolve respects constraints significantly better when it understands the motivation behind them.
|
|
416
|
-
</spec_quality_criteria>
|
|
417
|
-
|
|
418
|
-
If an item is too vague to write specific requirements, it needs more exploration (revisit Phase 2 for that item) or should be split into smaller items.
|
|
419
|
-
|
|
420
|
-
### Handling Existing Documents
|
|
421
|
-
In **Expand** and **Replan** modes:
|
|
422
|
-
- Read existing documents first
|
|
423
|
-
- Merge new items into the existing phase structure
|
|
424
|
-
- Preserve existing items (don't overwrite or reorder without confirmation)
|
|
425
|
-
- Update ROADMAP.md index to include new entries
|
|
426
|
-
|
|
427
|
-
### Output Summary
|
|
428
|
-
After generating all documents:
|
|
429
|
-
```
|
|
430
|
-
Documents created:
|
|
431
|
-
- docs/VISION.md
|
|
432
|
-
- docs/ROADMAP.md
|
|
433
|
-
- docs/roadmap/phase-1/_overview.md
|
|
434
|
-
- docs/roadmap/phase-1/1.1-xxx.md
|
|
435
|
-
- docs/roadmap/phase-1/1.2-yyy.md
|
|
436
|
-
- docs/roadmap/decisions/001-xxx.md
|
|
437
|
-
[total: N files]
|
|
438
|
-
```
|
|
439
|
-
|
|
440
|
-
## Phase 5: BRIDGE
|
|
441
|
-
|
|
442
|
-
<phase_goal>Connect documents to the implementation pipeline.</phase_goal>
|
|
443
|
-
|
|
444
|
-
After document generation, output the implementation guide:
|
|
445
|
-
|
|
446
|
-
```
|
|
447
|
-
## Implementation
|
|
448
|
-
|
|
449
|
-
To implement each item:
|
|
450
|
-
/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-1/1.1-xxx.md — read the spec file for requirements, constraints, and scope boundaries"
|
|
451
|
-
|
|
452
|
-
Recommended order (respecting dependencies):
|
|
453
|
-
1. 1.1 [name] — no dependencies
|
|
454
|
-
2. 1.2 [name] — depends on 1.1
|
|
455
|
-
3. 1.3 [name] — depends on 1.1
|
|
456
|
-
...
|
|
457
|
-
|
|
458
|
-
After completing each item:
|
|
459
|
-
1. Update status in the item spec frontmatter (status: done)
|
|
460
|
-
2. Update ROADMAP.md status column
|
|
461
|
-
```
|
|
462
|
-
|
|
463
|
-
The auto-resolve prompt explicitly tells the build agent to read the spec file — this ensures done-criteria are adopted from the spec rather than generated from scratch, preserving the ideation context through to implementation.
|
|
464
|
-
|
|
465
|
-
## Language
|
|
466
|
-
|
|
467
|
-
Generate all documents in the language the user communicates in. If the user mixes languages, match their primary language for prose and keep technical terms in English.
|
|
137
|
+
No `pipeline.state.json` here — that's resolve's surface.
|