devlyn-cli 2.2.2 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (220) hide show
  1. package/AGENTS.md +2 -2
  2. package/CLAUDE.md +4 -4
  3. package/README.md +85 -34
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +61 -44
  5. package/benchmark/auto-resolve/BENCHMARK-RESULTS.md +341 -0
  6. package/benchmark/auto-resolve/README.md +307 -44
  7. package/benchmark/auto-resolve/RUBRIC.md +23 -14
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +7 -3
  9. package/benchmark/auto-resolve/fixtures/F10-persist-write-collision/NOTES.md +8 -3
  10. package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/NOTES.md +8 -3
  11. package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/NOTES.md +10 -4
  12. package/benchmark/auto-resolve/fixtures/F15-frozen-diff-race-review/NOTES.md +10 -4
  13. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/NOTES.md +12 -0
  14. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/spec.md +6 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +7 -4
  16. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/NOTES.md +12 -0
  17. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/spec.md +6 -0
  18. package/benchmark/auto-resolve/fixtures/F22-cli-ledger-close/NOTES.md +8 -0
  19. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/NOTES.md +12 -0
  20. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/spec.md +6 -0
  21. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/NOTES.md +16 -4
  22. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/spec.md +7 -0
  23. package/benchmark/auto-resolve/fixtures/F26-cli-payout-ledger-rules/NOTES.md +11 -5
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +8 -1
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +4 -2
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +1 -1
  27. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/NOTES.md +34 -0
  28. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/expected.json +57 -0
  29. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/metadata.json +10 -0
  30. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/setup.sh +2 -0
  31. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/spec.md +67 -0
  32. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/task.txt +7 -0
  33. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/duplicate-event-error.js +35 -0
  34. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/priority-transfer-rollback.js +53 -0
  35. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/expected.json +57 -0
  37. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/setup.sh +2 -0
  39. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/spec.md +70 -0
  40. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/task.txt +3 -0
  41. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/duplicate-renewal-error.js +42 -0
  42. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/priority-credit-rollback.js +70 -0
  43. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +10 -3
  44. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +7 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +5 -0
  46. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +7 -0
  47. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +3 -0
  48. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +1 -1
  49. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +15 -3
  50. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +1 -1
  51. package/benchmark/auto-resolve/fixtures/SCHEMA.md +53 -7
  52. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/NOTES.md +37 -0
  53. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/RETIRED.md +13 -0
  54. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/expected.json +56 -0
  55. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/setup.sh +18 -0
  57. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/spec.md +69 -0
  58. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/task.txt +7 -0
  59. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/exact-proration.js +48 -0
  60. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/rules-source-and-conflict.js +79 -0
  61. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/NOTES.md +54 -0
  62. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/RETIRED.md +7 -0
  63. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/expected.json +67 -0
  64. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/metadata.json +10 -0
  65. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/setup.sh +2 -0
  66. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/spec.md +67 -0
  67. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/task.txt +5 -0
  68. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/policy-precedence.js +72 -0
  69. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-and-immutability.js +43 -0
  70. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-boundary.js +116 -0
  71. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/NOTES.md +35 -0
  72. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/RETIRED.md +12 -0
  73. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/expected.json +58 -0
  74. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/metadata.json +10 -0
  75. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/setup.sh +2 -0
  76. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/spec.md +73 -0
  77. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/task.txt +17 -0
  78. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/mixed-idempotent-settlement.js +53 -0
  79. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/rejection-boundaries.js +74 -0
  80. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/NOTES.md +60 -0
  81. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/RETIRED.md +29 -0
  82. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/expected.json +73 -0
  83. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/metadata.json +10 -0
  84. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/setup.sh +28 -0
  85. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/spec.md +58 -0
  86. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/task.txt +5 -0
  87. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.json +82 -0
  88. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.md +18 -0
  89. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.json +46 -0
  90. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.md +17 -0
  91. package/benchmark/auto-resolve/run-real-benchmark.md +303 -0
  92. package/benchmark/auto-resolve/scripts/audit-headroom-rejections.py +441 -0
  93. package/benchmark/auto-resolve/scripts/audit-pair-evidence.py +1256 -0
  94. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +147 -15
  95. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +28 -16
  96. package/benchmark/auto-resolve/scripts/collect-swebench-predictions.py +11 -1
  97. package/benchmark/auto-resolve/scripts/compile-report.py +208 -46
  98. package/benchmark/auto-resolve/scripts/fetch-swebench-instances.py +22 -4
  99. package/benchmark/auto-resolve/scripts/frozen-verify-gate.py +175 -30
  100. package/benchmark/auto-resolve/scripts/full-pipeline-pair-gate.py +408 -46
  101. package/benchmark/auto-resolve/scripts/headroom-gate.py +270 -39
  102. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +164 -33
  103. package/benchmark/auto-resolve/scripts/iter-0033c-l1-summary.py +97 -0
  104. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +150 -38
  105. package/benchmark/auto-resolve/scripts/judge.sh +153 -26
  106. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +12 -5
  107. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +25 -2
  108. package/benchmark/auto-resolve/scripts/pair-candidate-frontier.py +469 -0
  109. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +5 -5
  110. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +9 -2
  111. package/benchmark/auto-resolve/scripts/pair-rejected-fixtures.sh +91 -0
  112. package/benchmark/auto-resolve/scripts/pair_evidence_contract.py +269 -0
  113. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-case.py +39 -10
  114. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-corpus.py +34 -4
  115. package/benchmark/auto-resolve/scripts/prepare-swebench-solver-worktree.py +23 -5
  116. package/benchmark/auto-resolve/scripts/recent-benchmark-summary.py +232 -0
  117. package/benchmark/auto-resolve/scripts/run-fixture.sh +118 -51
  118. package/benchmark/auto-resolve/scripts/run-frozen-verify-pair.sh +211 -39
  119. package/benchmark/auto-resolve/scripts/run-full-pipeline-pair-candidate.sh +335 -39
  120. package/benchmark/auto-resolve/scripts/run-headroom-candidate.sh +249 -6
  121. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +22 -48
  122. package/benchmark/auto-resolve/scripts/run-suite.sh +44 -7
  123. package/benchmark/auto-resolve/scripts/run-swebench-frozen-corpus.sh +120 -19
  124. package/benchmark/auto-resolve/scripts/run-swebench-solver-batch.sh +32 -14
  125. package/benchmark/auto-resolve/scripts/ship-gate.py +219 -50
  126. package/benchmark/auto-resolve/scripts/solo-ceiling-avoidance.py +53 -0
  127. package/benchmark/auto-resolve/scripts/solo-headroom-hypothesis.py +77 -0
  128. package/benchmark/auto-resolve/scripts/swebench-frozen-matrix.py +239 -26
  129. package/benchmark/auto-resolve/scripts/test-audit-headroom-rejections.sh +288 -0
  130. package/benchmark/auto-resolve/scripts/test-audit-pair-evidence.sh +1672 -0
  131. package/benchmark/auto-resolve/scripts/test-benchmark-arg-parsing.sh +933 -0
  132. package/benchmark/auto-resolve/scripts/test-build-pair-eligible-manifest.sh +491 -0
  133. package/benchmark/auto-resolve/scripts/test-check-f9-artifacts.sh +91 -0
  134. package/benchmark/auto-resolve/scripts/test-frozen-verify-gate.sh +328 -3
  135. package/benchmark/auto-resolve/scripts/test-full-pipeline-pair-gate.sh +497 -18
  136. package/benchmark/auto-resolve/scripts/test-headroom-gate.sh +331 -14
  137. package/benchmark/auto-resolve/scripts/test-iter-0033c-compare.sh +525 -0
  138. package/benchmark/auto-resolve/scripts/test-iter-0033c-l1-summary.sh +254 -0
  139. package/benchmark/auto-resolve/scripts/test-lint-fixtures.sh +580 -0
  140. package/benchmark/auto-resolve/scripts/test-pair-candidate-frontier.sh +591 -0
  141. package/benchmark/auto-resolve/scripts/test-run-full-pipeline-pair-candidate.sh +497 -0
  142. package/benchmark/auto-resolve/scripts/test-run-headroom-candidate.sh +401 -0
  143. package/benchmark/auto-resolve/scripts/test-run-swebench-solver-batch.sh +111 -0
  144. package/benchmark/auto-resolve/scripts/test-ship-gate.sh +1189 -0
  145. package/benchmark/auto-resolve/scripts/test-swebench-frozen-case.sh +924 -5
  146. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/NOTES.md +28 -0
  147. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/expected.json +63 -0
  148. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/metadata.json +10 -0
  149. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/setup.sh +3 -0
  150. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/spec.md +47 -0
  151. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/task.txt +1 -0
  152. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/NOTES.md +34 -0
  153. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/expected.json +53 -0
  154. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/metadata.json +10 -0
  155. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/setup.sh +3 -0
  156. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/spec.md +50 -0
  157. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/task.txt +1 -0
  158. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/duplicate-order-error.js +27 -0
  159. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/priority-stock-reservation.js +44 -0
  160. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/NOTES.md +34 -0
  161. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/expected.json +55 -0
  162. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/metadata.json +10 -0
  163. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/setup.sh +3 -0
  164. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/spec.md +52 -0
  165. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/task.txt +1 -0
  166. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/duplicate-ticket-error.js +29 -0
  167. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/priority-agent-assignment.js +48 -0
  168. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/NOTES.md +34 -0
  169. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/expected.json +55 -0
  170. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/metadata.json +10 -0
  171. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/setup.sh +3 -0
  172. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/spec.md +55 -0
  173. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/task.txt +1 -0
  174. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/duplicate-return-error.js +43 -0
  175. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/priority-return-routing.js +70 -0
  176. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/NOTES.md +37 -0
  177. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/expected.json +54 -0
  178. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/metadata.json +10 -0
  179. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/setup.sh +3 -0
  180. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/spec.md +59 -0
  181. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/task.txt +1 -0
  182. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/credit-ledger-priority.js +98 -0
  183. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/duplicate-charge-error.js +38 -0
  184. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/NOTES.md +36 -0
  185. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/expected.json +56 -0
  186. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/metadata.json +10 -0
  187. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/setup.sh +3 -0
  188. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/spec.md +59 -0
  189. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/task.txt +1 -0
  190. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/duplicate-refund-error.js +41 -0
  191. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/priority-refund-ledger.js +65 -0
  192. package/bin/devlyn.js +221 -17
  193. package/config/skills/_shared/adapters/README.md +3 -0
  194. package/config/skills/_shared/adapters/gpt-5-5.md +5 -1
  195. package/config/skills/_shared/adapters/opus-4-7.md +9 -1
  196. package/config/skills/_shared/archive_run.py +78 -6
  197. package/config/skills/_shared/codex-config.md +5 -4
  198. package/config/skills/_shared/codex-monitored.sh +46 -1
  199. package/config/skills/_shared/collect-codex-findings.py +20 -5
  200. package/config/skills/_shared/engine-preflight.md +17 -13
  201. package/config/skills/_shared/runtime-principles.md +6 -9
  202. package/config/skills/_shared/spec-verify-check.py +2664 -107
  203. package/config/skills/_shared/verify-merge-findings.py +1369 -19
  204. package/config/skills/devlyn:design-ui/SKILL.md +364 -0
  205. package/config/skills/devlyn:ideate/SKILL.md +7 -4
  206. package/config/skills/devlyn:ideate/references/elicitation.md +50 -4
  207. package/config/skills/devlyn:ideate/references/from-spec-mode.md +26 -4
  208. package/config/skills/devlyn:ideate/references/project-mode.md +20 -1
  209. package/config/skills/devlyn:ideate/references/spec-template.md +10 -1
  210. package/config/skills/devlyn:resolve/SKILL.md +78 -26
  211. package/config/skills/devlyn:resolve/references/free-form-mode.md +15 -0
  212. package/config/skills/devlyn:resolve/references/phases/build-gate.md +2 -2
  213. package/config/skills/devlyn:resolve/references/phases/implement.md +1 -1
  214. package/config/skills/devlyn:resolve/references/phases/probe-derive.md +74 -2
  215. package/config/skills/devlyn:resolve/references/phases/verify.md +80 -29
  216. package/config/skills/devlyn:resolve/references/state-schema.md +9 -4
  217. package/package.json +47 -2
  218. package/scripts/lint-fixtures.sh +349 -0
  219. package/scripts/lint-shadow-fixtures.sh +58 -0
  220. package/scripts/lint-skills.sh +3645 -95
@@ -0,0 +1,364 @@
1
+ ---
2
+ name: devlyn:design-ui
3
+ description: Generate N (default 5) radically distinct UI style options from a PRD as portfolio-worthy HTML/CSS samples. Pass a leading integer or `count:N` in the brief to change the count.
4
+ source: project
5
+ ---
6
+
7
+ You are the **Lead Designer** with full creative authority. Create N portfolio-worthy HTML/CSS style samples that help stakeholders visualize design directions. These aren't mockups—they're design statements.
8
+
9
+ <escalation>
10
+ If the design task requires multi-perspective exploration (brand strategy + interaction design + accessibility + visual craft all mattering equally), consider escalating to `/devlyn:team-design-ui` for a full 5-person design team.
11
+ </escalation>
12
+
13
+ <context>
14
+ $ARGUMENTS
15
+ </context>
16
+
17
+ <count_resolution>
18
+ **Resolve N before doing any design work.**
19
+
20
+ 1. If `$ARGUMENTS` begins with a positive integer (e.g. `3`, `7 dark dashboard`), that is N.
21
+ 2. Else if `$ARGUMENTS` contains a `count:N` or `n=N` token (any case), that is N.
22
+ 3. Otherwise N defaults to **5**.
23
+
24
+ Clamp N to the range `1..10`. Values outside that range default to 5 and are noted in the final report. After resolving, use N consistently across every phase below — concept count, file count, output table rows.
25
+
26
+ Strip the count token from the brief before using `$ARGUMENTS` as the product description.
27
+ </count_resolution>
28
+
29
+ <input_handling>
30
+ The context above may contain:
31
+
32
+ - **PRD document**: Extract product goals, target users, and brand requirements
33
+ - **Product description**: Parse key features and emotional direction
34
+ - **Image references**: Analyze and replicate the visual style as closely as possible
35
+
36
+ If no input is provided, check for existing PRD at `docs/prd.md` or `README.md`.
37
+
38
+ ### When Image References Are Provided
39
+
40
+ **Your primary goal shifts to replication, not invention.**
41
+
42
+ 1. **Analyze the reference image(s) precisely:**
43
+
44
+ - Extract exact color values (use color picker precision: #RRGGBB)
45
+ - Identify font characteristics (serif/sans, weight, spacing, size ratios)
46
+ - Map layout structure (grid, spacing rhythm, alignment patterns)
47
+ - Note visual effects (shadows, gradients, blur, textures, border styles)
48
+ - Capture motion cues (if animated reference or implied motion)
49
+
50
+ 2. **Generate designs that match the reference:**
51
+
52
+ - **First ~40% of N designs**: Replicate the reference style as closely as possible, adapting to the PRD's content (with N=5 this is designs 1-2; with N=3 just design 1; with N=10 designs 1-4).
53
+ - **Remaining designs**: Variations that preserve the reference's core aesthetic while exploring different directions within that style.
54
+
55
+ 3. **Fidelity checklist for reference-based designs:**
56
+ - [ ] Color palette within ±5% of reference values
57
+ - [ ] Typography style matches (same category, similar weight/spacing)
58
+ - [ ] Layout proportions preserved
59
+ - [ ] Visual effects replicated (shadows, gradients, textures)
60
+ - [ ] Overall "feel" is recognizably similar to reference
61
+
62
+ ### When No Image References Are Provided
63
+
64
+ Follow the standard creative process: invent tension-based concept names, map across spectrums, and generate N radically different directions.
65
+ </input_handling>
66
+
67
+ <instructions>
68
+
69
+ ## Phase 1: Extract Design DNA
70
+
71
+ Keep this brief—creative naming drives the design, not over-analysis.
72
+
73
+ ```
74
+ **Product:** [one sentence]
75
+ **User:** [who, in what context, with what goal]
76
+ **Must convey:** [2-3 essential feelings]
77
+ **Count (N):** [resolved N]
78
+ ```
79
+
80
+ ## Phase 2: Invent N Creative Directions
81
+
82
+ ### Check Existing Styles
83
+
84
+ Read `docs/design/` directory. If `style_K_*.html` files exist, continue numbering from K+1. New styles must be visually distinct from existing ones.
85
+
86
+ ### Create N Concept Names
87
+
88
+ **Before any design work, invent N evocative names.**
89
+
90
+ Name format: `[word_A]_[word_B]` where:
91
+
92
+ - Word A and Word B create **tension or contrast**
93
+ - The combination should feel unexpected, not obvious
94
+ - Each word pulls the design in a different direction
95
+
96
+ Good patterns:
97
+
98
+ - [temperature]\_[movement]: warm vs cold, static vs dynamic
99
+ - [texture]\_[era]: rough vs smooth, retro vs futuristic
100
+ - [emotion]\_[structure]: soft vs rigid, chaotic vs ordered
101
+ - [material]\_[concept]: organic vs digital, heavy vs light
102
+
103
+ Avoid:
104
+
105
+ - Single adjectives
106
+ - Obvious pairings without tension
107
+ - Generic descriptors
108
+
109
+ **The name drives the design.** Tension in the name forces creative problem-solving.
110
+
111
+ ### Map Each Concept Across 7 Spectrums
112
+
113
+ For each concept, mark its position. **Extremes create distinctiveness—avoid the middle.**
114
+
115
+ ```
116
+ Concept: [name]
117
+
118
+ Layout: Dense ●○○○○ Spacious
119
+ Color: Monochrome ○○○○● Vibrant
120
+ Typography: Serif ○○●○○ Display
121
+ Depth: Flat ○○○○● Layered
122
+ Energy: Calm ○●○○○ Dynamic
123
+ Theme: Dark ●○○○○ Light
124
+ Shape: Angular ○○○○● Curved
125
+ ```
126
+
127
+ ### Extreme Rule (Mandatory)
128
+
129
+ **Each design MUST have at least 2 extreme positions** (●○○○○ or ○○○○●).
130
+
131
+ Why: Middle positions (○○●○○) converge to "safe" averages. Extremes force distinctive choices.
132
+
133
+ ### Verify Contrast
134
+
135
+ Before proceeding:
136
+
137
+ - [ ] Each design has **2+ extreme positions**
138
+ - [ ] No two concepts share the same position on 4+ spectrums
139
+ - [ ] If N ≥ 2, mix of dark and light themes across the N designs
140
+ - [ ] If N ≥ 2, mix of angular and curved across the N designs
141
+
142
+ ## Phase 3: Define Concrete Specifications
143
+
144
+ For each concept, specify exact values—no adjectives.
145
+
146
+ ```
147
+ ### [Concept Name]
148
+
149
+ **Palette:**
150
+ - Background: #______
151
+ - Surface: #______
152
+ - Text: #______
153
+ - Text muted: #______
154
+ - Accent: #______
155
+
156
+ **Typography:**
157
+ - Font: [Google Font name]
158
+ - Headline: [size]px / [weight] / [letter-spacing]em
159
+ - Body: [size]px / [weight] / [line-height]
160
+
161
+ **Spacing:**
162
+ - Container max-width: [value]px
163
+ - Section padding: [value]px
164
+ - Element gap: [value]px
165
+ - Border-radius: [value]px
166
+
167
+ **Motion:**
168
+ - Duration: [value]s
169
+ - Easing: cubic-bezier([values])
170
+ - Stagger delay: [value]s
171
+ ```
172
+
173
+ ## Phase 4: Generate HTML Files
174
+
175
+ <use_parallel_tool_calls>
176
+ Write all N HTML files simultaneously by making N independent Write tool calls in a single response. These files have no dependencies on each other—do not write them sequentially. Maximize parallel execution for speed.
177
+ </use_parallel_tool_calls>
178
+
179
+ <frontend_aesthetics>
180
+ You tend to converge toward generic outputs. Avoid this:
181
+
182
+ **Typography:** Never use Inter, Roboto, Arial, Helvetica, Open Sans, Space Grotesk, or system fonts. Choose distinctive typefaces. Use weight extremes (100 vs 900, not 400 vs 600). Dramatic size jumps (3x+). Tight headline letter-spacing (-0.02em to -0.05em).
183
+
184
+ **Color:** One dominant + one sharp accent. Never pure #FFFFFF or #000000 backgrounds—add subtle tint. No purple gradients.
185
+
186
+ **Motion:** Focus on high-impact moments, not scattered micro-interactions.
187
+
188
+ - **Page load**: Orchestrated staggered reveals (vary `animation-delay` by 0.05-0.1s increments)
189
+ - **Scroll**: Use `IntersectionObserver` for scroll-triggered fade-ins (vanilla JS, no frameworks)
190
+ - **Hover**: Transform + opacity + subtle shadow shifts, not just color changes
191
+ - **Transitions**: Custom `cubic-bezier` easings that feel physical (e.g., `cubic-bezier(0.34, 1.56, 0.64, 1)` for bounce)
192
+ - **Advanced**: Gradient animations via `background-position`, `backdrop-filter` transitions, CSS `@property` for animatable custom properties
193
+ - **Restraint**: One dramatic sequence beats many small animations. If everything moves, nothing stands out.
194
+
195
+ **Backgrounds:** Never flat solid colors. Layer gradients, add subtle noise/grain, create atmosphere.
196
+
197
+ **Layout:** Break at least one standard pattern per design. Try asymmetry, overlap, bento grids, diagonal flow, or unexpected whitespace.
198
+ </frontend_aesthetics>
199
+
200
+ ### File Requirements
201
+
202
+ | Requirement | Details |
203
+ | ------------------ | ------------------------------------------------- |
204
+ | **Path** | `docs/design/style_{n}_{concept_name}.html` |
205
+ | **Content** | Realistic view matching product purpose |
206
+ | **Self-contained** | Inline CSS, only Google Fonts external |
207
+ | **Interactivity** | Hover, active, focus states + page load animation |
208
+ | **Responsive** | Basic mobile adaptation |
209
+ | **Real content** | Actual copy from PRD, no lorem ipsum |
210
+
211
+ ### HTML Structure
212
+
213
+ ```html
214
+ <!DOCTYPE html>
215
+ <html lang="en">
216
+ <head>
217
+ <meta charset="UTF-8" />
218
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
219
+ <title>[Product] - [Concept]</title>
220
+
221
+ <link rel="preconnect" href="https://fonts.googleapis.com" />
222
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
223
+ <link href="https://fonts.googleapis.com/css2?family=[Font]:[weights]&display=swap" rel="stylesheet" />
224
+
225
+ <style>
226
+ /* Concept: [name]
227
+ Spectrum: L[x] C[x] T[x] D[x] E[x] Th[x] Sh[x]
228
+ Extremes: [list which 2+ are extreme] */
229
+
230
+ :root {
231
+ --bg: #[hex];
232
+ --surface: #[hex];
233
+ --text: #[hex];
234
+ --text-muted: #[hex];
235
+ --accent: #[hex];
236
+ }
237
+
238
+ * {
239
+ margin: 0;
240
+ padding: 0;
241
+ box-sizing: border-box;
242
+ }
243
+
244
+ body {
245
+ font-family: "[Font]", sans-serif;
246
+ background: var(--bg);
247
+ color: var(--text);
248
+ }
249
+
250
+ /* Page load: staggered reveal */
251
+ .reveal {
252
+ opacity: 0;
253
+ transform: translateY(20px);
254
+ animation: fadeUp 0.6s cubic-bezier(0.16, 1, 0.3, 1) forwards;
255
+ }
256
+ .reveal:nth-child(1) {
257
+ animation-delay: 0.1s;
258
+ }
259
+ .reveal:nth-child(2) {
260
+ animation-delay: 0.15s;
261
+ }
262
+ .reveal:nth-child(3) {
263
+ animation-delay: 0.2s;
264
+ }
265
+
266
+ @keyframes fadeUp {
267
+ to {
268
+ opacity: 1;
269
+ transform: translateY(0);
270
+ }
271
+ }
272
+
273
+ /* Scroll-triggered: hidden until in view */
274
+ .scroll-reveal {
275
+ opacity: 0;
276
+ transform: translateY(30px);
277
+ transition: opacity 0.6s cubic-bezier(0.16, 1, 0.3, 1), transform 0.6s cubic-bezier(0.16, 1, 0.3, 1);
278
+ }
279
+ .scroll-reveal.visible {
280
+ opacity: 1;
281
+ transform: translateY(0);
282
+ }
283
+
284
+ /* Hover: physical-feeling bounce */
285
+ .interactive {
286
+ transition: transform 0.3s cubic-bezier(0.34, 1.56, 0.64, 1), box-shadow 0.3s ease;
287
+ }
288
+ .interactive:hover {
289
+ transform: translateY(-4px);
290
+ box-shadow: 0 12px 24px -8px rgba(0, 0, 0, 0.15);
291
+ }
292
+ </style>
293
+ </head>
294
+ <body>
295
+ <!-- Semantic HTML with real content -->
296
+
297
+ <script>
298
+ // Scroll-triggered animations
299
+ const observer = new IntersectionObserver(
300
+ (entries) => {
301
+ entries.forEach((entry) => {
302
+ if (entry.isIntersecting) {
303
+ entry.target.classList.add("visible");
304
+ }
305
+ });
306
+ },
307
+ { threshold: 0.1 }
308
+ );
309
+
310
+ document.querySelectorAll(".scroll-reveal").forEach((el) => observer.observe(el));
311
+ </script>
312
+ </body>
313
+ </html>
314
+ ```
315
+
316
+ ## Phase 5: Verify Quality
317
+
318
+ ### Per-Design Checklist
319
+
320
+ - [ ] Font is distinctive (not Inter/Roboto/Arial/system)
321
+ - [ ] Background has depth (not flat white/black)
322
+ - [ ] Page load animation with staggered delays
323
+ - [ ] Scroll-triggered reveals on below-fold content
324
+ - [ ] Hover states with transform + shadow (not just color)
325
+ - [ ] Custom easing (cubic-bezier), not default `ease` or `linear`
326
+ - [ ] CSS custom properties for colors
327
+ - [ ] Layout breaks at least one standard pattern
328
+
329
+ ### Cross-Design Contrast
330
+
331
+ Each pair of designs must have 5+ obvious visual differences. If not, revise. (Skipped automatically when N=1.)
332
+
333
+ ## Phase 6: Save & Report
334
+
335
+ Create `docs/design/` directory if needed. Save all N HTML files.
336
+
337
+ </instructions>
338
+
339
+ <output_format>
340
+
341
+ ```
342
+ ## Generated Styles (N = {N})
343
+
344
+ | # | Name | Spectrum (L/C/T/D/E/Th/Sh) | Extremes | Palette | Font |
345
+ |---|------|---------------------------|----------|---------|------|
346
+ | {k} | {name} | [x][x][x][x][x][x][x] | {which 2+} | #___, #___, #___ | {font} |
347
+
348
+ ### Files
349
+ - docs/design/style_{k}_{name}.html
350
+ - ...
351
+
352
+ ### Rationale
353
+ 1. **{name}**: [1 sentence connecting to product requirements]
354
+ 2. ...
355
+ ```
356
+
357
+ </output_format>
358
+
359
+ Make bold choices. Each design should be portfolio-worthy—something you'd proudly present.
360
+
361
+ <next_step>
362
+ After the user picks a style, suggest:
363
+ → Run `/devlyn:design-system [style-number]` to extract design tokens from the chosen style into a reusable design system reference.
364
+ </next_step>
@@ -82,7 +82,7 @@ The elicitation agent:
82
82
  5. Stops when the structural lint passes AND user confirms, or 8 turns elapsed.
83
83
 
84
84
  Structural lint (inline check, no script needed):
85
- - Frontmatter has `id`, `title`, `kind`, `status: planned`.
85
+ - Frontmatter has `id`, `title`, `kind`, `status: planned`, `complexity`.
86
86
  - `## Context` non-empty (≥ 1 sentence).
87
87
  - `## Requirements` has ≥ 1 `- [ ]` bullet.
88
88
  - `## Out of Scope` present (may list "none" if truly nothing).
@@ -91,8 +91,9 @@ Structural lint (inline check, no script needed):
91
91
  After lint passes:
92
92
  1. Write `<spec-dir>/<id>-<slug>/spec.md` (the spec).
93
93
  2. Generate `<spec-dir>/<id>-<slug>/spec.expected.json` from the spec's `## Verification` block + any `forbidden_patterns` / `required_files` / `forbidden_files` / `max_deps_added` the conversation surfaced.
94
- 3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2, fix the carrier and re-run.
95
- 4. Print: `spec ready /devlyn:resolve --spec <spec-path>`.
94
+ 3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape, supported `complexity` frontmatter, and any present actionable solo-headroom hypothesis; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`. If exit 2, fix the carrier/frontmatter/hypothesis and re-run.
95
+ 4. Run `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` to validate sibling `spec.expected.json` against `_shared/expected.schema.json` plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`. If exit 2, fix the JSON/frontmatter/hypothesis and re-run.
96
+ 5. Print: `spec ready — /devlyn:resolve --spec <spec-path>`.
96
97
 
97
98
  ## PHASE 1Q: QUICK MODE
98
99
 
@@ -103,6 +104,7 @@ Single-turn assume-and-confirm. Prompt body: see `references/elicitation.md` §
103
104
  3. User responds with "go" / "fix X" / "no, different".
104
105
  4. On "go": write spec + spec.expected.json + lint + announce.
105
106
  5. On "fix X": apply correction, re-show, ask again. Maximum 3 correction rounds before escalating to default mode.
107
+ 6. Exception: for benchmark, risk-probe, or pair-evidence goals, do not infer a solo-headroom hypothesis. Ask for the actionable hypothesis first; if unavailable, exit with `spec not ready — solo-headroom hypothesis required`. For new unmeasured benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence candidates, also do not infer solo ceiling avoidance; ask for the concrete difference from rejected or solo-saturated controls such as `S2`-`S6`, and exit with `spec not ready — solo ceiling avoidance required` if unavailable.
106
108
 
107
109
  ## PHASE 1F: FROM-SPEC MODE
108
110
 
@@ -114,7 +116,8 @@ Prompt body: `references/from-spec-mode.md`.
114
116
  4. Apply structural fixes only — do NOT reshape Requirements / Out-of-Scope content. The user's substantive intent is preserved.
115
117
  5. Generate `spec.expected.json` if absent (best-effort from `## Verification` block).
116
118
  6. Write the normalized spec back to `<spec-dir>/<id>-<slug>/` (preserves original at `<path>` untouched unless user passes `--in-place`).
117
- 7. Lint pass announce. Lint fail → surface the unfixable issue and exit non-zero.
119
+ 7. Run both lint checks: `--check <spec-path>` and `--check-expected <expected-path>`.
120
+ 8. Lint pass → announce. Lint fail → surface the unfixable issue and exit non-zero. If the source is a pair-evidence candidate without an actionable solo-headroom hypothesis, the announcement must say `pair-evidence not ready` instead of implying measurement readiness.
118
121
 
119
122
  ## PHASE 1P: PROJECT MODE
120
123
 
@@ -30,7 +30,37 @@ For most coding tasks, the under-specified blanks are:
30
30
  3. **Failure shape**: what happens on bad input? Exit code, error message format, fallback behavior (silent vs visible)?
31
31
  4. **Scope boundary**: which files are in-scope, which are out-of-scope? "Don't touch the auth module" is a boundary worth surfacing.
32
32
  5. **Constraints**: dependency policy (new deps allowed?), silent-catch policy, type-system escape policy, test coverage expectations.
33
- 6. **Verification**: how does the user know it worked? Pick the smallest concrete check.
33
+ 6. **Complexity signal**: set spec frontmatter `complexity` to `high` when
34
+ the spec needs a compound scenario crossing state mutation with ordering,
35
+ idempotency, auth/error priority, rollback/failure handling, or exact output
36
+ shape. This is a downstream VERIFY pair-trigger signal, not a vague
37
+ difficulty label.
38
+ 7. **Verification**: how does the user know it worked? Pick the smallest concrete check.
39
+ If the goal combines state mutation with ordering/priority, idempotency,
40
+ auth/error priority, or exact output shape, ask for one concrete compound
41
+ scenario that exercises the interaction end-to-end instead of accepting only
42
+ isolated happy-path checks.
43
+ 8. **Pair-candidate headroom**: when the user is creating a benchmark, risk
44
+ probe, or pair-evidence candidate, ask for one solo-headroom hypothesis in
45
+ actionable form: the spec must literally contain `solo-headroom hypothesis`,
46
+ `solo_claude`, `miss`, and a backticked observable command while naming the
47
+ visible behavior a capable `solo_claude` baseline should miss; the backticked
48
+ line itself must contain `miss` and be framed as the command/observable that exposes it. If the
49
+ answer is only "the task is hard", rework the candidate before spending provider
50
+ calls. Do not write a benchmark/risk-probe/pair-evidence spec until this
51
+ hypothesis is actionable; if the user cannot provide it, stop with
52
+ `spec not ready — solo-headroom hypothesis required` and ask them to return
53
+ with the visible behavior `solo_claude` is expected to miss.
54
+ 9. **Solo ceiling avoidance**: for a new unmeasured benchmark, shadow-fixture,
55
+ golden-fixture, risk-probe, or pair-evidence candidate, ask how this candidate
56
+ differs from rejected or solo-saturated controls such as `S2`-`S6`. The note
57
+ must literally contain `solo ceiling avoidance`, mention `solo_claude`, and
58
+ name the concrete difference expected to preserve `solo_claude` headroom.
59
+ Benchmark fixture directories put this in `NOTES.md` as
60
+ `## Solo ceiling avoidance`; ordinary specs keep it in `## Verification`
61
+ next to the solo-headroom hypothesis. Do not write or measure the candidate
62
+ if this answer is missing; stop with
63
+ `spec not ready — solo ceiling avoidance required`.
34
64
 
35
65
  Walk through these in roughly this order. Skip the ones already clear from the user's initial text.
36
66
  </missing_decisions_to_surface>
@@ -56,14 +86,18 @@ When you're about to ask the user a question, look at the draft first — if the
56
86
 
57
87
  <lint>
58
88
  Before declaring the spec ready, verify structurally:
59
- - Frontmatter has `id`, `title`, `kind`, `status: planned`.
89
+ - Frontmatter has `id`, `title`, `kind`, `status: planned`, `complexity`.
60
90
  - All 5 H2 sections present (`## Context`, `## Requirements`, `## Constraints`, `## Out of Scope`, `## Verification`).
61
91
  - Requirements ≥ 1 bullet.
62
92
  - Verification has either ≥ 1 named command OR the explicit pure-design escape phrase.
63
93
 
64
94
  If the lint fails, fix the missing piece (ask one focused question if needed) before announcing.
65
95
 
66
- After lint passes, also run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2: read the stderr message, fix the carrier, re-run. Exit 0 = ready.
96
+ After lint passes, run both mechanical checks:
97
+ 1. `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` validates the spec's verification carrier shape, supported `complexity` frontmatter, and any present actionable solo-headroom hypothesis; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`.
98
+ 2. `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` validates sibling `spec.expected.json` against `_shared/expected.schema.json` plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`.
99
+
100
+ If either exits 2: read the stderr message, fix the malformed carrier or JSON, and re-run the failed command. Both commands must exit 0 before ready.
67
101
  </lint>
68
102
 
69
103
  <output>
@@ -83,9 +117,21 @@ When `--quick` is set:
83
117
  1. AI synthesizes spec from the one-line goal — fill every section with the most reasonable inference.
84
118
  2. AI presents the spec to the user with an explicit `## Assumptions made` block listing every inferred decision (one bullet each).
85
119
  3. User responds with "go" / "fix X to be Y" / "no, different".
86
- 4. On "go": write the spec + spec.expected.json, run lint, announce.
120
+ 4. On "go": write the spec + spec.expected.json, run both lint checks, announce.
87
121
  5. On "fix X": apply correction, re-present, ask again. Maximum 3 correction rounds before escalating to default mode.
88
122
 
123
+ Exception: quick mode must not infer a solo-headroom hypothesis for benchmark,
124
+ risk-probe, or pair-evidence goals. If the one-line goal lacks the actionable
125
+ `solo-headroom hypothesis` / `solo_claude` / `miss` / backticked-command
126
+ contract, ask exactly one focused follow-up for that hypothesis before showing a
127
+ draft; if the user cannot provide it, exit with
128
+ `spec not ready — solo-headroom hypothesis required`. For a new unmeasured
129
+ benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence
130
+ candidate, quick mode also must not infer the `solo ceiling avoidance` note; ask
131
+ for the concrete difference from rejected or solo-saturated controls such as
132
+ `S2`-`S6`, and exit with `spec not ready — solo ceiling avoidance required` if
133
+ the user cannot provide it.
134
+
89
135
  Quick mode trades thoroughness for speed. Use it for trivial-medium tasks where the user has a clear-enough goal that one round of inference + correction is sufficient.
90
136
 
91
137
  ## Anti-patterns
@@ -14,7 +14,7 @@ The user already wrote a spec (or has one from elsewhere — a teammate, a previ
14
14
 
15
15
  <allowed_changes>
16
16
  You may:
17
- 1. Add missing frontmatter fields (id from filename, kind=feature default, status=planned).
17
+ 1. Add missing frontmatter fields (id from filename, kind=feature default, status=planned, complexity=medium default; set complexity=high only when preserved Requirements clearly combine state/order/failure/output-shape risks).
18
18
  2. Rename non-canonical section headings to canonical (`## Goals` → `## Requirements`, `## Notes` ignored unless they clearly belong in Constraints).
19
19
  3. Add a missing `## Out of Scope` section with `- (no explicit non-goals provided by author)`.
20
20
  4. Add a missing `## Verification` section if Requirements imply observable runtime checks — best-effort one-command-per-Requirement, then surface to user for review.
@@ -37,14 +37,36 @@ You must NOT:
37
37
  3. For each missing/malformed piece: apply the smallest allowed fix.
38
38
  4. Write the normalized spec. Default location: `<spec-dir>/<id>-<slug>/spec.md`. With `--in-place` flag: write to `<path>` directly (overwrites the original).
39
39
  5. Generate or fix `spec.expected.json` per the rules above. Same dir as the spec.
40
- 6. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate.
41
- 7. If lint still fails after allowed fixes (e.g. Requirements section is empty in the source), surface the issue and exit non-zero do NOT invent Requirements.
40
+ 6. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the spec carrier and supported `complexity` frontmatter; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`.
41
+ 7. Run `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` to validate sibling `spec.expected.json` plus sibling spec `complexity` frontmatter and any present solo-headroom hypothesis command against `spec.expected.json.verification_commands[].cmd`.
42
+ 8. If lint still fails after allowed fixes (e.g. Requirements section is empty in the source), surface the issue and exit non-zero — do NOT invent Requirements.
43
+ 9. If the preserved Requirements combine state mutation with ordering/priority,
44
+ idempotency, auth/error priority, or exact output shape but the Verification
45
+ section lacks a compound end-to-end scenario, do not rewrite the author's
46
+ content. Add a final warning that `/devlyn:resolve` may need default-mode
47
+ ideation or a stronger Verification section before pair-relevant risks are
48
+ measurable.
49
+ 10. If the source is a benchmark, risk probe, or pair-evidence candidate and it
50
+ lacks an actionable solo-headroom hypothesis, do not invent one. Add a final
51
+ warning that the candidate may be solo-saturated until Context or
52
+ Verification literally contains `solo-headroom hypothesis`, `solo_claude`,
53
+ `miss`, and a backticked observable command while naming the visible
54
+ behavior a capable `solo_claude` baseline is expected to miss; the
55
+ backticked line itself must contain `miss` and be framed as the
56
+ command/observable that exposes it. Do not call the normalized spec pair-evidence ready.
57
+ 11. If the source is a new unmeasured benchmark, shadow-fixture, golden-fixture,
58
+ risk-probe, or pair-evidence candidate and it lacks a solo ceiling avoidance
59
+ note, do not invent one. Add a final warning that the candidate may replay
60
+ rejected or solo-saturated controls until Context, Verification, or fixture
61
+ `NOTES.md` literally contains `solo ceiling avoidance`, mentions
62
+ `solo_claude`, and names a concrete difference from rejected controls such
63
+ as `S2`-`S6`. Do not call the normalized spec pair-evidence ready.
42
64
  </flow>
43
65
 
44
66
  <output>
45
67
  Same as default mode: `<spec-dir>/<id>-<slug>/spec.md` + `<spec-dir>/<id>-<slug>/spec.expected.json`.
46
68
 
47
- Final announcement: `spec normalized — /devlyn:resolve --spec <spec-path>`. If the spec was lint-passing with no changes needed, announce: `spec already canonical — /devlyn:resolve --spec <spec-path>`.
69
+ Final announcement: `spec normalized — /devlyn:resolve --spec <spec-path>`. If the spec was lint-passing with no changes needed, announce: `spec already canonical — /devlyn:resolve --spec <spec-path>`. If step 9 applies, append: `warning: Verification may need one compound end-to-end scenario before pair-relevant risks are measurable`. If step 10 applies, append: `pair-evidence not ready — Pair-candidate headroom is unproven until the spec states a solo-headroom hypothesis`. If step 11 applies, append: `pair-evidence not ready — Pair-candidate headroom is unproven until the spec states solo ceiling avoidance`.
48
70
 
49
71
  If lint failed unfixably: print the specific failure, exit non-zero. Do not write a partial output.
50
72
  </output>
@@ -11,6 +11,25 @@ The user wants to build a project, not a single feature. Your job is to elicit t
11
11
  2. Ask the same question categories as default mode (input/output/failure/scope/constraints/verification) but at the project level first, then drill into each feature.
12
12
  3. Decompose into 3-7 features. Fewer = the project is actually one big feature; recommend default mode. More = the project is too large; recommend splitting into separate ideate runs.
13
13
  4. Each feature must be independently shippable: a feature whose verification depends on another feature's runtime behavior is a dependency, not a feature.
14
+ 5. When a feature combines state mutation with ordering/priority, idempotency,
15
+ auth/error priority, or exact output shape, its per-feature Verification must
16
+ include one compound end-to-end scenario; do not hide the interaction in
17
+ project-level prose.
18
+ 6. When a feature is intended as a benchmark, risk probe, or pair-evidence
19
+ candidate, its per-feature Verification must include a solo-headroom
20
+ hypothesis. The feature spec must literally contain
21
+ `solo-headroom hypothesis`, `solo_claude`, `miss`, and a backticked
22
+ observable command while naming the visible behavior a capable
23
+ `solo_claude` baseline is expected to miss; the backticked line itself must
24
+ contain `miss` and be framed as the command/observable that exposes it. Do not defer that to
25
+ project-level prose, and rework the feature spec if the hypothesis is only
26
+ "the task is hard".
27
+ 7. When a feature is a new unmeasured benchmark, shadow-fixture, golden-fixture,
28
+ risk-probe, or pair-evidence candidate, its per-feature Verification must also include a solo ceiling avoidance note. The feature spec must literally
29
+ contain `solo ceiling avoidance`, mention `solo_claude`, and name a concrete
30
+ difference from rejected or solo-saturated controls such as `S2`-`S6`. Do not
31
+ defer that to project-level prose; benchmark fixture directories mirror the
32
+ same note in `NOTES.md` as `## Solo ceiling avoidance`.
14
33
  </conversation_rules>
15
34
 
16
35
  <decomposition_rules>
@@ -63,7 +82,7 @@ Anything binding all features (e.g. "no new top-level dependencies", "all CLI ou
63
82
  - `<spec-dir>/<id-N>/spec.md` for each feature (per `references/spec-template.md`).
64
83
  - `<spec-dir>/<id-N>/spec.expected.json` for each feature (per `_shared/expected.schema.json`).
65
84
 
66
- Each per-feature spec is structurally lint-validated using `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>`.
85
+ Each per-feature spec is structurally lint-validated using `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>`, including supported `complexity` frontmatter and any present actionable solo-headroom hypothesis, and each sibling expected contract plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis is validated using `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>`; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`.
67
86
 
68
87
  Final announcement: `project ready — N specs at <spec-dir>/. Start with /devlyn:resolve --spec <first-spec-path>`.
69
88
  </output>
@@ -10,6 +10,7 @@ id: "<spec-id>" # kebab-case, unique per spec-dir; auto-generated if us
10
10
  title: "<short title>" # one line, descriptive
11
11
  kind: feature # feature | spike | prototype
12
12
  status: planned # planned → in_progress → done. ideate writes "planned"; resolve's CLEANUP flips to "done".
13
+ complexity: medium # trivial | medium | high. Use high when Verification needs compound state/order/failure checks.
13
14
  depends_on: [] # list of spec ids this depends on (empty for standalone). project mode populates this.
14
15
  ---
15
16
  ```
@@ -78,7 +79,11 @@ If all Requirements are pure-design (no observable runtime check), the body of t
78
79
 
79
80
  ## Sibling file: `spec.expected.json`
80
81
 
81
- Schema: `_shared/expected.schema.json`. Required when Requirements have observable checks; optional when all Requirements are pure-design.
82
+ Schema: `_shared/expected.schema.json`. Required when Requirements have observable checks; optional when all Requirements are pure-design. Validate before announcing ready:
83
+
84
+ ```bash
85
+ python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>
86
+ ```
82
87
 
83
88
  Generated by ideate from the conversation:
84
89
  - `verification_commands` ← parsed from `## Verification` body + any commands the conversation surfaced.
@@ -99,4 +104,8 @@ Substantive (ideate's job during elicitation):
99
104
  - Each Constraint has a reasoning clause.
100
105
  - Out of Scope explicitly enumerates non-goals.
101
106
  - Verification commands actually verify the Requirement they map to (no "looks plausible" verification).
107
+ - Frontmatter `complexity` reflects verification shape: `high` when the spec combines state mutation with ordering/priority, idempotency, auth/error priority, rollback/failure handling, or exact output shape; `medium` for ordinary multi-step work; `trivial` only for a single localized behavior.
108
+ - When Requirements combine state mutation, ordering/priority, idempotency, auth/error priority, or exact output shape, Verification includes at least one compound scenario that exercises the interaction end-to-end; isolated happy paths are insufficient.
109
+ - For benchmark, risk-probe, or pair-evidence specs, include a solo-headroom hypothesis inside `## Verification`: the artifact must literally contain `solo-headroom hypothesis`, `solo_claude`, `miss`, and a backticked observable command while naming the visible behavior a capable `solo_claude` baseline is expected to miss; the backticked line itself must contain `miss`, be framed as the command/observable that exposes it, and match a `spec.expected.json.verification_commands[].cmd` entry. For regenerated pair evidence, this hypothesis is the source for VERIFY's canonical `spec.solo_headroom_hypothesis` trigger reason and must pass `benchmark audit --require-hypothesis-trigger`. If the hypothesis is only "the task is hard", rework the spec before measurement.
110
+ - For new unmeasured benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence candidates, include a solo ceiling avoidance note before measurement: the artifact must literally contain `solo ceiling avoidance`, mention `solo_claude`, and name a concrete difference from rejected or solo-saturated controls such as `S2`-`S6`. If the note cannot say why the candidate should preserve `solo_claude` headroom, rework the candidate instead of spending provider calls. Benchmark fixture directories put this in `NOTES.md` as `## Solo ceiling avoidance`; ordinary specs keep the note in `## Verification` next to the solo-headroom hypothesis.
102
111
  - Spec text is plain language — no jargon walls, no "for future flexibility" hedging.