npm - devlyn-cli - Versions diffs - 2.2.2 → 2.3.1 - Mend

devlyn-cli 2.2.2 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (220) hide show

package/config/skills/devlyn:design-ui/SKILL.md ADDED Viewed

@@ -0,0 +1,364 @@
+---
+name: devlyn:design-ui
+description: Generate N (default 5) radically distinct UI style options from a PRD as portfolio-worthy HTML/CSS samples. Pass a leading integer or `count:N` in the brief to change the count.
+source: project
+---
+You are the **Lead Designer** with full creative authority. Create N portfolio-worthy HTML/CSS style samples that help stakeholders visualize design directions. These aren't mockups—they're design statements.
+<escalation>
+If the design task requires multi-perspective exploration (brand strategy + interaction design + accessibility + visual craft all mattering equally), consider escalating to `/devlyn:team-design-ui` for a full 5-person design team.
+</escalation>
+<context>
+$ARGUMENTS
+</context>
+<count_resolution>
+**Resolve N before doing any design work.**
+1. If `$ARGUMENTS` begins with a positive integer (e.g. `3`, `7 dark dashboard`), that is N.
+2. Else if `$ARGUMENTS` contains a `count:N` or `n=N` token (any case), that is N.
+3. Otherwise N defaults to **5**.
+Clamp N to the range `1..10`. Values outside that range default to 5 and are noted in the final report. After resolving, use N consistently across every phase below — concept count, file count, output table rows.
+Strip the count token from the brief before using `$ARGUMENTS` as the product description.
+</count_resolution>
+<input_handling>
+The context above may contain:
+- **PRD document**: Extract product goals, target users, and brand requirements
+- **Product description**: Parse key features and emotional direction
+- **Image references**: Analyze and replicate the visual style as closely as possible
+If no input is provided, check for existing PRD at `docs/prd.md` or `README.md`.
+### When Image References Are Provided
+**Your primary goal shifts to replication, not invention.**
+1. **Analyze the reference image(s) precisely:**
+   - Extract exact color values (use color picker precision: #RRGGBB)
+   - Identify font characteristics (serif/sans, weight, spacing, size ratios)
+   - Map layout structure (grid, spacing rhythm, alignment patterns)
+   - Note visual effects (shadows, gradients, blur, textures, border styles)
+   - Capture motion cues (if animated reference or implied motion)
+2. **Generate designs that match the reference:**
+   - **First ~40% of N designs**: Replicate the reference style as closely as possible, adapting to the PRD's content (with N=5 this is designs 1-2; with N=3 just design 1; with N=10 designs 1-4).
+   - **Remaining designs**: Variations that preserve the reference's core aesthetic while exploring different directions within that style.
+3. **Fidelity checklist for reference-based designs:**
+   - [ ] Color palette within ±5% of reference values
+   - [ ] Typography style matches (same category, similar weight/spacing)
+   - [ ] Layout proportions preserved
+   - [ ] Visual effects replicated (shadows, gradients, textures)
+   - [ ] Overall "feel" is recognizably similar to reference
+### When No Image References Are Provided
+Follow the standard creative process: invent tension-based concept names, map across spectrums, and generate N radically different directions.
+</input_handling>
+<instructions>
+## Phase 1: Extract Design DNA
+Keep this brief—creative naming drives the design, not over-analysis.
+```
+**Product:** [one sentence]
+**User:** [who, in what context, with what goal]
+**Must convey:** [2-3 essential feelings]
+**Count (N):** [resolved N]
+```
+## Phase 2: Invent N Creative Directions
+### Check Existing Styles
+Read `docs/design/` directory. If `style_K_*.html` files exist, continue numbering from K+1. New styles must be visually distinct from existing ones.
+### Create N Concept Names
+**Before any design work, invent N evocative names.**
+Name format: `[word_A]_[word_B]` where:
+- Word A and Word B create **tension or contrast**
+- The combination should feel unexpected, not obvious
+- Each word pulls the design in a different direction
+Good patterns:
+- [temperature]\_[movement]: warm vs cold, static vs dynamic
+- [texture]\_[era]: rough vs smooth, retro vs futuristic
+- [emotion]\_[structure]: soft vs rigid, chaotic vs ordered
+- [material]\_[concept]: organic vs digital, heavy vs light
+Avoid:
+- Single adjectives
+- Obvious pairings without tension
+- Generic descriptors
+**The name drives the design.** Tension in the name forces creative problem-solving.
+### Map Each Concept Across 7 Spectrums
+For each concept, mark its position. **Extremes create distinctiveness—avoid the middle.**
+```
+Concept: [name]
+Layout:      Dense ●○○○○ Spacious
+Color:       Monochrome ○○○○● Vibrant
+Typography:  Serif ○○●○○ Display
+Depth:       Flat ○○○○● Layered
+Energy:      Calm ○●○○○ Dynamic
+Theme:       Dark ●○○○○ Light
+Shape:       Angular ○○○○● Curved
+```
+### Extreme Rule (Mandatory)
+**Each design MUST have at least 2 extreme positions** (●○○○○ or ○○○○●).
+Why: Middle positions (○○●○○) converge to "safe" averages. Extremes force distinctive choices.
+### Verify Contrast
+Before proceeding:
+- [ ] Each design has **2+ extreme positions**
+- [ ] No two concepts share the same position on 4+ spectrums
+- [ ] If N ≥ 2, mix of dark and light themes across the N designs
+- [ ] If N ≥ 2, mix of angular and curved across the N designs
+## Phase 3: Define Concrete Specifications
+For each concept, specify exact values—no adjectives.
+```
+### [Concept Name]
+**Palette:**
+- Background: #______
+- Surface: #______
+- Text: #______
+- Text muted: #______
+- Accent: #______
+**Typography:**
+- Font: [Google Font name]
+- Headline: [size]px / [weight] / [letter-spacing]em
+- Body: [size]px / [weight] / [line-height]
+**Spacing:**
+- Container max-width: [value]px
+- Section padding: [value]px
+- Element gap: [value]px
+- Border-radius: [value]px
+**Motion:**
+- Duration: [value]s
+- Easing: cubic-bezier([values])
+- Stagger delay: [value]s
+```
+## Phase 4: Generate HTML Files
+<use_parallel_tool_calls>
+Write all N HTML files simultaneously by making N independent Write tool calls in a single response. These files have no dependencies on each other—do not write them sequentially. Maximize parallel execution for speed.
+</use_parallel_tool_calls>
+<frontend_aesthetics>
+You tend to converge toward generic outputs. Avoid this:
+**Typography:** Never use Inter, Roboto, Arial, Helvetica, Open Sans, Space Grotesk, or system fonts. Choose distinctive typefaces. Use weight extremes (100 vs 900, not 400 vs 600). Dramatic size jumps (3x+). Tight headline letter-spacing (-0.02em to -0.05em).
+**Color:** One dominant + one sharp accent. Never pure #FFFFFF or #000000 backgrounds—add subtle tint. No purple gradients.
+**Motion:** Focus on high-impact moments, not scattered micro-interactions.
+- **Page load**: Orchestrated staggered reveals (vary `animation-delay` by 0.05-0.1s increments)
+- **Scroll**: Use `IntersectionObserver` for scroll-triggered fade-ins (vanilla JS, no frameworks)
+- **Hover**: Transform + opacity + subtle shadow shifts, not just color changes
+- **Transitions**: Custom `cubic-bezier` easings that feel physical (e.g., `cubic-bezier(0.34, 1.56, 0.64, 1)` for bounce)
+- **Advanced**: Gradient animations via `background-position`, `backdrop-filter` transitions, CSS `@property` for animatable custom properties
+- **Restraint**: One dramatic sequence beats many small animations. If everything moves, nothing stands out.
+**Backgrounds:** Never flat solid colors. Layer gradients, add subtle noise/grain, create atmosphere.
+**Layout:** Break at least one standard pattern per design. Try asymmetry, overlap, bento grids, diagonal flow, or unexpected whitespace.
+</frontend_aesthetics>
+### File Requirements
+| Requirement        | Details                                           |
+| ------------------ | ------------------------------------------------- |
+| **Path**           | `docs/design/style_{n}_{concept_name}.html`       |
+| **Content**        | Realistic view matching product purpose           |
+| **Self-contained** | Inline CSS, only Google Fonts external            |
+| **Interactivity**  | Hover, active, focus states + page load animation |
+| **Responsive**     | Basic mobile adaptation                           |
+| **Real content**   | Actual copy from PRD, no lorem ipsum              |
+### HTML Structure
+```html
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>[Product] - [Concept]</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link href="https://fonts.googleapis.com/css2?family=[Font]:[weights]&display=swap" rel="stylesheet" />
+    <style>
+      /* Concept: [name]
+       Spectrum: L[x] C[x] T[x] D[x] E[x] Th[x] Sh[x]
+       Extremes: [list which 2+ are extreme] */
+      :root {
+        --bg: #[hex];
+        --surface: #[hex];
+        --text: #[hex];
+        --text-muted: #[hex];
+        --accent: #[hex];
+      }
+      * {
+        margin: 0;
+        padding: 0;
+        box-sizing: border-box;
+      }
+      body {
+        font-family: "[Font]", sans-serif;
+        background: var(--bg);
+        color: var(--text);
+      }
+      /* Page load: staggered reveal */
+      .reveal {
+        opacity: 0;
+        transform: translateY(20px);
+        animation: fadeUp 0.6s cubic-bezier(0.16, 1, 0.3, 1) forwards;
+      }
+      .reveal:nth-child(1) {
+        animation-delay: 0.1s;
+      }
+      .reveal:nth-child(2) {
+        animation-delay: 0.15s;
+      }
+      .reveal:nth-child(3) {
+        animation-delay: 0.2s;
+      }
+      @keyframes fadeUp {
+        to {
+          opacity: 1;
+          transform: translateY(0);
+        }
+      }
+      /* Scroll-triggered: hidden until in view */
+      .scroll-reveal {
+        opacity: 0;
+        transform: translateY(30px);
+        transition: opacity 0.6s cubic-bezier(0.16, 1, 0.3, 1), transform 0.6s cubic-bezier(0.16, 1, 0.3, 1);
+      }
+      .scroll-reveal.visible {
+        opacity: 1;
+        transform: translateY(0);
+      }
+      /* Hover: physical-feeling bounce */
+      .interactive {
+        transition: transform 0.3s cubic-bezier(0.34, 1.56, 0.64, 1), box-shadow 0.3s ease;
+      }
+      .interactive:hover {
+        transform: translateY(-4px);
+        box-shadow: 0 12px 24px -8px rgba(0, 0, 0, 0.15);
+      }
+    </style>
+  </head>
+  <body>
+    <!-- Semantic HTML with real content -->
+    <script>
+      // Scroll-triggered animations
+      const observer = new IntersectionObserver(
+        (entries) => {
+          entries.forEach((entry) => {
+            if (entry.isIntersecting) {
+              entry.target.classList.add("visible");
+            }
+          });
+        },
+        { threshold: 0.1 }
+      );
+      document.querySelectorAll(".scroll-reveal").forEach((el) => observer.observe(el));
+    </script>
+  </body>
+</html>
+```
+## Phase 5: Verify Quality
+### Per-Design Checklist
+- [ ] Font is distinctive (not Inter/Roboto/Arial/system)
+- [ ] Background has depth (not flat white/black)
+- [ ] Page load animation with staggered delays
+- [ ] Scroll-triggered reveals on below-fold content
+- [ ] Hover states with transform + shadow (not just color)
+- [ ] Custom easing (cubic-bezier), not default `ease` or `linear`
+- [ ] CSS custom properties for colors
+- [ ] Layout breaks at least one standard pattern
+### Cross-Design Contrast
+Each pair of designs must have 5+ obvious visual differences. If not, revise. (Skipped automatically when N=1.)
+## Phase 6: Save & Report
+Create `docs/design/` directory if needed. Save all N HTML files.
+</instructions>
+<output_format>
+```
+## Generated Styles (N = {N})
+| # | Name | Spectrum (L/C/T/D/E/Th/Sh) | Extremes | Palette | Font |
+|---|------|---------------------------|----------|---------|------|
+| {k} | {name} | [x][x][x][x][x][x][x] | {which 2+} | #___, #___, #___ | {font} |
+### Files
+- docs/design/style_{k}_{name}.html
+- ...
+### Rationale
+1. **{name}**: [1 sentence connecting to product requirements]
+2. ...
+```
+</output_format>
+Make bold choices. Each design should be portfolio-worthy—something you'd proudly present.
+<next_step>
+After the user picks a style, suggest:
+→ Run `/devlyn:design-system [style-number]` to extract design tokens from the chosen style into a reusable design system reference.
+</next_step>

package/config/skills/devlyn:ideate/SKILL.md CHANGED Viewed

@@ -82,7 +82,7 @@ The elicitation agent:
 5. Stops when the structural lint passes AND user confirms, or 8 turns elapsed.
 Structural lint (inline check, no script needed):
-- Frontmatter has `id`, `title`, `kind`, `status: planned`.
+- Frontmatter has `id`, `title`, `kind`, `status: planned`, `complexity`.
 - `## Context` non-empty (≥ 1 sentence).
 - `## Requirements` has ≥ 1 `- [ ]` bullet.
 - `## Out of Scope` present (may list "none" if truly nothing).
@@ -91,8 +91,9 @@ Structural lint (inline check, no script needed):
 After lint passes:
 1. Write `<spec-dir>/<id>-<slug>/spec.md` (the spec).
 2. Generate `<spec-dir>/<id>-<slug>/spec.expected.json` from the spec's `## Verification` block + any `forbidden_patterns` / `required_files` / `forbidden_files` / `max_deps_added` the conversation surfaced.
-3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2, fix the carrier and re-run.
-4. Print: `spec ready — /devlyn:resolve --spec <spec-path>`.
+3. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape, supported `complexity` frontmatter, and any present actionable solo-headroom hypothesis; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`. If exit 2, fix the carrier/frontmatter/hypothesis and re-run.
+4. Run `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` to validate sibling `spec.expected.json` against `_shared/expected.schema.json` plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`. If exit 2, fix the JSON/frontmatter/hypothesis and re-run.
+5. Print: `spec ready — /devlyn:resolve --spec <spec-path>`.
 ## PHASE 1Q: QUICK MODE
@@ -103,6 +104,7 @@ Single-turn assume-and-confirm. Prompt body: see `references/elicitation.md` §
 3. User responds with "go" / "fix X" / "no, different".
 4. On "go": write spec + spec.expected.json + lint + announce.
 5. On "fix X": apply correction, re-show, ask again. Maximum 3 correction rounds before escalating to default mode.
+6. Exception: for benchmark, risk-probe, or pair-evidence goals, do not infer a solo-headroom hypothesis. Ask for the actionable hypothesis first; if unavailable, exit with `spec not ready — solo-headroom hypothesis required`. For new unmeasured benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence candidates, also do not infer solo ceiling avoidance; ask for the concrete difference from rejected or solo-saturated controls such as `S2`-`S6`, and exit with `spec not ready — solo ceiling avoidance required` if unavailable.
 ## PHASE 1F: FROM-SPEC MODE
@@ -114,7 +116,8 @@ Prompt body: `references/from-spec-mode.md`.
 4. Apply structural fixes only — do NOT reshape Requirements / Out-of-Scope content. The user's substantive intent is preserved.
 5. Generate `spec.expected.json` if absent (best-effort from `## Verification` block).
 6. Write the normalized spec back to `<spec-dir>/<id>-<slug>/` (preserves original at `<path>` untouched unless user passes `--in-place`).
-7. Lint pass → announce. Lint fail → surface the unfixable issue and exit non-zero.
+7. Run both lint checks: `--check <spec-path>` and `--check-expected <expected-path>`.
+8. Lint pass → announce. Lint fail → surface the unfixable issue and exit non-zero. If the source is a pair-evidence candidate without an actionable solo-headroom hypothesis, the announcement must say `pair-evidence not ready` instead of implying measurement readiness.
 ## PHASE 1P: PROJECT MODE

package/config/skills/devlyn:ideate/references/elicitation.md CHANGED Viewed

@@ -30,7 +30,37 @@ For most coding tasks, the under-specified blanks are:
 3. **Failure shape**: what happens on bad input? Exit code, error message format, fallback behavior (silent vs visible)?
 4. **Scope boundary**: which files are in-scope, which are out-of-scope? "Don't touch the auth module" is a boundary worth surfacing.
 5. **Constraints**: dependency policy (new deps allowed?), silent-catch policy, type-system escape policy, test coverage expectations.
-6. **Verification**: how does the user know it worked? Pick the smallest concrete check.
+6. **Complexity signal**: set spec frontmatter `complexity` to `high` when
+   the spec needs a compound scenario crossing state mutation with ordering,
+   idempotency, auth/error priority, rollback/failure handling, or exact output
+   shape. This is a downstream VERIFY pair-trigger signal, not a vague
+   difficulty label.
+7. **Verification**: how does the user know it worked? Pick the smallest concrete check.
+   If the goal combines state mutation with ordering/priority, idempotency,
+   auth/error priority, or exact output shape, ask for one concrete compound
+   scenario that exercises the interaction end-to-end instead of accepting only
+   isolated happy-path checks.
+8. **Pair-candidate headroom**: when the user is creating a benchmark, risk
+   probe, or pair-evidence candidate, ask for one solo-headroom hypothesis in
+   actionable form: the spec must literally contain `solo-headroom hypothesis`,
+   `solo_claude`, `miss`, and a backticked observable command while naming the
+   visible behavior a capable `solo_claude` baseline should miss; the backticked
+   line itself must contain `miss` and be framed as the command/observable that exposes it. If the
+   answer is only "the task is hard", rework the candidate before spending provider
+   calls. Do not write a benchmark/risk-probe/pair-evidence spec until this
+   hypothesis is actionable; if the user cannot provide it, stop with
+   `spec not ready — solo-headroom hypothesis required` and ask them to return
+   with the visible behavior `solo_claude` is expected to miss.
+9. **Solo ceiling avoidance**: for a new unmeasured benchmark, shadow-fixture,
+   golden-fixture, risk-probe, or pair-evidence candidate, ask how this candidate
+   differs from rejected or solo-saturated controls such as `S2`-`S6`. The note
+   must literally contain `solo ceiling avoidance`, mention `solo_claude`, and
+   name the concrete difference expected to preserve `solo_claude` headroom.
+   Benchmark fixture directories put this in `NOTES.md` as
+   `## Solo ceiling avoidance`; ordinary specs keep it in `## Verification`
+   next to the solo-headroom hypothesis. Do not write or measure the candidate
+   if this answer is missing; stop with
+   `spec not ready — solo ceiling avoidance required`.
 Walk through these in roughly this order. Skip the ones already clear from the user's initial text.
 </missing_decisions_to_surface>
@@ -56,14 +86,18 @@ When you're about to ask the user a question, look at the draft first — if the
 <lint>
 Before declaring the spec ready, verify structurally:
-- Frontmatter has `id`, `title`, `kind`, `status: planned`.
+- Frontmatter has `id`, `title`, `kind`, `status: planned`, `complexity`.
 - All 5 H2 sections present (`## Context`, `## Requirements`, `## Constraints`, `## Out of Scope`, `## Verification`).
 - Requirements ≥ 1 bullet.
 - Verification has either ≥ 1 named command OR the explicit pure-design escape phrase.
 If the lint fails, fix the missing piece (ask one focused question if needed) before announcing.
-After lint passes, also run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the verification carrier shape. If exit 2: read the stderr message, fix the carrier, re-run. Exit 0 = ready.
+After lint passes, run both mechanical checks:
+1. `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` validates the spec's verification carrier shape, supported `complexity` frontmatter, and any present actionable solo-headroom hypothesis; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`.
+2. `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` validates sibling `spec.expected.json` against `_shared/expected.schema.json` plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`.
+If either exits 2: read the stderr message, fix the malformed carrier or JSON, and re-run the failed command. Both commands must exit 0 before ready.
 </lint>
 <output>
@@ -83,9 +117,21 @@ When `--quick` is set:
 1. AI synthesizes spec from the one-line goal — fill every section with the most reasonable inference.
 2. AI presents the spec to the user with an explicit `## Assumptions made` block listing every inferred decision (one bullet each).
 3. User responds with "go" / "fix X to be Y" / "no, different".
-4. On "go": write the spec + spec.expected.json, run lint, announce.
+4. On "go": write the spec + spec.expected.json, run both lint checks, announce.
 5. On "fix X": apply correction, re-present, ask again. Maximum 3 correction rounds before escalating to default mode.
+Exception: quick mode must not infer a solo-headroom hypothesis for benchmark,
+risk-probe, or pair-evidence goals. If the one-line goal lacks the actionable
+`solo-headroom hypothesis` / `solo_claude` / `miss` / backticked-command
+contract, ask exactly one focused follow-up for that hypothesis before showing a
+draft; if the user cannot provide it, exit with
+`spec not ready — solo-headroom hypothesis required`. For a new unmeasured
+benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence
+candidate, quick mode also must not infer the `solo ceiling avoidance` note; ask
+for the concrete difference from rejected or solo-saturated controls such as
+`S2`-`S6`, and exit with `spec not ready — solo ceiling avoidance required` if
+the user cannot provide it.
 Quick mode trades thoroughness for speed. Use it for trivial-medium tasks where the user has a clear-enough goal that one round of inference + correction is sufficient.
 ## Anti-patterns

package/config/skills/devlyn:ideate/references/from-spec-mode.md CHANGED Viewed

@@ -14,7 +14,7 @@ The user already wrote a spec (or has one from elsewhere — a teammate, a previ
 <allowed_changes>
 You may:
-1. Add missing frontmatter fields (id from filename, kind=feature default, status=planned).
+1. Add missing frontmatter fields (id from filename, kind=feature default, status=planned, complexity=medium default; set complexity=high only when preserved Requirements clearly combine state/order/failure/output-shape risks).
 2. Rename non-canonical section headings to canonical (`## Goals` → `## Requirements`, `## Notes` ignored unless they clearly belong in Constraints).
 3. Add a missing `## Out of Scope` section with `- (no explicit non-goals provided by author)`.
 4. Add a missing `## Verification` section if Requirements imply observable runtime checks — best-effort one-command-per-Requirement, then surface to user for review.
@@ -37,14 +37,36 @@ You must NOT:
 3. For each missing/malformed piece: apply the smallest allowed fix.
 4. Write the normalized spec. Default location: `<spec-dir>/<id>-<slug>/spec.md`. With `--in-place` flag: write to `<path>` directly (overwrites the original).
 5. Generate or fix `spec.expected.json` per the rules above. Same dir as the spec.
-6. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate.
-7. If lint still fails after allowed fixes (e.g. Requirements section is empty in the source), surface the issue and exit non-zero — do NOT invent Requirements.
+6. Run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate the spec carrier and supported `complexity` frontmatter; if the spec uses a legacy inline `## Verification` JSON carrier, any solo-headroom hypothesis command must match that carrier's `verification_commands[].cmd`.
+7. Run `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>` to validate sibling `spec.expected.json` plus sibling spec `complexity` frontmatter and any present solo-headroom hypothesis command against `spec.expected.json.verification_commands[].cmd`.
+8. If lint still fails after allowed fixes (e.g. Requirements section is empty in the source), surface the issue and exit non-zero — do NOT invent Requirements.
+9. If the preserved Requirements combine state mutation with ordering/priority,
+   idempotency, auth/error priority, or exact output shape but the Verification
+   section lacks a compound end-to-end scenario, do not rewrite the author's
+   content. Add a final warning that `/devlyn:resolve` may need default-mode
+   ideation or a stronger Verification section before pair-relevant risks are
+   measurable.
+10. If the source is a benchmark, risk probe, or pair-evidence candidate and it
+    lacks an actionable solo-headroom hypothesis, do not invent one. Add a final
+    warning that the candidate may be solo-saturated until Context or
+    Verification literally contains `solo-headroom hypothesis`, `solo_claude`,
+    `miss`, and a backticked observable command while naming the visible
+    behavior a capable `solo_claude` baseline is expected to miss; the
+    backticked line itself must contain `miss` and be framed as the
+    command/observable that exposes it. Do not call the normalized spec pair-evidence ready.
+11. If the source is a new unmeasured benchmark, shadow-fixture, golden-fixture,
+    risk-probe, or pair-evidence candidate and it lacks a solo ceiling avoidance
+    note, do not invent one. Add a final warning that the candidate may replay
+    rejected or solo-saturated controls until Context, Verification, or fixture
+    `NOTES.md` literally contains `solo ceiling avoidance`, mentions
+    `solo_claude`, and names a concrete difference from rejected controls such
+    as `S2`-`S6`. Do not call the normalized spec pair-evidence ready.
 </flow>
 <output>
 Same as default mode: `<spec-dir>/<id>-<slug>/spec.md` + `<spec-dir>/<id>-<slug>/spec.expected.json`.
-Final announcement: `spec normalized — /devlyn:resolve --spec <spec-path>`. If the spec was lint-passing with no changes needed, announce: `spec already canonical — /devlyn:resolve --spec <spec-path>`.
+Final announcement: `spec normalized — /devlyn:resolve --spec <spec-path>`. If the spec was lint-passing with no changes needed, announce: `spec already canonical — /devlyn:resolve --spec <spec-path>`. If step 9 applies, append: `warning: Verification may need one compound end-to-end scenario before pair-relevant risks are measurable`. If step 10 applies, append: `pair-evidence not ready — Pair-candidate headroom is unproven until the spec states a solo-headroom hypothesis`. If step 11 applies, append: `pair-evidence not ready — Pair-candidate headroom is unproven until the spec states solo ceiling avoidance`.
 If lint failed unfixably: print the specific failure, exit non-zero. Do not write a partial output.
 </output>

package/config/skills/devlyn:ideate/references/project-mode.md CHANGED Viewed

@@ -11,6 +11,25 @@ The user wants to build a project, not a single feature. Your job is to elicit t
 2. Ask the same question categories as default mode (input/output/failure/scope/constraints/verification) but at the project level first, then drill into each feature.
 3. Decompose into 3-7 features. Fewer = the project is actually one big feature; recommend default mode. More = the project is too large; recommend splitting into separate ideate runs.
 4. Each feature must be independently shippable: a feature whose verification depends on another feature's runtime behavior is a dependency, not a feature.
+5. When a feature combines state mutation with ordering/priority, idempotency,
+   auth/error priority, or exact output shape, its per-feature Verification must
+   include one compound end-to-end scenario; do not hide the interaction in
+   project-level prose.
+6. When a feature is intended as a benchmark, risk probe, or pair-evidence
+   candidate, its per-feature Verification must include a solo-headroom
+   hypothesis. The feature spec must literally contain
+   `solo-headroom hypothesis`, `solo_claude`, `miss`, and a backticked
+   observable command while naming the visible behavior a capable
+   `solo_claude` baseline is expected to miss; the backticked line itself must
+   contain `miss` and be framed as the command/observable that exposes it. Do not defer that to
+   project-level prose, and rework the feature spec if the hypothesis is only
+   "the task is hard".
+7. When a feature is a new unmeasured benchmark, shadow-fixture, golden-fixture,
+   risk-probe, or pair-evidence candidate, its per-feature Verification must also include a solo ceiling avoidance note. The feature spec must literally
+   contain `solo ceiling avoidance`, mention `solo_claude`, and name a concrete
+   difference from rejected or solo-saturated controls such as `S2`-`S6`. Do not
+   defer that to project-level prose; benchmark fixture directories mirror the
+   same note in `NOTES.md` as `## Solo ceiling avoidance`.
 </conversation_rules>
 <decomposition_rules>
@@ -63,7 +82,7 @@ Anything binding all features (e.g. "no new top-level dependencies", "all CLI ou
 - `<spec-dir>/<id-N>/spec.md` for each feature (per `references/spec-template.md`).
 - `<spec-dir>/<id-N>/spec.expected.json` for each feature (per `_shared/expected.schema.json`).
-Each per-feature spec is structurally lint-validated using `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>`.
+Each per-feature spec is structurally lint-validated using `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>`, including supported `complexity` frontmatter and any present actionable solo-headroom hypothesis, and each sibling expected contract plus sibling spec `complexity` frontmatter and any present actionable solo-headroom hypothesis is validated using `python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>`; if the spec has a solo-headroom hypothesis, its observable command must match `spec.expected.json.verification_commands[].cmd`.
 Final announcement: `project ready — N specs at <spec-dir>/. Start with /devlyn:resolve --spec <first-spec-path>`.
 </output>

package/config/skills/devlyn:ideate/references/spec-template.md CHANGED Viewed

@@ -10,6 +10,7 @@ id: "<spec-id>"          # kebab-case, unique per spec-dir; auto-generated if us
 title: "<short title>"    # one line, descriptive
 kind: feature             # feature | spike | prototype
 status: planned           # planned → in_progress → done. ideate writes "planned"; resolve's CLEANUP flips to "done".
+complexity: medium        # trivial | medium | high. Use high when Verification needs compound state/order/failure checks.
 depends_on: []            # list of spec ids this depends on (empty for standalone). project mode populates this.
 ---
 ```
@@ -78,7 +79,11 @@ If all Requirements are pure-design (no observable runtime check), the body of t
 ## Sibling file: `spec.expected.json`
-Schema: `_shared/expected.schema.json`. Required when Requirements have observable checks; optional when all Requirements are pure-design.
+Schema: `_shared/expected.schema.json`. Required when Requirements have observable checks; optional when all Requirements are pure-design. Validate before announcing ready:
+```bash
+python3 .claude/skills/_shared/spec-verify-check.py --check-expected <expected-path>
+```
 Generated by ideate from the conversation:
 - `verification_commands` ← parsed from `## Verification` body + any commands the conversation surfaced.
@@ -99,4 +104,8 @@ Substantive (ideate's job during elicitation):
 - Each Constraint has a reasoning clause.
 - Out of Scope explicitly enumerates non-goals.
 - Verification commands actually verify the Requirement they map to (no "looks plausible" verification).
+- Frontmatter `complexity` reflects verification shape: `high` when the spec combines state mutation with ordering/priority, idempotency, auth/error priority, rollback/failure handling, or exact output shape; `medium` for ordinary multi-step work; `trivial` only for a single localized behavior.
+- When Requirements combine state mutation, ordering/priority, idempotency, auth/error priority, or exact output shape, Verification includes at least one compound scenario that exercises the interaction end-to-end; isolated happy paths are insufficient.
+- For benchmark, risk-probe, or pair-evidence specs, include a solo-headroom hypothesis inside `## Verification`: the artifact must literally contain `solo-headroom hypothesis`, `solo_claude`, `miss`, and a backticked observable command while naming the visible behavior a capable `solo_claude` baseline is expected to miss; the backticked line itself must contain `miss`, be framed as the command/observable that exposes it, and match a `spec.expected.json.verification_commands[].cmd` entry. For regenerated pair evidence, this hypothesis is the source for VERIFY's canonical `spec.solo_headroom_hypothesis` trigger reason and must pass `benchmark audit --require-hypothesis-trigger`. If the hypothesis is only "the task is hard", rework the spec before measurement.
+- For new unmeasured benchmark, shadow-fixture, golden-fixture, risk-probe, or pair-evidence candidates, include a solo ceiling avoidance note before measurement: the artifact must literally contain `solo ceiling avoidance`, mention `solo_claude`, and name a concrete difference from rejected or solo-saturated controls such as `S2`-`S6`. If the note cannot say why the candidate should preserve `solo_claude` headroom, rework the candidate instead of spending provider calls. Benchmark fixture directories put this in `NOTES.md` as `## Solo ceiling avoidance`; ordinary specs keep the note in `## Verification` next to the solo-headroom hypothesis.
 - Spec text is plain language — no jargon walls, no "for future flexibility" hedging.