@hegemonart/get-design-done 1.48.0 → 1.50.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/.claude-plugin/marketplace.json +2 -2
  2. package/.claude-plugin/plugin.json +8 -2
  3. package/CHANGELOG.md +93 -0
  4. package/README.md +4 -0
  5. package/SKILL.md +2 -1
  6. package/agents/design-auditor.md +37 -4
  7. package/agents/design-context-builder.md +2 -0
  8. package/agents/design-debt-crawler.md +36 -5
  9. package/agents/design-executor.md +2 -0
  10. package/agents/design-fixer.md +4 -1
  11. package/agents/design-planner.md +2 -0
  12. package/agents/design-reflector.md +2 -0
  13. package/agents/design-research-synthesizer.md +2 -0
  14. package/agents/design-verifier.md +7 -15
  15. package/dist/claude-code/.claude/skills/audit/SKILL.md +1 -1
  16. package/dist/claude-code/.claude/skills/brief/SKILL.md +1 -1
  17. package/dist/claude-code/.claude/skills/compare/SKILL.md +1 -1
  18. package/dist/claude-code/.claude/skills/connections/SKILL.md +1 -1
  19. package/dist/claude-code/.claude/skills/darkmode/SKILL.md +1 -1
  20. package/dist/claude-code/.claude/skills/design/SKILL.md +1 -1
  21. package/dist/claude-code/.claude/skills/discover/SKILL.md +1 -1
  22. package/dist/claude-code/.claude/skills/do/SKILL.md +1 -1
  23. package/dist/claude-code/.claude/skills/explore/SKILL.md +1 -1
  24. package/dist/claude-code/.claude/skills/fast/SKILL.md +1 -1
  25. package/dist/claude-code/.claude/skills/health/SKILL.md +2 -2
  26. package/dist/claude-code/.claude/skills/live/SKILL.md +1 -1
  27. package/dist/claude-code/.claude/skills/new-skill/SKILL.md +90 -0
  28. package/dist/claude-code/.claude/skills/plan/SKILL.md +1 -1
  29. package/dist/claude-code/.claude/skills/progress/SKILL.md +9 -1
  30. package/dist/claude-code/.claude/skills/quick/SKILL.md +1 -1
  31. package/dist/claude-code/.claude/skills/scan/SKILL.md +1 -1
  32. package/dist/claude-code/.claude/skills/ship/SKILL.md +1 -1
  33. package/dist/claude-code/.claude/skills/verify/SKILL.md +1 -1
  34. package/hooks/gdd-design-quality-check.js +340 -0
  35. package/hooks/hooks.json +9 -0
  36. package/package.json +12 -2
  37. package/reference/anti-slop-rubric.md +173 -0
  38. package/reference/audit-scoring.md +4 -0
  39. package/reference/debt-categories.md +20 -1
  40. package/reference/registry.json +28 -0
  41. package/reference/reviewer-confidence-gate.md +108 -0
  42. package/reference/skill-authoring-contract.md +97 -15
  43. package/reference/skill-graph.md +118 -0
  44. package/reference/visual-tells.md +383 -0
  45. package/scripts/lib/confidence-route.cjs +60 -0
  46. package/scripts/lib/manifest/scaffolder.cjs +261 -0
  47. package/scripts/lib/manifest/schemas/skills.schema.json +14 -0
  48. package/scripts/lib/manifest/skills.json +26 -18
  49. package/scripts/lib/worktree-resolve.cjs +221 -0
  50. package/sdk/mcp/gdd-state/server.js +37 -4
  51. package/sdk/mcp/gdd-state/tools/shared.ts +61 -0
  52. package/skills/audit/SKILL.md +1 -1
  53. package/skills/brief/SKILL.md +1 -1
  54. package/skills/compare/SKILL.md +1 -1
  55. package/skills/connections/SKILL.md +1 -1
  56. package/skills/darkmode/SKILL.md +1 -1
  57. package/skills/design/SKILL.md +1 -1
  58. package/skills/discover/SKILL.md +1 -1
  59. package/skills/do/SKILL.md +1 -1
  60. package/skills/explore/SKILL.md +1 -1
  61. package/skills/fast/SKILL.md +1 -1
  62. package/skills/health/SKILL.md +2 -2
  63. package/skills/live/SKILL.md +1 -1
  64. package/skills/new-skill/SKILL.md +90 -0
  65. package/skills/plan/SKILL.md +1 -1
  66. package/skills/progress/SKILL.md +9 -1
  67. package/skills/quick/SKILL.md +1 -1
  68. package/skills/scan/SKILL.md +1 -1
  69. package/skills/ship/SKILL.md +1 -1
  70. package/skills/verify/SKILL.md +1 -1
@@ -0,0 +1,173 @@
1
+ ---
2
+ name: anti-slop-rubric
3
+ type: reference
4
+ version: 1.0.0
5
+ phase: 50
6
+ tags: [anti-slop, verb-axes, lens-tag, orthogonal, aesthetic-slop, directness, distinctness, hierarchy, authenticity, density]
7
+ last_updated: 2026-06-03
8
+ ---
9
+
10
+ # Anti-slop Rubric (Verb Axes)
11
+
12
+ The 7-pillar audit in `reference/audit-scoring.md` answers "is the typography wrong,
13
+ is the contrast failing, is the spacing off-grid?". This rubric answers a different
14
+ question: "is the work generically AI-default, even when every pillar passes?". A
15
+ screen can clear all seven pillars and still read as a template a model produced
16
+ without a brief. These five axes name that gap.
17
+
18
+ ## Orthogonal by design
19
+
20
+ This is an ORTHOGONAL lens, not a new pillar. It mirrors the lens-tag pattern that
21
+ `emotion_levels`, `composition_alignment`, and `i18n_readiness` already follow in
22
+ `reference/audit-scoring.md`. Adding these axes does NOT:
23
+
24
+ - add an eighth scored pillar (the reserved Pillar 8 stays unscored),
25
+ - change any pillar weight,
26
+ - change the qualitative /28 total in `agents/design-auditor.md`,
27
+ - change the weighted 0-100 score in `reference/audit-scoring.md`.
28
+
29
+ The axes attach to existing findings as a label. They produce one routing signal (a
30
+ sum threshold) and nothing else touches the scoring math. The `verb_axes` lens-tag is
31
+ registered in `reference/audit-scoring.md` under Lens-Tags (Orthogonal).
32
+
33
+ ## How to score
34
+
35
+ Score each axis 1-10 against its scale, after the pillar pass. Ten is specific,
36
+ chosen, and defensible in three sentences against the brand. One is the move a model
37
+ makes when no brief exists. Read the diagnostic question first, then place the work on
38
+ the scale, then record the number. Each axis below carries three paired before/after
39
+ examples drawn from design-domain content so the boundary between a 3 and an 8 is
40
+ concrete, not a vibe.
41
+
42
+ The five scores are independent of the 1-4 pillar scores and never change them.
43
+
44
+ ---
45
+
46
+ ## Axis 1: Directness
47
+
48
+ **Diagnostic question:** Does the copy and the call to action name the specific
49
+ product, verb, and outcome, or does it fall back to a label that would fit any app?
50
+
51
+ | Score | Criteria |
52
+ |-------|----------|
53
+ | 9-10 | Every primary action names its object and outcome; the headline carries a verb and a specific promise |
54
+ | 7-8 | Most actions are specific; one generic label remains on a reversible secondary action |
55
+ | 4-6 | A mix of specific and generic; the hero leans on a template opener |
56
+ | 1-3 | Bare "Get Started", "Submit", "Welcome to [Product]"; no subject, no product-specific verb |
57
+
58
+ Paired examples:
59
+
60
+ 1. Before: button reads "Get Started". After: "Start a free audit".
61
+ 2. Before: headline "Welcome to the platform". After: "Ship your first design pass in ten minutes".
62
+ 3. Before: empty state "No data". After: "No audits yet. Run your first audit to see findings here."
63
+
64
+ ---
65
+
66
+ ## Axis 2: Distinctness
67
+
68
+ **Diagnostic question:** Would this surface be recognizable as this product, or is it
69
+ the default palette, the default typeface, and the default decoration any model reaches
70
+ for first?
71
+
72
+ | Score | Criteria |
73
+ |-------|----------|
74
+ | 9-10 | Palette and type record a brand decision in tokens; decoration earns its place |
75
+ | 7-8 | A clear identity with one or two default-leaning choices left undocumented |
76
+ | 4-6 | Recognizable in places, generic in others; tokens partly present |
77
+ | 1-3 | Purple-violet accent, Inter alone, gradient and glass standing in for identity |
78
+
79
+ Paired examples:
80
+
81
+ 1. Before: `bg-violet-600` hardcoded on every primary button. After: `bg-primary` routed to a documented brand hue token.
82
+ 2. Before: Inter set on the root with no second face and no token. After: a defended display face paired with a body face, both as `--font-*` tokens.
83
+ 3. Before: three gradients and frosted glass carrying the hero. After: one solid surface with weight and spacing carrying hierarchy.
84
+
85
+ ---
86
+
87
+ ## Axis 3: Hierarchy
88
+
89
+ **Diagnostic question:** Does the eye land on one clear focal point and follow an
90
+ obvious reading order, or does every block compete on the same axis with the same
91
+ weight?
92
+
93
+ | Score | Criteria |
94
+ |-------|----------|
95
+ | 9-10 | One primary action per view; reading order is instant; weight and spacing group meaning |
96
+ | 7-8 | Mostly clear; one or two competing priorities |
97
+ | 4-6 | The primary action must be hunted; several blocks share equal weight |
98
+ | 1-3 | Centered-everything; flat weight throughout; no discernible focal point |
99
+
100
+ Paired examples:
101
+
102
+ 1. Before: hero, feature grid, and testimonial all centered with `mx-auto text-center`. After: the hero line centered, body and lists left-aligned on a shared reading edge.
103
+ 2. Before: three buttons styled as primary on one view. After: one primary action, the rest demoted to secondary or text.
104
+ 3. Before: every heading at `font-weight: 400`, same size as body. After: bold headings, regular body, medium labels in a deliberate weight ladder.
105
+
106
+ ---
107
+
108
+ ## Axis 4: Authenticity
109
+
110
+ **Diagnostic question:** Does the surface show the real product, or does it lean on
111
+ stock scenes, placeholder copy, and badge decoration that signal nothing was actually
112
+ built yet?
113
+
114
+ | Score | Criteria |
115
+ |-------|----------|
116
+ | 9-10 | Real screenshots or purpose-drawn art; copy is shipped, not placeholder; badges carry true status |
117
+ | 7-8 | Mostly real with one stock asset or one stray placeholder |
118
+ | 4-6 | A mix of real and stock; some lorem ipsum survives past mockup |
119
+ | 1-3 | Undraw isometric scenes, lorem ipsum, "New" and "AI-powered" badge spam |
120
+
121
+ Paired examples:
122
+
123
+ 1. Before: `undraw_dashboard.svg` in the empty state. After: a real screenshot of the populated dashboard.
124
+ 2. Before: "Lorem ipsum dolor sit amet" in a shipped card body. After: the actual feature description in product voice.
125
+ 3. Before: a row of "New", "Beta", "AI-powered" badges with no state behind them. After: one badge that reflects a real, current status.
126
+
127
+ ---
128
+
129
+ ## Axis 5: Density
130
+
131
+ **Diagnostic question:** Is the information density chosen for the content and the
132
+ reader, or is everything inflated to fill space with oversized single words and airy
133
+ padding that says nothing?
134
+
135
+ | Score | Criteria |
136
+ |-------|----------|
137
+ | 9-10 | Density fits the content; spacing rides the scale; type sizes serve reading, not decoration |
138
+ | 7-8 | Mostly considered; one oversized display moment that could earn its size |
139
+ | 4-6 | Uneven density; some off-scale padding; a few sizes chosen for drama over meaning |
140
+ | 1-3 | One giant word per section, vast empty padding, no content to justify the scale |
141
+
142
+ Paired examples:
143
+
144
+ 1. Before: a single word at `text-9xl` filling a section with nothing else. After: a sized headline plus the supporting sentence the word was standing in for.
145
+ 2. Before: card padding at arbitrary `p-[37px]` off the scale. After: padding snapped to the 8pt step the rest of the layout uses.
146
+ 3. Before: a feature grid where each tile holds three words and 200px of air. After: tiles sized to their content with rhythm matched across siblings.
147
+
148
+ ---
149
+
150
+ ## Threshold and routing
151
+
152
+ Sum the five axis scores for one finding. The maximum is 50 (five axes times ten).
153
+
154
+ ```
155
+ verb_axes_sum = directness + distinctness + hierarchy + authenticity + density
156
+ ```
157
+
158
+ When `verb_axes_sum < 35` (out of 50), the work reads as generically AI-default even
159
+ if the pillars pass. Route that finding to `agents/design-debt-crawler.md` as a debt
160
+ item with `category: aesthetic-slop` (see `reference/debt-categories.md`). The auditor
161
+ attaches the per-axis scores as the `verb_axes_scored` lens-tag and records which
162
+ visual-tells categories matched (see `reference/visual-tells.md`).
163
+
164
+ A sum at or above 35 is not a pass on the pillars; the pillars carry their own scores.
165
+ The threshold is a routing rule for the aesthetic-slop debt class only. It changes no
166
+ pillar weight and no total.
167
+
168
+ ## What this is not
169
+
170
+ This rubric does not replace the pillar audit, does not gate a write, and does not
171
+ produce a 0-100 number. It is a verb-based lens that sits beside the pillars and emits
172
+ one tag plus one routing decision. Real review still applies the full rubric in
173
+ `reference/audit-scoring.md` and the BAN / SLOP catalog in `reference/anti-patterns.md`.
@@ -248,3 +248,7 @@ Attach to findings under the Visual Hierarchy pillar that relate to compositiona
248
248
  ### `i18n_readiness`
249
249
 
250
250
  Attach to findings under the Accessibility pillar (for WCAG 3.1.1 / 3.1.2 violations) or under the Anti-Pattern Compliance pillar (for hardcoded-string / overflow-at-+40% defects). Emitted by the `agents/design-verifier.md` §i18n probes section (Phase 28-06). See [`./i18n.md`](./i18n.md) §WCAG i18n + §Verifier Integration Spec. Does NOT change pillar weights or scores.
251
+
252
+ ### `verb_axes` (anti-slop)
253
+
254
+ Attach to any finding to record how generically AI-default the work reads, orthogonal to whether the pillar itself fails. Emitted by the `agents/design-auditor.md` §Anti-slop scoring section (Phase 50). It attaches `verb_axes_scored: {directness, distinctness, hierarchy, authenticity, density}` (each 1-10) to the finding, per the rubric in [`./anti-slop-rubric.md`](./anti-slop-rubric.md). When the five scores sum below 35 of 50, the finding also routes to `agents/design-debt-crawler.md` with `category: aesthetic-slop` (see [`./debt-categories.md`](./debt-categories.md)). Does NOT add a pillar and does NOT change pillar weights or scores.
@@ -3,7 +3,7 @@ name: debt-categories
3
3
  type: reference
4
4
  version: 1.0.0
5
5
  phase: 48
6
- tags: [debt, taxonomy, audit, crawler, priority-scoring, retroactive]
6
+ tags: [debt, taxonomy, audit, crawler, priority-scoring, retroactive, aesthetic-slop, anti-slop]
7
7
  last_updated: 2026-06-03
8
8
  ---
9
9
 
@@ -105,6 +105,25 @@ empty-state strings such as "No data" or raw error codes.
105
105
  **Fix shape:** Add the accessible name or label; rewrite generic copy to be specific
106
106
  and actionable. Copy-quality detail lives in `reference/copy-quality.md`.
107
107
 
108
+ ### aesthetic-slop
109
+
110
+ **Definition:** Work that reads as generically AI-default even when the pillar audit
111
+ passes: template copy, the default palette and typeface used without a decision, stock
112
+ scenes and placeholder content, flat competing hierarchy, and density inflated to fill
113
+ space. This is the orthogonal verb-axis lens from `reference/anti-slop-rubric.md`, not a
114
+ pillar failure. A surface can clear contrast, typography, and spacing and still be
115
+ aesthetic-slop because nothing about it is chosen.
116
+ **Detection signal:** The five verb axes (Directness, Distinctness, Hierarchy,
117
+ Authenticity, Density) scored 1-10 each by `agents/design-auditor.md`, with the sum
118
+ `< 35` of 50, corroborated by matches in `reference/visual-tells.md` (for example
119
+ `stock-photo-people`, `badge-spam`, `oversized-single-word`,
120
+ `motion-without-content-intent`, `narrator-from-a-distance-UI`, or the v1 tells). Record
121
+ the per-axis `verb_axes_scored` values and the matched tell categories as evidence.
122
+ **Fix shape:** Address the lowest axes first: write specific copy, route color and type
123
+ through documented tokens, establish one focal point, replace stock and placeholder with
124
+ real content, and size density to the content. This is usually a redesign-leaning effort,
125
+ not a one-line swap, so it scores low on the effort factor below.
126
+
108
127
  ---
109
128
 
110
129
  ## Priority Scoring Model
@@ -1121,6 +1121,34 @@
1121
1121
  "type": "output-contract",
1122
1122
  "phase": 48,
1123
1123
  "description": "Phase 48 brief-quality rubric: 5 anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) the brief-auditor surfaces."
1124
+ },
1125
+ {
1126
+ "name": "visual-tells",
1127
+ "path": "reference/visual-tells.md",
1128
+ "type": "heuristic",
1129
+ "phase": 49,
1130
+ "description": "Phase 49 visual-tells catalog: 8 default-AI-aesthetic categories (default-AI-hero, gradient-spam, isometric-illustration-fallback, centered-everything-syndrome, inter-everything, purple-violet-default, glassmorphism-spam, decorative-motion-without-intent) with diagnostic regex + remediation; backs the gdd-design-quality-check hook."
1131
+ },
1132
+ {
1133
+ "name": "reviewer-confidence-gate",
1134
+ "path": "reference/reviewer-confidence-gate.md",
1135
+ "type": "meta-rules",
1136
+ "phase": 49,
1137
+ "description": "Phase 49 reviewer confidence gate: 4-question Pre-Report Gate + confidence 0.0-1.0 field; HIGH/CRITICAL require >=0.8 + cited proof, <0.5 stays Tentative and never reaches design-fixer."
1138
+ },
1139
+ {
1140
+ "name": "anti-slop-rubric",
1141
+ "path": "reference/anti-slop-rubric.md",
1142
+ "type": "heuristic",
1143
+ "phase": 50,
1144
+ "description": "Phase 50 verb-based anti-slop rubric: 5 orthogonal axes (Directness, Distinctness, Hierarchy, Authenticity, Density), 1-10 each; sum below 35/50 routes a finding to design-debt-crawler as aesthetic-slop. Lens-tag, not a pillar."
1145
+ },
1146
+ {
1147
+ "name": "skill-graph",
1148
+ "path": "reference/skill-graph.md",
1149
+ "type": "meta-rules",
1150
+ "phase": 50,
1151
+ "description": "Phase 50 auto-generated skill composition graph (mermaid): skills grouped by lifecycle stage with composes_with and next_skills edges; regenerated by scripts/generate-skill-graph.cjs and drift-gated in CI."
1124
1152
  }
1125
1153
  ]
1126
1154
  }
@@ -0,0 +1,108 @@
1
+ ---
2
+ name: reviewer-confidence-gate
3
+ type: meta-rules
4
+ version: 1.0.0
5
+ phase: 49
6
+ tags: [review, confidence, audit, verify, gap, routing, anti-slop]
7
+ last_updated: 2026-06-03
8
+ ---
9
+
10
+ # Reviewer Confidence Gate
11
+
12
+ Audit and verify findings can inflate severity without proof. A grep hit gets reported as a BLOCKER; a single line read out of context becomes a MAJOR. This contract adds a confidence discipline so review agents (`design-auditor`, `design-verifier`, `design-debt-crawler`) earn the severity they assign, and so `design-fixer` only auto-applies fixes that are backed by evidence.
13
+
14
+ Every emitting agent runs the Pre-Report Gate before writing a finding, stamps each finding with a `confidence` score, and parks weak findings in a `## Tentative` section that the fixer never reads. The routing helper `scripts/lib/confidence-route.cjs` encodes the same rule in code.
15
+
16
+ ## Pre-Report Gate
17
+
18
+ Before you emit any finding or gap, answer these four questions. If you cannot answer all four with a clear yes, the finding is not ready to ship at its stated severity.
19
+
20
+ - **a. Can I cite `file:line`?** Point at the exact location. A finding with no concrete location is a hunch, not a defect.
21
+ - **b. Can I state the failure mode in one sentence?** Name what breaks for the user or the build. If the sentence needs an "and" plus a "maybe", the finding is two findings or none.
22
+ - **c. Did I read context beyond the modified file?** Confirm the call site, the token definition, or the parent component. A value that looks wrong in isolation is often correct once you read what feeds it.
23
+ - **d. Is the severity defensible?** A BLOCKER blocks shipping. A MAJOR is a real deviation from intent. If you would not defend the label to the author, lower it.
24
+
25
+ ## The `confidence` field
26
+
27
+ Every finding carries a `confidence: 0.0-1.0` field. It records how sure you are that the finding is real and correctly classified, not how bad the issue is. Severity and confidence are independent axes: a cosmetic issue can be high confidence, and a suspected BLOCKER can be low confidence.
28
+
29
+ | Range | Meaning | Where it goes |
30
+ |-------|---------|---------------|
31
+ | `>= 0.8` | Cited `file:line`, one-sentence failure mode, context read. | Reported at full severity; eligible for auto-fix. |
32
+ | `0.5 - 0.8` | Real signal, but evidence is partial or context is incomplete. | Reported, routed to user review, never auto-fixed. |
33
+ | `< 0.5` | A hunch, a guess, or a pattern match you could not confirm. | Moved to `## Tentative`; never reaches `design-fixer`. |
34
+
35
+ ## Routing rule
36
+
37
+ The gate controls what reaches the fixer. The rule is:
38
+
39
+ - A HIGH severity finding (BLOCKER or MAJOR) requires `confidence >= 0.8` **and** a `file:line` citation **and** a one-sentence failure mode. Below `0.8`, a HIGH finding is surfaced for user review instead of auto-fix.
40
+ - A finding with `confidence < 0.5` stays in the `## Tentative` section and never reaches `design-fixer`.
41
+ - A finding with `confidence` in the `0.5 - 0.8` band is surfaced in the report but routed to user review, not auto-fix.
42
+
43
+ `scripts/lib/confidence-route.cjs` exports `route({ severity, confidence, tentative })` and returns `'fix'`, `'user-review'`, or `'drop'`. Agents and the fixer share this single decision so the matrix stays consistent.
44
+
45
+ ### Routing matrix
46
+
47
+ The full decision table the helper encodes:
48
+
49
+ | Severity | `tentative` | confidence | Destination |
50
+ |----------|-------------|------------|-------------|
51
+ | any | `true` | any | `drop` (never reaches fixer) |
52
+ | any | `false` | `< 0.5` | `drop` (stays tentative) |
53
+ | BLOCKER or MAJOR | `false` | `0.5 - 0.8` | `user-review` |
54
+ | BLOCKER or MAJOR | `false` | `>= 0.8` | `fix` |
55
+ | MINOR or COSMETIC | `false` | `0.5 - 0.8` | `user-review` |
56
+ | MINOR or COSMETIC | `false` | `>= 0.8` | `fix` |
57
+
58
+ Read the table as: tentative wins first, then the `0.5` floor, then the severity-specific `0.8` auto-fix gate.
59
+
60
+ ## How to emit a finding
61
+
62
+ After the Pre-Report Gate passes, write the finding with the `confidence` field on its own line inside the existing locked format. For `design-verifier` gaps this sits alongside the other gap fields:
63
+
64
+ ```text
65
+ ### BLOCKER G-01: raw error object rendered on payment failure
66
+ - Phase: 2
67
+ - Description: Checkout.tsx renders the error object directly
68
+ - Expected: a human-readable failure message
69
+ - Actual: users see "[object Object]"
70
+ - Location: src/Checkout.tsx:88
71
+ - Suggested fix: render error.message with a fallback string
72
+ - confidence: 0.85
73
+ ```
74
+
75
+ A finding that scores `< 0.5` is not written in the gap list at all. It goes under a `## Tentative` heading in the same report, in plain prose, so a human can promote it later if context proves it real.
76
+
77
+ ## Paired examples
78
+
79
+ Each pair shows a raw finding (before the gate) and the same finding after the gate corrects it.
80
+
81
+ ### Example 1: severity inflated, no context read
82
+
83
+ **Before:** `BLOCKER: hardcoded color #1a73e8 in Button.tsx breaks theming.`
84
+
85
+ **After:** `MINOR G-04: raw #1a73e8 instead of a semantic token. confidence: 0.9`. Reading context (question c) showed `Button.tsx:42` is the token definition file, so theming is not broken; the issue is a style-coherence nit, not a shipping blocker. High confidence, low severity.
86
+
87
+ ### Example 2: a grep guess that could not be confirmed
88
+
89
+ **Before:** `MAJOR: missing reduced-motion guard, animations will trigger vestibular issues.`
90
+
91
+ **After:** moved to `## Tentative` with `confidence: 0.4`. The grep matched `framer-motion` but question a failed: no single `file:line` proves the guard is absent app-wide, and a root `MotionConfig` may cover it. Parked as tentative; the fixer never sees it.
92
+
93
+ ### Example 3: real defect, evidence complete
94
+
95
+ **Before:** `error states look weak somewhere in the checkout flow.`
96
+
97
+ **After:** `BLOCKER G-01: Checkout.tsx:88 renders the raw error object, so users see "[object Object]" on a failed payment. confidence: 0.85`. All four questions pass: cited location, one-sentence failure mode, call site read, severity defensible. Auto-fix eligible.
98
+
99
+ ### Example 4: partial evidence, honest mid-band score
100
+
101
+ **Before:** `MAJOR: empty state copy is generic across the app.`
102
+
103
+ **After:** `MINOR G-06: Inbox.tsx:30 empty state reads "No data". confidence: 0.65`. One real instance is cited, but question c is only half done: the "across the app" claim was not verified. Scored mid-band, surfaced for user review rather than auto-fixed, and the severity was lowered to match the single confirmed instance.
104
+
105
+ ## Agent integration
106
+
107
+ - `design-auditor`, `design-verifier`, and `design-debt-crawler` run the Pre-Report Gate, stamp each finding with `confidence`, and route sub-0.5 findings to `## Tentative`.
108
+ - `design-fixer` skips every gap in `## Tentative` and skips BLOCKER or MAJOR gaps whose `confidence < 0.8`, routing those to user review instead of auto-fix.
@@ -1,10 +1,10 @@
1
1
  ---
2
2
  name: skill-authoring-contract
3
3
  type: meta-rules
4
- version: 1.0.0
5
- phase: 28.5
6
- tags: [skill, authoring, contract, length-cap, description, frontmatter, progressive-disclosure]
7
- last_updated: 2026-05-18
4
+ version: 3.0.0
5
+ phase: 50
6
+ tags: [skill, authoring, contract, length-cap, description, frontmatter, progressive-disclosure, composition, skill-graph]
7
+ last_updated: 2026-06-03
8
8
  ---
9
9
 
10
10
  Source: mattpocock/skills (MIT) - adapted with permission. See `../NOTICE` for the full attribution block.
@@ -48,27 +48,65 @@ worst-offender and is scheduled for Bucket 1 rework in plan `28.5-04`. `skills/h
48
48
  Two rules:
49
49
 
50
50
  - **Length cap is STRICT.** `description ≤ 1024 chars` - no flag, no override. Under 20 chars
51
- is also blocked as under-specification.
52
- - **Recommended form is LAX by default.** `<what>. Use when <triggers>.` - third person,
53
- first sentence what the skill does, second sentence the trigger conditions. Validator
54
- enforces the form regex only under `--strict-description` or `STRICT_DESCRIPTION=1`. Default
55
- is length-only.
51
+ is also blocked as under-specification. The 1024-char cap is UNCHANGED in v3.
52
+ - **Recommended form is LAX by default.** The validator enforces a form regex only under
53
+ `--strict-description` or `STRICT_DESCRIPTION=1`. Default is length-only.
54
+
55
+ ### v3 form (recommended)
56
+
57
+ ```text
58
+ <what>. Use when <triggers>. Activates for requests involving <kw1>, <kw2>, <kw3>.
59
+ ```
60
+
61
+ Three sentences, third person:
62
+
63
+ 1. **`<what>`** - what the skill does.
64
+ 2. **`Use when <triggers>`** - the trigger conditions.
65
+ 3. **`Activates for requests involving <kw1>, <kw2>, <kw3>`** - a short keyword list. This
66
+ trigger sentence is the v3 addition: naming the activating keywords improves retrieval, so the
67
+ router surfaces the skill on the requests it is meant to handle rather than on near-misses.
68
+
69
+ ### v2 form (still accepted during the transition window)
70
+
71
+ ```text
72
+ <what>. Use when <triggers>.
73
+ ```
74
+
75
+ The v2 form is the two-sentence shape shipped in Phase 28.5 (first sentence what, second sentence
76
+ when). It omits the `Activates for ...` trigger sentence.
77
+
78
+ ### Transition window
79
+
80
+ BOTH the v2 form and the v3 form are accepted for one minor version. Neither is a hard failure
81
+ during the window; the length cap (20-1024) is the only blocking description rule. `gsd-health`
82
+ tracks v3 adoption (the share of descriptions carrying the `Activates for ...` sentence) so the
83
+ rollout is measurable before the v2 form is retired in a later minor.
56
84
 
57
85
  Why lax-by-default (D-02): `obra/superpowers/skills/writing-skills/SKILL.md` documents a
58
- shortcut-effect where an agent reads the description and skips the body - the more
59
- essential the description summary, the more often this happens. Phase 33 ships an A/B
60
- study at `.design/research/description-format-ab.md`; until then the regex stays advisory.
86
+ shortcut-effect where an agent reads the description and skips the body - the more essential the
87
+ description summary, the more often this happens. The form regex therefore stays advisory; only
88
+ length is enforced by default.
61
89
 
62
- Examples (both 201024 chars, both pass the length check):
90
+ Examples (all 20-1024 chars, all pass the length check):
63
91
 
64
92
  ```text
65
- # Strict-mode-compliant
93
+ # v3 form (recommended)
94
+ Renders an OKLCH gamut comparison chart. Use when the user asks to see the visible difference between a target gamut and sRGB. Activates for requests involving gamut, OKLCH, sRGB.
95
+
96
+ # v2 form (accepted during the transition window)
66
97
  Renders an OKLCH gamut comparison chart. Use when the user asks to see the visible difference between a target gamut and sRGB.
67
98
 
68
- # Lax-mode-only acceptable
99
+ # Lax-mode-only acceptable (length passes; form regex would flag under --strict-description)
69
100
  Compares OKLCH gamut coverage against sRGB and prints a visual diff chart.
70
101
  ```
71
102
 
103
+ ### Anti-boilerplate gate
104
+
105
+ `scripts/validate-skill-frontmatter.cjs` is a separate, always-on cohort check: if three or more
106
+ skills share an identical opening sentence OR an identical `Use when` clause, it fails. Collapsed
107
+ boilerplate across many descriptions erases the discriminating signal the router needs, so each
108
+ skill keeps a distinct opening and a distinct trigger clause.
109
+
72
110
  ## Frontmatter
73
111
 
74
112
  Required fields (validator blocks if absent):
@@ -85,6 +123,10 @@ Optional fields (recognized by the Claude Code agent loader):
85
123
  whitelist (pure shortcuts like `help`, `stats`, `note`, `health`, `zoom-out`). The
86
124
  validator blocks if a non-whitelisted skill sets this field to `true`.
87
125
  - `user-invocable: true|false` - whether the slash-command picker exposes the skill.
126
+ - `composes_with: [skill, ...]` - optional (v3). Skill names this skill calls as
127
+ sub-orchestration. See `## Skill composition` below.
128
+ - `next_skills: [skill]` - optional (v3). A pipeline hint listing the skills that naturally
129
+ run after this one. See `## Skill composition` below.
88
130
 
89
131
  Concrete example:
90
132
 
@@ -97,6 +139,35 @@ disable-model-invocation: true
97
139
  ---
98
140
  ```
99
141
 
142
+ ## Skill composition
143
+
144
+ v3 closes the "no skill calls another skill" gap with two optional, machine-parseable frontmatter
145
+ fields. Both are arrays of skill names and both are OPTIONAL; a skill with neither is unchanged.
146
+
147
+ - `composes_with: [skill, ...]` - the skills this one calls as sub-orchestration. Use it when a
148
+ skill spawns or delegates into another skill as part of its own run.
149
+ - `next_skills: [skill]` - a pipeline hint: the skills that naturally run after this one. It does
150
+ not call them; it records the intended flow so tooling can suggest the next step.
151
+
152
+ Each entry becomes a directed edge (this skill points at the referenced skill). The composition
153
+ graph across all skills MUST be a directed acyclic graph: a skill cannot transitively compose back
154
+ into itself, and every referenced name MUST be a real skill. `scripts/validate-composition-graph.cjs`
155
+ reads these fields from `scripts/lib/manifest/skills.json` (either as native array fields or parsed
156
+ from the record's `extra_frontmatter` passthrough lines), then fails on a cycle or a dangling
157
+ reference. `scripts/generate-skill-graph.cjs` reads the same edges and regenerates
158
+ `./skill-graph.md`, a mermaid flowchart of the skills and their composition edges grouped by
159
+ lifecycle stage; CI drift-gates that file with `--check`.
160
+
161
+ ```yaml
162
+ ---
163
+ name: audit
164
+ description: "Runs a design audit and prints a 6-pillar score. Use when the user wants to score the current design. Activates for requests involving audit, score, design review."
165
+ tools: Read, Write, Task, Glob, Bash
166
+ composes_with: [scan]
167
+ next_skills: [reflect]
168
+ ---
169
+ ```
170
+
100
171
  ## Progressive disclosure
101
172
 
102
173
  References-one-level-deep is the rule (D-06):
@@ -157,3 +228,14 @@ node scripts/validate-skill-length.cjs --quiet --json
157
228
  Exit codes: `0` clean, `1` warnings only, `2` blockers present. Flags: `--quiet` suppresses
158
229
  per-skill output, `--strict-description` adds the form regex check, `--json` emits
159
230
  machine-readable output. Env: `STRICT_DESCRIPTION=1` and `SKILLS_DIR=<path>` are honored.
231
+
232
+ v3 adds three SoT-driven scripts that read `scripts/lib/manifest/skills.json`:
233
+
234
+ ```text
235
+ node scripts/validate-skill-frontmatter.cjs # fail on 3+ shared opening/Use-when clauses
236
+ node scripts/validate-composition-graph.cjs # fail on a composition cycle or dangling ref
237
+ node scripts/generate-skill-graph.cjs --check # drift-gate the generated skill-graph.md
238
+ ```
239
+
240
+ Each exits `0` clean, `1` on a failure (drift for the generator under `--check`), `2` on an
241
+ internal error.
@@ -0,0 +1,118 @@
1
+ # Skill Composition Graph
2
+
3
+ > GENERATED FILE. Do not edit by hand. Source: scripts/lib/manifest/skills.json.
4
+ > Regenerate: `node scripts/generate-skill-graph.cjs`; CI drift-gates it with `--check`.
5
+
6
+ This graph visualizes every skill grouped by inferred lifecycle stage, plus the skill
7
+ composition edges declared in v3 frontmatter (see skill-authoring-contract.md). A solid arrow
8
+ is a `composes_with` edge (the source calls the target as sub-orchestration); a dotted arrow is
9
+ a `next_skills` edge (a pipeline hint for what runs next). Stage grouping is best-effort and
10
+ inferred from the skill name; skills with no stage keyword fall under Utility.
11
+
12
+ Skills: 88. Composition edges: 0 composes_with, 0 next_skills.
13
+
14
+ ```mermaid
15
+ flowchart TD
16
+ subgraph intake["Intake"]
17
+ n_brief["brief"]
18
+ n_discover["discover"]
19
+ n_new_cycle["new-cycle"]
20
+ n_new_project["new-project"]
21
+ n_start["start"]
22
+ end
23
+ subgraph explore["Explore"]
24
+ n_benchmark["benchmark"]
25
+ n_compare["compare"]
26
+ n_explore["explore"]
27
+ n_map["map"]
28
+ n_sketch["sketch"]
29
+ n_sketch_wrap_up["sketch-wrap-up"]
30
+ n_spike["spike"]
31
+ n_spike_wrap_up["spike-wrap-up"]
32
+ end
33
+ subgraph decide["Decide"]
34
+ n_discuss["discuss"]
35
+ n_list_assumptions["list-assumptions"]
36
+ n_plan["plan"]
37
+ n_review_decisions["review-decisions"]
38
+ n_unlock_decision["unlock-decision"]
39
+ end
40
+ subgraph build["Build"]
41
+ n_bootstrap_ds["bootstrap-ds"]
42
+ n_darkmode["darkmode"]
43
+ n_design["design"]
44
+ n_do["do"]
45
+ n_export["export"]
46
+ n_figma_write["figma-write"]
47
+ n_migrate["migrate"]
48
+ n_optimize["optimize"]
49
+ end
50
+ subgraph verify["Verify"]
51
+ n_audit["audit"]
52
+ n_complete_cycle["complete-cycle"]
53
+ n_quality_gate["quality-gate"]
54
+ n_review_backlog["review-backlog"]
55
+ n_scan["scan"]
56
+ n_turn_closeout["turn-closeout"]
57
+ n_verify["verify"]
58
+ end
59
+ subgraph operate["Operate"]
60
+ n_live["live"]
61
+ n_report_issue["report-issue"]
62
+ n_roi["roi"]
63
+ n_rollout_status["rollout-status"]
64
+ n_watch_authorities["watch-authorities"]
65
+ end
66
+ subgraph utility["Utility"]
67
+ n_add_backlog["add-backlog"]
68
+ n_analyze_dependencies["analyze-dependencies"]
69
+ n_apply_reflections["apply-reflections"]
70
+ n_bandit_status["bandit-status"]
71
+ n_budget["budget"]
72
+ n_cache_manager["cache-manager"]
73
+ n_check_update["check-update"]
74
+ n_connections["connections"]
75
+ n_continue["continue"]
76
+ n_debug["debug"]
77
+ n_extract_learnings["extract-learnings"]
78
+ n_fast["fast"]
79
+ n_figma_extract["figma-extract"]
80
+ n_graphify["graphify"]
81
+ n_health["health"]
82
+ n_help["help"]
83
+ n_list_pins["list-pins"]
84
+ n_locale["locale"]
85
+ n_new_skill["new-skill"]
86
+ n_next["next"]
87
+ n_note["note"]
88
+ n_openrouter_status["openrouter-status"]
89
+ n_pause["pause"]
90
+ n_peer_cli_add["peer-cli-add"]
91
+ n_peer_cli_customize["peer-cli-customize"]
92
+ n_peers["peers"]
93
+ n_pin["pin"]
94
+ n_plant_seed["plant-seed"]
95
+ n_pr_branch["pr-branch"]
96
+ n_progress["progress"]
97
+ n_quick["quick"]
98
+ n_reapply_patches["reapply-patches"]
99
+ n_recall["recall"]
100
+ n_reflect["reflect"]
101
+ n_resume["resume"]
102
+ n_router["router"]
103
+ n_settings["settings"]
104
+ n_ship["ship"]
105
+ n_skill_manifest["skill-manifest"]
106
+ n_stats["stats"]
107
+ n_style["style"]
108
+ n_synthesize["synthesize"]
109
+ n_timeline["timeline"]
110
+ n_todo["todo"]
111
+ n_undo["undo"]
112
+ n_unpin["unpin"]
113
+ n_update["update"]
114
+ n_using_gdd["using-gdd"]
115
+ n_warm_cache["warm-cache"]
116
+ n_zoom_out["zoom-out"]
117
+ end
118
+ ```