devlyn-cli 1.14.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (148) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +112 -119
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +129 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/_shared/archive_run.py +130 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -481
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/optional-skills/devlyn:reap/SKILL.md +105 -0
  117. package/optional-skills/devlyn:reap/scripts/reap.sh +129 -0
  118. package/optional-skills/devlyn:reap/scripts/scan.sh +116 -0
  119. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  120. package/package.json +16 -2
  121. package/scripts/lint-skills.sh +431 -0
  122. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -602
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -116
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -204
  125. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  126. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  127. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  128. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  129. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  130. package/config/skills/devlyn:clean/SKILL.md +0 -285
  131. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  132. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  133. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  134. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  135. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  136. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  137. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  138. package/config/skills/devlyn:preflight/SKILL.md +0 -370
  139. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  140. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -90
  141. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  142. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  143. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  144. package/config/skills/devlyn:review/SKILL.md +0 -161
  145. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  146. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  147. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  148. package/config/skills/workflow-routing/SKILL.md +0 -73
@@ -1,122 +0,0 @@
1
- # CHALLENGE Rubric (single source of truth)
2
-
3
- ## Contents
4
- - Context — this is a planning rubric
5
- - The 5 axes (NO WORKAROUND, NO GUESSWORK, NO OVERENGINEERING, WORLD-CLASS BEST PRACTICE, OPTIMIZED)
6
- - Hard rule — respect explicit user intent
7
- - Finding format
8
- - Examples (good vs bad findings, plus a detour-sequencing example)
9
-
10
- The 5-axis rubric applied in Phase 3.5 CHALLENGE of `devlyn:ideate`. Both the solo Claude pass and the Codex critic pass (on `--engine auto`) use this file — there is exactly one definition of the rubric, and `SKILL.md` instructs both passes to read it directly from here.
11
-
12
- The rubric exists because plans produced in a single pass, by a single model, in a single conversation almost always fail at least one axis somewhere. The user's historical experience: every time they asked "is this really no-workaround, no-guesswork, no-overengineering, world-class, optimized?", the honest answer was no. This phase makes the answer honestly yes before the user even has to ask.
13
-
14
- ## Context — this is a PLANNING rubric, not a code rubric
15
-
16
- This rubric judges the shape of the roadmap: what items exist, in what order, why. It does NOT judge implementation details, code style, or abstractions in code. "Overengineering" here means overengineering the plan, not overengineering a function. When applying it, keep asking: *is this the most direct, optimized path from the user's stated problem to a working outcome?*
17
-
18
- ## The 5 axes
19
-
20
- ### 1. NO WORKAROUND
21
-
22
- Does the item solve the actual problem directly, or does it route around a missing capability? If the direct path is "build X" and the item is "work around not having X", it fails.
23
-
24
- Canonical failure pattern: the user asks for a feature that papers over a missing foundation. Building the feature adds an item to the plan without solving the real problem, and often makes the real problem harder to fix later.
25
-
26
- ### 2. NO GUESSWORK
27
-
28
- Every requirement must be grounded in something the user explicitly confirmed, or in something verifiable from the problem framing. Silent assumptions, "I think the user probably wants...", and requirements invented to fill gaps all fail.
29
-
30
- Canonical failure pattern: vague user input ("improve the dashboard") leads to a fully-specified plan full of invented detail. Correct handling is to mark every assumed fact as [ASSUMED], ask clarifying questions, and keep the plan minimal until the user fills in the gaps.
31
-
32
- ### 3. NO OVERENGINEERING (planning-stage)
33
-
34
- The plan fails this axis when it contains any of:
35
-
36
- - **Luxury items** — polish, theming, animations, nice-to-haves that do not serve the stated problem. A polish/theming item in Phase 1 of a tool that does not yet solve its core job.
37
- - **Filler items** — items added to pad a phase or make the plan feel complete. If an item has no testable requirement a real user would notice if absent, it is filler.
38
- - **Detour sequencing** — the plan takes the long way around when a direct route exists. Three items building toward X when one item could deliver X. Separate scaffold / store / deploy items when they could be bundled into the actual feature they enable.
39
- - **Roadmap workarounds masquerading as features** — see axis 1. The same failure can fire on axis 1 (paper-over) and axis 3 (padding the roadmap with the workaround).
40
-
41
- The question to ask for every item: *"Is this the most direct, optimized path to the stated goal, or are we decorating / detouring / papering over?"*
42
-
43
- ### 4. WORLD-CLASS BEST PRACTICE
44
-
45
- Would a senior team at a top company structure the roadmap this way for this kind of product today? If a known-good pattern exists for sequencing or decomposing this kind of problem, name it and use it.
46
-
47
- Canonical failure pattern: the plan uses a familiar-but-mediocre decomposition when a better-known-good pattern exists for the specific problem type. Example: using manual export/import for cross-device sync when autosave + cloud draft storage is the standard pattern across mainstream editing tools (Notion, Linear, Gmail, Google Docs).
48
-
49
- ### 5. OPTIMIZED
50
-
51
- Does the sequencing minimize wait time, front-load risk, and ship user-visible value at every phase boundary? Dead phases — phases that are pure setup with no visible win for a real user — are a fail.
52
-
53
- Canonical failure pattern: Phase 1 is entirely infrastructure (scaffold, models, deploy) and the first user-facing win arrives in Phase 2. Better: Phase 1 ships one thin vertical slice that a real user can use, even if it is small.
54
-
55
- ## Hard rule — respect explicit user intent
56
-
57
- The rubric is a tool to prevent drift from quality, not a tool to override the user. If the user has explicitly and clearly stated a preference ("I want X, not Y"), the rubric does not silently replace X with Y. Instead:
58
-
59
- - Run the rubric as normal.
60
- - If an axis flags X, do not rewrite the plan. Record the finding and surface it to the user as an open question: "The rubric flags X on [axis] because [reason]. You explicitly asked for X — confirm you want to proceed, or consider [alternative]."
61
- - The user makes the call. The rubric's job is to make the tradeoff visible, not to make the decision.
62
-
63
- This rule exists because the 5-axis rubric is an opinionated lens, and opinionated lenses are wrong sometimes. The user's stated intent is ground truth when it is explicit. The rubric is ground truth only for things the user did not explicitly decide.
64
-
65
- ## Finding format
66
-
67
- For every item that fails any axis, produce a finding in this exact format:
68
-
69
- ```
70
- Severity: CRITICAL / HIGH / MEDIUM / LOW
71
- Quote: [copy the specific item title or line you are critiquing — one line]
72
- Axis: [which of the five]
73
- Why it fails: [one sentence]
74
- Fix: [one concrete revision — not "reconsider X", say what to do instead]
75
- ```
76
-
77
- For the plan as a whole, give a one-line pass/fail per axis with one-sentence reasoning.
78
-
79
- End with a verdict: `PASS / PASS WITH MINOR FIXES / FAIL — REVISION REQUIRED`.
80
-
81
- The Quote field is load-bearing. It anchors each finding to a specific line in the plan, which prevents the common failure mode of generic unanchored critiques ("too much in Phase 1", "consider refactoring"). Anchored findings are actionable; unanchored findings are noise.
82
-
83
- ## Examples
84
-
85
- <example>
86
- BAD finding (too vague, not actionable):
87
- Severity: HIGH
88
- Axis: NO OVERENGINEERING
89
- Why: Phase 1 has too much.
90
- Fix: Reduce scope.
91
-
92
- GOOD finding (anchored, specific, actionable):
93
- Severity: HIGH
94
- Quote: "1.3 — Theme customization (light/dark/custom accent colors)"
95
- Axis: NO OVERENGINEERING (luxury item)
96
- Why it fails: The product does not yet solve its core job of letting users save a session; theming is a decoration item that does not move the primary problem forward.
97
- Fix: Move 1.3 to backlog. Phase 1 is shorter by one item. Revisit theming only after the core save flow is shipped and used.
98
- </example>
99
-
100
- <example>
101
- BAD finding:
102
- Severity: HIGH
103
- Axis: NO WORKAROUND
104
- Why: Item 2.1 is a workaround.
105
- Fix: Do it properly.
106
-
107
- GOOD finding:
108
- Severity: CRITICAL
109
- Quote: "2.1 — Export/import session as JSON file so users can move work between devices"
110
- Axis: NO WORKAROUND
111
- Why it fails: The real problem is cross-device sync. File export is a roadmap workaround that asks the user to do the sync manually; it adds an item to the plan without solving the stated problem, and makes the real problem harder to fix later.
112
- Fix: Replace 2.1 with "Cloud-backed session storage" as a direct cross-device solution. If cloud storage is out of scope for the current phase, explicitly defer cross-device sync to a later phase rather than shipping a manual workaround as if it were the feature.
113
- </example>
114
-
115
- <example>
116
- Detour sequencing finding:
117
- Severity: MEDIUM
118
- Quote: "Phase 1: 1.1-scaffold, 1.2-data-store, 1.3-log-today, 1.4-streak-display, 1.5-history-view, 1.6-manage-habits, 1.7-deploy"
119
- Axis: NO OVERENGINEERING (detour sequencing)
120
- Why it fails: Scaffold, data store, streak display, and deploy are not features a user would notice as separate items — they are implementation steps of the three actual user capabilities (log a habit, see streak, see history). Splitting them into standalone roadmap items pads the plan without delivering value at each boundary.
121
- Fix: Collapse Phase 1 to 2 items: "1.1 — Log a habit and see streak" (bundles scaffold + store + log + streak), "1.2 — History view". Deploy is part of each item's done criteria, not a standalone item. Result: 7 items → 2 items, same delivered scope.
122
- </example>
@@ -1,90 +0,0 @@
1
- # Item Spec Template (Auto-Resolve-Ready)
2
-
3
- Generate one file per roadmap item at `docs/roadmap/phase-N/{id}-{name}.md`. This is the most critical template — each spec becomes the direct input to `/devlyn:auto-resolve`.
4
-
5
- ---
6
-
7
- ```markdown
8
- ---
9
- id: "[phase.item]"
10
- title: "[Feature Name]"
11
- phase: [N]
12
- status: planned
13
- priority: [high | medium | low]
14
- complexity: [low | medium | high]
15
- depends-on: []
16
- ---
17
-
18
- # [id] [Feature Name]
19
-
20
- ## Context
21
- <!-- 2-3 sentences MAX. Just enough for auto-resolve to understand WHY this exists. -->
22
- <!-- Extract only the relevant context from the vision — don't make the implementation agent read the full vision document. -->
23
- [Project] does [what]. This feature [enables/improves/fixes] [specific user capability].
24
-
25
- ## Customer Frame
26
- <!-- One sentence. When [situation], [user] wants to [motivation] so they can [outcome]. -->
27
- <!-- Use this to resolve ambiguous requirements: prefer the behavior that best serves this user outcome, and do not add capabilities outside this frame. -->
28
-
29
- ## Objective
30
- <!-- One sentence: what the user can do after this is implemented. -->
31
-
32
- ## Requirements
33
- <!-- These become auto-resolve's done-criteria. Quality of these requirements directly determines implementation quality. -->
34
- - [ ] [Specific, testable requirement]
35
- - [ ] [Specific, testable requirement]
36
- - [ ] [Specific, testable requirement]
37
- - [ ] ...
38
-
39
- ## Constraints
40
- <!-- Technical constraints WITH reasoning. Implementation agents respect constraints significantly better when they understand the motivation. -->
41
- - [Constraint] — Why: [reason]
42
- - ...
43
-
44
- ## Out of Scope
45
- <!-- What this item explicitly does NOT include. This prevents auto-resolve from over-building. -->
46
- - [Feature/behavior] ([where/when it will be addressed, e.g., "Phase 2, item 2.3"])
47
- - ...
48
-
49
- ## Architecture Notes
50
- <!-- Technical context that helps implementation. Reference decision records when applicable. -->
51
- <!-- Remove this section if the implementation is straightforward. -->
52
-
53
- ## Dependencies
54
- - **Internal**: [Other roadmap items that must exist first, e.g., "1.1 User Auth"]
55
- - **External**: [APIs, services, credentials, third-party setup needed]
56
-
57
- ## Verification
58
- <!-- How to confirm this works. Overlaps with Requirements but focuses on observable user-facing behavior. -->
59
- - [ ] [Observable verification step]
60
- - [ ] ...
61
- ```
62
-
63
- ## Quality Criteria
64
-
65
- Before writing a spec, verify each requirement against these criteria:
66
-
67
- **Testable**: Can a test assert this, or can a human verify it in under 30 seconds?
68
- - Bad: "The dashboard loads quickly"
69
- - Good: "Dashboard initial render completes within 2 seconds on 3G throttled connection"
70
-
71
- **Specific**: Is there exactly one interpretation of what "done" means?
72
- - Bad: "Handles errors gracefully"
73
- - Good: "Failed API calls display an error banner with the message and a retry button"
74
-
75
- **Scoped**: Does this belong to THIS item only?
76
- - Bad: "The app supports multiple languages" (cross-cutting concern, not a single item)
77
- - Good: "The settings page displays a language selector with EN and KO options"
78
-
79
- **Self-contained**: Can auto-resolve implement this without reading VISION.md or ROADMAP.md?
80
- - If the Context section references principles without explaining them, it's not self-contained
81
- - The spec should carry its own context, not point to other documents
82
-
83
- ## When a Spec Isn't Ready
84
-
85
- If you can't write specific requirements for an item, it needs one of:
86
- 1. **More exploration** — go back to Phase 2 for this item
87
- 2. **Splitting** — the item is too large; break it into smaller, specifiable pieces
88
- 3. **A spike** — mark it as a research task whose output is a proper spec
89
-
90
- Never generate a spec with vague requirements just to fill the roadmap. A backlog item with "needs exploration" is more honest and more useful than a spec with untestable requirements.
@@ -1,466 +0,0 @@
1
- Build or improve UI by assembling a specialized Agent Team. Each teammate brings a different design and engineering perspective — component architecture, interaction design, accessibility, and visual fidelity — to produce production-quality UI that perfectly matches the design system.
2
-
3
- Works for both:
4
- - **New UI**: Build pages/components from scratch using the design system
5
- - **Improve existing UI**: Audit and upgrade current implementation to match the design system
6
-
7
- <context>
8
- $ARGUMENTS
9
- </context>
10
-
11
- <prerequisites>
12
- This command expects a design system at `docs/design-system.md`. If it doesn't exist, tell the user:
13
- ```
14
- No design system found at docs/design-system.md
15
- Run the pipeline first:
16
- 1. /devlyn:design-ui → Generate style explorations
17
- 2. /devlyn:design-system [style-number] → Extract design tokens
18
- 3. /devlyn:implement-ui → Build/improve UI (this command)
19
- ```
20
- </prerequisites>
21
-
22
- <team_workflow>
23
-
24
- ## Phase 1: INTAKE (You are the Build Lead — work solo first)
25
-
26
- Before spawning any teammates, assess the scope:
27
-
28
- 1. **Read `docs/design-system.md`** — understand all tokens, component patterns, interactive states
29
- 2. **Detect the project framework** — read package.json, config files, existing components to identify the stack (React, Vue, Svelte, Next.js, vanilla, Flutter, etc.)
30
- 3. **Assess build vs improve mode**:
31
-
32
- <mode_detection>
33
- **Build mode** (new UI):
34
- - User explicitly asks to build/create pages or components
35
- - No existing components match the design system
36
- - Feature spec exists but no implementation yet
37
-
38
- **Improve mode** (existing UI):
39
- - User asks to improve, upgrade, or fix existing UI
40
- - Existing components exist but don't match the design system
41
- - UI looks outdated, inconsistent, or has accessibility gaps
42
-
43
- **Hybrid mode** (both):
44
- - Some components exist and need improvement
45
- - Some new components need to be built
46
- - Design system has been updated and implementation needs to catch up
47
- </mode_detection>
48
-
49
- 4. **Map the work**:
50
- - In build mode: read feature specs (`docs/features/`) or product spec (`docs/product-spec.md`) to understand WHAT to build
51
- - In improve mode: read existing components, identify gaps between current implementation and design system
52
- - List all pages/components that need work
53
- 5. **Select teammates** using the matrix below
54
-
55
- <scope_classification>
56
- **Always spawn** (every build/improve):
57
- - component-architect
58
- - ux-engineer
59
- - accessibility-engineer
60
-
61
- **When building for web** (React, Vue, Svelte, Next.js, vanilla HTML/CSS):
62
- - Add: responsive-engineer
63
-
64
- **When improving existing UI** (improve or hybrid mode):
65
- - Add: visual-qa (to audit current implementation against design system)
66
- </scope_classification>
67
-
68
- Announce to the user:
69
- ```
70
- [Build/Improve/Hybrid] mode for: [scope summary]
71
- Framework: [detected framework]
72
- Design System: docs/design-system.md
73
- Teammates: [list of roles being spawned and why]
74
- ```
75
-
76
- ## Phase 2: TEAM ASSEMBLY
77
-
78
- Use the Agent Teams infrastructure:
79
-
80
- 1. **TeamCreate** with name `build-{scope-slug}` (e.g., `build-landing-page`, `build-improve-dashboard`)
81
- 2. **Spawn teammates** using the `Task` tool with `team_name` and `name` parameters. Each teammate is a separate Claude instance.
82
- 3. **TaskCreate** tasks for each teammate — include the design system path, framework info, and their specific mandate.
83
- 4. **Assign tasks** using TaskUpdate with `owner` set to the teammate name.
84
-
85
- **IMPORTANT**: Do NOT hardcode a model. All teammates inherit the user's active model automatically.
86
-
87
- ### Teammate Prompts
88
-
89
- When spawning each teammate via the Task tool, use these prompts:
90
-
91
- <component_architect_prompt>
92
- You are the **Component Architect** on an Agent Team building/improving UI.
93
-
94
- **Your perspective**: Frontend architect who turns design systems into component trees
95
- **Your mandate**: Define the component hierarchy, map design tokens to framework primitives, and plan the implementation structure.
96
-
97
- **Your process**:
98
- 1. Read `docs/design-system.md` thoroughly — understand every token and component pattern
99
- 2. Read the project's existing components and framework setup
100
- 3. If **build mode**: Design the component tree from scratch
101
- 4. If **improve mode**: Audit existing components against the design system, identify gaps
102
-
103
- **Your deliverable**: Send a message to the team lead with:
104
-
105
- 1. **Token mapping**: How each design token maps to the framework
106
- - Colors → CSS variables / theme object / tokens file
107
- - Typography → text style utilities or components
108
- - Spacing → spacing scale or utilities
109
- - Shadows, radii, motion → where they live in code
110
-
111
- 2. **Component tree**: For each component pattern in the design system:
112
- - Component name and file path
113
- - Props/API surface
114
- - Which design tokens it uses
115
- - Variants (if any)
116
- - Composition (what it's made of)
117
-
118
- 3. **Shared patterns**:
119
- - Base layout component (container, section wrapper)
120
- - Animation utilities (reveal, hover, scroll-triggered)
121
- - Theme provider / token distribution strategy
122
-
123
- 4. **In improve mode, additionally**:
124
- - Gap analysis: what exists vs what the design system defines
125
- - Files to modify with specific changes needed
126
- - Files to create
127
-
128
- **Tools available**: Read, Grep, Glob, Bash (read-only)
129
-
130
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share your component tree with other teammates so they can provide feedback.
131
- </component_architect_prompt>
132
-
133
- <ux_engineer_prompt>
134
- You are the **UX Engineer** on an Agent Team building/improving UI.
135
-
136
- **Your perspective**: Interaction designer who makes interfaces feel alive and intuitive
137
- **Your mandate**: Define every interaction pattern, state transition, animation, and micro-interaction based on the design system's motion and interactive state specs.
138
-
139
- **Your process**:
140
- 1. Read `docs/design-system.md` — focus on Motion, Interactive States, and Effects sections
141
- 2. Read existing components (if improve mode) to audit current interaction quality
142
- 3. Define interaction specifications for every interactive element
143
-
144
- **Your deliverable**: Send a message to the team lead with:
145
-
146
- 1. **State machine for each interactive component**:
147
- ```
148
- Button: idle → hover → active → focus → disabled
149
- Card: idle → hover (lift + shadow) → active
150
- Modal: closed → entering → open → exiting → closed
151
- ```
152
-
153
- 2. **Animation specs** (derived from design system motion tokens):
154
- - Page load sequence: which elements appear in what order, with what delays
155
- - Scroll-triggered reveals: threshold, animation type, stagger
156
- - Hover/focus transitions: property, duration, easing (exact cubic-bezier from design system)
157
- - Route transitions (if SPA)
158
-
159
- 3. **UI state coverage** — for each component/page:
160
- - Loading state: skeleton, spinner, or progressive
161
- - Empty state: illustration + message + CTA
162
- - Error state: inline error, toast, or error page
163
- - Success state: confirmation feedback
164
-
165
- 4. **Micro-interactions**:
166
- - Form validation feedback timing
167
- - Button click feedback
168
- - Toast/notification enter/exit
169
- - Scroll indicator behavior
170
-
171
- 5. **In improve mode, additionally**:
172
- - Current interaction gaps (missing states, jarring transitions, no loading states)
173
- - Specific files and lines that need interaction improvements
174
-
175
- **Tools available**: Read, Grep, Glob
176
-
177
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Coordinate with the Component Architect on component state management.
178
- </ux_engineer_prompt>
179
-
180
- <accessibility_engineer_prompt>
181
- You are the **Accessibility Engineer** on an Agent Team building/improving UI.
182
-
183
- **Your perspective**: Accessibility specialist ensuring WCAG 2.1 AA compliance
184
- **Your mandate**: Every component must be usable by everyone — keyboard users, screen reader users, users with low vision, motor impairments, and cognitive differences.
185
-
186
- **Your process**:
187
- 1. Read `docs/design-system.md` — check color contrast ratios, font sizes, touch targets
188
- 2. Read existing components (if improve mode) to audit current accessibility
189
- 3. Define accessibility requirements for every component
190
-
191
- **Your deliverable**: Send a message to the team lead with:
192
-
193
- 1. **Color contrast audit** (from design system tokens):
194
- - text on background: ratio (PASS/FAIL AA)
195
- - text-muted on background: ratio (PASS/FAIL AA)
196
- - text on surface: ratio (PASS/FAIL AA)
197
- - accent on background: ratio (PASS/FAIL AA for large text)
198
- - If any FAIL: recommend adjusted color values that pass while staying close to design intent
199
-
200
- 2. **Component accessibility requirements**:
201
- For each component pattern in the design system:
202
- - Semantic HTML element to use
203
- - Required ARIA attributes
204
- - Keyboard interaction pattern (what keys do what)
205
- - Focus management (focus order, focus trapping for modals)
206
- - Screen reader announcements (aria-live regions, status updates)
207
-
208
- 3. **Motion accessibility**:
209
- - `prefers-reduced-motion` handling for every animation
210
- - Which animations are decorative (can be removed) vs functional (should simplify)
211
-
212
- 4. **Touch and pointer**:
213
- - Minimum touch target sizes (44x44px)
214
- - Adequate spacing between interactive elements
215
- - Hover-only interactions that need touch alternatives
216
-
217
- 5. **Content accessibility**:
218
- - Image alt text strategy
219
- - Heading hierarchy requirements
220
- - Link text that makes sense out of context
221
- - Form label associations
222
-
223
- 6. **In improve mode, additionally**:
224
- - Current a11y violations with severity and file:line
225
- - Quick wins vs structural fixes
226
-
227
- **Tools available**: Read, Grep, Glob, Bash (for running any a11y audit tools)
228
-
229
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Flag accessibility concerns that affect the Component Architect's component design.
230
- </accessibility_engineer_prompt>
231
-
232
- <responsive_engineer_prompt>
233
- You are the **Responsive Engineer** on an Agent Team building/improving UI for web.
234
-
235
- **Your perspective**: Responsive design specialist ensuring the UI works beautifully across all screen sizes
236
- **Your mandate**: Define the responsive strategy — breakpoints, fluid typography, layout shifts, and touch adaptation.
237
-
238
- **Your process**:
239
- 1. Read `docs/design-system.md` — understand spacing, typography, and layout patterns
240
- 2. Read existing components (if improve mode) to audit current responsive behavior
241
- 3. Define responsive specifications
242
-
243
- **Your deliverable**: Send a message to the team lead with:
244
-
245
- 1. **Breakpoint strategy**:
246
- - Recommended breakpoints (mobile-first: 640px, 768px, 1024px, 1280px, or project convention)
247
- - Which components change at which breakpoints
248
-
249
- 2. **Layout transformations**:
250
- For each page section / component grid:
251
- - Desktop layout (columns, gaps)
252
- - Tablet layout (columns, gaps, reordering)
253
- - Mobile layout (stacking, gaps, padding reduction)
254
-
255
- 3. **Typography scaling**:
256
- - Font size adjustments per breakpoint (use clamp() where supported)
257
- - Line-height adjustments for mobile readability
258
-
259
- 4. **Spacing adjustments**:
260
- - Section padding per breakpoint
261
- - Card padding per breakpoint
262
- - Gap reductions for mobile
263
-
264
- 5. **Component adaptations**:
265
- - Navigation: desktop → hamburger/drawer
266
- - Cards: grid → single column
267
- - Tables: horizontal scroll or card transformation
268
- - Modals: full-screen on mobile vs centered on desktop
269
-
270
- 6. **Touch targets**:
271
- - Minimum 44x44px for all interactive elements on touch devices
272
- - Adequate spacing between tappable items
273
-
274
- **Tools available**: Read, Grep, Glob
275
-
276
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Coordinate with the Component Architect on responsive component variants.
277
- </responsive_engineer_prompt>
278
-
279
- <visual_qa_prompt>
280
- You are the **Visual QA** on an Agent Team improving existing UI.
281
-
282
- **Your perspective**: Design system compliance auditor who catches every deviation
283
- **Your mandate**: Compare the current implementation against the design system and produce a detailed gap report.
284
-
285
- **Your process**:
286
- 1. Read `docs/design-system.md` — internalize every token value and component pattern
287
- 2. Read ALL existing component/page files
288
- 3. For each file, compare actual values against design system values
289
-
290
- **Your deliverable**: Send a message to the team lead with:
291
-
292
- 1. **Token compliance audit**:
293
- For each design token category (colors, typography, spacing, shadows, radii, motion):
294
- - Which tokens are correctly applied
295
- - Which tokens are wrong (expected vs actual, file:line)
296
- - Which tokens are missing (hardcoded values instead of tokens)
297
-
298
- 2. **Component pattern compliance**:
299
- For each component pattern in the design system:
300
- - Does the implementation match the defined pattern?
301
- - Missing hover/interactive states
302
- - Missing animation/motion
303
- - Wrong component structure
304
-
305
- 3. **Consistency issues**:
306
- - Same component styled differently in different places
307
- - Hardcoded values that should use tokens
308
- - Inconsistent spacing or typography
309
-
310
- 4. **Priority ranking**:
311
- - HIGH: Visible deviations (wrong colors, wrong fonts, missing animations)
312
- - MEDIUM: Subtle deviations (slightly wrong spacing, missing hover states)
313
- - LOW: Minor inconsistencies (token not used but value is correct)
314
-
315
- **Tools available**: Read, Grep, Glob
316
-
317
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share your findings with the Component Architect so they can plan structural fixes.
318
- </visual_qa_prompt>
319
-
320
- ## Phase 3: PARALLEL ANALYSIS
321
-
322
- All teammates work simultaneously. They will:
323
- - Analyze from their unique perspective
324
- - Message each other about cross-cutting concerns
325
- - Send their final specifications/findings to you (Build Lead)
326
-
327
- Wait for all teammates to report back. If a teammate goes idle after sending findings, that's normal — they're done with their analysis.
328
-
329
- ## Phase 4: SYNTHESIS & PLANNING (You, Build Lead)
330
-
331
- After receiving all teammate findings:
332
-
333
- 1. **Read all findings** — component tree, interaction specs, accessibility requirements, responsive strategy, and visual QA gaps (if improve mode)
334
- 2. **Resolve conflicts** — if teammates disagree (e.g., Component Architect's structure conflicts with Accessibility Engineer's semantic requirements), prioritize accessibility
335
- 3. **Create the implementation plan**:
336
-
337
- <implementation_plan>
338
- Organize work into this order:
339
-
340
- **Foundation layer** (do first):
341
- 1. Token/theme setup — CSS variables, theme object, or tokens file from design system values
342
- 2. Base utilities — animation helpers, layout primitives, shared styles
343
-
344
- **Component layer** (do second):
345
- 3. Atomic components — buttons, badges, labels, icons
346
- 4. Composite components — cards, navigation, section headers, forms
347
- 5. Layout components — page wrapper, section containers, grid systems
348
-
349
- **Page layer** (do third):
350
- 6. Page compositions — assemble components into pages
351
- 7. Interaction wiring — state management, transitions, animations
352
- 8. Responsive adjustments — breakpoint-specific overrides
353
-
354
- **Polish layer** (do last):
355
- 9. Accessibility pass — ARIA, keyboard nav, focus management, reduced motion
356
- 10. Animation polish — page load sequences, scroll reveals, hover states
357
- </implementation_plan>
358
-
359
- 4. **Present the plan to the user** — enter plan mode if the scope is large (5+ components or 3+ pages). For smaller scope, proceed directly.
360
-
361
- ## Phase 5: IMPLEMENTATION (You, Build Lead)
362
-
363
- <implementation_standards>
364
- Follow these standards for every component:
365
-
366
- **Design system fidelity**:
367
- - Use design tokens from docs/design-system.md — never hardcode values
368
- - Match component patterns exactly as defined in the design system
369
- - Apply interactive states with exact values from the design system's Interactive States table
370
-
371
- **Accessibility** (non-negotiable):
372
- - Semantic HTML first (nav, main, section, article, button, etc.)
373
- - All ARIA attributes from the Accessibility Engineer's spec
374
- - Keyboard navigation works for all interactive elements
375
- - `prefers-reduced-motion` media query for all animations
376
- - Color contrast meets WCAG 2.1 AA (fix if design system tokens fail)
377
-
378
- **Interaction quality**:
379
- - All UI states implemented: loading, empty, error, success
380
- - Animations use exact easing and duration from design system
381
- - Page load sequence with staggered reveals
382
- - Scroll-triggered animations where specified
383
- - Hover/focus/active states for all interactive elements
384
-
385
- **Responsive** (if web):
386
- - Mobile-first implementation
387
- - Breakpoints from Responsive Engineer's spec
388
- - Touch targets minimum 44x44px on touch devices
389
-
390
- **Code quality**:
391
- - Follow existing codebase patterns and conventions
392
- - Components are composable and reusable
393
- - No inline styles — use the token system
394
- - Server components where possible (Next.js)
395
- - Client components only when interactivity requires it
396
- </implementation_standards>
397
-
398
- Build in the order defined in the implementation plan. After each layer, verify it works before proceeding.
399
-
400
- ## Phase 6: VALIDATION (You, Build Lead)
401
-
402
- After implementation:
403
- 1. Run the test suite (if tests exist)
404
- 2. Verify all design tokens are correctly applied
405
- 3. Verify accessibility requirements are met
406
- 4. Check responsive behavior at key breakpoints
407
-
408
- ## Phase 7: CLEANUP
409
-
410
- After build is complete:
411
- 1. Send `shutdown_request` to all teammates via SendMessage
412
- 2. Wait for shutdown confirmations
413
- 3. Call TeamDelete to clean up the team
414
-
415
- </team_workflow>
416
-
417
- <output_format>
418
- Present the result in this format:
419
-
420
- <team_build_summary>
421
-
422
- ### Build Complete
423
-
424
- **Mode**: [Build / Improve / Hybrid]
425
- **Framework**: [detected framework]
426
- **Design System**: docs/design-system.md
427
-
428
- ### Team Findings
429
- - **Component Architect**: [component tree summary — N components mapped]
430
- - **UX Engineer**: [interaction specs — N states defined, N animations specified]
431
- - **Accessibility Engineer**: [a11y requirements — contrast PASS/FAIL, N requirements defined]
432
- - **Responsive Engineer**: [responsive strategy — N breakpoints, key adaptations] (if spawned)
433
- - **Visual QA**: [N deviations found — N high, N medium, N low] (if spawned)
434
-
435
- ### Implemented
436
- **Foundation**:
437
- - [token/theme file] — [N tokens mapped]
438
-
439
- **Components**:
440
- - [component file:line] — [what it is, key features]
441
- - ...
442
-
443
- **Pages** (if applicable):
444
- - [page file] — [what it contains]
445
-
446
- ### Design System Compliance
447
- - [ ] All color tokens applied (no hardcoded colors)
448
- - [ ] Typography matches design system specs
449
- - [ ] Spacing uses design system values
450
- - [ ] Animations use design system motion tokens
451
- - [ ] Interactive states match design system table
452
- - [ ] Component patterns follow design system definitions
453
-
454
- ### Accessibility
455
- - [ ] Color contrast WCAG 2.1 AA compliant
456
- - [ ] Keyboard navigation works for all interactive elements
457
- - [ ] ARIA attributes applied per spec
458
- - [ ] `prefers-reduced-motion` handled
459
- - [ ] Semantic HTML throughout
460
-
461
- ### Next Steps
462
- - Run `/devlyn:team-review` to validate code quality
463
- - Run `/devlyn:team-resolve [feature]` to add features on top of this UI
464
-
465
- </team_build_summary>
466
- </output_format>