devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (158) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +135 -21
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +175 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -429
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
  117. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  118. package/package.json +12 -2
  119. package/scripts/lint-skills.sh +431 -0
  120. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
  121. package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
  122. package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
  125. package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
  126. package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
  127. package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
  128. package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
  129. package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
  130. package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
  131. package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
  132. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  133. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  134. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  135. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  136. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  137. package/config/skills/devlyn:clean/SKILL.md +0 -285
  138. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  139. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  140. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  141. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  142. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  143. package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
  144. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  145. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  146. package/config/skills/devlyn:preflight/SKILL.md +0 -355
  147. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  148. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
  149. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  150. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  151. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  152. package/config/skills/devlyn:review/SKILL.md +0 -161
  153. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  154. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  155. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  156. package/config/skills/workflow-routing/SKILL.md +0 -73
  157. /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
  158. /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
@@ -1,631 +0,0 @@
1
- ---
2
- name: devlyn:team-resolve
3
- description: Multi-perspective issue resolution using a specialized agent team. Use this for complex bugs spanning multiple modules, feature implementations requiring diverse expertise, or any issue where a single perspective is insufficient. Assembles root-cause analysts, test engineers, security auditors, and other specialists as needed. Use when the user says "fix this bug", "resolve this issue", "team resolve", or describes a problem that needs investigation.
4
- ---
5
-
6
- Resolve the following issue by assembling a specialized Agent Team to investigate, analyze, and fix it. Each teammate brings a different engineering perspective — like a real team tackling a hard problem together.
7
-
8
- <issue>
9
- $ARGUMENTS
10
- </issue>
11
-
12
- <team_workflow>
13
-
14
- <code_quality_standards>
15
- Every line of code produced by this team must be **production-grade**. This is not a prototype — treat every fix and feature as code that ships to real users at scale.
16
-
17
- **Non-negotiable standards**:
18
- - **Root cause fixes only** — never workarounds, never "good enough for now" (see `<no_workarounds>` below)
19
- - **Graceful error handling** — errors are caught, surfaced to the user with actionable context, and logged. No silent swallowing. No raw stack traces in UI. Every failure path has a recovery or clear error state.
20
- - **Robust edge case coverage** — handle nulls, empty states, concurrent access, network failures, partial data, and boundary conditions. If it can happen in production, handle it.
21
- - **Optimized for performance** — no unnecessary re-renders, no N+1 queries, no unbounded loops, no blocking I/O on hot paths. Profile before and after when touching performance-sensitive code.
22
- - **Scalable patterns** — solutions must work at 10x the current load. Avoid patterns that degrade with data size (O(n²) where O(n) is possible, in-memory aggregation of unbounded datasets, missing pagination).
23
- - **Best practice adherence** — follow the language/framework idioms of the codebase. Use established patterns over novel approaches. Leverage the type system. Prefer composition over inheritance. Keep functions focused and testable.
24
- - **Clean interfaces** — clear contracts between modules. No leaky abstractions. Inputs validated at boundaries. Return types are explicit, not `any`.
25
- - **Defensive but not paranoid** — validate external inputs rigorously, trust internal interfaces. Don't add guards for impossible states — instead, make impossible states unrepresentable through types.
26
-
27
- Every teammate should evaluate their findings and recommendations against these standards. The Team Lead enforces them during synthesis and implementation.
28
- </code_quality_standards>
29
-
30
- ## Phase 1: INTAKE (You are the Team Lead — work solo first)
31
-
32
- Before spawning any teammates, do your own investigation:
33
-
34
- <investigate_before_answering>
35
- Never speculate about code you have not opened. If the issue references a specific file, you MUST read the file before forming hypotheses. Make sure to investigate and read relevant files BEFORE classifying the issue or assembling a team. Never make any claims about code before investigating unless you are certain of the correct answer — give grounded and hallucination-free answers. Never use placeholders or guess missing details — use tools to discover them.
36
- </investigate_before_answering>
37
-
38
- 1. Read the issue/task description carefully
39
- 2. Read relevant files and error logs in parallel (use parallel tool calls)
40
- 3. Trace the initial code path from entry point to likely source
41
- 4. Classify the issue type using the matrix below
42
- 5. Decide which teammates to spawn (minimum viable team — don't spawn roles whose perspective won't materially change the outcome)
43
-
44
- <issue_classification>
45
-
46
- Classify the issue and select teammates:
47
-
48
- **Bug Report**:
49
- - Always: root-cause-analyst, test-engineer
50
- - Security-related (auth, user data, API endpoints, file handling, env/config): + security-auditor
51
- - User-facing UI bug (wrong rendering, interaction, visual): + ux-designer
52
- - Product behavior mismatch (wrong UX flow, missing feature logic): + product-analyst
53
- - Spans 3+ modules or touches shared utilities/interfaces: + architecture-reviewer
54
- - Performance regression (slow query, slow render, memory): + performance-engineer
55
-
56
- **Feature Implementation**:
57
- - Always: implementation-planner, test-engineer
58
- - User-facing UI feature: + ux-designer
59
- - Accessibility requirements or WCAG compliance: + accessibility-auditor
60
- - Architectural (new patterns, interfaces, cross-cutting concerns): + architecture-reviewer
61
- - Handles user data, auth, or secrets: + security-auditor
62
- - New API design or external integration: + api-designer
63
-
64
- **UI/UX Task** (design, interaction, layout, visual consistency, aesthetics):
65
- - Always: product-designer, ux-designer, ui-designer
66
- - Accessibility requirements: + accessibility-auditor
67
- - Design system or component pattern alignment: + architecture-reviewer
68
-
69
- **Performance Issue**:
70
- - Always: performance-engineer, root-cause-analyst
71
- - Architectural root cause: + architecture-reviewer
72
- - Needs test coverage to catch regressions: + test-engineer
73
-
74
- **Refactor or Chore**:
75
- - Always: architecture-reviewer, test-engineer
76
- - Spans 3+ modules: + root-cause-analyst
77
- - Touches auth, crypto, or secrets: + security-auditor
78
-
79
- **Security Vulnerability**:
80
- - Always: root-cause-analyst, test-engineer, security-auditor
81
- - User-facing impact: + product-analyst
82
-
83
- </issue_classification>
84
-
85
- Announce to the user:
86
- ```
87
- Team assembling for: [issue summary]
88
- Issue type: [classification]
89
- Teammates: [list of roles being spawned and why each was chosen]
90
- ```
91
-
92
- ## Phase 1.5: DEFINITION OF DONE (Sprint Contract)
93
-
94
- Before any code is written, define what "done" looks like. This prevents self-evaluation bias and gives external evaluators (like `/devlyn:evaluate`) concrete criteria to grade against.
95
-
96
- 1. Based on your Phase 1 investigation, write testable success criteria to `.devlyn/done-criteria.md`:
97
-
98
- ```markdown
99
- # Done Criteria: [issue summary]
100
-
101
- ## Success Criteria
102
- - [ ] [Specific, verifiable criterion — e.g., "User sees error toast when API returns 401, not blank screen"]
103
- - [ ] [Each criterion must be testable: runnable test, observable behavior, or measurable metric]
104
- - [ ] [Include edge cases discovered during investigation]
105
-
106
- ## Out of Scope
107
- - [Explicitly list what this fix does NOT address]
108
-
109
- ## Verification Method
110
- - [How to verify: test command, manual steps, or expected UI behavior]
111
- ```
112
-
113
- 2. Each criterion must be:
114
- - **Verifiable** — a test can assert it, or a human can observe it in under 30 seconds
115
- - **Specific** — "handles errors correctly" is too vague; "returns 400 with `{error: 'missing_field', field: 'email'}` when email is omitted" is specific
116
- - **Scoped** — tied to THIS issue, not aspirational improvements
117
-
118
- 3. This file serves as the contract between the generator (you) and any external evaluator. Do not skip it.
119
-
120
- ## Phase 2: TEAM ASSEMBLY
121
-
122
- Use the Agent Teams infrastructure:
123
-
124
- 1. **TeamCreate** with name `resolve-{short-issue-slug}` (e.g., `resolve-null-user-crash`)
125
- 2. **Spawn teammates** using the `Task` tool with `team_name` and `name` parameters. Each teammate is a separate Claude instance with its own context.
126
- 3. **TaskCreate** investigation tasks for each teammate — include the issue description, the specific file paths you discovered in Phase 1, and their mandate.
127
- 4. **Assign tasks** using TaskUpdate with `owner` set to the teammate name.
128
-
129
- **IMPORTANT**: Do NOT hardcode a model. All teammates inherit the user's active model automatically.
130
-
131
- **IMPORTANT**: When spawning teammates, replace `{team-name}` in each prompt below with the actual team name you chose (e.g., `resolve-null-user-crash`). Include the relevant file paths from your Phase 1 investigation in the spawn prompt.
132
-
133
- ### Engine-Routed Teammate Spawning
134
-
135
- If the caller passed `--engine auto` or `--engine codex` (check the orchestrator's context or the pipeline config), read the auto-resolve skill's `references/engine-routing.md` for per-role routing under "team-resolve roles".
136
-
137
- **For roles routed to Codex**: Instead of spawning a Claude Agent teammate, call `mcp__codex-cli__codex` with:
138
- - `model`: `"gpt-5.4"`
139
- - `reasoningEffort`: `"xhigh"`
140
- - `sandbox`: per routing table (`"read-only"` or `"workspace-write"`)
141
- - `workingDirectory`: project root
142
- - `prompt`: the full teammate prompt below, with issue context and file paths included inline
143
-
144
- Codex roles cannot use TeamCreate/SendMessage — the Team Lead (you) relays their findings to other teammates and collects their output directly from the MCP call response.
145
-
146
- **For roles routed to Claude**: Spawn via Task tool as normal (prompts below).
147
-
148
- **For Dual roles** (e.g., security-auditor): Run BOTH a Claude Agent teammate AND a `mcp__codex-cli__codex` call in parallel with the same prompt. Merge findings per `engine-routing.md` "How to Spawn a Dual Role" section.
149
-
150
- If `--engine auto` or no `--engine` flag: routes each role to the optimal model based on benchmark data (see `engine-routing.md`). If `--engine claude`: all roles use Claude Agent teammates.
151
-
152
- ### Teammate Prompts
153
-
154
- When spawning each teammate via the Task tool (or passing to `mcp__codex-cli__codex` for Codex-routed roles), use these prompts:
155
-
156
- <root_cause_analyst_prompt>
157
- You are the **Root Cause Analyst** on an Agent Team resolving an issue.
158
-
159
- **Your perspective**: Engineering detective
160
- **Your mandate**: Apply the 5 Whys technique. Trace from symptom to fundamental cause. Never accept surface explanations.
161
-
162
- **5 Whys Protocol**:
163
- For this issue, apply the 5 Whys:
164
-
165
- Why 1: Why did [symptom] happen?
166
- -> Because [cause 1]. Evidence: [file:line]
167
-
168
- Why 2: Why did [cause 1] happen?
169
- -> Because [cause 2]. Evidence: [file:line]
170
-
171
- Why 3: Why did [cause 2] happen?
172
- -> Because [cause 3]. Evidence: [file:line]
173
-
174
- Continue until you reach something ACTIONABLE — a code change that prevents the entire chain from occurring.
175
-
176
- Stop criteria:
177
- - You've reached a design decision or architectural choice that caused the issue
178
- - You've found a missing validation, wrong assumption, or incorrect logic
179
- - Further "whys" leave the codebase (external dependency, infrastructure)
180
-
181
- Don't stop at "the code does X" — always ask WHY the code does X.
182
-
183
- **Tools available**: Read, Grep, Glob, Bash (read-only commands like git log, git blame, ls, etc.)
184
-
185
- **Your deliverable**: Send a message to the team lead with:
186
- 1. The complete 5 Whys chain with file:line evidence for each step
187
- 2. The identified root cause (the deepest actionable "why")
188
- 3. Your recommended fix approach (what code change addresses the root cause)
189
- 4. Any disagreements with other teammates' findings (if you receive messages from them)
190
-
191
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Communicate findings that may be relevant to other teammates via SendMessage.
192
- </root_cause_analyst_prompt>
193
-
194
- <implementation_planner_prompt>
195
- You are the **Implementation Planner** on an Agent Team building a feature.
196
-
197
- **Your perspective**: Senior engineer who ships without regressions
198
- **Your mandate**: Design the implementation. Map existing patterns. Identify integration points and sequencing. Surface risks before code is written.
199
-
200
- **Your process**:
201
- 1. Understand the full feature spec from the task description
202
- 2. Explore the codebase to find existing patterns this feature should follow
203
- 3. Identify all files that need to change and why
204
- 4. Sequence the changes: what depends on what?
205
- 5. Flag risks: where could this break existing behavior?
206
- 6. Check for similar features already implemented — reuse over re-invent
207
-
208
- **Your checklist**:
209
- - What existing abstractions can this feature extend vs. what needs to be created new?
210
- - Are there API contracts, types, or interfaces this must conform to?
211
- - What are the 3 most likely ways this could go wrong?
212
- - Is there a simpler design that achieves the same outcome?
213
-
214
- **Tools available**: Read, Grep, Glob, Bash (read-only)
215
-
216
- **Your deliverable**: Send a message to the team lead with:
217
- 1. Ordered implementation task list (each step with target file:line or new file)
218
- 2. Existing patterns to follow (with file references)
219
- 3. Integration points and dependencies between steps
220
- 4. Top risks and how to mitigate them
221
- 5. Simplifications worth considering
222
-
223
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share your plan with the architecture-reviewer (if present) via SendMessage for a second opinion on design decisions.
224
- </implementation_planner_prompt>
225
-
226
- <test_engineer_prompt>
227
- You are the **Test Engineer** on an Agent Team resolving an issue.
228
-
229
- **Your perspective**: QA/QAQC specialist
230
- **Your mandate**: Write failing tests that reproduce the issue. Identify edge cases. Think about what ELSE could break.
231
-
232
- **Your process**:
233
- 1. Understand the issue from the task description
234
- 2. Find existing test files that cover the affected code
235
- 3. Write a failing test that reproduces the exact bug/issue
236
- 4. Identify 3-5 edge cases that should also be tested
237
- 5. Write tests for those edge cases
238
- 6. Run the tests to confirm they fail as expected (proving the issue exists)
239
-
240
- **Tools available**: Read, Grep, Glob, Bash (including running tests)
241
-
242
- **Your deliverable**: Send a message to the team lead with:
243
- 1. The reproduction test (file path and code)
244
- 2. Edge case tests written
245
- 3. Test results showing failures (proving the issue)
246
- 4. Any additional issues discovered while writing tests
247
- 5. Suggested test strategy for validating the fix
248
-
249
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share relevant findings with other teammates via SendMessage.
250
- </test_engineer_prompt>
251
-
252
- <security_auditor_prompt>
253
- You are the **Security Auditor** on an Agent Team resolving an issue.
254
-
255
- **Your perspective**: Security-first thinker
256
- **Your mandate**: Check for security implications of BOTH the bug AND any potential fix. Apply OWASP Top 10 thinking.
257
-
258
- **Your checklist**:
259
- - Does the bug expose sensitive data?
260
- - Could an attacker exploit this bug?
261
- - Does the bug involve auth, session management, or access control?
262
- - Are there injection risks (SQL, XSS, command injection, path traversal)?
263
- - Is input validation missing or insufficient?
264
- - Are credentials, tokens, or secrets at risk?
265
- - Could the fix introduce NEW security issues?
266
-
267
- **Tools available**: Read, Grep, Glob
268
-
269
- **Your deliverable**: Send a message to the team lead with:
270
- 1. Security implications of the current bug (if any)
271
- 2. Security constraints the fix MUST satisfy
272
- 3. Any security issues discovered in surrounding code
273
- 4. Approval or rejection of proposed fix approaches from a security perspective
274
-
275
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Alert other teammates immediately if you find critical security issues via SendMessage.
276
- </security_auditor_prompt>
277
-
278
- <product_designer_prompt>
279
- You are the **Product Designer** on an Agent Team resolving an issue.
280
-
281
- **Your perspective**: Holistic design thinker who owns the design vision
282
- **Your mandate**: Define what the experience *should* be and why. Bridge product goals, user needs, and visual/interaction craft into a coherent design direction. You are the design decision-maker.
283
-
284
- **Your process**:
285
- 1. Understand the product goal — what outcome is this feature/fix serving?
286
- 2. Review existing design patterns in the codebase (components, design tokens, visual language)
287
- 3. Define the design direction: what principles should guide all design decisions here?
288
- 4. Identify where existing patterns should be extended vs. where new patterns are needed
289
- 5. Write specific design requirements that the ux-designer and ui-designer must satisfy
290
- 6. Flag design decisions that could set a precedent (good or bad) for the wider product
291
-
292
- **Your checklist**:
293
- - Does the design direction align with the product's established visual identity?
294
- - Are we extending existing design system tokens or introducing inconsistency?
295
- - Does the design solve the actual user problem, not just look polished?
296
- - Are there component reuse opportunities we're missing?
297
- - What is the "feel" this interaction should communicate (fast/calm/playful/trustworthy)?
298
- - Is the design scalable — will it work for future edge cases?
299
-
300
- **Tools available**: Read, Grep, Glob
301
-
302
- **Your deliverable**: Send a message to the team lead with:
303
- 1. Design direction brief (the guiding principles and "feel" for this work)
304
- 2. Design requirements that ux-designer and ui-designer must satisfy
305
- 3. Existing patterns to extend or reuse (with file:line references)
306
- 4. Design decisions that need user or product owner sign-off
307
- 5. Any design system gaps or inconsistencies this work should address
308
-
309
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share your design direction with ux-designer and ui-designer immediately via SendMessage so they can align their work.
310
- </product_designer_prompt>
311
-
312
- <ui_designer_prompt>
313
- You are the **UI Designer** on an Agent Team resolving an issue.
314
-
315
- **Your perspective**: Visual craftsperson — typography, color, spacing, hierarchy, motion
316
- **Your mandate**: Make it beautiful, polished, and pixel-perfect. Translate UX flows and product direction into specific, implementable visual decisions.
317
-
318
- **Your process**:
319
- 1. Read the product-designer's direction (via team message or task description)
320
- 2. Audit the existing visual language: spacing scale, type scale, color palette, border radius, shadow, motion tokens
321
- 3. Design specific visual solutions for each UI element: exact spacing values, font sizes, colors, states
322
- 4. Check every interactive state: default, hover, focus, active, disabled, loading, error
323
- 5. Verify visual hierarchy — does the eye land in the right place first?
324
- 6. Check consistency: does this component look like it belongs in the same product as everything else?
325
-
326
- **Your checklist**:
327
- - Typography: correct font weight, size, line-height, letter-spacing per the scale?
328
- - Color: using design tokens or raw values? Sufficient contrast?
329
- - Spacing: following the spacing scale (4px/8px grid or whatever the project uses)?
330
- - Elevation: correct shadow/border treatment for this layer?
331
- - Motion: are transitions appropriate (duration, easing, purpose)?
332
- - Iconography: correct icon size, stroke weight, optical alignment?
333
- - Empty states: are they designed, not just blank?
334
- - Dark mode / theming: does this work across themes if the product has them?
335
-
336
- **Tools available**: Read, Grep, Glob
337
-
338
- **Your deliverable**: Send a message to the team lead with:
339
- 1. Visual spec for each UI element (exact token values or pixel values)
340
- 2. State-by-state breakdown (default, hover, focus, active, disabled, error, loading)
341
- 3. Code-level notes: specific CSS/Tailwind/token changes to achieve the design
342
- 4. Visual inconsistencies found in surrounding code that should be fixed together
343
- 5. Any visual decisions that require product-designer sign-off
344
-
345
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Align with ux-designer on interaction states and with accessibility-auditor (if present) on contrast and focus indicators via SendMessage.
346
- </ui_designer_prompt>
347
-
348
- <ux_designer_prompt>
349
- You are the **UX Designer** on an Agent Team resolving an issue.
350
-
351
- **Your perspective**: User experience specialist and interaction designer
352
- **Your mandate**: Ensure the fix or feature delivers a coherent, intuitive user experience. Catch UX regressions before they ship.
353
-
354
- **Your checklist**:
355
- - What is the user-visible impact of this bug or feature?
356
- - Are all UI states handled: loading, error, empty, disabled, success?
357
- - Does the interaction model match user mental models?
358
- - Is the visual hierarchy and information architecture clear?
359
- - Consistency: does this match existing patterns in the codebase?
360
- - Are there micro-interaction gaps (focus states, transitions, feedback)?
361
- - Does the copy/text communicate clearly and consistently?
362
- - Mobile/responsive considerations?
363
-
364
- **Your process**:
365
- 1. Read the affected component and page files
366
- 2. Trace the user flow from entry to completion
367
- 3. Identify missing states and edge cases in the UI
368
- 4. Check for consistency with existing UI patterns
369
- 5. Flag any usability regressions the proposed fix might introduce
370
-
371
- **Tools available**: Read, Grep, Glob
372
-
373
- **Your deliverable**: Send a message to the team lead with:
374
- 1. User flow assessment (current vs. expected)
375
- 2. Missing UI states that must be handled
376
- 3. UX concerns about the proposed fix approach
377
- 4. Specific component/interaction recommendations with file:line references
378
- 5. Copy/text issues if any
379
-
380
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Communicate with product-analyst (if present) to align on behavior intent, and with accessibility-auditor (if present) on interaction requirements via SendMessage.
381
- </ux_designer_prompt>
382
-
383
- <accessibility_auditor_prompt>
384
- You are the **Accessibility Auditor** on an Agent Team resolving an issue.
385
-
386
- **Your perspective**: WCAG 2.1 AA compliance specialist
387
- **Your mandate**: Ensure the fix or feature is usable by everyone, including people using assistive technologies.
388
-
389
- **Your checklist** (WCAG 2.1 AA):
390
- - Semantic HTML: are the right elements used for their semantic meaning?
391
- - ARIA labels and roles: are interactive elements properly labeled?
392
- - Keyboard navigation: can all interactions be performed without a mouse?
393
- - Focus management: is focus handled correctly on dialogs, modals, dynamic content?
394
- - Color contrast: do text and interactive elements meet 4.5:1 ratio?
395
- - Screen reader compatibility: do dynamic updates get announced?
396
- - Error messages: are they associated with their input fields?
397
- - Images and icons: do they have appropriate alt text?
398
- - Motion: is `prefers-reduced-motion` respected?
399
-
400
- **Tools available**: Read, Grep, Glob
401
-
402
- **Your deliverable**: Send a message to the team lead with:
403
- 1. Accessibility issues found, each with: severity (CRITICAL/HIGH/MEDIUM), file:line, WCAG criterion, and recommended fix
404
- 2. "CLEAN" if no issues found
405
- 3. Any patterns in the codebase that need consistent a11y fixes
406
-
407
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Alert the ux-designer (if present) about interaction-level a11y concerns via SendMessage.
408
- </accessibility_auditor_prompt>
409
-
410
- <product_analyst_prompt>
411
- You are the **Product Analyst** on an Agent Team resolving an issue.
412
-
413
- **Your perspective**: Product owner / user advocate
414
- **Your mandate**: Ensure the fix aligns with product intent and user expectations. Validate requirements. Flag scope drift.
415
-
416
- **Your checklist**:
417
- - What is the intended behavior from a product perspective?
418
- - Does the bug represent a product spec gap or an implementation error?
419
- - Could the fix change behavior that other features or users depend on?
420
- - Does the fix need documentation or changelog updates?
421
- - Are there user segments differentially impacted?
422
- - Does the proposed fix scope match the actual user problem?
423
-
424
- **Tools available**: Read, Grep, Glob
425
-
426
- **Your deliverable**: Send a message to the team lead with:
427
- 1. Product intent clarification (what should the correct behavior be and why)
428
- 2. Scope assessment (is the proposed fix too narrow, too broad, or off-target?)
429
- 3. Any UX behavior concerns about proposed fix approaches
430
- 4. Documentation or changelog requirements
431
-
432
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Communicate product intent to ux-designer and architecture-reviewer (if present) via SendMessage.
433
- </product_analyst_prompt>
434
-
435
- <architecture_reviewer_prompt>
436
- You are the **Architecture Reviewer** on an Agent Team resolving an issue.
437
-
438
- **Your perspective**: System architect
439
- **Your mandate**: Ensure the fix respects codebase patterns, won't cause cascading issues, and uses the right abstraction level.
440
-
441
- **Your checklist**:
442
- - Does the fix follow existing codebase patterns and conventions?
443
- - Could the fix break other modules that depend on the changed code?
444
- - Is the abstraction level right (not over-engineered, not a hack)?
445
- - Are interfaces/contracts being respected?
446
- - Will this fix scale or create tech debt?
447
- - Are there similar patterns elsewhere that should be fixed consistently?
448
-
449
- **Tools available**: Read, Grep, Glob
450
-
451
- **Your deliverable**: Send a message to the team lead with:
452
- 1. Codebase pattern analysis (how similar issues are handled elsewhere)
453
- 2. Impact assessment (what else could break)
454
- 3. Architectural constraints the fix must satisfy
455
- 4. Approval or concerns about proposed fix approaches
456
-
457
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Challenge other teammates' findings if they violate architectural patterns via SendMessage.
458
- </architecture_reviewer_prompt>
459
-
460
- <performance_engineer_prompt>
461
- You are the **Performance Engineer** on an Agent Team resolving an issue.
462
-
463
- **Your perspective**: Performance specialist
464
- **Your mandate**: Diagnose performance bottlenecks. Identify root causes in algorithms, data access patterns, rendering, or resource usage. Recommend specific optimizations.
465
-
466
- **Your checklist**:
467
- - Algorithmic complexity: is there an O(n²) or worse pattern where O(n log n) or O(n) is feasible?
468
- - N+1 patterns: database or API calls inside loops?
469
- - Unnecessary re-renders: React memo misuse, unstable references, inline object/function creation?
470
- - Bundle and import size: large dependencies imported where tree-shaking or lazy loading applies?
471
- - Memory leaks: event listeners, subscriptions, timers not cleaned up?
472
- - Synchronous blocking: operations that should be async or deferred?
473
- - Unbounded data: missing pagination, limits, or streaming?
474
- - Cache misses: data fetched repeatedly when it could be memoized?
475
-
476
- **Tools available**: Read, Grep, Glob, Bash (profiling tools, bundle analyzers if available)
477
-
478
- **Your deliverable**: Send a message to the team lead with:
479
- 1. Performance diagnosis: exact bottleneck with file:line evidence
480
- 2. Measured or estimated impact (e.g., "this runs N times per render")
481
- 3. Specific optimization recommendation with code sketch
482
- 4. Risk assessment of the optimization (could it break correctness?)
483
-
484
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Alert architecture-reviewer (if present) if the performance issue stems from a design-level problem via SendMessage.
485
- </performance_engineer_prompt>
486
-
487
- <api_designer_prompt>
488
- You are the **API Designer** on an Agent Team implementing a feature.
489
-
490
- **Your perspective**: API design specialist
491
- **Your mandate**: Design clean, consistent, versioning-safe API contracts. Ensure the API matches existing conventions and doesn't create breaking changes.
492
-
493
- **Your checklist**:
494
- - REST: correct HTTP verbs, status codes, and resource naming?
495
- - GraphQL: correct query/mutation/subscription semantics and schema design?
496
- - Consistency: does this API match the style of existing endpoints in the codebase?
497
- - Versioning: does this break existing clients? Is backwards compatibility preserved?
498
- - Error handling: are errors returned in the consistent error envelope format?
499
- - Authentication: is the right auth mechanism applied?
500
- - Input validation: are request payloads validated at the boundary?
501
- - Documentation: are types and contracts clear enough to generate a client SDK?
502
-
503
- **Tools available**: Read, Grep, Glob
504
-
505
- **Your deliverable**: Send a message to the team lead with:
506
- 1. API contract design (endpoint, request shape, response shape, error codes)
507
- 2. Consistency assessment against existing API patterns (with file:line references)
508
- 3. Breaking change risk assessment
509
- 4. Security considerations for this API surface
510
-
511
- Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Share API design with architecture-reviewer and security-auditor (if present) via SendMessage.
512
- </api_designer_prompt>
513
-
514
- ## Phase 3: PARALLEL INVESTIGATION
515
-
516
- All teammates work simultaneously. They will:
517
- - Investigate from their unique perspective
518
- - Message each other to share findings and challenge assumptions
519
- - Send their final findings to you (Team Lead)
520
-
521
- Wait for all teammates to report back. If a teammate goes idle after sending findings, that's normal — they're done with their investigation.
522
-
523
- ## Phase 4: SYNTHESIS (You, Team Lead)
524
-
525
- After receiving all teammate findings:
526
-
527
- 1. Read all findings carefully
528
- 2. If teammates disagree on root cause → re-examine the contested evidence yourself by reading the specific files and lines they reference
529
- 3. Compile a unified root cause analysis
530
- 4. If the fix is complex (multiple files, architectural change) → call the `EnterPlanMode` tool to enter plan mode and present the implementation plan to the user for approval before writing any code
531
- 5. If the fix is simple and all teammates agree → proceed directly
532
-
533
- Present the synthesis to the user before implementing.
534
-
535
- ## Phase 5: IMPLEMENTATION (You, Team Lead)
536
-
537
- <no_workarounds>
538
- Write a high-quality, general-purpose solution that addresses the actual root cause. Do not implement workarounds.
539
-
540
- Do not create helper scripts or workarounds to accomplish the task more efficiently.
541
- Do not hard-code values or create solutions that only work for specific failing cases.
542
- Instead, implement the actual logic that solves the problem generally.
543
-
544
- Workaround indicators (if you catch yourself doing any of these, STOP):
545
- - Adding `|| defaultValue` to mask null/undefined
546
- - Adding `try/catch` that swallows errors silently
547
- - Using optional chaining (?.) to skip over null when null IS the bug
548
- - Hard-coding a value for the specific failing case
549
- - Adding a "just in case" check that shouldn't be needed
550
- - Suppressing warnings/errors instead of fixing them
551
- - Adding retry logic instead of fixing why it fails
552
-
553
- If the task is unreasonable or infeasible, or if any of the tests are incorrect, inform the user rather than working around them. The solution should be robust, maintainable, and extendable.
554
-
555
- If the true fix requires significant refactoring:
556
- 1. Document why in the root cause analysis
557
- 2. Call the `EnterPlanMode` tool to present the scope to the user and get approval before proceeding
558
- 3. Get approval before proceeding
559
- 4. Never ship a workaround "for now"
560
- </no_workarounds>
561
-
562
- <commit_to_approach>
563
- When deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning. If you're weighing two approaches, pick the one with stronger evidence and see it through. Do not oscillate between strategies — diagnose, decide, execute.
564
- </commit_to_approach>
565
-
566
- Implementation order:
567
- 1. Write a failing test based on the Test Engineer's findings
568
- 2. Implement the fix addressing the true root cause
569
- 3. Incorporate security constraints from the Security Auditor (if present)
570
- 4. Respect architectural patterns flagged by the Architecture Reviewer (if present)
571
- 5. Apply UX requirements from the UX Designer and Accessibility Auditor (if present)
572
- 6. **Update done-criteria.md** — mark each criterion you believe is satisfied. Do NOT self-evaluate quality — that is the evaluator's job. Your role is to implement, not to judge your own work.
573
- 7. Run the failing test — if it still fails, revert and re-analyze (never layer fixes)
574
- 8. Run the full test suite for regressions
575
-
576
- ## Phase 6: CLEANUP
577
-
578
- After implementation is complete:
579
- 1. Send `shutdown_request` to all teammates via SendMessage
580
- 2. Wait for shutdown confirmations
581
- 3. Call TeamDelete to clean up the team
582
-
583
- </team_workflow>
584
-
585
- <output_format>
586
- Present findings in this format:
587
-
588
- <team_resolution>
589
-
590
- ### Team Composition
591
- - **[Role]**: [1-line finding summary]
592
- - (list each spawned teammate and their key contribution)
593
-
594
- ### 5 Whys Analysis (Bug/Performance only)
595
- **Why 1**: [symptom] -> [cause] (file:line)
596
- **Why 2**: [cause] -> [deeper cause] (file:line)
597
- **Why 3**: [deeper cause] -> [even deeper cause] (file:line)
598
- ...
599
- **Root Cause**: [fundamental issue] (file:line)
600
-
601
- ### Root Cause / Implementation Plan
602
- **Symptom / Goal**: [what was observed or what must be built]
603
- **Code Path / Integration Points**: [entry -> ... -> issue location with file:line references]
604
- **Fundamental Cause / Chosen Approach**: [the real reason, or the design decision made]
605
- **Why it matters**: [impact if unfixed, or value unlocked]
606
-
607
- ### Fix Applied
608
- - [file:line] — [what changed and why]
609
-
610
- ### Tests
611
- - [test file] — [what it validates]
612
- - Edge cases covered: [list]
613
-
614
- ### Verification
615
- - [ ] Failing test now passes
616
- - [ ] No regressions in full test suite
617
- - [ ] Root cause addressed (no workarounds — see `<no_workarounds>` criteria)
618
- - [ ] Error handling is graceful with user-facing messages
619
- - [ ] Edge cases covered (nulls, empty states, boundaries, concurrent access)
620
- - [ ] Performance verified (no O(n²), no N+1, no unbounded operations)
621
- - [ ] Code follows existing codebase patterns and idioms
622
- - [ ] Types are explicit, interfaces are clean, no `any` leaks
623
- - [ ] UX/accessibility concerns addressed (if applicable)
624
- - [ ] Manual verification (if applicable)
625
-
626
- ### Recommendation
627
- - Run `/devlyn:evaluate` to grade this work against the done criteria with an independent evaluator team
628
- - Or run `/devlyn:auto-resolve` next time for the fully automated pipeline (build → evaluate → fix loop → simplify → review → clean → docs)
629
-
630
- </team_resolution>
631
- </output_format>