@jaggerxtrm/specialists 3.12.0 → 3.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/config/hooks/specialists-session-start.mjs +1 -1
  2. package/config/mandatory-rules/bead-id-verbatim.md +14 -0
  3. package/config/mandatory-rules/per-turn-handoff-schema.md +16 -0
  4. package/config/skills/specialists-creator/SKILL.md +16 -0
  5. package/config/skills/update-specialists/SKILL.md +183 -350
  6. package/config/skills/using-kpi/SKILL.md +86 -0
  7. package/config/skills/using-specialists-v2/SKILL.md +1 -1
  8. package/config/skills/using-specialists-v3/SKILL.md +390 -112
  9. package/config/specialists/changelog-keeper.specialist.json +2 -1
  10. package/config/specialists/code-sanity.specialist.json +3 -1
  11. package/config/specialists/debugger.specialist.json +3 -1
  12. package/config/specialists/executor.specialist.json +3 -1
  13. package/config/specialists/explorer.specialist.json +2 -1
  14. package/config/specialists/overthinker.specialist.json +2 -1
  15. package/config/specialists/planner.specialist.json +3 -1
  16. package/config/specialists/researcher.specialist.json +2 -1
  17. package/config/specialists/reviewer.specialist.json +3 -1
  18. package/config/specialists/security-auditor.specialist.json +53 -10
  19. package/config/specialists/specialists-creator.specialist.json +2 -2
  20. package/config/specialists/sync-docs.specialist.json +3 -1
  21. package/config/specialists/test-runner.specialist.json +2 -1
  22. package/dist/index.js +247 -355
  23. package/dist/lib.js +38 -19
  24. package/dist/types/cli/help.d.ts.map +1 -1
  25. package/dist/types/cli/run.d.ts.map +1 -1
  26. package/dist/types/cli/version-check.d.ts +3 -0
  27. package/dist/types/cli/version-check.d.ts.map +1 -1
  28. package/dist/types/index.d.ts +1 -1
  29. package/dist/types/specialist/mandatory-rules.d.ts +5 -0
  30. package/dist/types/specialist/mandatory-rules.d.ts.map +1 -1
  31. package/package.json +4 -4
  32. package/config/specialists/.serena/project.yml +0 -151
@@ -5,39 +5,39 @@ description: >
5
5
  that should be delegated, tracked, reviewed, fixed, tested, or merged through
6
6
  specialists: code review, debugging, implementation, planning, doc sync,
7
7
  security checks, multi-step chains, and questions about specialist workflow.
8
- version: 3.1
8
+ version: 3.2
9
9
  ---
10
10
 
11
11
  # Using Specialists v3
12
12
 
13
- You are the orchestrator. Your job is to turn user intent into a clear bead contract, choose the right specialist from the live registry, launch the chain, monitor it, consume results, drive fixes, and publish through the specialist merge path.
13
+ You are the orchestrator. Turn user intent into a strong bead contract, choose right specialist from live registry, launch chain, monitor it, consume results, drive fixes, and publish through specialist merge path.
14
14
 
15
- Keep this skill practical. It should contain the core behavior needed to orchestrate well; use live commands for volatile details instead of embedding a static catalog.
15
+ Keep skill practical. Core behavior belongs here; volatile detail stays in live commands.
16
16
 
17
17
  ## When To Delegate
18
18
 
19
19
  Use specialists for substantial work: codebase exploration, debugging, implementation, review, test execution, planning, documentation sync, security/config audit, release publication, and multi-chain epics.
20
20
 
21
- Do small deterministic edits directly when the scope is already obvious and delegation would add ceremony. Do not self-investigate or self-implement a substantial task just because you can read files faster; the audit trail and specialist review are part of the workflow.
21
+ Do small deterministic edits directly when scope is already obvious and delegation would add ceremony. Do not self-investigate or self-implement a substantial task just because you can read files faster; audit trail and specialist review are part of workflow.
22
22
 
23
23
  ## Non-Negotiable Rules
24
24
 
25
- 1. `--bead` is the prompt for tracked work.
26
- 2. Do not dispatch until the bead is a usable task contract.
27
- 3. Never use `--prompt` to supplement tracked work. Update the bead instead.
28
- 4. Choose by task shape, not by habit. Check `specialists list --full` when roles may have changed.
25
+ 1. `--bead` is prompt for tracked work.
26
+ 2. Do not dispatch until bead is usable task contract.
27
+ 3. Never use `--prompt` to supplement tracked work. Update bead instead.
28
+ 4. Choose by task shape, not habit. Check `specialists list --full` when roles may have changed.
29
29
  5. Explorer/debugger answer uncertainty before executor writes code.
30
30
  6. Executor starts only when scope, constraints, and validation are clear.
31
- 7. Reviewer uses its own bead and the executor workspace via `--job <exec-job>`.
31
+ 7. Reviewer uses its own bead and executor workspace via `--job <exec-job>`.
32
32
  8. Keep executor/debugger jobs alive through review so they can be resumed.
33
33
  9. Merge specialist-owned work with `sp merge` or `sp epic merge`, not manual `git merge`.
34
34
  10. Specialists must not perform destructive or irreversible operations.
35
- 11. Treat tests as evidence: classify failures as in-scope, pre-existing, or infrastructure before starting a fix loop.
36
- 12. Drive routine stages autonomously once the task is clear. Escalate only for human judgment, destructive actions, repeated crashes, or reviewer `FAIL`.
35
+ 11. Treat tests as evidence: classify failures as in-scope, pre-existing, or infrastructure before starting fix loop.
36
+ 12. Drive routine stages autonomously once task is clear. Escalate only for human judgment, destructive actions, repeated crashes, or reviewer `FAIL`.
37
37
 
38
38
  ## Live Registry And Help
39
39
 
40
- Use the live registry for role details, permissions, current models, and skills:
40
+ Use live registry for role details, permissions, current models, and skills:
41
41
 
42
42
  ```bash
43
43
  specialists list --full
@@ -58,93 +58,304 @@ sp epic --help
58
58
 
59
59
  Do not rely on stale remembered flags when help is available.
60
60
 
61
- ## Role Selection
62
-
63
- Common routing:
64
-
65
- | Need | Specialist |
66
- | --- | --- |
67
- | Unknown architecture, call flow, dependencies, implementation options | `explorer` |
68
- | Symptom, stack trace, regression, flaky/failing test, root cause | `debugger` |
69
- | Broad feature decomposition, bead board, dependencies, sequencing | `planner` |
70
- | Risky design choice, tradeoff, premortem, critique | `overthinker` |
71
- | Clear implementation or scoped doc edit | `executor` |
72
- | Cheap implementation-quality smell pass before final review | `code-sanity` |
73
- | Security/config/dependency audit with recommendations only | `security-auditor` |
74
- | Final compliance verdict on executor/debugger diff | `reviewer` |
75
- | Run checks and interpret failures without fixing | `test-runner` |
76
- | Exactly one doc needs drift-aware sync | `sync-docs` |
77
- | Current external docs/API/ecosystem research | `researcher` |
78
- | Create or fix specialist config/schema | `specialists-creator` |
79
- | Release changelog/package/dist/tag publication | `changelog-keeper` through the `releasing` skill |
61
+ ## Writing Bead Contracts Well
62
+
63
+ Bead quality controls specialist quality. A title-only bead produces wandering output because specialist has no contract to optimize against. Write contract before dispatch. Tighten vague scope before launch.
64
+
65
+ Bad bead:
66
+
67
+ ```text
68
+ TITLE: Fix bug
69
+ PROBLEM: Something is broken.
70
+ SUCCESS: It works.
71
+ SCOPE: src/
72
+ NON_GOALS: N/A
73
+ CONSTRAINTS: Be careful.
74
+ VALIDATION: Tests pass.
75
+ OUTPUT: Done.
76
+ ```
77
+
78
+ Good bead:
79
+
80
+ ```text
81
+ TITLE: Fix feed cursor regression in sp result
82
+ PROBLEM: specialists feed follow skips events after restart because cursor tracks count, not last seq.
83
+ SUCCESS: feed follow resumes from last seen seq; result still reads terminal output.
84
+ SCOPE: src/cli/feed.ts, src/cli/result.ts, tests/unit/cli/feed.test.ts
85
+ NON_GOALS: No new runtime format, no DB schema change, no unrelated poll changes.
86
+ CONSTRAINTS: Preserve existing job IDs, keep backwards-compatible CLI output, avoid file-based fallback drift.
87
+ VALIDATION: Add regression test for restart resume; run targeted CLI tests.
88
+ OUTPUT: Changed files, test evidence, residual risks.
89
+ ```
90
+
91
+ Fix three bad smells fast:
92
+
93
+ - Title-only bead. Add problem, scope, validation, output.
94
+ - Vague SCOPE like `src/`. Name files, symbols, or bounded docs.
95
+ - Missing VALIDATION. Say what proves done, not just that work is “finished.”
96
+
97
+ What differs: orchestrator writes contract before dispatch, so specialist does less guessing and more useful work.
98
+
99
+ ## Dependency Linking
100
+
101
+ Link beads with correct edge shape. The edge tells orchestrator what blocks what, what is only related, and what should auto-nest.
102
+
103
+ - `bd dep add <issue> <depends-on>`: issue depends on depends-on; depends-on blocks issue. Use this for hard sequencing. [source: bd dep --help]
104
+ - `bd dep <blocker> --blocks <blocked>`: reverse phrasing of same edge; blocker-first reads better when thinking in blockers. [source: bd dep --help; CLAUDE.md lines 62-64]
105
+ - `bd dep relate <a> <b>`: non-blocking `relates_to` link. Use for context, not order. [source: bd dep --help; CLAUDE.md lines 64, 200-204]
106
+ - `bd create --parent <epic-id>`: epic-child edge; auto-names child `.1`, `.2`, … and adds parent edge. Use for chain members that must live under epic. [source: CLAUDE.md lines 49-50, 154-156; bd create --help]
107
+ - `bd create --deps discovered-from:<id>`: follow-up work discovered from source bead. Use when one bead reveals new tracked work. [source: CLAUDE.md lines 50, 62-65; bd create --help]
108
+
109
+ Use each form for a different reason:
110
+
111
+ - `add` / `--blocks` for must-happen-before dependency.
112
+ - `relate` for soft linkage with no schedule effect.
113
+ - `--parent` for epic ownership and child naming.
114
+ - `discovered-from:` for spawned follow-up beads.
115
+
116
+ What differs: orchestrator chooses edge type deliberately, so graph stays correct for chain execution, epic publish, and follow-up traceability.
117
+
118
+ ## Bead Contract By Bead Type
119
+
120
+ Use shape that fits specialist.
121
+
122
+ Task/epic bead:
123
+
124
+ ```text
125
+ PROBLEM: User-facing or project-facing objective.
126
+ SUCCESS: End-state across all child beads.
127
+ SCOPE: Area of project affected.
128
+ REFERENCES: Optional files, skills, or docs specialist reads only if work needs them.
129
+ NON_GOALS: Boundaries for entire effort.
130
+ CONSTRAINTS: Sequencing, compatibility, branch/merge rules.
131
+ VALIDATION: Final checks before close.
132
+ OUTPUT: What orchestrator reports back.
133
+ ```
134
+
135
+ `SCOPE` is always loaded as context. `REFERENCES` is progressive disclosure: name what exists, but do not force load unless task needs it. Use this when a file would bloat payload today, like citing a huge skill file in scope and dragging in all lines before specialist even knows it must read them.
136
+
137
+ Example:
138
+
139
+ ```text
140
+ SCOPE: config/skills/using-specialists-v3/SKILL.md, docs/specialists/handoff-schema.md
141
+ REFERENCES: config/skills/prompt-improving/SKILL.md (xml_core conventions), sibling beads per-turn-handoff-schema and bead-id-verbatim once landed
142
+ ```
143
+
144
+ Explorer bead:
145
+
146
+ ```text
147
+ PROBLEM: What is unknown.
148
+ SUCCESS: Questions answered with evidence.
149
+ SCOPE: Code areas, docs, commands, or symbols to inspect.
150
+ NON_GOALS: No implementation, no broad audit outside scope.
151
+ CONSTRAINTS: READ_ONLY, cite files/symbols/flows.
152
+ VALIDATION: Findings cite evidence.
153
+ OUTPUT: Findings, risks, recommended implementation track, stop condition.
154
+ ```
155
+
156
+ Debugger bead:
157
+
158
+ ```text
159
+ PROBLEM: Symptom, regression, or failing test.
160
+ SUCCESS: Root cause plus minimal fix path.
161
+ SCOPE: Logs, reproduction, code paths, and related tests.
162
+ NON_GOALS: No broad refactor.
163
+ CONSTRAINTS: Preserve behavior outside fault line.
164
+ VALIDATION: Repro steps and diagnosis.
165
+ OUTPUT: Root cause, fix options, confidence, remaining unknowns.
166
+ ```
167
+
168
+ Executor bead:
169
+
170
+ ```text
171
+ PROBLEM: Exact behavior or artifact to change.
172
+ SUCCESS: Observable acceptance criteria.
173
+ SCOPE: Target files/symbols; include do-not-touch boundaries.
174
+ NON_GOALS: Related improvements explicitly excluded.
175
+ CONSTRAINTS: API compatibility, style, migrations, safety.
176
+ VALIDATION: Lint/typecheck/tests or manual checks.
177
+ OUTPUT: Changed files, verification, residual risks.
178
+ ```
179
+
180
+ Reviewer bead:
181
+
182
+ ```text
183
+ PROBLEM: Verify executor output against requirements.
184
+ SUCCESS: PASS only if requirements and validation are satisfied.
185
+ SCOPE: Executor job, diff, task bead, acceptance criteria.
186
+ NON_GOALS: Do not rewrite unless explicitly asked.
187
+ CONSTRAINTS: Code-review mindset; findings first.
188
+ VALIDATION: Run or inspect required checks where feasible.
189
+ OUTPUT: PASS/PARTIAL/FAIL with file/line findings.
190
+ ```
191
+
192
+ Test bead:
193
+
194
+ ```text
195
+ PROBLEM: Validate one or more implementation chains.
196
+ SUCCESS: Relevant tests/checks pass or failures are diagnosed.
197
+ SCOPE: Commands and implementation beads covered.
198
+ NON_GOALS: No broad unrelated suite expansion unless requested.
199
+ CONSTRAINTS: Avoid destructive cleanup; report flaky/infra failures separately.
200
+ VALIDATION: Command output and failure interpretation.
201
+ OUTPUT: Pass/fail summary, failing tests, likely owner.
202
+ ```
203
+
204
+ Sync-docs bead:
205
+
206
+ ```text
207
+ PROBLEM: Exactly one doc drifted from source truth.
208
+ SUCCESS: One doc updated and drift checked clean.
209
+ SCOPE: One doc only.
210
+ NON_GOALS: No source-code rewrite.
211
+ CONSTRAINTS: Keep doc and source aligned.
212
+ VALIDATION: Drift scan or bounded source cross-check.
213
+ OUTPUT: Updated doc, drift evidence, remaining doc gaps.
214
+ ```
215
+
216
+ What differs: orchestrator gives each specialist a contract shape that matches job, so role stays narrow and reviewable.
217
+
218
+ For evidence-heavy or multi-item beads, let `SCOPE`, `CONSTRAINTS`, and `EXAMPLES` carry opt-in XML tags. Follow prompt-improving `xml_core` style: wrap only the subpart that needs structure, not whole bead. Example: a debugger bead can put stack trace lines in `<evidence>` and do-not-touch items in `<constraints>`, so specialist can scan facts fast without turning every field into markup.
219
+
220
+ ## Choosing The Specialist
221
+
222
+ Run `specialists list` if you need live registry. Choose by task, not habit.
223
+
224
+ | Need | Specialist | Use when |
225
+ | --- | --- | --- |
226
+ | Architecture/code mapping | `explorer` | Need evidence and scoped implementation track |
227
+ | Root-cause analysis | `debugger` | Symptom, stack trace, failing test, or regression |
228
+ | Planning/decomposition | `planner` | Need beads, dependencies, file scopes, sequencing |
229
+ | Design/tradeoffs | `overthinker` | Approach is risky, ambiguous, or needs critique |
230
+ | Implementation | `executor` | Contract is clear enough to write code or docs |
231
+ | Compliance/code review | `reviewer` | Executor/debugger produced changes that need final PASS/PARTIAL/FAIL |
232
+ | Implementation sanity | `code-sanity` | Diff smells overcomplicated, brittle, or type-risky |
233
+ | Security/dependency audit | `security-auditor` | Need threat modeling, secure-code review, or agent/config security scan |
234
+ | Multiple review perspectives | `parallel-review` | Critical diff needs independent review passes |
235
+ | Test execution | `test-runner` | Need suites run and failures interpreted |
236
+ | Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization |
237
+ | External/live research | `researcher` | Current non-security library/docs/media lookup is needed |
238
+ | Specialist config | `specialists-creator` | Creating or changing specialist JSON/config |
239
+ | Release publication | `changelog-keeper` | New tag is being cut |
80
240
 
81
241
  Selection rules:
82
242
 
83
- - Use `explorer` when you need evidence before deciding what to change.
84
- - Use `debugger` instead of explorer when there is a failure symptom.
85
- - Use `executor` only after the task can name target files/symbols or a bounded discovery result.
86
- - Use `reviewer` as the merge gate; code-sanity and security-auditor are advisory.
87
- - Use `test-runner` for running/classifying tests; it does not implement fixes.
88
- - Use `specialists-creator` before changing specialist definitions.
243
+ - Explorer is READ_ONLY and should answer specific questions.
244
+ - Debugger beats explorer for failures because it traces causes and remediation.
245
+ - Planner shapes epic/task graph before executor starts.
246
+ - Overthinker defends risky design before code locks in. It is CoT specialist by design, so thinking-heavy turns and `<thinking>` tags fit there.
247
+ - Reviewer already uses structured evidence/gap matrices, which is CoT in disguise; keep that structure, do not add freeform `<thinking>` blocks.
248
+ - Executor, debugger, changelog-keeper, sync-docs, and test-runner should not carry mandatory `<thinking>` blocks. That bloats output without payoff and hides the real contract.
249
+ - Executor does not own full test validation; use reviewer/test-runner for that phase.
250
+ - Sync-docs is for audit/sync; executor is for heavy doc rewrites.
251
+ - Researcher is for current external info, not repo archaeology.
252
+ - Specialists-creator should precede specialist config/schema edits.
89
253
 
90
- ## Bead Contract
254
+ ## Code-sanity
91
255
 
92
- Every specialist-bound bead must be a usable prompt. Title-only beads are not acceptable.
256
+ Use code-sanity when diff smells overcomplicated, brittle, or type-risky, but not yet broken enough for debugger. Use it before final review when you want cheap simplification check without blocking merge.
93
257
 
94
- Required structure:
258
+ Bead shape:
95
259
 
96
260
  ```text
97
- PROBLEM: What is wrong or needed.
98
- SUCCESS: Observable completion criteria.
99
- SCOPE: Files, symbols, commands, docs, or discovery area.
100
- NON_GOALS: Explicitly out of scope.
101
- CONSTRAINTS: Safety, compatibility, style, permissions, sequencing.
102
- VALIDATION: Checks/tests/review expected before closure.
103
- OUTPUT: Expected handoff format.
261
+ PROBLEM: Diff has complexity, duplication, or type-safety smell that could hide bugs.
262
+ SUCCESS: Findings isolate concrete smell or confirm clean shape.
263
+ SCOPE: Executor diff, risky files, and any nearby helpers.
264
+ NON_GOALS: No edits, no broad refactor, no merge gate decision.
265
+ CONSTRAINTS: READ_ONLY, keep feedback cheap, cite exact lines or symbols.
266
+ VALIDATION: Findings name concrete improvement or say OK.
267
+ OUTPUT: FINDINGS with severity, or OK with caveats.
104
268
  ```
105
269
 
106
- If the existing issue is vague, update it before dispatch:
270
+ Use `sp resume <exec-job> "Code-sanity findings: ..."` or `sp resume <exec-job> "Code-sanity OK; continue to reviewer."` to hand findings back.
107
271
 
108
- ```bash
109
- bd update <id> --notes "CONTRACT: ..."
272
+ OK is not reviewer PASS. It is advisory only.
273
+
274
+ What differs: orchestrator uses code-sanity as cheap smell screen, not as merge gate.
275
+
276
+ ## Security-auditor
277
+
278
+ Use security-auditor when diff touches auth, secrets, input handling, dependency logic, or agent/config surfaces. Keep it advisory and scan-only.
279
+
280
+ Bead shape:
281
+
282
+ ```text
283
+ PROBLEM: Diff may open auth, secrets, input, dependency, or agent-config risk.
284
+ SUCCESS: Findings isolate real security concern or confirm no obvious issue.
285
+ SCOPE: Executor diff, touched configs, and security-relevant paths.
286
+ NON_GOALS: No edits, no package updates, no destructive scans, no live exploit tests.
287
+ CONSTRAINTS: LOW permissions, scan-only, recommendations only.
288
+ VALIDATION: Findings cite risk surface and why it matters.
289
+ OUTPUT: Recommendations for executor to apply in a separate bead.
290
+ ```
291
+
292
+ Use `sp resume <exec-job> "Security findings: ..."` or `sp resume <exec-job> "Security scan clean; continue to reviewer."`.
293
+
294
+ No findings is not reviewer PASS. Executor still applies fixes if any, then reviewer decides publish.
295
+
296
+ What differs: orchestrator uses security-auditor to surface risk early, not to bless merge.
297
+
298
+ ## Dependency Graph Shapes
299
+
300
+ Draw graph before dispatch.
301
+
302
+ Simple chain:
303
+
304
+ ```text
305
+ task -> explore -> impl -> review
110
306
  ```
111
307
 
112
- Contract tuning by role:
308
+ Fix loop:
309
+
310
+ ```text
311
+ debug -> exec -> code-sanity? -> security-auditor? -> reviewer
312
+ ^ |
313
+ |------ resume PARTIAL --------------|
314
+ ```
113
315
 
114
- - Explorer: ask specific questions; require citations to files/symbols/flows; forbid implementation.
115
- - Debugger: include symptom, reproduction, expected/actual behavior, logs/tests; ask for root cause and minimal fix path.
116
- - Executor: name target files/symbols and do-not-touch boundaries; require verification evidence.
117
- - Reviewer: reference the executor job, diff, acceptance criteria, constraints, and required verdict format.
118
- - Test-runner: name exact commands/suites and expected classification of failures.
119
- - Sync-docs: exactly one doc in scope.
316
+ Epic:
317
+
318
+ ```text
319
+ epic
320
+ ├─ prep/planner
321
+ ├─ impl-a
322
+ ├─ impl-b
323
+ ├─ test-batch
324
+ └─ merge/review chain(s)
325
+ ```
326
+
327
+ What differs: orchestrator sees edge shape up front, so can pick sequential chain, fix loop, or multi-chain epic without graph drift.
120
328
 
121
329
  ## Canonical Single-Chain Flow
122
330
 
123
- Use this for one implementation branch.
331
+ Use for one implementation branch.
124
332
 
125
333
  ```bash
126
334
  # 1. Create or claim root task bead with complete contract
127
- bd create --title "..." --type task --priority 2 --description "PROBLEM: ..."
335
+ bd create --title "Fix token refresh retry" --type task --priority 2 --description "PROBLEM: login and refresh flow have a retry bug when transient token refresh fails before backoff clears stale state. SUCCESS: token refresh retries once, login survives transient failure, and terminal failure stays clear. SCOPE: src/auth/refresh.ts, src/cli/login.ts, tests/unit/auth/refresh.test.ts. NON_GOALS: no auth provider redesign, no storage migration, no UI changes. CONSTRAINTS: preserve token format, keep error text backward-compatible, avoid broad retry changes outside auth flow. VALIDATION: add regression test for fail-then-succeed path and run targeted auth tests. OUTPUT: changed files, test proof, residual risks."
128
336
  bd update <task> --claim
129
337
 
130
338
  # 2. Optional discovery when path is unknown
131
- bd create --title "Explore ..." --type task --priority 2 --description "PROBLEM: ... OUTPUT: evidence-backed plan."
339
+ bd create --title "Explore auth refresh path" --type task --priority 2 --description "PROBLEM: token refresh retry path is undocumented and likely drifts on failure handling. SUCCESS: evidence-backed plan names exact files, symbols, and risk. SCOPE: src/auth/refresh.ts, src/cli/login.ts, tests/unit/auth/*.test.ts. NON_GOALS: no implementation, no broad audit. CONSTRAINTS: READ_ONLY, cite files/symbols/flows, stay within live repo evidence. VALIDATION: findings cite code path and recommended sequence. OUTPUT: tracked discovery plan with stop condition."
132
340
  bd dep add <explore> <task>
133
341
  specialists run explorer --bead <explore> --context-depth 3
134
342
  specialists result <explore-job>
135
343
 
136
344
  # 3. Implementation
137
- bd create --title "Implement ..." --type task --priority 2 --description "PROBLEM: ... VALIDATION: ..."
345
+ bd create --title "Implement token refresh retry" --type task --priority 2 --description "PROBLEM: login fails after transient token refresh error because retry path returns before backoff and clear error state. SUCCESS: retry waits once, preserves session on success, and surfaces final failure clearly. SCOPE: src/auth/refresh.ts, src/cli/login.ts, tests/unit/auth/refresh.test.ts. NON_GOALS: no auth redesign, no storage migration, no UI refresh. CONSTRAINTS: preserve existing token format, keep backward-compatible error text, avoid broad retry changes elsewhere. VALIDATION: add regression test for transient failure then success; run targeted auth tests. OUTPUT: changed files, test evidence, residual risks."
138
346
  bd dep add <impl> <explore-or-task>
139
347
  specialists run executor --bead <impl> --context-depth 3
140
348
  specialists result <exec-job>
141
349
 
142
- # 4. Optional advisory passes
350
+ # 4. Advisory passes when diff smells risky
351
+ bd create --title "Sanity check token retry diff" --type task --priority 2 --description "PROBLEM: auth retry diff has control-flow and state-handling smell that could hide bug. SUCCESS: findings identify concrete simplification or confirm clean shape. SCOPE: executor diff in auth refresh and login flow. NON_GOALS: no edits, no merge gate decision. CONSTRAINTS: READ_ONLY, keep feedback cheap, cite exact lines or symbols. VALIDATION: findings name concrete improvement or say OK. OUTPUT: FINDINGS with severity or OK with caveats."
143
352
  specialists run code-sanity --bead <sanity-bead> --job <exec-job> --context-depth 3
353
+
354
+ bd create --title "Security scan token retry diff" --type task --priority 2 --description "PROBLEM: auth refresh code touches secrets and session handling, so security regression is possible. SUCCESS: findings isolate real risk surface or confirm no obvious issue. SCOPE: executor diff in auth, token storage, and login path. NON_GOALS: no edits, no package updates, no destructive scans, no live exploit tests. CONSTRAINTS: LOW permissions, scan-only, recommendations only. VALIDATION: findings cite auth/secrets/input surface and why it matters. OUTPUT: recommendations for executor to apply in separate bead."
144
355
  specialists run security-auditor --bead <security-bead> --job <exec-job> --context-depth 3
145
356
 
146
357
  # 5. Final review
147
- bd create --title "Review ..." --type task --priority 2 --description "PROBLEM: Verify executor output ... OUTPUT: PASS/PARTIAL/FAIL."
358
+ bd create --title "Review token refresh retry" --type task --priority 2 --description "PROBLEM: verify executor output against auth retry requirements. SUCCESS: PASS only if retry behavior, error handling, and tests satisfy contract. SCOPE: executor job, diff, acceptance criteria, and target auth files. NON_GOALS: do not rewrite unless explicitly asked. CONSTRAINTS: code-review mindset; findings first; verify security and sanity findings were handled. VALIDATION: inspect targeted checks and regression coverage. OUTPUT: PASS/PARTIAL/FAIL with file/line findings."
148
359
  bd dep add <review> <impl>
149
360
  specialists run reviewer --bead <review> --job <exec-job> --context-depth 3
150
361
  specialists result <review-job>
@@ -154,11 +365,46 @@ sp merge <impl>
154
365
  bd close <task> --reason "Reviewer PASS; merged."
155
366
  ```
156
367
 
157
- Edit-capable specialists with `--bead` auto-provision a worktree. `--worktree` is accepted for clarity but is usually unnecessary. Use `--job <exec-job>` for reviewer/fix passes that must enter the existing executor workspace.
368
+ Edit-capable specialists with `--bead` auto-provision a worktree. `--worktree` is accepted for clarity but usually unnecessary. Use `--job <exec-job>` for reviewer/fix passes that must enter existing executor workspace.
369
+
370
+ What differs: orchestrator carries full bead contract inline, so downstream specialists inherit the actual job shape, not a title.
371
+
372
+ ## Multi-Chain Epic Flow
373
+
374
+ Use epic when multiple implementation chains publish together.
375
+
376
+ ```bash
377
+ # Epic bead
378
+ bd create --title "Epic: auth refresh hardening" --type epic --priority 2 --description "PROBLEM: login and refresh flow have retry drift, weak error surfacing, and unclear follow-up ownership. SUCCESS: epic closes with stable retry behavior, tests, docs, and clean publish. SCOPE: src/auth/*, src/cli/login.ts, tests/unit/auth/*, docs/auth-refresh.md. NON_GOALS: no auth provider swap, no storage migration, no unrelated session revamp. CONSTRAINTS: preserve token format, keep login compatible, sequence risky fixes before merge, use child beads for parallelizable slices. VALIDATION: targeted tests, code-sanity or security pass if risk appears, final reviewer PASS. OUTPUT: merged chain set with notes on remaining gaps."
379
+
380
+ # Planner bead
381
+ bd create --title "Plan auth refresh split" --type task --priority 2 --description "PROBLEM: epic needs disjoint chains before executor starts. SUCCESS: child beads, dependency edges, and file ownership split are explicit. SCOPE: auth refresh epic area. NON_GOALS: no code changes. CONSTRAINTS: keep chains disjoint, identify security-sensitive slice, name review order. VALIDATION: plan names beads and edges. OUTPUT: parallel-ready plan with risk notes."
382
+ bd dep add <plan> <epic>
383
+ specialists run planner --bead <plan> --context-depth 3
384
+
385
+ # Parallel impl beads
386
+ bd create --parent <epic> --title "Impl auth retry" --type task --priority 2 --description "PROBLEM: transient refresh failure breaks login flow. SUCCESS: retry path succeeds after one transient failure and preserves session state. SCOPE: src/auth/refresh.ts, tests/unit/auth/refresh.test.ts. NON_GOALS: no UI changes, no storage migration, no unrelated retry framework edits. CONSTRAINTS: preserve error text, keep backoff bounded, avoid side effects outside auth flow. VALIDATION: regression test for fail-then-succeed path. OUTPUT: code diff, test proof, residual risk list."
387
+ bd create --parent <epic> --title "Impl login handoff" --type task --priority 2 --description "PROBLEM: login CLI does not surface refresh outcome clearly enough for operators. SUCCESS: login shows clear success/failure handoff and no stale token state. SCOPE: src/cli/login.ts, tests/unit/cli/login.test.ts. NON_GOALS: no auth protocol redesign. CONSTRAINTS: preserve CLI flags and error codes, keep output terse. VALIDATION: CLI regression test. OUTPUT: login diff and test evidence."
388
+
389
+ specialists run executor --bead <impl-a> --context-depth 3
390
+ specialists run executor --bead <impl-b> --context-depth 3
391
+
392
+ # Per-chain review
393
+ specialists run reviewer --bead <review-a> --job <exec-a-job> --context-depth 3
394
+ specialists run reviewer --bead <review-b> --job <exec-b-job> --context-depth 3
395
+
396
+ # Publish
397
+ sp epic status <epic>
398
+ sp epic merge <epic>
399
+ ```
400
+
401
+ Use `--epic <id>` when job belongs to epic but bead is not direct child. Avoid parallel executors on same file; sequence them or consolidate work.
402
+
403
+ What differs: orchestrator splits graph first, then launches parallel work only when file scopes are provably disjoint.
158
404
 
159
405
  ## Review And Fix Loop
160
406
 
161
- A chain stays alive until it is merged or abandoned.
407
+ A chain stays alive until merged or abandoned.
162
408
 
163
409
  ```text
164
410
  executor/debugger -> waiting
@@ -167,16 +413,52 @@ reviewer -> PASS | PARTIAL | FAIL
167
413
  ```
168
414
 
169
415
  - `PASS`: verify expected commit/diff, then publish.
170
- - `PARTIAL`: resume the same executor/debugger with exact findings, then re-review.
171
- - `FAIL`: stop and decide whether to replace the chain, re-scope the bead, or ask the operator if judgment is required.
416
+ - `PARTIAL`: resume same executor/debugger with exact findings, then re-review.
417
+ - `FAIL`: stop and decide whether to replace chain, re-scope bead, or ask operator if judgment is required.
172
418
 
173
- Prefer resume over spawning a new fix executor when the original job is waiting and context is healthy:
419
+ Prefer resume over new fix executor when original job is waiting and context is healthy:
174
420
 
175
421
  ```bash
176
422
  sp resume <exec-job> "Reviewer PARTIAL. Fix only these findings: ..."
177
423
  ```
178
424
 
179
- Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer PASS.
425
+ Do not treat job completion, code-sanity OK, security no-findings, or test-runner pass as equivalent to reviewer PASS.
426
+
427
+ What differs: orchestrator uses PASS/PARTIAL/FAIL as real control flow, not just status labels.
428
+
429
+ ## Mini-Flows For Under-Promoted Specialists
430
+
431
+ Planner:
432
+ - Use when epic needs bead split, dependency graph, or file ownership before code starts.
433
+ - Bead shape: task/epic contract with clear success criteria, child beads, and edge plan.
434
+ - Chain position: first or pre-impl.
435
+
436
+ Debugger:
437
+ - Use when symptom exists and root cause is unclear.
438
+ - Bead shape: reproduction, logs, expected vs actual, scope to investigate.
439
+ - Chain position: before executor, or after a failing review when cause is unclear.
440
+
441
+ Overthinker:
442
+ - Use for risky design, cross-cutting tradeoffs, or premortem before lock-in.
443
+ - Bead shape: options, risks, constraint conflicts, decision asked for.
444
+ - Chain position: before planner/executor when design uncertainty is high.
445
+
446
+ Researcher:
447
+ - Use for current external docs, package behavior, or ecosystem facts that repo cannot answer.
448
+ - Bead shape: source list, question set, required citations.
449
+ - Chain position: before executor when outside facts matter.
450
+
451
+ Test-runner:
452
+ - Use when commands need to run and failures need classification, not fixes.
453
+ - Bead shape: exact command list, suites, and expected failure taxonomy.
454
+ - Chain position: after executor or between fix loops.
455
+
456
+ Sync-docs:
457
+ - Use when one doc drifts and must be synced to source truth.
458
+ - Bead shape: one-doc scope, source cross-check, drift checks.
459
+ - Chain position: parallel to code only when doc scope is isolated; otherwise after code settles.
460
+
461
+ What differs: orchestrator uses specialists beyond the common trio, so planning, diagnosis, research, tests, and docs do not collapse into executor work.
180
462
 
181
463
  ## Monitoring And Steering
182
464
 
@@ -186,11 +468,11 @@ Use `sp ps` for state and `sp result` for completed turns.
186
468
  sp ps
187
469
  sp ps <job-id>
188
470
  sp ps --bead <bead-id>
189
- sp feed <job-id> # live/running output
190
- sp result <job-id> # done/error/waiting result
471
+ sp feed <job-id>
472
+ sp result <job-id>
191
473
  ```
192
474
 
193
- If a job is running, use `sp feed`. If it is waiting, use `sp result` and decide whether to resume, review, merge, or stop. Avoid tight polling; sleep based on task size, then check once.
475
+ If job is running, use `sp feed`. If it is waiting, use `sp result` and decide whether to resume, review, merge, or stop. Avoid tight polling; sleep based on task size, then check once.
194
476
 
195
477
  Use `steer` for running jobs and `resume` for waiting jobs:
196
478
 
@@ -204,10 +486,29 @@ Context usage is an action signal when available:
204
486
  - 0-40%: healthy.
205
487
  - 40-65%: monitor.
206
488
  - 65-80%: steer toward conclusion.
207
- - Above 80%: finish, summarize, or replace the job.
489
+ - Above 80%: finish, summarize, or replace job.
208
490
 
209
491
  Raw token totals are not context percentages.
210
492
 
493
+ ## What Stays Out
494
+
495
+ - `memory-processor` — memory synthesis specialist; see `/documenting`.
496
+ - `xt-merge`: deferred to xt-merge skill; this skill names specialist flow, not merge-wrapper internals.
497
+
498
+ ## Adjacent xt commands
499
+
500
+ Source: latest xt report + `xt --help`; keep commands here, not full CLI surface.
501
+ - `xt report` — session report input for release synthesis; see `/session-close-report`.
502
+ - `xt end` — close worktree session: push, PR, merge, cleanup; see `/xt-end`.
503
+ - `xt claude` — launch Claude in sandboxed worktree; see `/using-xtrm`.
504
+ - `xt update` — refresh xtrm-managed files in one repo or many; see `/update-xt`.
505
+ - `xt doctor` — diagnose xtrm drift in current project; see `/update-xt`.
506
+ - `xt init` — bootstrap xtrm in project; see xtrm-tools docs.
507
+ - `xt release prepare/publish` — legacy release path; canonical flow is `/releasing`.
508
+ - `bd prime` — refresh beads workflow context; see `CLAUDE.md`.
509
+ - `memory-processor` — memory synthesis specialist; see `/documenting`.
510
+ - `xt-merge` — defer merge-queue internals to `/xt-merge`.
511
+
211
512
  ## Merge And Publication
212
513
 
213
514
  Standalone chain:
@@ -225,25 +526,12 @@ sp epic merge <epic-id>
225
526
 
226
527
  Rules:
227
528
 
228
- - Merge only after reviewer PASS unless the operator explicitly accepts a draft for follow-up work.
529
+ - Merge only after reviewer PASS unless operator explicitly accepts draft for follow-up work.
229
530
  - Use `sp epic merge` for unresolved epic chains; `sp merge` refuses those by design.
230
531
  - Do not manually `git merge` specialist branches.
231
- - If merge refuses because a chain job is still `waiting`, consume the result and either resume/stop/finalize that job deliberately.
232
- - If merge reports a dirty worktree, inspect that worktree. Revert generated noise only when it is clearly unrelated; otherwise ask or re-dispatch.
233
- - Run or confirm required gates before closing the root bead or epic.
234
-
235
- ## Multi-Chain Epic Flow
236
-
237
- Use an epic when multiple implementation chains publish together.
238
-
239
- 1. Create an epic bead with complete contract.
240
- 2. Use planner/explorer for shared prep if needed.
241
- 3. Create independent implementation beads with disjoint file scopes.
242
- 4. Dispatch executors in parallel only when scopes are provably disjoint.
243
- 5. Review each chain with its own review bead and `--job`.
244
- 6. After every chain has reviewer PASS, publish with `sp epic merge <epic-id>`.
245
-
246
- Use `--epic <id>` when a job belongs to an epic but its bead is not a direct child. Avoid parallel executors on the same file; sequence them or consolidate the work.
532
+ - If merge refuses because chain job is still `waiting`, consume result and either resume/stop/finalize that job deliberately.
533
+ - If merge reports dirty worktree, inspect that worktree. Revert generated noise only when clearly unrelated; otherwise ask or re-dispatch.
534
+ - Run or confirm required gates before closing root bead or epic.
247
535
 
248
536
  ## Failure Recovery
249
537
 
@@ -258,27 +546,17 @@ sp doctor
258
546
 
259
547
  Then choose one action:
260
548
 
261
- - Steer a running job back to scope.
262
- - Resume a waiting job with exact next instructions.
263
- - Stop a dead or obsolete job.
264
- - Rerun with a better bead contract.
265
- - Switch specialist if the selected role was wrong.
266
- - Report blocker if destructive/high-risk/manual action is required.
267
-
268
- Common recovery commands:
269
-
270
- ```bash
271
- sp stop <job-id>
272
- sp clean --processes --dry-run
273
- sp epic status <epic-id>
274
- sp epic sync <epic-id> --apply
275
- sp epic abandon <epic-id> --reason "..."
276
- specialists doctor --check-drift
277
- sp prune-stale-defaults --dry-run
278
- ```
279
-
280
- Do not silently take over substantial specialist work yourself unless the operator agrees or the remaining change is genuinely small and deterministic.
549
+ - Resume waiting executor/debugger with exact findings.
550
+ - Re-run with better bead if contract was weak.
551
+ - Re-scope bead if scope was wrong.
552
+ - Escalate if human decision is needed.
553
+ - Replace specialist only if failure mode repeats.
281
554
 
282
- ## What Stays Out Of This Skill
555
+ ## What Orchestrator Does Differently Because Of This Skill
283
556
 
284
- Do not embed the full specialist catalog, all CLI help, release mechanics, stale incident reports, or historical gotchas. Keep volatile detail in `specialists list --full`, `sp help`, bead notes, and focused skills such as `releasing`, `using-nodes`, or `specialists-creator`.
557
+ - Writes bead contract before dispatch.
558
+ - Chooses edge type before creating chain.
559
+ - Uses specialist role by job shape, not by habit.
560
+ - Keeps fix loops alive with resume, not re-spawn.
561
+ - Treats reviewer PASS as only publish gate.
562
+ - Keeps memory-processor and xt-merge out of this skill on purpose.
@@ -32,7 +32,8 @@
32
32
  "template_sets": [
33
33
  "changelog-keeper-scope",
34
34
  "changelog-conventions",
35
- "git-workflow-safe"
35
+ "git-workflow-safe",
36
+ "per-turn-handoff-schema"
36
37
  ]
37
38
  },
38
39
  "prompt": {