xtrm-tools 0.7.12 → 0.7.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/.xtrm/config/hooks.json +10 -0
  2. package/.xtrm/hooks/specialists/specialists-memory-cache-sync.mjs +57 -0
  3. package/.xtrm/hooks/specialists-agent-guard.mjs +76 -0
  4. package/.xtrm/registry.json +509 -393
  5. package/.xtrm/skills/default/premortem/SKILL.md +218 -0
  6. package/.xtrm/skills/default/releasing/SKILL.md +94 -0
  7. package/.xtrm/skills/default/releasing/scripts/xt-reports.ts +18 -0
  8. package/.xtrm/skills/default/session-close-report/SKILL.md +85 -17
  9. package/.xtrm/skills/default/specialists-creator/SKILL.md +117 -42
  10. package/.xtrm/skills/default/specialists-creator/scripts/audit-spec-uniformity.mjs +86 -0
  11. package/.xtrm/skills/default/specialists-creator/scripts/scaffold-specialist.ts +223 -0
  12. package/.xtrm/skills/default/specialists-creator/scripts/validate-specialist.ts +1 -1
  13. package/.xtrm/skills/default/sync-docs/SKILL.md +88 -208
  14. package/.xtrm/skills/default/sync-docs/scripts/pre-context.sh +17 -0
  15. package/.xtrm/skills/default/update-specialists/SKILL.md +99 -201
  16. package/.xtrm/skills/default/update-xt/SKILL.md +34 -0
  17. package/.xtrm/skills/default/using-kpi/SKILL.md +150 -0
  18. package/.xtrm/skills/default/using-nodes/SKILL.md +18 -102
  19. package/.xtrm/skills/default/using-script-specialists/SKILL.md +208 -0
  20. package/.xtrm/skills/default/using-specialists/SKILL.md +13 -0
  21. package/.xtrm/skills/default/using-specialists-v2/SKILL.md +773 -0
  22. package/.xtrm/skills/default/using-specialists-v3/SKILL.md +284 -0
  23. package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +89 -0
  24. package/CHANGELOG.md +17 -0
  25. package/README.md +5 -1
  26. package/cli/dist/index.cjs +3401 -627
  27. package/cli/dist/index.cjs.map +1 -1
  28. package/cli/package.json +1 -1
  29. package/package.json +3 -2
  30. package/packages/pi-extensions/.serena/project.yml +130 -0
  31. package/packages/pi-extensions/extensions/pi-serena-compact/index.ts +4 -12
  32. package/packages/pi-extensions/extensions/xtrm-loader/index.ts +0 -1
  33. package/packages/pi-extensions/extensions/xtrm-ui/index.ts +201 -36
  34. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark-flattools.json +79 -0
  35. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark.json +85 -0
  36. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light-flattools.json +79 -0
  37. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light.json +85 -0
  38. package/packages/pi-extensions/package.json +1 -1
  39. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark-flattools.json +79 -0
  40. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark.json +3 -3
  41. package/packages/pi-extensions/themes/xtrm-ui/pidex-light-flattools.json +79 -0
  42. package/scripts/patch-external-pi-tools.mjs +154 -0
@@ -0,0 +1,773 @@
1
+ ---
2
+ name: using-specialists-v2
3
+ description: >
4
+ Use this skill to orchestrate substantial work through project specialists with
5
+ a bead-first workflow. It covers when to delegate, how to write complete bead
6
+ task contracts, how to run explorer/executor/reviewer/test chains, how to use
7
+ --worktree/--job/--epic/--context-depth, and how to merge or recover specialist
8
+ work without drift. Trigger for code review, debugging, implementation,
9
+ planning, test generation, doc sync, multi-chain epics, and any question about
10
+ specialist orchestration.
11
+ version: 1.4
12
+ ---
13
+
14
+ # Specialists V2
15
+
16
+ You are the orchestrator. Your job is to specify the work, choose the right specialist, launch the right chain, monitor progress, and synthesize results. Do not turn orchestration into vague delegation: `--bead` is the prompt.
17
+
18
+ Use this skill for substantial work: codebase exploration, debugging, implementation, review, testing, documentation sync, planning, specialist authoring, and multi-chain orchestration. Do small deterministic edits directly when the scope is already clear and delegation would add ceremony.
19
+
20
+ For one-shot synchronous specialist invocations from services or scripts (template + variables, READ_ONLY, JSON out), use `using-script-specialists` instead. That runtime (`sp script` / `sp serve`) is unrelated to bead-first orchestration.
21
+
22
+ ## Update Awareness On Skill Load
23
+
24
+ On first activation in a session, before substantial work, check whether the local specialists install is current:
25
+
26
+ ```bash
27
+ LOCAL=$(node -p "require('./package.json').version" 2>/dev/null)
28
+ LATEST=$(git ls-remote --tags --refs origin 2>/dev/null | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+$' | sort -V | tail -1 | sed 's/^v//')
29
+ [ -n "$LATEST" ] && [ "$LOCAL" != "$LATEST" ] && echo "specialists v$LOCAL is local; v$LATEST published — consider /update-specialists before substantial work."
30
+ ```
31
+
32
+ Skip the check entirely when `SPECIALISTS_OFFLINE=1` is set, when stdin is not a TTY (specialist-spawned subagent context), or when the previous turn already surfaced this notice. Surface at most one line — never block, never spam, never auto-update. The operator decides whether to run `/update-specialists`.
33
+
34
+ When the local version is behind, the latest CHANGELOG entry can be summarized via `head -50 CHANGELOG.md` to anchor what changed; cross-link to the `update-specialists` skill for the actual reconcile flow.
35
+
36
+ ## Hard Rules
37
+
38
+ 1. `--bead` is the prompt for tracked work.
39
+ 2. Do not dispatch until the bead is a complete task contract.
40
+ 3. Never use `--prompt` to supplement tracked work. Update bead notes instead.
41
+ 4. Use explorer only when the implementation path is unknown.
42
+ 5. Use executor only after scope, constraints, and validation are clear enough to act.
43
+ 6. Edit-capable specialists with `--bead` auto-provision a worktree. `--worktree` is still accepted for clarity but not required (the deprecated `--no-worktree` flag is gone).
44
+ 7. Reviewer gets its own bead and enters the executor workspace with `--job <exec-job>`. `--job` auto-resolves the bead if `--bead` is omitted.
45
+ 8. `--context-depth` defaults to 3 (parent task + predecessor + own bead). Override only when the chain needs less or more upstream context.
46
+ 9. Keep executor/debugger jobs alive through review so they can be resumed.
47
+ 10. Merge specialist branches with `sp merge` or `sp epic merge`, never manual `git merge`.
48
+ 11. Specialists must not perform destructive or irreversible actions.
49
+ 12. If a specialist fails, inspect feed/result and either steer, resume, rerun with a better bead, or report the blocker.
50
+ 13. Drive chains autonomously. Do not ask the operator to approve routine stage transitions. Escalate only on critical events (see Autonomous Drive section).
51
+ 14. Stale-base guard: dispatch refuses to provision a worktree when sibling epic chains have unmerged substantive commits. Override only with explicit `--force-stale-base` and a reason. Merge-time rebase happens automatically.
52
+ 15. Auto-checkpoint: executor and debugger commit substantive worktree changes on `waiting` by default (`auto_commit: checkpoint_on_waiting`). Noise paths (`.xtrm/`, `.wolf/`, `.specialists/jobs/`, `.beads/`) are filtered.
53
+ 16. Per-turn output appends to the input bead notes for **all** specialists on every `run_complete`, with `[WAITING — more output may follow]` or `[DONE]` headers. `bd show <bead-id>` is a valid path to read intermediate output.
54
+ 17. Specialist jobs do not orchestrate nested specialist chains. The top-level orchestrator dispatches specialists, collects results, and advances the workflow.
55
+ 18. Treat test failures as evidence to classify against the bead scope. Validate whether failures are in-scope, pre-existing, or infrastructure-related before sending an executor into a fix loop.
56
+
57
+ ## Canonical Runtime State
58
+
59
+ These are current operating facts, not migration notes:
60
+
61
+ - **Asset ownership:** Cat A runtime assets — specialists, mandatory-rules, catalog, and nodes — resolve live from the specialists package after project tiers. Cat B filesystem assets — skills and hooks — are owned by xtrm-tools under `.xtrm/skills/default` and `.xtrm/hooks/default`.
62
+ - **Resolution precedence:** project/user tiers win over managed defaults; package-live is the final fallback. Mandatory-rule indexes are not stacked across tiers; per-id mandatory-rule files may fall through to package canonical when absent locally.
63
+ - **Drift surface:** use `sp doctor --check-drift` to inspect stale managed defaults and `sp prune-stale-defaults --dry-run` to preview cleanup.
64
+ - **Source verification:** resolver/catalog changes in a worktree are verified with `sp config show <name> --resolved --from-source` so evidence comes from the checked-out source, not an installed dist.
65
+ - **Worktree publication:** edit-capable specialists produce worktree branches. Before review or merge, verify the branch diff and status from that worktree.
66
+ - **Epic publication:** epics are the merge-gated identity. Publish through `sp epic merge`; use `sp epic abandon` to deliberately close failed or cancelled epic bookkeeping.
67
+ - **CLI safety:** command help paths are side-effect free. New commands must parse `--help`/`-h` before action and have a no-write help test.
68
+ - **Release context:** changelog-keeper receives xt report context through the `releasing` skill's helper. Release-range logic supports annotated tags.
69
+
70
+ ## Autonomous Drive
71
+
72
+ Once the operator has approved a plan or specified a task, push the chain to completion without pausing for per-stage confirmation. Dispatch, wait with `sleep`, read results, dispatch the next stage, review, and merge. Treat each stage transition as a mechanical step — not a decision point.
73
+
74
+ Escalate to the operator only for:
75
+
76
+ - Reviewer verdict `FAIL` (not `PARTIAL` — fix those autonomously via resume).
77
+ - Destructive/irreversible action required (history rewrite, force push, credential rotation, mass delete, prod-impacting op).
78
+ - Repeated specialist crashes on the same chain (2+ in a row with same failure mode).
79
+ - Context-exhaustion risk above 80% with no clean handoff available.
80
+ - Ambiguous requirements the bead cannot resolve (rare — fix by updating the bead contract first and retrying).
81
+ - Explicit user-facing question embedded in reviewer output that needs human judgment.
82
+
83
+ Anything else — stage transitions, routine reviewer `PARTIAL` with concrete findings, merge gates passing, test retries — proceed without asking.
84
+
85
+ ### Sleep-Based Polling
86
+
87
+ Use `sleep` between dispatch and status check. Size the sleep to the observed median for the specialist and adjust by polling once and checking `sp ps <job-id>`:
88
+
89
+ | Specialist | First sleep | Poll interval after |
90
+ | --- | --- | --- |
91
+ | executor | `sleep 180` (3m) | 60-120s |
92
+ | reviewer | `sleep 120` (2m) | 60s |
93
+ | explorer | `sleep 180` (3m) | 60s |
94
+ | debugger | `sleep 480` (8m) | 120s |
95
+ | overthinker | `sleep 240` (4m) | 60s |
96
+ | planner | `sleep 300` (5m) | 60s |
97
+ | sync-docs | `sleep 180` (3m) | 60s |
98
+ | researcher | `sleep 120` (2m) | 60s |
99
+ | test-runner | `sleep 120` (2m) | 60s |
100
+
101
+ Medians are empirical (derived from run history). Adjust for observed run complexity. If `sp ps` shows `running` after the first sleep, poll once more before assuming stuck. If `waiting`, read `sp result` — reviewer verdicts and READ_ONLY outputs land in the bead notes automatically.
102
+
103
+ Do not busy-loop `sp ps` in tight intervals. One sleep + one confirmation poll is enough for routine runs.
104
+
105
+ ### Drive Loop Pattern
106
+
107
+ ```bash
108
+ # Dispatch
109
+ JOB=$(sp run <specialist> --bead <bead-id> --context-depth 3 --background 2>&1 | tail -1)
110
+
111
+ # Sleep for median
112
+ sleep 180
113
+
114
+ # Check
115
+ sp ps "$JOB"
116
+
117
+ # Still running? Short follow-up sleep, then re-check
118
+ # Waiting or done? Read result
119
+ sp result "$JOB"
120
+
121
+ # Advance to next stage based on output — no operator prompt
122
+ ```
123
+
124
+ Launch sleeps in the background when other orchestration work can proceed in parallel; the harness will notify on completion. Return to `sp ps`/`sp result` after the median interval elapses.
125
+
126
+ ## Bead Task Contract
127
+
128
+ Every specialist-bound bead must be a usable prompt. Title-only beads are not acceptable.
129
+
130
+ Required fields:
131
+
132
+ ```text
133
+ PROBLEM: What is wrong or needed.
134
+ SUCCESS: Observable completion criteria.
135
+ SCOPE: Files, symbols, commands, docs, or discovery area.
136
+ NON_GOALS: Explicitly out of scope.
137
+ CONSTRAINTS: Compatibility, safety, style, permissions, sequencing.
138
+ VALIDATION: Checks/tests/review expected before closure.
139
+ OUTPUT: Expected handoff format.
140
+ ```
141
+
142
+ Use `bd update <id> --notes "CONTRACT: ..."` when an existing bead is too vague.
143
+
144
+ ### Contract By Bead Type
145
+
146
+ Task/epic bead:
147
+
148
+ ```text
149
+ PROBLEM: User-facing or project-facing objective.
150
+ SUCCESS: End-state across all child beads.
151
+ SCOPE: Area of project affected.
152
+ NON_GOALS: Boundaries for the entire effort.
153
+ CONSTRAINTS: Sequencing, compatibility, branch/merge rules.
154
+ VALIDATION: Final checks before close.
155
+ OUTPUT: What the orchestrator reports back.
156
+ ```
157
+
158
+ Explorer bead:
159
+
160
+ ```text
161
+ PROBLEM: What is unknown.
162
+ SUCCESS: Questions answered with evidence.
163
+ SCOPE: Code areas, docs, commands, or symbols to inspect.
164
+ NON_GOALS: No implementation, no broad audit outside scope.
165
+ CONSTRAINTS: READ_ONLY, prefer GitNexus/code intelligence when available.
166
+ VALIDATION: Findings cite files/symbols/flows.
167
+ OUTPUT: Findings, risks, recommended implementation track, stop condition.
168
+ ```
169
+
170
+ Executor bead:
171
+
172
+ ```text
173
+ PROBLEM: Exact behavior or artifact to change.
174
+ SUCCESS: Observable acceptance criteria.
175
+ SCOPE: Target files/symbols; include "do not touch" boundaries.
176
+ NON_GOALS: Related improvements explicitly excluded.
177
+ CONSTRAINTS: API compatibility, style, migrations, safety.
178
+ VALIDATION: Lint/typecheck/tests or manual checks.
179
+ OUTPUT: Changed files, verification, residual risks.
180
+ ```
181
+
182
+ Reviewer bead:
183
+
184
+ ```text
185
+ PROBLEM: Verify executor output against requirements.
186
+ SUCCESS: PASS only if requirements and validation are satisfied.
187
+ SCOPE: Executor job, diff, task bead, acceptance criteria.
188
+ NON_GOALS: Do not rewrite unless explicitly asked.
189
+ CONSTRAINTS: Code-review mindset; findings first.
190
+ VALIDATION: Run or inspect required checks where feasible.
191
+ OUTPUT: PASS/PARTIAL/FAIL with file/line findings.
192
+ ```
193
+
194
+ Test bead:
195
+
196
+ ```text
197
+ PROBLEM: Validate one or more implementation chains.
198
+ SUCCESS: Relevant tests/checks pass or failures are diagnosed.
199
+ SCOPE: Commands and implementation beads covered.
200
+ NON_GOALS: No broad unrelated suite expansion unless requested.
201
+ CONSTRAINTS: Avoid destructive cleanup; report flaky/infra failures separately.
202
+ VALIDATION: Command output and failure interpretation.
203
+ OUTPUT: Pass/fail summary, failing tests, likely owner.
204
+ ```
205
+
206
+ ## Choosing The Specialist
207
+
208
+ Run `specialists list` if you need the live registry. Choose by task, not by habit.
209
+
210
+ | Need | Specialist | Use when |
211
+ | --- | --- | --- |
212
+ | Architecture/code mapping | `explorer` | You need evidence and a scoped implementation track. |
213
+ | Root-cause analysis | `debugger` | There is a symptom, stack trace, failing test, or regression. |
214
+ | Planning/decomposition | `planner` | You need beads, dependencies, file scopes, or sequencing. |
215
+ | Design/tradeoffs | `overthinker` | The approach is risky, ambiguous, or needs critique. |
216
+ | Implementation | `executor` | The contract is clear enough to write code or docs. |
217
+ | Compliance/code review | `reviewer` | An executor/debugger produced changes that need the final PASS/PARTIAL/FAIL verdict. |
218
+ | Implementation sanity | `code-sanity` | You want a cheap READ_ONLY smell pass for simplicity, type safety, dead code, brittle async/error handling, or maintainability before reviewer. |
219
+ | Security/dependency audit | `security-auditor` | You need threat modeling, secure-code review, package advisory triage, or agent/config security scanning. LOW: scan/read/recommend only. |
220
+ | Multiple review perspectives | `parallel-review` | A critical diff needs independent review passes. |
221
+ | Test execution | `test-runner` | You need suites run and failures interpreted. |
222
+ | Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization. |
223
+ | External/live research | `researcher` | Current non-security library/docs/media lookup is needed. |
224
+ | Specialist config | `specialists-creator` | Creating or changing specialist JSON/config. |
225
+ | Release publication (end-to-end) | `changelog-keeper` | A new tag is being cut. MEDIUM specialist: drafts CHANGELOG section from xt reports, bumps package.json, rebuilds dist, commits, tags, pushes. Use the `releasing` skill to dispatch. |
226
+
227
+ Selection rules:
228
+
229
+ - Explorer is READ_ONLY and should answer specific questions.
230
+ - Debugger is better than explorer for failures because it traces causes and remediation.
231
+ - Executor does not own full test validation; use reviewer/test-runner for that phase.
232
+ - Code-sanity is optional and non-blocking by default: use it when a diff smells overcomplicated or type-risky, then resume executor with concrete findings. It is not a merge gate.
233
+ - Security-auditor may run safe local audit commands and web/source research, but must not edit files, update dependencies, exfiltrate secrets, or run destructive/live-target exploit tests. Executor applies any recommended fixes in a separate bead.
234
+ - Reviewer always uses its own bead plus `--job <executor-job>` and remains the final merge gate.
235
+ - Sync-docs is for audit/sync; executor is for heavy doc rewrites.
236
+ - Specialists-creator should precede specialist config/schema edits.
237
+
238
+ ## Command Surface
239
+
240
+ Daily commands:
241
+
242
+ ```bash
243
+ specialists list
244
+ specialists list-rules # rule × specialist matrix
245
+ specialists doctor
246
+ specialists doctor --check-drift # inspect stale .specialists/default snapshots
247
+ sp prune-stale-defaults --dry-run # preview redundant default snapshots
248
+ specialists run <name> --bead <id> --background
249
+ specialists run executor --bead <impl-bead> --background # worktree auto-provisioned
250
+ specialists run code-sanity --bead <sanity-bead> --job <exec-job> --keep-alive --background
251
+ specialists run security-auditor --bead <security-bead> --job <exec-job> --keep-alive --background
252
+ specialists run reviewer --bead <review-bead> --job <exec-job> --keep-alive --background
253
+ specialists ps
254
+ specialists ps <job-id>
255
+ specialists feed <job-id>
256
+ specialists feed -f
257
+ specialists result <job-id> # works on done/error/waiting
258
+ specialists result <job-id> --wait --timeout 600
259
+ specialists steer <job-id> "new direction"
260
+ specialists resume <job-id> "next task"
261
+ specialists stop <job-id>
262
+ ```
263
+
264
+ Publication commands:
265
+
266
+ ```bash
267
+ sp merge <chain-root-bead>
268
+ sp epic status <epic-id>
269
+ sp epic sync <epic-id> --apply
270
+ sp epic merge <epic-id>
271
+ sp epic abandon <epic-id> --reason "..."
272
+ sp end
273
+ ```
274
+
275
+ `sp result <job-id>` returns the most recent completed turn for `waiting` jobs with a `Session is waiting for your input` footer — use it to inspect a keep-alive job before deciding whether to resume. For `running` jobs, `sp feed <job-id>` is the right tool; `sp poll` is deprecated. Avoid `specialists status --job` for normal monitoring; prefer `sp ps <job-id>`.
276
+
277
+ ## Flag Semantics
278
+
279
+ `--bead <id>` is the task prompt and tracked work identity.
280
+
281
+ `--context-depth N` controls parent/ancestor bead context. Default is **3** (own bead + predecessor + parent task). Lower it when the chain is shallow or the parent context is noisy.
282
+
283
+ `--worktree` provisions a new isolated workspace and branch for edit-capable work. Optional when `--bead` is provided to an edit-capable specialist — a worktree is auto-provisioned. Pass `--worktree` explicitly only when you want it without a bead, or for emphasis. The deprecated `--no-worktree` flag is removed and now errors out.
284
+
285
+ `--job <id>` reuses an existing job's workspace. Use it for reviewer and fix passes. If `--bead` is omitted, bead_id is inferred from the target job's status; explicit `--bead` always wins.
286
+
287
+ `--force-job` overrides the concurrency lock that blocks edit-capable specialists from entering an owner workspace while it is `starting`/`running`. Use only when you accept the write race; prefer `sp stop` on dead jobs first.
288
+
289
+ `--force-stale-base` bypasses the dispatch-time stale-base guard that blocks `--worktree` provisioning when sibling epic chains have unmerged substantive commits. Use only with a clear reason; the guard prevents merge-conflict cascades.
290
+
291
+ `--epic <id>` explicitly associates a job with an epic. Use it for prep jobs whose parent is not the epic but should appear in epic status/readiness.
292
+
293
+ `--keep-alive` keeps interactive specialists waiting after a turn. Use for reviewer, overthinker, researcher, sync-docs, and any job expected to receive follow-up.
294
+
295
+ `--worktree` and `--job` are mutually exclusive.
296
+
297
+ ## Golden Path: Single Implementation Chain
298
+
299
+ Use this when one implementation branch can solve the task.
300
+
301
+ Create a root task bead:
302
+
303
+ ```bash
304
+ bd create --title "Fix token refresh retry on 401" --type bug --priority 2 \
305
+ --description "PROBLEM: API clients fail permanently when token refresh receives a transient 401.
306
+ SUCCESS: Refresh retries are bounded, observable, and callers receive the same public error shape after exhaustion.
307
+ SCOPE: src/auth/refresh.ts, src/auth/client.ts, related auth tests.
308
+ NON_GOALS: Do not redesign auth storage or change public client API.
309
+ CONSTRAINTS: Preserve existing telemetry names and backward compatibility.
310
+ VALIDATION: lint, tsc, auth refresh tests or documented targeted equivalent.
311
+ OUTPUT: Changed files, validation results, residual risk."
312
+ ```
313
+
314
+ Create explorer only if the implementation path is unclear:
315
+
316
+ ```bash
317
+ bd create --title "Explore token refresh retry path" --type task --priority 2 \
318
+ --description "PROBLEM: Need exact refresh call graph and retry insertion point.
319
+ SUCCESS: Identify caller/callee path, current retry behavior, and safest files to modify.
320
+ SCOPE: auth refresh/client modules and tests only.
321
+ NON_GOALS: No implementation.
322
+ CONSTRAINTS: READ_ONLY; cite files/symbols.
323
+ VALIDATION: Findings include recommended executor scope and risks.
324
+ OUTPUT: Evidence-backed implementation plan."
325
+ bd dep add <explore> <task>
326
+ specialists run explorer --bead <explore> --context-depth 3 --background
327
+ specialists result <explore-job>
328
+ ```
329
+
330
+ Create implementation bead:
331
+
332
+ ```bash
333
+ bd create --title "Implement bounded token refresh retry" --type task --priority 2 \
334
+ --description "PROBLEM: Implement the retry behavior identified by exploration.
335
+ SUCCESS: 401 refresh retry is bounded and preserves public errors after exhaustion.
336
+ SCOPE: src/auth/refresh.ts, src/auth/client.ts, auth refresh tests.
337
+ NON_GOALS: No storage redesign, no public API change.
338
+ CONSTRAINTS: Keep telemetry names stable; avoid broad refactor.
339
+ VALIDATION: npm run lint, npx tsc --noEmit, targeted auth tests if available.
340
+ OUTPUT: Diff summary, checks run, follow-up risks."
341
+ bd dep add <impl> <explore-or-task>
342
+ specialists run executor --worktree --bead <impl> --context-depth 3 --background
343
+ specialists result <exec-job>
344
+ ```
345
+
346
+ Optional code-sanity pass for implementation smell checks (use when the diff is non-trivial or likely to accumulate agent-code complexity):
347
+
348
+ ```bash
349
+ bd create --title "Code sanity check token refresh retry" --type task --priority 3 \
350
+ --description "PROBLEM: Cheap READ_ONLY sanity pass for executor implementation quality before final review.
351
+ SUCCESS: Identify concrete simplicity/type-safety/maintainability findings, or return OK.
352
+ SCOPE: executor job <exec-job>, implementation diff only.
353
+ NON_GOALS: No requirements verdict, no security audit, no test execution, no edits.
354
+ CONSTRAINTS: At most 5 concrete findings; cite files/symbols/lines where possible.
355
+ VALIDATION: Findings are suitable to paste into specialists resume <exec-job>.
356
+ OUTPUT: OK/FINDINGS/BLOCKED with handoff."
357
+ bd dep add <sanity> <impl>
358
+ specialists run code-sanity --bead <sanity> --job <exec-job> --context-depth 3 --keep-alive --background
359
+ specialists result <sanity-job>
360
+ ```
361
+
362
+ If code-sanity returns `FINDINGS`, resume executor with those concrete instructions, then rerun code-sanity only if the fixes were substantive. Do not treat code-sanity `OK` as reviewer PASS.
363
+
364
+ Optional security pass when the task touches auth, secrets, input handling, dependency updates, package advisories, agent config, hooks, or exposed endpoints:
365
+
366
+ ```bash
367
+ bd create --title "Security audit token refresh retry" --type task --priority 2 \
368
+ --description "PROBLEM: Scoped security/dependency/config audit for executor changes.
369
+ SUCCESS: Identify evidence-backed security findings or return no findings.
370
+ SCOPE: executor job <exec-job>, changed files, relevant manifests/config only.
371
+ NON_GOALS: No edits, no package updates, no destructive scans, no live exploit testing.
372
+ CONSTRAINTS: LOW permission; recommendations only. HN/social signals are not authoritative proof.
373
+ VALIDATION: Findings cite local evidence or OSV/GHSA/NVD/vendor/package-audit sources.
374
+ OUTPUT: Security audit summary, findings, dependency triage, residual risk."
375
+ bd dep add <security> <impl>
376
+ specialists run security-auditor --bead <security> --job <exec-job> --context-depth 3 --keep-alive --background
377
+ specialists result <security-job>
378
+ ```
379
+
380
+ If security-auditor recommends code or dependency changes, create/resume an executor fix bead. Do not let security-auditor apply updates.
381
+
382
+ Create review bead:
383
+
384
+ ```bash
385
+ bd create --title "Review token refresh retry implementation" --type task --priority 2 \
386
+ --description "PROBLEM: Verify executor changes satisfy token refresh retry contract.
387
+ SUCCESS: PASS only if behavior, scope, constraints, and validation are satisfied.
388
+ SCOPE: executor job <exec-job>, implementation bead, root task contract.
389
+ NON_GOALS: Do not request unrelated auth redesign.
390
+ CONSTRAINTS: Findings first with file/line references.
391
+ VALIDATION: Inspect diff and available checks.
392
+ OUTPUT: PASS/PARTIAL/FAIL verdict with required fixes."
393
+ bd dep add <review> <impl>
394
+ specialists run reviewer --bead <review> --job <exec-job> --context-depth 3 --keep-alive --background
395
+ specialists result <review-job>
396
+ ```
397
+
398
+ If reviewer returns `PARTIAL`, prefer resuming the same executor:
399
+
400
+ ```bash
401
+ specialists resume <exec-job> "Reviewer PARTIAL. Fix only these findings: ..."
402
+ ```
403
+
404
+ Then create a new re-review bead and run reviewer again with the same `--job <exec-job>`.
405
+
406
+ After reviewer `PASS`, publish:
407
+
408
+ ```bash
409
+ sp merge <impl>
410
+ bd close <task> --reason "Fixed token refresh retry. Reviewer PASS. Merged."
411
+ ```
412
+
413
+ ## Golden Path: Multi-Chain Epic
414
+
415
+ Use this when multiple independent implementation chains must publish together.
416
+
417
+ Create a top-level epic with the complete contract:
418
+
419
+ ```bash
420
+ bd create --title "Add specialist bead contract enforcement" --type epic --priority 1 \
421
+ --description "PROBLEM: Specialists drift when --bead issues are under-specified.
422
+ SUCCESS: Docs and runtime guidance require complete bead contracts before dispatch.
423
+ SCOPE: docs/workflow guidance, skill docs, optional validation entry point.
424
+ NON_GOALS: No database migration, no breaking CLI changes.
425
+ CONSTRAINTS: Keep examples canonical and avoid title-only beads.
426
+ VALIDATION: docs review, lint/typecheck for runtime changes, reviewer PASS per chain.
427
+ OUTPUT: Merged epic with documented contract and verification."
428
+ ```
429
+
430
+ Create a shared prep bead:
431
+
432
+ ```bash
433
+ bd create --title "Plan bead contract enforcement tracks" --type task --priority 2 \
434
+ --description "PROBLEM: Need file-disjoint implementation tracks for the epic.
435
+ SUCCESS: Identify independent chains, dependencies, risks, and validation per chain.
436
+ SCOPE: workflow docs, CLI/run validation surfaces, tests.
437
+ NON_GOALS: No implementation.
438
+ CONSTRAINTS: READ_ONLY; produce dependency plan.
439
+ VALIDATION: Plan names file scopes and merge order.
440
+ OUTPUT: Parallel track plan."
441
+ bd dep add <plan> <epic>
442
+ specialists run planner --bead <plan> --epic <epic> --context-depth 3 --background
443
+ ```
444
+
445
+ Create independent implementation beads only when write scopes are disjoint:
446
+
447
+ ```bash
448
+ bd create --title "Implement CLI bead contract warning" --type task --priority 2 \
449
+ --description "PROBLEM: CLI allows specialist dispatch from vague beads.
450
+ SUCCESS: Dispatch warns or blocks according to agreed contract policy.
451
+ SCOPE: src/cli/run.ts, src/specialist/beads.ts, related tests.
452
+ NON_GOALS: No schema migration.
453
+ CONSTRAINTS: Preserve --prompt behavior for explicit ad-hoc runs.
454
+ VALIDATION: lint, tsc, targeted run/beads tests.
455
+ OUTPUT: Diff summary and verification."
456
+ bd dep add <impl-cli> <plan>
457
+
458
+ bd create --title "Update workflow docs for bead contract" --type task --priority 2 \
459
+ --description "PROBLEM: Docs teach title-only specialist beads.
460
+ SUCCESS: Canonical examples use complete task contracts.
461
+ SCOPE: config/skills/using-specialists/SKILL.md, CLAUDE.md, docs/features.md.
462
+ NON_GOALS: No runtime code.
463
+ CONSTRAINTS: Keep docs concise and current.
464
+ VALIDATION: Review examples for contract fields and stale commands.
465
+ OUTPUT: Updated docs summary."
466
+ bd dep add <impl-docs> <plan>
467
+ ```
468
+
469
+ Run parallel executors only if scopes are disjoint:
470
+
471
+ ```bash
472
+ specialists run executor --worktree --bead <impl-cli> --context-depth 3 --background
473
+ specialists run executor --worktree --bead <impl-docs> --context-depth 3 --background
474
+ ```
475
+
476
+ Review each chain with its own review bead and `--job`.
477
+
478
+ After every chain has reviewer `PASS`:
479
+
480
+ ```bash
481
+ sp epic status <epic>
482
+ sp epic merge <epic>
483
+ bd close <epic> --reason "All chains reviewer PASS. Epic merged."
484
+ ```
485
+
486
+ ## Review And Fix Loop
487
+
488
+ A chain stays alive until merge or abandonment.
489
+
490
+ Standard loop:
491
+
492
+ ```text
493
+ executor --worktree --bead impl
494
+ -> waiting after turn
495
+ optional code-sanity --bead sanity --job exec-job
496
+ -> OK: continue
497
+ -> FINDINGS: resume executor with exact sanity findings
498
+ optional security-auditor --bead security --job exec-job
499
+ -> no findings: continue
500
+ -> findings: create/resume executor fix bead; auditor never edits
501
+ reviewer --bead review --job exec-job
502
+ -> PASS: verify commit, publish, stop members if needed
503
+ -> PARTIAL: resume executor with exact findings
504
+ -> FAIL: decide whether to resume, replace, or abandon
505
+ ```
506
+
507
+ Prefer `sp resume <exec-job>` over a new fix executor when the original job is waiting and context is healthy. Use a new fix bead with `--job <exec-job>` only when the original executor is dead, context exhausted, or a separate audit trail is required.
508
+
509
+ Code-sanity and security-auditor outputs are advisory inputs to the chain; reviewer output must still be consumed before publishing. Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer acceptance.
510
+
511
+ ## Dependency Mapping
512
+
513
+ The bead graph should mirror execution order.
514
+
515
+ Simple chain:
516
+
517
+ ```text
518
+ task -> explore -> impl -> review
519
+ ```
520
+
521
+ Fix loop:
522
+
523
+ ```text
524
+ task -> explore -> impl -> review -> re-review
525
+ ^ |
526
+ | v
527
+ resume executor with findings
528
+ ```
529
+
530
+ Epic:
531
+
532
+ ```text
533
+ epic -> shared prep -> impl-a -> review-a
534
+ -> impl-b -> review-b
535
+ -> test-batch
536
+ -> epic merge
537
+ ```
538
+
539
+ Use `bd dep add <issue> <depends-on>` so downstream beads are blocked until upstream context exists. Test beads can depend on multiple implementation beads.
540
+
541
+ ## Monitoring
542
+
543
+ Use `sp ps` instead of ad-hoc polling.
544
+
545
+ ```bash
546
+ sp ps
547
+ sp ps <job-id>
548
+ sp ps --follow
549
+ sp ps --running # only starting/running/waiting jobs
550
+ sp ps --bead <bead-id> # only jobs linked to one bead
551
+ sp ps --since 30m # only jobs started in the last 30 minutes
552
+ sp ps --mine # only jobs whose bead is assigned to you
553
+ sp ps --include-terminal # include merged/abandoned epics (hidden by default)
554
+ sp feed <job-id>
555
+ sp result <job-id>
556
+ ```
557
+
558
+ Filter flags compose: `sp ps --running --bead <id>` is the canonical way to inspect "what's actively working on this issue right now". By default `sp ps` hides epics in `merged` or `abandoned` state to keep the snapshot focused; use `--include-terminal` (or `--all`) to bring them back.
559
+
560
+ When dead epics pile up in `failed` state (sibling-chain conflicts, manual stops), recover with `sp epic abandon <epic-id> --reason "<text>"`. The `failed -> abandoned` transition is allowed specifically for cleanup; live members still require `--force`.
561
+
562
+ Read results at every stage. Every specialist (not just READ_ONLY) auto-appends per-turn output to the input bead notes on each `run_complete`, with `[WAITING]` or `[DONE]` headers — `bd show <bead-id>` shows the full handoff trail. `sp result <job-id>` works on `waiting` jobs and returns the most recent turn plus a "Session is waiting for your input" footer; use it to decide whether to resume. If result is empty, inspect feed and rerun or switch specialists before relying on it.
563
+
564
+ Context percentage in `sp ps`/feed is an action signal:
565
+
566
+ - 0-40%: healthy.
567
+ - 40-65%: monitor.
568
+ - 65-80%: steer toward conclusion.
569
+ - Above 80%: finish, summarize, or replace the job.
570
+
571
+ Do not confuse raw token totals with context percentage. `sp ps` may show raw token counts around 50k-100k for large-context models; that alone is not a stop signal. Use the context percentage when available, plus stalls, repeated edit failures, or scope drift.
572
+
573
+ ## Steering And Resume
574
+
575
+ Use `steer` for running jobs:
576
+
577
+ ```bash
578
+ sp steer <job-id> "Stop broad audit. Answer only the three questions in the bead."
579
+ ```
580
+
581
+ Use `resume` for waiting keep-alive jobs:
582
+
583
+ ```bash
584
+ sp resume <job-id> "Reviewer PARTIAL. Fix only findings 1 and 2; do not refactor."
585
+ ```
586
+
587
+ Do not use `resume` as a substitute for a missing bead contract on a new tracked task. Create or update the bead first.
588
+
589
+ ## Merge Rules
590
+
591
+ Standalone chain:
592
+
593
+ ```bash
594
+ sp merge <chain-root-bead>
595
+ ```
596
+
597
+ Epic-owned chains:
598
+
599
+ ```bash
600
+ sp epic status <epic-id>
601
+ sp epic merge <epic-id>
602
+ ```
603
+
604
+ Rules:
605
+
606
+ - Merge only after reviewer `PASS`.
607
+ - Use `sp epic merge` for unresolved epic chains.
608
+ - Do not merge within a chain between executor and reviewer.
609
+ - Merge between stages only when later stages need the code on the main line.
610
+ - Run or confirm required gates before closing the root bead or epic.
611
+
612
+ ## Release Publication
613
+
614
+ Tagged releases go through the `releasing` skill, which dispatches the
615
+ `changelog-keeper` MEDIUM specialist. The specialist reads xt session
616
+ reports via the releasing skill's `xt-reports.ts` helper, drafts the new
617
+ section into `CHANGELOG.md`, bumps `package.json`, rebuilds `dist/`, commits
618
+ with `release: vX.Y.Z`, tags, and pushes `--follow-tags`. Optional
619
+ `gh release create` if the bead requests it.
620
+
621
+ Operator gate: a single `git diff --stat HEAD~1 HEAD` after the specialist
622
+ finishes. Must show only `CHANGELOG.md`, `package.json`, `dist/`. Anything
623
+ else means scope was violated — revert and refile.
624
+
625
+ The `changelog-keeper-scope` mandatory rule enforces the edit whitelist at
626
+ the specialist level. See `config/skills/releasing/SKILL.md` for the bead
627
+ template, dispatch command, and recovery commands.
628
+
629
+ Release helper contract:
630
+
631
+ - Report extraction is provided by the `releasing` skill, so consumer repos do not need repo-local release helper scripts.
632
+ - Release ranges support annotated tags and should be validated through the same path used by tagged releases.
633
+
634
+ ## Epic Lifecycle
635
+
636
+ Epics are merge-gated identities with a persisted state machine:
637
+
638
+ ```text
639
+ open -> resolving -> merge_ready -> merged
640
+ -> failed
641
+ -> abandoned
642
+ ```
643
+
644
+ | State | Meaning | Chains mergeable? |
645
+ | --- | --- | --- |
646
+ | `open` | Epic created, chains not yet running. | No |
647
+ | `resolving` | Chains actively running. | No |
648
+ | `merge_ready` | All chains terminal, reviewer PASS, tsc gate passes. | Yes via `sp epic merge` |
649
+ | `merged` | Publication complete. | — |
650
+ | `failed` | One or more chains failed. | Resolve or abandon. |
651
+ | `abandoned` | Cancelled without merge. | — |
652
+
653
+ Operator transitions:
654
+
655
+ ```bash
656
+ sp epic resolve <epic-id> # open -> resolving (marks epic as merge-ready target)
657
+ sp epic merge <epic-id> # merge_ready -> merged (canonical publication)
658
+ sp epic merge <epic-id> --pr # PR mode (publish via pull request)
659
+ sp epic sync <epic-id> --apply # reconcile DB vs live job state when stuck
660
+ sp epic abandon <epic-id> --reason <t> # terminal close for unrecoverable epic
661
+ sp epic abandon <epic-id> --reason <t> --force # force when active pointers still exist
662
+ ```
663
+
664
+ `sp merge <chain>` refuses if the chain belongs to an unresolved epic. Use
665
+ `sp epic merge` for epic-owned chains.
666
+
667
+ ## Concurrency And Force Flags
668
+
669
+ Edit-capable specialists (MEDIUM/HIGH permission) are blocked from entering a
670
+ workspace while the owner job is `starting` or `running`. This prevents
671
+ concurrent file corruption. READ_ONLY specialists (explorer, etc.) are always
672
+ allowed.
673
+
674
+ Override with `--force-job` only when the caller explicitly accepts the write
675
+ race (e.g. emergency fix into a stalled-but-not-terminal executor):
676
+
677
+ ```bash
678
+ sp run executor --bead <fix-bead> --job <stalled-exec-job> --force-job --context-depth 3 --background
679
+ ```
680
+
681
+ Do not use `--force-job` as a routine unblock. Inspect `sp ps <job-id>` and
682
+ prefer `sp stop <job-id>` on truly dead jobs first.
683
+
684
+ ## Terminology Bridge
685
+
686
+ Historical conversations and docs use "waves" for dispatch batches (e.g. "Wave
687
+ 1" / "Wave 2"). "Waves" are human shorthand only — not persisted. The
688
+ merge-gated identity is the epic. Map mental models as follows:
689
+
690
+ | Legacy speech | Canonical concept |
691
+ | --- | --- |
692
+ | "Wave 1" / prep wave | Stage 1 / shared prep job, `--epic` for membership |
693
+ | "Wave 2" | Implementation chains under one epic |
694
+ | "Between waves merge" | `sp epic merge <epic-id>` |
695
+ | "Parallel in wave" | Parallel chains under the same epic (disjoint scopes) |
696
+
697
+ Treat "wave" as speech, "epic" as truth.
698
+
699
+ ## Failure Handling
700
+
701
+ If a job fails or stalls:
702
+
703
+ ```bash
704
+ sp ps <job-id>
705
+ sp feed <job-id>
706
+ sp result <job-id>
707
+ sp doctor
708
+ ```
709
+
710
+ Then choose one action:
711
+
712
+ - Steer a running job back to scope.
713
+ - Resume a waiting job with exact next instruction.
714
+ - Stop a dead or obsolete job.
715
+ - Rerun with a better bead contract.
716
+ - Switch specialist if the selected role was wrong.
717
+ - Report blocker if destructive/high-risk/manual action is required.
718
+
719
+ Do not silently fall back to doing substantial specialist work yourself unless the user agrees or the work is genuinely small and deterministic.
720
+
721
+ ## Recovery Cheatsheet
722
+
723
+ Dead or zombie process:
724
+
725
+ ```bash
726
+ sp stop <job-id> # explicit single-job stop
727
+ sp clean --processes --dry-run # preview stale non-terminal cancellations (PID-dead OR > --stale-after, default 24h)
728
+ sp clean --processes # apply: cancel stale rows in observability.db
729
+ ```
730
+
731
+ `sp clean --processes` reads from `observability.db` (DB-first) and uses PID liveness as the primary gate — alive PIDs are never cancelled regardless of age. The `--stale-after <hours>` fallback applies only when a row has no recorded PID. `sp clean` with no flags purges terminal rows older than `SPECIALISTS_JOB_TTL_DAYS` (7d default); `--all` purges all terminals; `--keep <n>` retains the N most recent.
732
+
733
+ Epic state unclear:
734
+
735
+ ```bash
736
+ sp epic status <epic-id>
737
+ sp epic sync <epic-id> --apply
738
+ ```
739
+
740
+ Specialist missing, config skipped, or stale default snapshots:
741
+
742
+ ```bash
743
+ specialists list
744
+ specialists doctor
745
+ specialists doctor --check-drift
746
+ sp prune-stale-defaults --dry-run
747
+ ```
748
+
749
+ `sp prune-stale-defaults` is intentionally operator-facing. Always run `--dry-run` first unless the bead explicitly asks to apply cleanup.
750
+
751
+ Worktree already exists:
752
+
753
+ ```text
754
+ Rerun with the same bead if it is safe; worktree is reused rather than recreated.
755
+ ```
756
+
757
+ Reviewer cannot enter job workspace:
758
+
759
+ ```text
760
+ Check target job status with sp ps. MEDIUM/HIGH jobs are blocked from entering a running write-capable workspace unless forced.
761
+ ```
762
+
763
+ When resolver/catalog changes are under review inside a worktree, run `sp config show <name> --resolved --from-source` so reviewer sees local source behavior, not installed dist.
764
+
765
+ Explorer produced empty output:
766
+
767
+ ```text
768
+ Inspect feed. If no usable final summary exists, rerun with a clearer explorer bead or switch to debugger/planner as appropriate.
769
+ ```
770
+
771
+ ## What Not To Put In This Skill
772
+
773
+ Do not add historical migration notes, stale model names, exhaustive command references, internal token counts, long stuck-state postmortems, or title-only examples. Put long reference material in docs and keep this skill focused on current canonical orchestration.