xtrm-tools 0.7.13 → 0.7.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -53,7 +53,7 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
53
53
  - Your only tool is `bash`. Your only bash commands are `sp node` plus `sp ps`/`sp result`.
54
54
  - Do not call `read`, `ls`, `find`, `grep`, or any file inspection tool. You have none.
55
55
 
56
- 2. **Use only `sp node` + `sp ps` + `sp result` + `sp steer` + `sp resume` command surface for orchestration**
56
+ 2. **Use only `sp node` command surface for orchestration**
57
57
  - Do not emit legacy contract JSON plans as the primary control mechanism.
58
58
  - Do not call deprecated node action channels.
59
59
 
@@ -84,8 +84,6 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
84
84
  | `sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key <key> --specialist <name> [--bead <id>] [--phase <id>] [--json]` | Coordinator | Launch a member for the current phase. |
85
85
  | `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]` | Coordinator | Block until the named phase members reach terminal state. |
86
86
  | `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` | Coordinator | Read the persisted output for a specific member after a phase barrier. |
87
- | `sp steer <job-id> 'direction'` | Coordinator | Steer a running member with new context mid-flight. |
88
- | `sp resume <job-id> 'next task'` | Coordinator | Resume a waiting member with new task instructions. |
89
87
  | `sp node create-bead --node $SPECIALISTS_NODE_ID --title '...' [--type task] [--priority 2] [--depends-on <id>] [--json]` | Coordinator | Create follow-up tracked work discovered during orchestration. |
90
88
  | `sp node complete --node <node-id> --strategy <pr\|manual> [--json]` | Operator-only | Force-close node lifecycle when coordinator has reached waiting and operator decides to finalize. |
91
89
  | `sp node members <node-id> [--json]` | Operator | Inspect member registry and lineage. |
@@ -110,21 +108,13 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
110
108
  - after `wait-phase` succeeds, call `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` for each participating member,
111
109
  - synthesize the outputs into the next decision.
112
110
 
113
- 4. **Steer members dynamically**
114
- - after reading a member's result, if other members need updated context, steer them with `sp steer <job-id> 'specific direction from findings'`.
115
- - only steer with concrete, evidence-based direction — never speculative.
116
- - example: explorer finds X → steer researcher to 'investigate X patterns in external docs'.
117
-
118
- 5. **Re-check status**
111
+ 4. **Re-check status**
119
112
  - re-read node status after each command sequence,
120
113
  - adjust the plan from actual runtime state.
121
114
 
122
- 6. **Coordinator terminal behavior**
115
+ 5. **Coordinator terminal behavior**
123
116
  - once goals are satisfied (or terminally blocked with explicit reason),
124
- - synthesize ALL member evidence into a unified report,
125
- - this report is your final output — it MUST integrate all member findings,
126
- - 'Node completed. ok:true.' is NOT acceptable synthesis,
127
- - enter/remain in `waiting` after producing synthesis.
117
+ - synthesize evidence and enter/remain in `waiting`.
128
118
  - do not issue a completion command; operator decides lifecycle closure via `sp node stop` (or force-close via `sp node complete`).
129
119
 
130
120
  ---
@@ -137,70 +127,25 @@ Use this exact loop:
137
127
 
138
128
  1. `status`
139
129
  2. decide the next phase/member set
140
- 3. spawn members for THIS phase only (not all phases)
130
+ 3. launch members
141
131
  4. `wait-phase`
142
- 5. `result --wait` for each member
132
+ 5. `result --wait`
143
133
  6. synthesize evidence
144
- 7. steer or spawn members for next phase based on synthesis
145
- 8. repeat until all phases complete
146
- 9. produce final synthesis report
147
- 10. enter waiting for operator closure
148
-
149
- ### Multi-phase coordination pattern
150
-
151
- The coordinator MUST use at least 2 distinct phases:
152
-
153
- **Phase 1 — Explore:**
154
- - Spawn explorer to gather initial evidence
155
- - wait-phase → read result → synthesize findings
156
- - Decide: what needs deeper investigation?
157
-
158
- **Phase 2 — Deep-dive (conditional):**
159
- - Based on explore findings, spawn researcher/overthinker with specific context
160
- - Steer running members with evidence from phase 1
161
- - wait-phase → read results → synthesize
162
-
163
- **Phase 3 — Synthesis:**
164
- - Read ALL member results from all phases
165
- - Produce unified report integrating all findings
166
- - Enter waiting
134
+ 7. choose next action or enter waiting after synthesis
167
135
 
168
136
  ### Synthesis mandate
169
137
 
170
- Before declaring synthesis complete, the coordinator **MUST** read the persisted results for ALL members across ALL phases.
171
-
172
- The synthesis report MUST:
173
- - Integrate findings from every member
174
- - Highlight agreements, contradictions, and gaps
175
- - Provide actionable conclusions
176
- - Be the coordinator's own substantive output
177
-
178
- 'Node completed. ok:true.' is NEVER acceptable as synthesis output.
179
-
180
- ### Synthesis mandate (repeated for emphasis)
181
-
182
138
  Before declaring synthesis complete, the coordinator **MUST** read the persisted results for the members that produced the evidence.
183
139
 
184
140
  Do not rely only on status transitions. `wait-phase` tells you the members are terminal; `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` tells you what they actually found or changed. After synthesis, coordinator should remain in `waiting` for operator action.
185
141
 
186
142
  ### Steering guidance
187
143
 
188
- Steer when concrete result evidence shows a gap, contradiction, or missed requirement.
189
-
190
- **Steering commands:**
191
- - `sp steer <job-id> 'new direction based on evidence'` — for running members
192
- - `sp resume <job-id> 'next task with context from phase N'` — for waiting members
193
- - `sp node spawn-member ... --phase <next-phase>` — for new members with specific context
194
-
195
- **Good steering patterns:**
196
- - Explorer finds module X handles auth → steer researcher: 'Investigate how other frameworks handle auth patterns similar to module X'
197
- - Researcher finds tradeoff A vs B → spawn overthinker: 'Analyze tradeoff between A and B. Explorer found that X uses A, researcher found Y uses B. Consider: performance, complexity, ecosystem support.'
198
- - Reviewer finds missing test coverage → spawn executor: 'Add tests for the paths reviewer identified: ...'
144
+ Only steer when concrete result evidence shows a gap, contradiction, or missed requirement.
199
145
 
200
- **Bad steering patterns:**
201
- - Steering a member before reading its completed output
202
- - Steering with generic instructions ('do better', 'investigate more')
203
- - Steering speculatively without evidence from a prior member result
146
+ Do **not** steer speculatively.
147
+ - Good: result evidence shows a reviewer found a missing acceptance criterion.
148
+ - Bad: steering a member before reading its completed output.
204
149
 
205
150
  ---
206
151
 
@@ -242,49 +187,22 @@ When a command fails:
242
187
 
243
188
  ## Example command sequences
244
189
 
245
- ### Sequence A: multi-phase explore deep-dive synthesis
190
+ ### Sequence A: explore -> synthesis -> impl -> waiting
246
191
 
247
192
  ```bash
248
- # Phase 1: explore
249
193
  sp ps --node $SPECIALISTS_NODE_ID --json
250
194
  sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
251
195
  sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
252
196
  sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
253
- # Synthesize explore-1 findings. Decide what needs deeper investigation.
254
-
255
- # Phase 2: deep-dive (spawned based on explore findings)
256
- sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key researcher-1 --specialist researcher --phase deep-dive-1 --json
257
- sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key overthinker-1 --specialist overthinker --phase deep-dive-1 --json
258
- sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1,overthinker-1 --json
259
- sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
260
- sp result $SPECIALISTS_NODE_ID:overthinker-1 --wait --json
261
- # Synthesize all phase 2 evidence.
262
-
263
- # Phase 3: final synthesis
264
- # Read all member results, produce unified report, enter waiting.
265
- sp ps --node $SPECIALISTS_NODE_ID --json
266
- ```
267
-
268
- ### Sequence B: explore → steer → synthesis
269
-
270
- ```bash
271
- # Phase 1: explore
272
- sp ps --node $SPECIALISTS_NODE_ID --json
273
- sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
274
- sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
275
- sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
276
- # Explorer found X. Researcher is running — steer it.
277
-
278
- # Steer researcher with explorer findings
279
- sp steer <researcher-job-id> 'Based on explorer findings about X, investigate Y patterns in external docs'
280
- sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1 --json
281
- sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
282
-
283
- # Final synthesis — produce unified report integrating ALL findings
197
+ # Synthesize the explore findings and decide whether impl is required.
198
+ sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key impl-1 --specialist executor --phase impl-1 --json
199
+ sp node wait-phase --node $SPECIALISTS_NODE_ID --phase impl-1 --members impl-1 --json
200
+ sp result $SPECIALISTS_NODE_ID:impl-1 --wait --json
201
+ # Synthesize impl evidence, then stay in waiting for operator closure.
284
202
  sp ps --node $SPECIALISTS_NODE_ID --json
285
203
  ```
286
204
 
287
- ### Sequence C: discovered work + review synthesis + operator closure
205
+ ### Sequence B: discovered work + review synthesis + operator closure
288
206
 
289
207
  ```bash
290
208
  sp ps --node $SPECIALISTS_NODE_ID --json
@@ -319,8 +237,6 @@ sp ps --node $SPECIALISTS_NODE_ID --json
319
237
  - `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]`
320
238
  - `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json`
321
239
  - `sp ps --node $SPECIALISTS_NODE_ID --json`
322
- - `sp steer <job-id> 'new direction or context'` — steer a running member mid-flight
323
- - `sp resume <job-id> 'next task'` — resume a waiting member with new instructions
324
240
 
325
241
  ### Operator-only closure commands
326
242
  - `sp node stop <node-id>`
@@ -0,0 +1,208 @@
1
+ ---
2
+ name: using-script-specialists
3
+ description: >
4
+ Use this skill for synchronous one-shot specialist invocations via `sp script`
5
+ (CLI) or `sp serve` (HTTP daemon). These run READ_ONLY, template-driven
6
+ specialists with `$var` substitution and return JSON in-process — no beads,
7
+ no chains, no worktrees, no job lifecycle. Trigger when integrating a
8
+ specialist into a service, script, or library, when the caller needs the
9
+ output immediately, or when the work is a single LLM call with structured
10
+ input/output. Do NOT use for tracked agent work — that belongs to
11
+ `using-specialists-v2`.
12
+ version: 1.0
13
+ ---
14
+
15
+ # Script-Class Specialists
16
+
17
+ `sp script` and `sp serve` are a separate runtime from the bead-first
18
+ orchestration covered by `using-specialists-v2`. They exist for service and
19
+ library integration, not for agent chains.
20
+
21
+ | Aspect | `sp run` (orchestration) | `sp script` / `sp serve` |
22
+ | --- | --- | --- |
23
+ | Driver | bead contract | template + variables |
24
+ | Execution | supervised job, async | one-shot, synchronous |
25
+ | Permissions | READ_ONLY / MEDIUM / HIGH | READ_ONLY only |
26
+ | Worktrees | edit-capable provisions one | rejected |
27
+ | Output | result.txt + events.jsonl + bead notes | stdout JSON / HTTP body |
28
+ | Audit | `.specialists/jobs/<id>/` | one row in `.specialists/db/observability.db` |
29
+
30
+ Use `sp script` from a shell or build pipeline. Use `sp serve` from a service
31
+ that needs an HTTP endpoint backed by `pi`. The same `.specialist.json` runs
32
+ under both.
33
+
34
+ ## When To Use This Skill
35
+
36
+ Trigger when:
37
+
38
+ - A service or script needs a single LLM-backed transform (summarize, classify,
39
+ extract) returning JSON.
40
+ - You are integrating specialists into Python/Node code that cannot block on a
41
+ supervised job lifecycle.
42
+ - The call is request/response shaped: variables in, structured output out.
43
+ - You need a sidecar HTTP endpoint (`sp serve`) to wrap a specialist for a
44
+ service consumer that already speaks HTTP.
45
+
46
+ Do NOT trigger for: code review, debugging, implementation, multi-turn work,
47
+ keep-alive sessions, anything that should write files. Those belong to
48
+ `using-specialists-v2`.
49
+
50
+ ## Specialist Compatibility (compatGuard)
51
+
52
+ A spec is rejected at request time (`specialist_load_error`) if any of:
53
+
54
+ - `execution.interactive` is `true`
55
+ - `execution.requires_worktree` is `true`
56
+ - `execution.permission_required` is anything other than `READ_ONLY`
57
+ - `skills.scripts` is non-empty
58
+ - `prompt.task_template` is missing
59
+ - a referenced `$var` in the chosen template is not supplied (`template_variable_missing`)
60
+
61
+ Author specs that explicitly target script-class:
62
+
63
+ ```json
64
+ {
65
+ "specialist": {
66
+ "metadata": { "name": "summarize-event", "version": "1.0.0", "category": "ingestion" },
67
+ "execution": {
68
+ "mode": "auto",
69
+ "model": "anthropic/claude-haiku-4-5",
70
+ "timeout_ms": 30000,
71
+ "interactive": false,
72
+ "response_format": "json",
73
+ "output_type": "custom",
74
+ "permission_required": "READ_ONLY",
75
+ "requires_worktree": false,
76
+ "max_retries": 0
77
+ },
78
+ "prompt": {
79
+ "task_template": "Summarize event $event_id with body: $body. Return JSON {\"summary\": \"...\"}.",
80
+ "output_schema": { "required": ["summary"] }
81
+ }
82
+ }
83
+ }
84
+ ```
85
+
86
+ ## `sp script` — One-Shot CLI
87
+
88
+ ```bash
89
+ sp script <specialist-name> \
90
+ --vars key1=value1 --vars key2=value2 \
91
+ [--template task_template] \
92
+ [--model anthropic/claude-sonnet-4-6] \
93
+ [--thinking medium] \
94
+ [--timeout-ms 60000] \
95
+ [--db-path /path/to/observability.db] \
96
+ [--single-instance <lock-name>] \
97
+ [--no-trace] \
98
+ [--json]
99
+ ```
100
+
101
+ Behaviour:
102
+
103
+ - Loads the spec via `SpecialistLoader` (same loader as `sp run`).
104
+ - Renders `prompt.task_template` (or named template) with `--vars`.
105
+ - Spawns `pi --mode json --no-session --no-extensions --no-tools` with the
106
+ resolved model.
107
+ - Returns the final assistant text on stdout. With `--json`, returns the full
108
+ `ScriptGenerateResult` envelope.
109
+ - Writes one row to `.specialists/db/observability.db` (same writer as `sp run`).
110
+
111
+ Exit codes:
112
+
113
+ - `0` — success.
114
+ - non-zero — failure; with `--json`, body has `success: false` and `error_type`.
115
+
116
+ Use `--single-instance <lock>` when concurrent invocations of the same logical
117
+ job must be serialized (cron, batch script).
118
+
119
+ ## `sp serve` — HTTP Daemon
120
+
121
+ ```bash
122
+ sp serve \
123
+ [--port 8000] \
124
+ [--concurrency 4] \
125
+ [--queue-timeout-ms 5000] \
126
+ [--shutdown-grace-ms 30000] \
127
+ [--project-dir /path/to/project] \
128
+ [--fallback-model anthropic/claude-haiku-4-5]
129
+ ```
130
+
131
+ POST `/v1/generate`:
132
+
133
+ ```json
134
+ {
135
+ "specialist": "summarize-event",
136
+ "variables": { "event_id": "abc", "body": "..." },
137
+ "template": "task_template",
138
+ "model_override": "anthropic/...",
139
+ "timeout_ms": 60000,
140
+ "trace": true
141
+ }
142
+ ```
143
+
144
+ Response (200, success):
145
+
146
+ ```json
147
+ {
148
+ "success": true,
149
+ "output": "<final text>",
150
+ "parsed_json": { "summary": "..." },
151
+ "meta": {
152
+ "specialist": "summarize-event",
153
+ "model": "anthropic/claude-haiku-4-5",
154
+ "duration_ms": 1234,
155
+ "trace_id": "<uuid>"
156
+ }
157
+ }
158
+ ```
159
+
160
+ Response (200, failure):
161
+
162
+ ```json
163
+ { "success": false, "error": "...", "error_type": "..." }
164
+ ```
165
+
166
+ Error types: `specialist_not_found | specialist_load_error |
167
+ template_variable_missing | auth | quota | timeout | network | invalid_json |
168
+ output_too_large | internal`.
169
+
170
+ `400` is reserved for malformed HTTP. `429` returns when concurrency cap is
171
+ saturated past `queue-timeout-ms`.
172
+
173
+ ## Operational Rules
174
+
175
+ - One `pi` subprocess per in-flight request, bounded by `--concurrency`.
176
+ - Credentials come from `pi`'s own `~/.pi/agent/auth.json`. The service never
177
+ touches API keys.
178
+ - Observability DB is shared with `sp run`. Audit trail is unified.
179
+ - The service is sidecar-per-consumer: no multi-tenant routing, no session
180
+ state, no orchestration. If you need orchestration, use `sp run` + beads.
181
+ - For container deployments, see `docs/specialists-service-install.md`. Image
182
+ runs as non-root UID 10001; bind-mount `~/.pi` and `.specialists/`.
183
+
184
+ ## When To Switch Back To `using-specialists-v2`
185
+
186
+ If any of these become true mid-design, drop script-class and use the
187
+ orchestration runtime:
188
+
189
+ - The work needs to write files.
190
+ - The caller wants a multi-turn / keep-alive session.
191
+ - A reviewer pass is needed.
192
+ - The work should be tracked as a bead with auditability beyond a single
193
+ observability row.
194
+ - The output is iterative (steer / resume).
195
+
196
+ ## What Not To Put Here
197
+
198
+ - Bead workflow, chains, epics, reviewers, worktrees — those live in
199
+ `using-specialists-v2`.
200
+ - Orchestration MCP tooling (`use_specialist`).
201
+ - Long-running multi-turn examples.
202
+
203
+ ## Reference
204
+
205
+ - `docs/specialists-service.md` — HTTP contract and operational notes.
206
+ - `docs/specialists-service-install.md` — Docker/Podman install path.
207
+ - `docs/script-specialists.md` — historical context for the script-class shape.
208
+ - `src/cli/script.ts`, `src/cli/serve.ts`, `src/specialist/script-runner.ts` — runtime.
@@ -62,6 +62,17 @@ Specialists are autonomous AI agents that run independently — fresh context, d
62
62
  8. **No destructive operations by specialists.** No `rm -rf`, no force pushes, no database drops, no credential rotation, no mass deletes, no history rewrites. Surface destructive requirements to the user.
63
63
  9. **Executor does not run tests.** Executor runs lint + tsc only. Tests are the reviewer's and test-runner's responsibility in the chained pipeline.
64
64
  10. **Keep specialists alive through the review cycle.** Never `sp stop` an executor or debugger before the reviewer delivers its verdict. The specialist stays in `waiting` so you can `resume` it — to commit changes, apply fixes from reviewer feedback, or continue work. Only stop after final reviewer PASS and confirmed commit.
65
+ 11. **Respect ownership layers and loader precedence.** Loader resolution order is `.specialists/user/*` > `.specialists/default/*` > package fallback `config/*`. Upstream source = package `config/*` (read-only for repo operators); managed mirror = `.specialists/default/*` (no hand edits); repo custom layer = `.specialists/user/*`; runtime/generated = `.specialists/{jobs,ready,db}`.
66
+ 12. **Keep backlog-clean isolated.** Do not mix backlog-clean changes into specialist ownership/migration tasks.
67
+
68
+ ## Mandatory-rules template sets
69
+
70
+ Use template-driven mandatory rules for repeatable policy bundles.
71
+
72
+ - Specialist config field: `specialist.mandatory_rules.template_sets`
73
+ - Template source: `config/mandatory-rules/*.md`
74
+ - Template format: YAML frontmatter + body content
75
+ - Runtime behavior: runner resolves templates and injects rendered rules at end of prompt
65
76
 
66
77
  ---
67
78
 
@@ -127,11 +138,13 @@ specialists stop <job-id> --force # 5s SIGTERM timeout, then pgroup
127
138
 
128
139
  # Management
129
140
  specialists edit <name> # edit specialist config (dot-path, --preset)
141
+ specialists edit <name> --fork-from <base> # fork non-user specialist into .specialists/user/ then edit
130
142
  specialists clean # purge old job dirs + worktree GC
131
143
  specialists clean --processes # kill all running/starting specialist jobs
132
144
  specialists db vacuum # compact SQLite storage (refuses if jobs running)
133
145
  specialists db prune --before <iso|duration> --dry-run|--apply # prune old events/results/terminal jobs
134
146
  specialists doctor orphans # integrity scan: orphan, stale-pointer, integrity-violation
147
+ specialists init --sync-defaults # refresh specialists + mandatory-rules + nodes from canonical defaults
135
148
  specialists init --sync-skills # re-sync skills only (no full init)
136
149
  specialists init --no-xtrm-check # skip xtrm prerequisite check (CI/testing)
137
150
  ```
@@ -8,7 +8,7 @@ description: >
8
8
  work without drift. Trigger for code review, debugging, implementation,
9
9
  planning, test generation, doc sync, multi-chain epics, and any question about
10
10
  specialist orchestration.
11
- version: 1.1
11
+ version: 1.4
12
12
  ---
13
13
 
14
14
  # Specialists V2
@@ -51,6 +51,21 @@ When the local version is behind, the latest CHANGELOG entry can be summarized v
51
51
  14. Stale-base guard: dispatch refuses to provision a worktree when sibling epic chains have unmerged substantive commits. Override only with explicit `--force-stale-base` and a reason. Merge-time rebase happens automatically.
52
52
  15. Auto-checkpoint: executor and debugger commit substantive worktree changes on `waiting` by default (`auto_commit: checkpoint_on_waiting`). Noise paths (`.xtrm/`, `.wolf/`, `.specialists/jobs/`, `.beads/`) are filtered.
53
53
  16. Per-turn output appends to the input bead notes for **all** specialists on every `run_complete`, with `[WAITING — more output may follow]` or `[DONE]` headers. `bd show <bead-id>` is a valid path to read intermediate output.
54
+ 17. Specialist jobs do not orchestrate nested specialist chains. The top-level orchestrator dispatches specialists, collects results, and advances the workflow.
55
+ 18. Treat test failures as evidence to classify against the bead scope. Validate whether failures are in-scope, pre-existing, or infrastructure-related before sending an executor into a fix loop.
56
+
57
+ ## Canonical Runtime State
58
+
59
+ These are current operating facts, not migration notes:
60
+
61
+ - **Asset ownership:** Cat A runtime assets — specialists, mandatory-rules, catalog, and nodes — resolve live from the specialists package after project tiers. Cat B filesystem assets — skills and hooks — are owned by xtrm-tools under `.xtrm/skills/default` and `.xtrm/hooks/default`.
62
+ - **Resolution precedence:** project/user tiers win over managed defaults; package-live is the final fallback. Mandatory-rule indexes are not stacked across tiers; per-id mandatory-rule files may fall through to package canonical when absent locally.
63
+ - **Drift surface:** use `sp doctor --check-drift` to inspect stale managed defaults and `sp prune-stale-defaults --dry-run` to preview cleanup.
64
+ - **Source verification:** resolver/catalog changes in a worktree are verified with `sp config show <name> --resolved --from-source` so evidence comes from the checked-out source, not an installed dist.
65
+ - **Worktree publication:** edit-capable specialists produce worktree branches. Before review or merge, verify the branch diff and status from that worktree.
66
+ - **Epic publication:** epics are the merge-gated identity. Publish through `sp epic merge`; use `sp epic abandon` to deliberately close failed or cancelled epic bookkeeping.
67
+ - **CLI safety:** command help paths are side-effect free. New commands must parse `--help`/`-h` before action and have a no-write help test.
68
+ - **Release context:** changelog-keeper receives xt report context through the `releasing` skill's helper. Release-range logic supports annotated tags.
54
69
 
55
70
  ## Autonomous Drive
56
71
 
@@ -199,20 +214,24 @@ Run `specialists list` if you need the live registry. Choose by task, not by hab
199
214
  | Planning/decomposition | `planner` | You need beads, dependencies, file scopes, or sequencing. |
200
215
  | Design/tradeoffs | `overthinker` | The approach is risky, ambiguous, or needs critique. |
201
216
  | Implementation | `executor` | The contract is clear enough to write code or docs. |
202
- | Compliance/code review | `reviewer` | An executor/debugger produced changes that need a verdict. |
217
+ | Compliance/code review | `reviewer` | An executor/debugger produced changes that need the final PASS/PARTIAL/FAIL verdict. |
218
+ | Implementation sanity | `code-sanity` | You want a cheap READ_ONLY smell pass for simplicity, type safety, dead code, brittle async/error handling, or maintainability before reviewer. |
219
+ | Security/dependency audit | `security-auditor` | You need threat modeling, secure-code review, package advisory triage, or agent/config security scanning. LOW: scan/read/recommend only. |
203
220
  | Multiple review perspectives | `parallel-review` | A critical diff needs independent review passes. |
204
221
  | Test execution | `test-runner` | You need suites run and failures interpreted. |
205
222
  | Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization. |
206
- | External/live research | `researcher` | Current library/docs/media lookup is needed. |
223
+ | External/live research | `researcher` | Current non-security library/docs/media lookup is needed. |
207
224
  | Specialist config | `specialists-creator` | Creating or changing specialist JSON/config. |
208
- | Release changelog drafting | `changelog-keeper` | A new tag is being cut and a `[X.Y.Z] - YYYY-MM-DD` section is needed. Driven by `sp release prepare`, not invoked directly. |
225
+ | Release publication (end-to-end) | `changelog-keeper` | A new tag is being cut. MEDIUM specialist: drafts CHANGELOG section from xt reports, bumps package.json, rebuilds dist, commits, tags, pushes. Use the `releasing` skill to dispatch. |
209
226
 
210
227
  Selection rules:
211
228
 
212
229
  - Explorer is READ_ONLY and should answer specific questions.
213
230
  - Debugger is better than explorer for failures because it traces causes and remediation.
214
231
  - Executor does not own full test validation; use reviewer/test-runner for that phase.
215
- - Reviewer always uses its own bead plus `--job <executor-job>`.
232
+ - Code-sanity is optional and non-blocking by default: use it when a diff smells overcomplicated or type-risky, then resume executor with concrete findings. It is not a merge gate.
233
+ - Security-auditor may run safe local audit commands and web/source research, but must not edit files, update dependencies, exfiltrate secrets, or run destructive/live-target exploit tests. Executor applies any recommended fixes in a separate bead.
234
+ - Reviewer always uses its own bead plus `--job <executor-job>` and remains the final merge gate.
216
235
  - Sync-docs is for audit/sync; executor is for heavy doc rewrites.
217
236
  - Specialists-creator should precede specialist config/schema edits.
218
237
 
@@ -224,8 +243,12 @@ Daily commands:
224
243
  specialists list
225
244
  specialists list-rules # rule × specialist matrix
226
245
  specialists doctor
246
+ specialists doctor --check-drift # inspect stale .specialists/default snapshots
247
+ sp prune-stale-defaults --dry-run # preview redundant default snapshots
227
248
  specialists run <name> --bead <id> --background
228
249
  specialists run executor --bead <impl-bead> --background # worktree auto-provisioned
250
+ specialists run code-sanity --bead <sanity-bead> --job <exec-job> --keep-alive --background
251
+ specialists run security-auditor --bead <security-bead> --job <exec-job> --keep-alive --background
229
252
  specialists run reviewer --bead <review-bead> --job <exec-job> --keep-alive --background
230
253
  specialists ps
231
254
  specialists ps <job-id>
@@ -245,6 +268,7 @@ sp merge <chain-root-bead>
245
268
  sp epic status <epic-id>
246
269
  sp epic sync <epic-id> --apply
247
270
  sp epic merge <epic-id>
271
+ sp epic abandon <epic-id> --reason "..."
248
272
  sp end
249
273
  ```
250
274
 
@@ -319,6 +343,42 @@ specialists run executor --worktree --bead <impl> --context-depth 3 --background
319
343
  specialists result <exec-job>
320
344
  ```
321
345
 
346
+ Optional code-sanity pass for implementation smell checks (use when the diff is non-trivial or likely to accumulate agent-code complexity):
347
+
348
+ ```bash
349
+ bd create --title "Code sanity check token refresh retry" --type task --priority 3 \
350
+ --description "PROBLEM: Cheap READ_ONLY sanity pass for executor implementation quality before final review.
351
+ SUCCESS: Identify concrete simplicity/type-safety/maintainability findings, or return OK.
352
+ SCOPE: executor job <exec-job>, implementation diff only.
353
+ NON_GOALS: No requirements verdict, no security audit, no test execution, no edits.
354
+ CONSTRAINTS: At most 5 concrete findings; cite files/symbols/lines where possible.
355
+ VALIDATION: Findings are suitable to paste into specialists resume <exec-job>.
356
+ OUTPUT: OK/FINDINGS/BLOCKED with handoff."
357
+ bd dep add <sanity> <impl>
358
+ specialists run code-sanity --bead <sanity> --job <exec-job> --context-depth 3 --keep-alive --background
359
+ specialists result <sanity-job>
360
+ ```
361
+
362
+ If code-sanity returns `FINDINGS`, resume executor with those concrete instructions, then rerun code-sanity only if the fixes were substantive. Do not treat code-sanity `OK` as reviewer PASS.
363
+
364
+ Optional security pass when the task touches auth, secrets, input handling, dependency updates, package advisories, agent config, hooks, or exposed endpoints:
365
+
366
+ ```bash
367
+ bd create --title "Security audit token refresh retry" --type task --priority 2 \
368
+ --description "PROBLEM: Scoped security/dependency/config audit for executor changes.
369
+ SUCCESS: Identify evidence-backed security findings or return no findings.
370
+ SCOPE: executor job <exec-job>, changed files, relevant manifests/config only.
371
+ NON_GOALS: No edits, no package updates, no destructive scans, no live exploit testing.
372
+ CONSTRAINTS: LOW permission; recommendations only. HN/social signals are not authoritative proof.
373
+ VALIDATION: Findings cite local evidence or OSV/GHSA/NVD/vendor/package-audit sources.
374
+ OUTPUT: Security audit summary, findings, dependency triage, residual risk."
375
+ bd dep add <security> <impl>
376
+ specialists run security-auditor --bead <security> --job <exec-job> --context-depth 3 --keep-alive --background
377
+ specialists result <security-job>
378
+ ```
379
+
380
+ If security-auditor recommends code or dependency changes, create/resume an executor fix bead. Do not let security-auditor apply updates.
381
+
322
382
  Create review bead:
323
383
 
324
384
  ```bash
@@ -432,6 +492,12 @@ Standard loop:
432
492
  ```text
433
493
  executor --worktree --bead impl
434
494
  -> waiting after turn
495
+ optional code-sanity --bead sanity --job exec-job
496
+ -> OK: continue
497
+ -> FINDINGS: resume executor with exact sanity findings
498
+ optional security-auditor --bead security --job exec-job
499
+ -> no findings: continue
500
+ -> findings: create/resume executor fix bead; auditor never edits
435
501
  reviewer --bead review --job exec-job
436
502
  -> PASS: verify commit, publish, stop members if needed
437
503
  -> PARTIAL: resume executor with exact findings
@@ -440,7 +506,7 @@ reviewer --bead review --job exec-job
440
506
 
441
507
  Prefer `sp resume <exec-job>` over a new fix executor when the original job is waiting and context is healthy. Use a new fix bead with `--job <exec-job>` only when the original executor is dead, context exhausted, or a separate audit trail is required.
442
508
 
443
- Reviewer output must be consumed before publishing. Do not treat job completion as equivalent to acceptance.
509
+ Code-sanity and security-auditor outputs are advisory inputs to the chain; reviewer output must still be consumed before publishing. Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer acceptance.
444
510
 
445
511
  ## Dependency Mapping
446
512
 
@@ -480,10 +546,19 @@ Use `sp ps` instead of ad-hoc polling.
480
546
  sp ps
481
547
  sp ps <job-id>
482
548
  sp ps --follow
549
+ sp ps --running # only starting/running/waiting jobs
550
+ sp ps --bead <bead-id> # only jobs linked to one bead
551
+ sp ps --since 30m # only jobs started in the last 30 minutes
552
+ sp ps --mine # only jobs whose bead is assigned to you
553
+ sp ps --include-terminal # include merged/abandoned epics (hidden by default)
483
554
  sp feed <job-id>
484
555
  sp result <job-id>
485
556
  ```
486
557
 
558
+ Filter flags compose: `sp ps --running --bead <id>` is the canonical way to inspect "what's actively working on this issue right now". By default `sp ps` hides epics in `merged` or `abandoned` state to keep the snapshot focused; use `--include-terminal` (or `--all`) to bring them back.
559
+
560
+ When dead epics pile up in `failed` state (sibling-chain conflicts, manual stops), recover with `sp epic abandon <epic-id> --reason "<text>"`. The `failed -> abandoned` transition is allowed specifically for cleanup; live members still require `--force`.
561
+
487
562
  Read results at every stage. Every specialist (not just READ_ONLY) auto-appends per-turn output to the input bead notes on each `run_complete`, with `[WAITING]` or `[DONE]` headers — `bd show <bead-id>` shows the full handoff trail. `sp result <job-id>` works on `waiting` jobs and returns the most recent turn plus a "Session is waiting for your input" footer; use it to decide whether to resume. If result is empty, inspect feed and rerun or switch specialists before relying on it.
488
563
 
489
564
  Context percentage in `sp ps`/feed is an action signal:
@@ -493,6 +568,8 @@ Context percentage in `sp ps`/feed is an action signal:
493
568
  - 65-80%: steer toward conclusion.
494
569
  - Above 80%: finish, summarize, or replace the job.
495
570
 
571
+ Do not confuse raw token totals with context percentage. `sp ps` may show raw token counts around 50k-100k for large-context models; that alone is not a stop signal. Use the context percentage when available, plus stalls, repeated edit failures, or scope drift.
572
+
496
573
  ## Steering And Resume
497
574
 
498
575
  Use `steer` for running jobs:
@@ -534,18 +611,25 @@ Rules:
534
611
 
535
612
  ## Release Publication
536
613
 
537
- Tagged releases go through `sp release`, not manual `git tag`:
614
+ Tagged releases go through the `releasing` skill, which dispatches the
615
+ `changelog-keeper` MEDIUM specialist. The specialist reads xt session
616
+ reports via the releasing skill's `xt-reports.ts` helper, drafts the new
617
+ section into `CHANGELOG.md`, bumps `package.json`, rebuilds `dist/`, commits
618
+ with `release: vX.Y.Z`, tags, and pushes `--follow-tags`. Optional
619
+ `gh release create` if the bead requests it.
538
620
 
539
- ```bash
540
- sp release prepare [--major | --minor | --patch] # default: --patch
541
- sp release publish
542
- ```
621
+ Operator gate: a single `git diff --stat HEAD~1 HEAD` after the specialist
622
+ finishes. Must show only `CHANGELOG.md`, `package.json`, `dist/`. Anything
623
+ else means scope was violated — revert and refile.
543
624
 
544
- `prepare` invokes the `changelog-keeper` specialist to draft a Keep-a-Changelog section between the previous tag and the next tag, bumps `package.json`, and stages `CHANGELOG.md` + `package.json` + `dist/index.js`. It does not commit — operator reviews and commits with `release: v<version>`.
625
+ The `changelog-keeper-scope` mandatory rule enforces the edit whitelist at
626
+ the specialist level. See `config/skills/releasing/SKILL.md` for the bead
627
+ template, dispatch command, and recovery commands.
545
628
 
546
- `publish` validates the staged commit (dirty-tree refusal, HEAD message match, version match, top-section match in `CHANGELOG.md`), creates the annotated tag, pushes to origin, and optionally creates a GitHub release via `gh`. Re-emits the empty `[Unreleased]` placeholder for the next cycle.
629
+ Release helper contract:
547
630
 
548
- The `changelog-keeper` specialist is READ_ONLY; the CLI is the file mutator. See `docs/release.md` for the operator runbook.
631
+ - Report extraction is provided by the `releasing` skill, so consumer repos do not need repo-local release helper scripts.
632
+ - Release ranges support annotated tags and should be validated through the same path used by tagged releases.
549
633
 
550
634
  ## Epic Lifecycle
551
635
 
@@ -653,13 +737,17 @@ sp epic status <epic-id>
653
737
  sp epic sync <epic-id> --apply
654
738
  ```
655
739
 
656
- Specialist missing or config skipped:
740
+ Specialist missing, config skipped, or stale default snapshots:
657
741
 
658
742
  ```bash
659
743
  specialists list
660
744
  specialists doctor
745
+ specialists doctor --check-drift
746
+ sp prune-stale-defaults --dry-run
661
747
  ```
662
748
 
749
+ `sp prune-stale-defaults` is intentionally operator-facing. Always run `--dry-run` first unless the bead explicitly asks to apply cleanup.
750
+
663
751
  Worktree already exists:
664
752
 
665
753
  ```text
@@ -672,6 +760,8 @@ Reviewer cannot enter job workspace:
672
760
  Check target job status with sp ps. MEDIUM/HIGH jobs are blocked from entering a running write-capable workspace unless forced.
673
761
  ```
674
762
 
763
+ When resolver/catalog changes are under review inside a worktree, run `sp config show <name> --resolved --from-source` so reviewer sees local source behavior, not installed dist.
764
+
675
765
  Explorer produced empty output:
676
766
 
677
767
  ```text