xtrm-tools 0.7.13 → 0.7.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.xtrm/config/hooks.json +10 -0
- package/.xtrm/hooks/specialists-agent-guard.mjs +76 -0
- package/.xtrm/registry.json +433 -413
- package/.xtrm/skills/default/releasing/SKILL.md +49 -45
- package/.xtrm/skills/default/releasing/scripts/xt-reports.ts +18 -0
- package/.xtrm/skills/default/session-close-report/SKILL.md +85 -17
- package/.xtrm/skills/default/specialists-creator/SKILL.md +133 -42
- package/.xtrm/skills/default/specialists-creator/scripts/audit-spec-uniformity.mjs +86 -0
- package/.xtrm/skills/default/specialists-creator/scripts/scaffold-specialist.ts +223 -0
- package/.xtrm/skills/default/specialists-creator/scripts/validate-specialist.ts +1 -1
- package/.xtrm/skills/default/update-specialists/SKILL.md +98 -392
- package/.xtrm/skills/default/using-nodes/SKILL.md +18 -102
- package/.xtrm/skills/default/using-script-specialists/SKILL.md +208 -0
- package/.xtrm/skills/default/using-specialists/SKILL.md +13 -0
- package/.xtrm/skills/default/using-specialists-v2/SKILL.md +105 -15
- package/.xtrm/skills/default/using-xtrm/SKILL.md +14 -0
- package/CHANGELOG.md +22 -0
- package/README.md +5 -1
- package/cli/dist/index.cjs +2991 -627
- package/cli/dist/index.cjs.map +1 -1
- package/cli/package.json +1 -1
- package/package.json +3 -2
- package/packages/pi-extensions/.serena/project.yml +11 -0
- package/packages/pi-extensions/package.json +1 -1
- package/scripts/patch-external-pi-tools.mjs +154 -0
|
@@ -53,7 +53,7 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
|
|
|
53
53
|
- Your only tool is `bash`. Your only bash commands are `sp node` plus `sp ps`/`sp result`.
|
|
54
54
|
- Do not call `read`, `ls`, `find`, `grep`, or any file inspection tool. You have none.
|
|
55
55
|
|
|
56
|
-
2. **Use only `sp node`
|
|
56
|
+
2. **Use only `sp node` command surface for orchestration**
|
|
57
57
|
- Do not emit legacy contract JSON plans as the primary control mechanism.
|
|
58
58
|
- Do not call deprecated node action channels.
|
|
59
59
|
|
|
@@ -84,8 +84,6 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
|
|
|
84
84
|
| `sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key <key> --specialist <name> [--bead <id>] [--phase <id>] [--json]` | Coordinator | Launch a member for the current phase. |
|
|
85
85
|
| `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]` | Coordinator | Block until the named phase members reach terminal state. |
|
|
86
86
|
| `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` | Coordinator | Read the persisted output for a specific member after a phase barrier. |
|
|
87
|
-
| `sp steer <job-id> 'direction'` | Coordinator | Steer a running member with new context mid-flight. |
|
|
88
|
-
| `sp resume <job-id> 'next task'` | Coordinator | Resume a waiting member with new task instructions. |
|
|
89
87
|
| `sp node create-bead --node $SPECIALISTS_NODE_ID --title '...' [--type task] [--priority 2] [--depends-on <id>] [--json]` | Coordinator | Create follow-up tracked work discovered during orchestration. |
|
|
90
88
|
| `sp node complete --node <node-id> --strategy <pr\|manual> [--json]` | Operator-only | Force-close node lifecycle when coordinator has reached waiting and operator decides to finalize. |
|
|
91
89
|
| `sp node members <node-id> [--json]` | Operator | Inspect member registry and lineage. |
|
|
@@ -110,21 +108,13 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
|
|
|
110
108
|
- after `wait-phase` succeeds, call `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` for each participating member,
|
|
111
109
|
- synthesize the outputs into the next decision.
|
|
112
110
|
|
|
113
|
-
4. **
|
|
114
|
-
- after reading a member's result, if other members need updated context, steer them with `sp steer <job-id> 'specific direction from findings'`.
|
|
115
|
-
- only steer with concrete, evidence-based direction — never speculative.
|
|
116
|
-
- example: explorer finds X → steer researcher to 'investigate X patterns in external docs'.
|
|
117
|
-
|
|
118
|
-
5. **Re-check status**
|
|
111
|
+
4. **Re-check status**
|
|
119
112
|
- re-read node status after each command sequence,
|
|
120
113
|
- adjust the plan from actual runtime state.
|
|
121
114
|
|
|
122
|
-
|
|
115
|
+
5. **Coordinator terminal behavior**
|
|
123
116
|
- once goals are satisfied (or terminally blocked with explicit reason),
|
|
124
|
-
- synthesize
|
|
125
|
-
- this report is your final output — it MUST integrate all member findings,
|
|
126
|
-
- 'Node completed. ok:true.' is NOT acceptable synthesis,
|
|
127
|
-
- enter/remain in `waiting` after producing synthesis.
|
|
117
|
+
- synthesize evidence and enter/remain in `waiting`.
|
|
128
118
|
- do not issue a completion command; operator decides lifecycle closure via `sp node stop` (or force-close via `sp node complete`).
|
|
129
119
|
|
|
130
120
|
---
|
|
@@ -137,70 +127,25 @@ Use this exact loop:
|
|
|
137
127
|
|
|
138
128
|
1. `status`
|
|
139
129
|
2. decide the next phase/member set
|
|
140
|
-
3.
|
|
130
|
+
3. launch members
|
|
141
131
|
4. `wait-phase`
|
|
142
|
-
5. `result --wait`
|
|
132
|
+
5. `result --wait`
|
|
143
133
|
6. synthesize evidence
|
|
144
|
-
7.
|
|
145
|
-
8. repeat until all phases complete
|
|
146
|
-
9. produce final synthesis report
|
|
147
|
-
10. enter waiting for operator closure
|
|
148
|
-
|
|
149
|
-
### Multi-phase coordination pattern
|
|
150
|
-
|
|
151
|
-
The coordinator MUST use at least 2 distinct phases:
|
|
152
|
-
|
|
153
|
-
**Phase 1 — Explore:**
|
|
154
|
-
- Spawn explorer to gather initial evidence
|
|
155
|
-
- wait-phase → read result → synthesize findings
|
|
156
|
-
- Decide: what needs deeper investigation?
|
|
157
|
-
|
|
158
|
-
**Phase 2 — Deep-dive (conditional):**
|
|
159
|
-
- Based on explore findings, spawn researcher/overthinker with specific context
|
|
160
|
-
- Steer running members with evidence from phase 1
|
|
161
|
-
- wait-phase → read results → synthesize
|
|
162
|
-
|
|
163
|
-
**Phase 3 — Synthesis:**
|
|
164
|
-
- Read ALL member results from all phases
|
|
165
|
-
- Produce unified report integrating all findings
|
|
166
|
-
- Enter waiting
|
|
134
|
+
7. choose next action or enter waiting after synthesis
|
|
167
135
|
|
|
168
136
|
### Synthesis mandate
|
|
169
137
|
|
|
170
|
-
Before declaring synthesis complete, the coordinator **MUST** read the persisted results for ALL members across ALL phases.
|
|
171
|
-
|
|
172
|
-
The synthesis report MUST:
|
|
173
|
-
- Integrate findings from every member
|
|
174
|
-
- Highlight agreements, contradictions, and gaps
|
|
175
|
-
- Provide actionable conclusions
|
|
176
|
-
- Be the coordinator's own substantive output
|
|
177
|
-
|
|
178
|
-
'Node completed. ok:true.' is NEVER acceptable as synthesis output.
|
|
179
|
-
|
|
180
|
-
### Synthesis mandate (repeated for emphasis)
|
|
181
|
-
|
|
182
138
|
Before declaring synthesis complete, the coordinator **MUST** read the persisted results for the members that produced the evidence.
|
|
183
139
|
|
|
184
140
|
Do not rely only on status transitions. `wait-phase` tells you the members are terminal; `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` tells you what they actually found or changed. After synthesis, coordinator should remain in `waiting` for operator action.
|
|
185
141
|
|
|
186
142
|
### Steering guidance
|
|
187
143
|
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
**Steering commands:**
|
|
191
|
-
- `sp steer <job-id> 'new direction based on evidence'` — for running members
|
|
192
|
-
- `sp resume <job-id> 'next task with context from phase N'` — for waiting members
|
|
193
|
-
- `sp node spawn-member ... --phase <next-phase>` — for new members with specific context
|
|
194
|
-
|
|
195
|
-
**Good steering patterns:**
|
|
196
|
-
- Explorer finds module X handles auth → steer researcher: 'Investigate how other frameworks handle auth patterns similar to module X'
|
|
197
|
-
- Researcher finds tradeoff A vs B → spawn overthinker: 'Analyze tradeoff between A and B. Explorer found that X uses A, researcher found Y uses B. Consider: performance, complexity, ecosystem support.'
|
|
198
|
-
- Reviewer finds missing test coverage → spawn executor: 'Add tests for the paths reviewer identified: ...'
|
|
144
|
+
Only steer when concrete result evidence shows a gap, contradiction, or missed requirement.
|
|
199
145
|
|
|
200
|
-
**
|
|
201
|
-
-
|
|
202
|
-
-
|
|
203
|
-
- Steering speculatively without evidence from a prior member result
|
|
146
|
+
Do **not** steer speculatively.
|
|
147
|
+
- Good: result evidence shows a reviewer found a missing acceptance criterion.
|
|
148
|
+
- Bad: steering a member before reading its completed output.
|
|
204
149
|
|
|
205
150
|
---
|
|
206
151
|
|
|
@@ -242,49 +187,22 @@ When a command fails:
|
|
|
242
187
|
|
|
243
188
|
## Example command sequences
|
|
244
189
|
|
|
245
|
-
### Sequence A:
|
|
190
|
+
### Sequence A: explore -> synthesis -> impl -> waiting
|
|
246
191
|
|
|
247
192
|
```bash
|
|
248
|
-
# Phase 1: explore
|
|
249
193
|
sp ps --node $SPECIALISTS_NODE_ID --json
|
|
250
194
|
sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
|
|
251
195
|
sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
|
|
252
196
|
sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
|
|
253
|
-
# Synthesize explore
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
sp
|
|
257
|
-
|
|
258
|
-
sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1,overthinker-1 --json
|
|
259
|
-
sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
|
|
260
|
-
sp result $SPECIALISTS_NODE_ID:overthinker-1 --wait --json
|
|
261
|
-
# Synthesize all phase 2 evidence.
|
|
262
|
-
|
|
263
|
-
# Phase 3: final synthesis
|
|
264
|
-
# Read all member results, produce unified report, enter waiting.
|
|
265
|
-
sp ps --node $SPECIALISTS_NODE_ID --json
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
### Sequence B: explore → steer → synthesis
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
# Phase 1: explore
|
|
272
|
-
sp ps --node $SPECIALISTS_NODE_ID --json
|
|
273
|
-
sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
|
|
274
|
-
sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
|
|
275
|
-
sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
|
|
276
|
-
# Explorer found X. Researcher is running — steer it.
|
|
277
|
-
|
|
278
|
-
# Steer researcher with explorer findings
|
|
279
|
-
sp steer <researcher-job-id> 'Based on explorer findings about X, investigate Y patterns in external docs'
|
|
280
|
-
sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1 --json
|
|
281
|
-
sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
|
|
282
|
-
|
|
283
|
-
# Final synthesis — produce unified report integrating ALL findings
|
|
197
|
+
# Synthesize the explore findings and decide whether impl is required.
|
|
198
|
+
sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key impl-1 --specialist executor --phase impl-1 --json
|
|
199
|
+
sp node wait-phase --node $SPECIALISTS_NODE_ID --phase impl-1 --members impl-1 --json
|
|
200
|
+
sp result $SPECIALISTS_NODE_ID:impl-1 --wait --json
|
|
201
|
+
# Synthesize impl evidence, then stay in waiting for operator closure.
|
|
284
202
|
sp ps --node $SPECIALISTS_NODE_ID --json
|
|
285
203
|
```
|
|
286
204
|
|
|
287
|
-
### Sequence
|
|
205
|
+
### Sequence B: discovered work + review synthesis + operator closure
|
|
288
206
|
|
|
289
207
|
```bash
|
|
290
208
|
sp ps --node $SPECIALISTS_NODE_ID --json
|
|
@@ -319,8 +237,6 @@ sp ps --node $SPECIALISTS_NODE_ID --json
|
|
|
319
237
|
- `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]`
|
|
320
238
|
- `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json`
|
|
321
239
|
- `sp ps --node $SPECIALISTS_NODE_ID --json`
|
|
322
|
-
- `sp steer <job-id> 'new direction or context'` — steer a running member mid-flight
|
|
323
|
-
- `sp resume <job-id> 'next task'` — resume a waiting member with new instructions
|
|
324
240
|
|
|
325
241
|
### Operator-only closure commands
|
|
326
242
|
- `sp node stop <node-id>`
|
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: using-script-specialists
|
|
3
|
+
description: >
|
|
4
|
+
Use this skill for synchronous one-shot specialist invocations via `sp script`
|
|
5
|
+
(CLI) or `sp serve` (HTTP daemon). These run READ_ONLY, template-driven
|
|
6
|
+
specialists with `$var` substitution and return JSON in-process — no beads,
|
|
7
|
+
no chains, no worktrees, no job lifecycle. Trigger when integrating a
|
|
8
|
+
specialist into a service, script, or library, when the caller needs the
|
|
9
|
+
output immediately, or when the work is a single LLM call with structured
|
|
10
|
+
input/output. Do NOT use for tracked agent work — that belongs to
|
|
11
|
+
`using-specialists-v2`.
|
|
12
|
+
version: 1.0
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Script-Class Specialists
|
|
16
|
+
|
|
17
|
+
`sp script` and `sp serve` are a separate runtime from the bead-first
|
|
18
|
+
orchestration covered by `using-specialists-v2`. They exist for service and
|
|
19
|
+
library integration, not for agent chains.
|
|
20
|
+
|
|
21
|
+
| Aspect | `sp run` (orchestration) | `sp script` / `sp serve` |
|
|
22
|
+
| --- | --- | --- |
|
|
23
|
+
| Driver | bead contract | template + variables |
|
|
24
|
+
| Execution | supervised job, async | one-shot, synchronous |
|
|
25
|
+
| Permissions | READ_ONLY / MEDIUM / HIGH | READ_ONLY only |
|
|
26
|
+
| Worktrees | edit-capable provisions one | rejected |
|
|
27
|
+
| Output | result.txt + events.jsonl + bead notes | stdout JSON / HTTP body |
|
|
28
|
+
| Audit | `.specialists/jobs/<id>/` | one row in `.specialists/db/observability.db` |
|
|
29
|
+
|
|
30
|
+
Use `sp script` from a shell or build pipeline. Use `sp serve` from a service
|
|
31
|
+
that needs an HTTP endpoint backed by `pi`. The same `.specialist.json` runs
|
|
32
|
+
under both.
|
|
33
|
+
|
|
34
|
+
## When To Use This Skill
|
|
35
|
+
|
|
36
|
+
Trigger when:
|
|
37
|
+
|
|
38
|
+
- A service or script needs a single LLM-backed transform (summarize, classify,
|
|
39
|
+
extract) returning JSON.
|
|
40
|
+
- You are integrating specialists into Python/Node code that cannot block on a
|
|
41
|
+
supervised job lifecycle.
|
|
42
|
+
- The call is request/response shaped: variables in, structured output out.
|
|
43
|
+
- You need a sidecar HTTP endpoint (`sp serve`) to wrap a specialist for a
|
|
44
|
+
service consumer that already speaks HTTP.
|
|
45
|
+
|
|
46
|
+
Do NOT trigger for: code review, debugging, implementation, multi-turn work,
|
|
47
|
+
keep-alive sessions, anything that should write files. Those belong to
|
|
48
|
+
`using-specialists-v2`.
|
|
49
|
+
|
|
50
|
+
## Specialist Compatibility (compatGuard)
|
|
51
|
+
|
|
52
|
+
A spec is rejected at request time (`specialist_load_error`) if any of:
|
|
53
|
+
|
|
54
|
+
- `execution.interactive` is `true`
|
|
55
|
+
- `execution.requires_worktree` is `true`
|
|
56
|
+
- `execution.permission_required` is anything other than `READ_ONLY`
|
|
57
|
+
- `skills.scripts` is non-empty
|
|
58
|
+
- `prompt.task_template` is missing
|
|
59
|
+
- a referenced `$var` in the chosen template is not supplied (`template_variable_missing`)
|
|
60
|
+
|
|
61
|
+
Author specs that explicitly target script-class:
|
|
62
|
+
|
|
63
|
+
```json
|
|
64
|
+
{
|
|
65
|
+
"specialist": {
|
|
66
|
+
"metadata": { "name": "summarize-event", "version": "1.0.0", "category": "ingestion" },
|
|
67
|
+
"execution": {
|
|
68
|
+
"mode": "auto",
|
|
69
|
+
"model": "anthropic/claude-haiku-4-5",
|
|
70
|
+
"timeout_ms": 30000,
|
|
71
|
+
"interactive": false,
|
|
72
|
+
"response_format": "json",
|
|
73
|
+
"output_type": "custom",
|
|
74
|
+
"permission_required": "READ_ONLY",
|
|
75
|
+
"requires_worktree": false,
|
|
76
|
+
"max_retries": 0
|
|
77
|
+
},
|
|
78
|
+
"prompt": {
|
|
79
|
+
"task_template": "Summarize event $event_id with body: $body. Return JSON {\"summary\": \"...\"}.",
|
|
80
|
+
"output_schema": { "required": ["summary"] }
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## `sp script` — One-Shot CLI
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
sp script <specialist-name> \
|
|
90
|
+
--vars key1=value1 --vars key2=value2 \
|
|
91
|
+
[--template task_template] \
|
|
92
|
+
[--model anthropic/claude-sonnet-4-6] \
|
|
93
|
+
[--thinking medium] \
|
|
94
|
+
[--timeout-ms 60000] \
|
|
95
|
+
[--db-path /path/to/observability.db] \
|
|
96
|
+
[--single-instance <lock-name>] \
|
|
97
|
+
[--no-trace] \
|
|
98
|
+
[--json]
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Behaviour:
|
|
102
|
+
|
|
103
|
+
- Loads the spec via `SpecialistLoader` (same loader as `sp run`).
|
|
104
|
+
- Renders `prompt.task_template` (or named template) with `--vars`.
|
|
105
|
+
- Spawns `pi --mode json --no-session --no-extensions --no-tools` with the
|
|
106
|
+
resolved model.
|
|
107
|
+
- Returns the final assistant text on stdout. With `--json`, returns the full
|
|
108
|
+
`ScriptGenerateResult` envelope.
|
|
109
|
+
- Writes one row to `.specialists/db/observability.db` (same writer as `sp run`).
|
|
110
|
+
|
|
111
|
+
Exit codes:
|
|
112
|
+
|
|
113
|
+
- `0` — success.
|
|
114
|
+
- non-zero — failure; with `--json`, body has `success: false` and `error_type`.
|
|
115
|
+
|
|
116
|
+
Use `--single-instance <lock>` when concurrent invocations of the same logical
|
|
117
|
+
job must be serialized (cron, batch script).
|
|
118
|
+
|
|
119
|
+
## `sp serve` — HTTP Daemon
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
sp serve \
|
|
123
|
+
[--port 8000] \
|
|
124
|
+
[--concurrency 4] \
|
|
125
|
+
[--queue-timeout-ms 5000] \
|
|
126
|
+
[--shutdown-grace-ms 30000] \
|
|
127
|
+
[--project-dir /path/to/project] \
|
|
128
|
+
[--fallback-model anthropic/claude-haiku-4-5]
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
POST `/v1/generate`:
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"specialist": "summarize-event",
|
|
136
|
+
"variables": { "event_id": "abc", "body": "..." },
|
|
137
|
+
"template": "task_template",
|
|
138
|
+
"model_override": "anthropic/...",
|
|
139
|
+
"timeout_ms": 60000,
|
|
140
|
+
"trace": true
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Response (200, success):
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"success": true,
|
|
149
|
+
"output": "<final text>",
|
|
150
|
+
"parsed_json": { "summary": "..." },
|
|
151
|
+
"meta": {
|
|
152
|
+
"specialist": "summarize-event",
|
|
153
|
+
"model": "anthropic/claude-haiku-4-5",
|
|
154
|
+
"duration_ms": 1234,
|
|
155
|
+
"trace_id": "<uuid>"
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Response (200, failure):
|
|
161
|
+
|
|
162
|
+
```json
|
|
163
|
+
{ "success": false, "error": "...", "error_type": "..." }
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
Error types: `specialist_not_found | specialist_load_error |
|
|
167
|
+
template_variable_missing | auth | quota | timeout | network | invalid_json |
|
|
168
|
+
output_too_large | internal`.
|
|
169
|
+
|
|
170
|
+
`400` is reserved for malformed HTTP. `429` returns when concurrency cap is
|
|
171
|
+
saturated past `queue-timeout-ms`.
|
|
172
|
+
|
|
173
|
+
## Operational Rules
|
|
174
|
+
|
|
175
|
+
- One `pi` subprocess per in-flight request, bounded by `--concurrency`.
|
|
176
|
+
- Credentials come from `pi`'s own `~/.pi/agent/auth.json`. The service never
|
|
177
|
+
touches API keys.
|
|
178
|
+
- Observability DB is shared with `sp run`. Audit trail is unified.
|
|
179
|
+
- The service is sidecar-per-consumer: no multi-tenant routing, no session
|
|
180
|
+
state, no orchestration. If you need orchestration, use `sp run` + beads.
|
|
181
|
+
- For container deployments, see `docs/specialists-service-install.md`. Image
|
|
182
|
+
runs as non-root UID 10001; bind-mount `~/.pi` and `.specialists/`.
|
|
183
|
+
|
|
184
|
+
## When To Switch Back To `using-specialists-v2`
|
|
185
|
+
|
|
186
|
+
If any of these become true mid-design, drop script-class and use the
|
|
187
|
+
orchestration runtime:
|
|
188
|
+
|
|
189
|
+
- The work needs to write files.
|
|
190
|
+
- The caller wants a multi-turn / keep-alive session.
|
|
191
|
+
- A reviewer pass is needed.
|
|
192
|
+
- The work should be tracked as a bead with auditability beyond a single
|
|
193
|
+
observability row.
|
|
194
|
+
- The output is iterative (steer / resume).
|
|
195
|
+
|
|
196
|
+
## What Not To Put Here
|
|
197
|
+
|
|
198
|
+
- Bead workflow, chains, epics, reviewers, worktrees — those live in
|
|
199
|
+
`using-specialists-v2`.
|
|
200
|
+
- Orchestration MCP tooling (`use_specialist`).
|
|
201
|
+
- Long-running multi-turn examples.
|
|
202
|
+
|
|
203
|
+
## Reference
|
|
204
|
+
|
|
205
|
+
- `docs/specialists-service.md` — HTTP contract and operational notes.
|
|
206
|
+
- `docs/specialists-service-install.md` — Docker/Podman install path.
|
|
207
|
+
- `docs/script-specialists.md` — historical context for the script-class shape.
|
|
208
|
+
- `src/cli/script.ts`, `src/cli/serve.ts`, `src/specialist/script-runner.ts` — runtime.
|
|
@@ -62,6 +62,17 @@ Specialists are autonomous AI agents that run independently — fresh context, d
|
|
|
62
62
|
8. **No destructive operations by specialists.** No `rm -rf`, no force pushes, no database drops, no credential rotation, no mass deletes, no history rewrites. Surface destructive requirements to the user.
|
|
63
63
|
9. **Executor does not run tests.** Executor runs lint + tsc only. Tests are the reviewer's and test-runner's responsibility in the chained pipeline.
|
|
64
64
|
10. **Keep specialists alive through the review cycle.** Never `sp stop` an executor or debugger before the reviewer delivers its verdict. The specialist stays in `waiting` so you can `resume` it — to commit changes, apply fixes from reviewer feedback, or continue work. Only stop after final reviewer PASS and confirmed commit.
|
|
65
|
+
11. **Respect ownership layers and loader precedence.** Loader resolution order is `.specialists/user/*` > `.specialists/default/*` > package fallback `config/*`. Upstream source = package `config/*` (read-only for repo operators); managed mirror = `.specialists/default/*` (no hand edits); repo custom layer = `.specialists/user/*`; runtime/generated = `.specialists/{jobs,ready,db}`.
|
|
66
|
+
12. **Keep backlog-clean isolated.** Do not mix backlog-clean changes into specialist ownership/migration tasks.
|
|
67
|
+
|
|
68
|
+
## Mandatory-rules template sets
|
|
69
|
+
|
|
70
|
+
Use template-driven mandatory rules for repeatable policy bundles.
|
|
71
|
+
|
|
72
|
+
- Specialist config field: `specialist.mandatory_rules.template_sets`
|
|
73
|
+
- Template source: `config/mandatory-rules/*.md`
|
|
74
|
+
- Template format: YAML frontmatter + body content
|
|
75
|
+
- Runtime behavior: runner resolves templates and injects rendered rules at end of prompt
|
|
65
76
|
|
|
66
77
|
---
|
|
67
78
|
|
|
@@ -127,11 +138,13 @@ specialists stop <job-id> --force # 5s SIGTERM timeout, then pgroup
|
|
|
127
138
|
|
|
128
139
|
# Management
|
|
129
140
|
specialists edit <name> # edit specialist config (dot-path, --preset)
|
|
141
|
+
specialists edit <name> --fork-from <base> # fork non-user specialist into .specialists/user/ then edit
|
|
130
142
|
specialists clean # purge old job dirs + worktree GC
|
|
131
143
|
specialists clean --processes # kill all running/starting specialist jobs
|
|
132
144
|
specialists db vacuum # compact SQLite storage (refuses if jobs running)
|
|
133
145
|
specialists db prune --before <iso|duration> --dry-run|--apply # prune old events/results/terminal jobs
|
|
134
146
|
specialists doctor orphans # integrity scan: orphan, stale-pointer, integrity-violation
|
|
147
|
+
specialists init --sync-defaults # refresh specialists + mandatory-rules + nodes from canonical defaults
|
|
135
148
|
specialists init --sync-skills # re-sync skills only (no full init)
|
|
136
149
|
specialists init --no-xtrm-check # skip xtrm prerequisite check (CI/testing)
|
|
137
150
|
```
|
|
@@ -8,7 +8,7 @@ description: >
|
|
|
8
8
|
work without drift. Trigger for code review, debugging, implementation,
|
|
9
9
|
planning, test generation, doc sync, multi-chain epics, and any question about
|
|
10
10
|
specialist orchestration.
|
|
11
|
-
version: 1.
|
|
11
|
+
version: 1.4
|
|
12
12
|
---
|
|
13
13
|
|
|
14
14
|
# Specialists V2
|
|
@@ -51,6 +51,21 @@ When the local version is behind, the latest CHANGELOG entry can be summarized v
|
|
|
51
51
|
14. Stale-base guard: dispatch refuses to provision a worktree when sibling epic chains have unmerged substantive commits. Override only with explicit `--force-stale-base` and a reason. Merge-time rebase happens automatically.
|
|
52
52
|
15. Auto-checkpoint: executor and debugger commit substantive worktree changes on `waiting` by default (`auto_commit: checkpoint_on_waiting`). Noise paths (`.xtrm/`, `.wolf/`, `.specialists/jobs/`, `.beads/`) are filtered.
|
|
53
53
|
16. Per-turn output appends to the input bead notes for **all** specialists on every `run_complete`, with `[WAITING — more output may follow]` or `[DONE]` headers. `bd show <bead-id>` is a valid path to read intermediate output.
|
|
54
|
+
17. Specialist jobs do not orchestrate nested specialist chains. The top-level orchestrator dispatches specialists, collects results, and advances the workflow.
|
|
55
|
+
18. Treat test failures as evidence to classify against the bead scope. Validate whether failures are in-scope, pre-existing, or infrastructure-related before sending an executor into a fix loop.
|
|
56
|
+
|
|
57
|
+
## Canonical Runtime State
|
|
58
|
+
|
|
59
|
+
These are current operating facts, not migration notes:
|
|
60
|
+
|
|
61
|
+
- **Asset ownership:** Cat A runtime assets — specialists, mandatory-rules, catalog, and nodes — resolve live from the specialists package after project tiers. Cat B filesystem assets — skills and hooks — are owned by xtrm-tools under `.xtrm/skills/default` and `.xtrm/hooks/default`.
|
|
62
|
+
- **Resolution precedence:** project/user tiers win over managed defaults; package-live is the final fallback. Mandatory-rule indexes are not stacked across tiers; per-id mandatory-rule files may fall through to package canonical when absent locally.
|
|
63
|
+
- **Drift surface:** use `sp doctor --check-drift` to inspect stale managed defaults and `sp prune-stale-defaults --dry-run` to preview cleanup.
|
|
64
|
+
- **Source verification:** resolver/catalog changes in a worktree are verified with `sp config show <name> --resolved --from-source` so evidence comes from the checked-out source, not an installed dist.
|
|
65
|
+
- **Worktree publication:** edit-capable specialists produce worktree branches. Before review or merge, verify the branch diff and status from that worktree.
|
|
66
|
+
- **Epic publication:** epics are the merge-gated identity. Publish through `sp epic merge`; use `sp epic abandon` to deliberately close failed or cancelled epic bookkeeping.
|
|
67
|
+
- **CLI safety:** command help paths are side-effect free. New commands must parse `--help`/`-h` before action and have a no-write help test.
|
|
68
|
+
- **Release context:** changelog-keeper receives xt report context through the `releasing` skill's helper. Release-range logic supports annotated tags.
|
|
54
69
|
|
|
55
70
|
## Autonomous Drive
|
|
56
71
|
|
|
@@ -199,20 +214,24 @@ Run `specialists list` if you need the live registry. Choose by task, not by hab
|
|
|
199
214
|
| Planning/decomposition | `planner` | You need beads, dependencies, file scopes, or sequencing. |
|
|
200
215
|
| Design/tradeoffs | `overthinker` | The approach is risky, ambiguous, or needs critique. |
|
|
201
216
|
| Implementation | `executor` | The contract is clear enough to write code or docs. |
|
|
202
|
-
| Compliance/code review | `reviewer` | An executor/debugger produced changes that need
|
|
217
|
+
| Compliance/code review | `reviewer` | An executor/debugger produced changes that need the final PASS/PARTIAL/FAIL verdict. |
|
|
218
|
+
| Implementation sanity | `code-sanity` | You want a cheap READ_ONLY smell pass for simplicity, type safety, dead code, brittle async/error handling, or maintainability before reviewer. |
|
|
219
|
+
| Security/dependency audit | `security-auditor` | You need threat modeling, secure-code review, package advisory triage, or agent/config security scanning. LOW: scan/read/recommend only. |
|
|
203
220
|
| Multiple review perspectives | `parallel-review` | A critical diff needs independent review passes. |
|
|
204
221
|
| Test execution | `test-runner` | You need suites run and failures interpreted. |
|
|
205
222
|
| Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization. |
|
|
206
|
-
| External/live research | `researcher` | Current library/docs/media lookup is needed. |
|
|
223
|
+
| External/live research | `researcher` | Current non-security library/docs/media lookup is needed. |
|
|
207
224
|
| Specialist config | `specialists-creator` | Creating or changing specialist JSON/config. |
|
|
208
|
-
| Release
|
|
225
|
+
| Release publication (end-to-end) | `changelog-keeper` | A new tag is being cut. MEDIUM specialist: drafts CHANGELOG section from xt reports, bumps package.json, rebuilds dist, commits, tags, pushes. Use the `releasing` skill to dispatch. |
|
|
209
226
|
|
|
210
227
|
Selection rules:
|
|
211
228
|
|
|
212
229
|
- Explorer is READ_ONLY and should answer specific questions.
|
|
213
230
|
- Debugger is better than explorer for failures because it traces causes and remediation.
|
|
214
231
|
- Executor does not own full test validation; use reviewer/test-runner for that phase.
|
|
215
|
-
-
|
|
232
|
+
- Code-sanity is optional and non-blocking by default: use it when a diff smells overcomplicated or type-risky, then resume executor with concrete findings. It is not a merge gate.
|
|
233
|
+
- Security-auditor may run safe local audit commands and web/source research, but must not edit files, update dependencies, exfiltrate secrets, or run destructive/live-target exploit tests. Executor applies any recommended fixes in a separate bead.
|
|
234
|
+
- Reviewer always uses its own bead plus `--job <executor-job>` and remains the final merge gate.
|
|
216
235
|
- Sync-docs is for audit/sync; executor is for heavy doc rewrites.
|
|
217
236
|
- Specialists-creator should precede specialist config/schema edits.
|
|
218
237
|
|
|
@@ -224,8 +243,12 @@ Daily commands:
|
|
|
224
243
|
specialists list
|
|
225
244
|
specialists list-rules # rule × specialist matrix
|
|
226
245
|
specialists doctor
|
|
246
|
+
specialists doctor --check-drift # inspect stale .specialists/default snapshots
|
|
247
|
+
sp prune-stale-defaults --dry-run # preview redundant default snapshots
|
|
227
248
|
specialists run <name> --bead <id> --background
|
|
228
249
|
specialists run executor --bead <impl-bead> --background # worktree auto-provisioned
|
|
250
|
+
specialists run code-sanity --bead <sanity-bead> --job <exec-job> --keep-alive --background
|
|
251
|
+
specialists run security-auditor --bead <security-bead> --job <exec-job> --keep-alive --background
|
|
229
252
|
specialists run reviewer --bead <review-bead> --job <exec-job> --keep-alive --background
|
|
230
253
|
specialists ps
|
|
231
254
|
specialists ps <job-id>
|
|
@@ -245,6 +268,7 @@ sp merge <chain-root-bead>
|
|
|
245
268
|
sp epic status <epic-id>
|
|
246
269
|
sp epic sync <epic-id> --apply
|
|
247
270
|
sp epic merge <epic-id>
|
|
271
|
+
sp epic abandon <epic-id> --reason "..."
|
|
248
272
|
sp end
|
|
249
273
|
```
|
|
250
274
|
|
|
@@ -319,6 +343,42 @@ specialists run executor --worktree --bead <impl> --context-depth 3 --background
|
|
|
319
343
|
specialists result <exec-job>
|
|
320
344
|
```
|
|
321
345
|
|
|
346
|
+
Optional code-sanity pass for implementation smell checks (use when the diff is non-trivial or likely to accumulate agent-code complexity):
|
|
347
|
+
|
|
348
|
+
```bash
|
|
349
|
+
bd create --title "Code sanity check token refresh retry" --type task --priority 3 \
|
|
350
|
+
--description "PROBLEM: Cheap READ_ONLY sanity pass for executor implementation quality before final review.
|
|
351
|
+
SUCCESS: Identify concrete simplicity/type-safety/maintainability findings, or return OK.
|
|
352
|
+
SCOPE: executor job <exec-job>, implementation diff only.
|
|
353
|
+
NON_GOALS: No requirements verdict, no security audit, no test execution, no edits.
|
|
354
|
+
CONSTRAINTS: At most 5 concrete findings; cite files/symbols/lines where possible.
|
|
355
|
+
VALIDATION: Findings are suitable to paste into specialists resume <exec-job>.
|
|
356
|
+
OUTPUT: OK/FINDINGS/BLOCKED with handoff."
|
|
357
|
+
bd dep add <sanity> <impl>
|
|
358
|
+
specialists run code-sanity --bead <sanity> --job <exec-job> --context-depth 3 --keep-alive --background
|
|
359
|
+
specialists result <sanity-job>
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
If code-sanity returns `FINDINGS`, resume executor with those concrete instructions, then rerun code-sanity only if the fixes were substantive. Do not treat code-sanity `OK` as reviewer PASS.
|
|
363
|
+
|
|
364
|
+
Optional security pass when the task touches auth, secrets, input handling, dependency updates, package advisories, agent config, hooks, or exposed endpoints:
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
bd create --title "Security audit token refresh retry" --type task --priority 2 \
|
|
368
|
+
--description "PROBLEM: Scoped security/dependency/config audit for executor changes.
|
|
369
|
+
SUCCESS: Identify evidence-backed security findings or return no findings.
|
|
370
|
+
SCOPE: executor job <exec-job>, changed files, relevant manifests/config only.
|
|
371
|
+
NON_GOALS: No edits, no package updates, no destructive scans, no live exploit testing.
|
|
372
|
+
CONSTRAINTS: LOW permission; recommendations only. HN/social signals are not authoritative proof.
|
|
373
|
+
VALIDATION: Findings cite local evidence or OSV/GHSA/NVD/vendor/package-audit sources.
|
|
374
|
+
OUTPUT: Security audit summary, findings, dependency triage, residual risk."
|
|
375
|
+
bd dep add <security> <impl>
|
|
376
|
+
specialists run security-auditor --bead <security> --job <exec-job> --context-depth 3 --keep-alive --background
|
|
377
|
+
specialists result <security-job>
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
If security-auditor recommends code or dependency changes, create/resume an executor fix bead. Do not let security-auditor apply updates.
|
|
381
|
+
|
|
322
382
|
Create review bead:
|
|
323
383
|
|
|
324
384
|
```bash
|
|
@@ -432,6 +492,12 @@ Standard loop:
|
|
|
432
492
|
```text
|
|
433
493
|
executor --worktree --bead impl
|
|
434
494
|
-> waiting after turn
|
|
495
|
+
optional code-sanity --bead sanity --job exec-job
|
|
496
|
+
-> OK: continue
|
|
497
|
+
-> FINDINGS: resume executor with exact sanity findings
|
|
498
|
+
optional security-auditor --bead security --job exec-job
|
|
499
|
+
-> no findings: continue
|
|
500
|
+
-> findings: create/resume executor fix bead; auditor never edits
|
|
435
501
|
reviewer --bead review --job exec-job
|
|
436
502
|
-> PASS: verify commit, publish, stop members if needed
|
|
437
503
|
-> PARTIAL: resume executor with exact findings
|
|
@@ -440,7 +506,7 @@ reviewer --bead review --job exec-job
|
|
|
440
506
|
|
|
441
507
|
Prefer `sp resume <exec-job>` over a new fix executor when the original job is waiting and context is healthy. Use a new fix bead with `--job <exec-job>` only when the original executor is dead, context exhausted, or a separate audit trail is required.
|
|
442
508
|
|
|
443
|
-
|
|
509
|
+
Code-sanity and security-auditor outputs are advisory inputs to the chain; reviewer output must still be consumed before publishing. Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer acceptance.
|
|
444
510
|
|
|
445
511
|
## Dependency Mapping
|
|
446
512
|
|
|
@@ -480,10 +546,19 @@ Use `sp ps` instead of ad-hoc polling.
|
|
|
480
546
|
sp ps
|
|
481
547
|
sp ps <job-id>
|
|
482
548
|
sp ps --follow
|
|
549
|
+
sp ps --running # only starting/running/waiting jobs
|
|
550
|
+
sp ps --bead <bead-id> # only jobs linked to one bead
|
|
551
|
+
sp ps --since 30m # only jobs started in the last 30 minutes
|
|
552
|
+
sp ps --mine # only jobs whose bead is assigned to you
|
|
553
|
+
sp ps --include-terminal # include merged/abandoned epics (hidden by default)
|
|
483
554
|
sp feed <job-id>
|
|
484
555
|
sp result <job-id>
|
|
485
556
|
```
|
|
486
557
|
|
|
558
|
+
Filter flags compose: `sp ps --running --bead <id>` is the canonical way to inspect "what's actively working on this issue right now". By default `sp ps` hides epics in `merged` or `abandoned` state to keep the snapshot focused; use `--include-terminal` (or `--all`) to bring them back.
|
|
559
|
+
|
|
560
|
+
When dead epics pile up in `failed` state (sibling-chain conflicts, manual stops), recover with `sp epic abandon <epic-id> --reason "<text>"`. The `failed -> abandoned` transition is allowed specifically for cleanup; live members still require `--force`.
|
|
561
|
+
|
|
487
562
|
Read results at every stage. Every specialist (not just READ_ONLY) auto-appends per-turn output to the input bead notes on each `run_complete`, with `[WAITING]` or `[DONE]` headers — `bd show <bead-id>` shows the full handoff trail. `sp result <job-id>` works on `waiting` jobs and returns the most recent turn plus a "Session is waiting for your input" footer; use it to decide whether to resume. If result is empty, inspect feed and rerun or switch specialists before relying on it.
|
|
488
563
|
|
|
489
564
|
Context percentage in `sp ps`/feed is an action signal:
|
|
@@ -493,6 +568,8 @@ Context percentage in `sp ps`/feed is an action signal:
|
|
|
493
568
|
- 65-80%: steer toward conclusion.
|
|
494
569
|
- Above 80%: finish, summarize, or replace the job.
|
|
495
570
|
|
|
571
|
+
Do not confuse raw token totals with context percentage. `sp ps` may show raw token counts around 50k-100k for large-context models; that alone is not a stop signal. Use the context percentage when available, plus stalls, repeated edit failures, or scope drift.
|
|
572
|
+
|
|
496
573
|
## Steering And Resume
|
|
497
574
|
|
|
498
575
|
Use `steer` for running jobs:
|
|
@@ -534,18 +611,25 @@ Rules:
|
|
|
534
611
|
|
|
535
612
|
## Release Publication
|
|
536
613
|
|
|
537
|
-
Tagged releases go through `
|
|
614
|
+
Tagged releases go through the `releasing` skill, which dispatches the
|
|
615
|
+
`changelog-keeper` MEDIUM specialist. The specialist reads xt session
|
|
616
|
+
reports via the releasing skill's `xt-reports.ts` helper, drafts the new
|
|
617
|
+
section into `CHANGELOG.md`, bumps `package.json`, rebuilds `dist/`, commits
|
|
618
|
+
with `release: vX.Y.Z`, tags, and pushes `--follow-tags`. Optional
|
|
619
|
+
`gh release create` if the bead requests it.
|
|
538
620
|
|
|
539
|
-
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
```
|
|
621
|
+
Operator gate: a single `git diff --stat HEAD~1 HEAD` after the specialist
|
|
622
|
+
finishes. Must show only `CHANGELOG.md`, `package.json`, `dist/`. Anything
|
|
623
|
+
else means scope was violated — revert and refile.
|
|
543
624
|
|
|
544
|
-
|
|
625
|
+
The `changelog-keeper-scope` mandatory rule enforces the edit whitelist at
|
|
626
|
+
the specialist level. See `config/skills/releasing/SKILL.md` for the bead
|
|
627
|
+
template, dispatch command, and recovery commands.
|
|
545
628
|
|
|
546
|
-
|
|
629
|
+
Release helper contract:
|
|
547
630
|
|
|
548
|
-
|
|
631
|
+
- Report extraction is provided by the `releasing` skill, so consumer repos do not need repo-local release helper scripts.
|
|
632
|
+
- Release ranges support annotated tags and should be validated through the same path used by tagged releases.
|
|
549
633
|
|
|
550
634
|
## Epic Lifecycle
|
|
551
635
|
|
|
@@ -653,13 +737,17 @@ sp epic status <epic-id>
|
|
|
653
737
|
sp epic sync <epic-id> --apply
|
|
654
738
|
```
|
|
655
739
|
|
|
656
|
-
Specialist missing or
|
|
740
|
+
Specialist missing, config skipped, or stale default snapshots:
|
|
657
741
|
|
|
658
742
|
```bash
|
|
659
743
|
specialists list
|
|
660
744
|
specialists doctor
|
|
745
|
+
specialists doctor --check-drift
|
|
746
|
+
sp prune-stale-defaults --dry-run
|
|
661
747
|
```
|
|
662
748
|
|
|
749
|
+
`sp prune-stale-defaults` is intentionally operator-facing. Always run `--dry-run` first unless the bead explicitly asks to apply cleanup.
|
|
750
|
+
|
|
663
751
|
Worktree already exists:
|
|
664
752
|
|
|
665
753
|
```text
|
|
@@ -672,6 +760,8 @@ Reviewer cannot enter job workspace:
|
|
|
672
760
|
Check target job status with sp ps. MEDIUM/HIGH jobs are blocked from entering a running write-capable workspace unless forced.
|
|
673
761
|
```
|
|
674
762
|
|
|
763
|
+
When resolver/catalog changes are under review inside a worktree, run `sp config show <name> --resolved --from-source` so reviewer sees local source behavior, not installed dist.
|
|
764
|
+
|
|
675
765
|
Explorer produced empty output:
|
|
676
766
|
|
|
677
767
|
```text
|