npm - xtrm-tools - Versions diffs - 0.7.13 → 0.7.15 - Mend

xtrm-tools 0.7.13 → 0.7.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.xtrm/config/hooks.json +10 -0
package/.xtrm/hooks/specialists-agent-guard.mjs +76 -0
package/.xtrm/registry.json +433 -413
package/.xtrm/skills/default/releasing/SKILL.md +49 -45
package/.xtrm/skills/default/releasing/scripts/xt-reports.ts +18 -0
package/.xtrm/skills/default/session-close-report/SKILL.md +85 -17
package/.xtrm/skills/default/specialists-creator/SKILL.md +133 -42
package/.xtrm/skills/default/specialists-creator/scripts/audit-spec-uniformity.mjs +86 -0
package/.xtrm/skills/default/specialists-creator/scripts/scaffold-specialist.ts +223 -0
package/.xtrm/skills/default/specialists-creator/scripts/validate-specialist.ts +1 -1
package/.xtrm/skills/default/update-specialists/SKILL.md +98 -392
package/.xtrm/skills/default/using-nodes/SKILL.md +18 -102
package/.xtrm/skills/default/using-script-specialists/SKILL.md +208 -0
package/.xtrm/skills/default/using-specialists/SKILL.md +13 -0
package/.xtrm/skills/default/using-specialists-v2/SKILL.md +105 -15
package/.xtrm/skills/default/using-xtrm/SKILL.md +14 -0
package/CHANGELOG.md +22 -0
package/README.md +5 -1
package/cli/dist/index.cjs +2991 -627
package/cli/dist/index.cjs.map +1 -1
package/cli/package.json +1 -1
package/package.json +3 -2
package/packages/pi-extensions/.serena/project.yml +11 -0
package/packages/pi-extensions/package.json +1 -1
package/scripts/patch-external-pi-tools.mjs +154 -0

package/.xtrm/skills/default/using-nodes/SKILL.md CHANGED Viewed

@@ -53,7 +53,7 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
    - Your only tool is `bash`. Your only bash commands are `sp node` plus `sp ps`/`sp result`.
    - Do not call `read`, `ls`, `find`, `grep`, or any file inspection tool. You have none.
-2. **Use only `sp node` + `sp ps` + `sp result` + `sp steer` + `sp resume` command surface for orchestration**
+2. **Use only `sp node` command surface for orchestration**
    - Do not emit legacy contract JSON plans as the primary control mechanism.
    - Do not call deprecated node action channels.
@@ -84,8 +84,6 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
 | `sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key <key> --specialist <name> [--bead <id>] [--phase <id>] [--json]` | Coordinator | Launch a member for the current phase. |
 | `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]` | Coordinator | Block until the named phase members reach terminal state. |
 | `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` | Coordinator | Read the persisted output for a specific member after a phase barrier. |
-| `sp steer <job-id> 'direction'` | Coordinator | Steer a running member with new context mid-flight. |
-| `sp resume <job-id> 'next task'` | Coordinator | Resume a waiting member with new task instructions. |
 | `sp node create-bead --node $SPECIALISTS_NODE_ID --title '...' [--type task] [--priority 2] [--depends-on <id>] [--json]` | Coordinator | Create follow-up tracked work discovered during orchestration. |
 | `sp node complete --node <node-id> --strategy <pr\|manual> [--json]` | Operator-only | Force-close node lifecycle when coordinator has reached waiting and operator decides to finalize. |
 | `sp node members <node-id> [--json]` | Operator | Inspect member registry and lineage. |
@@ -110,21 +108,13 @@ Coordinator commands should still use `$SPECIALISTS_NODE_ID` directly.
    - after `wait-phase` succeeds, call `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` for each participating member,
    - synthesize the outputs into the next decision.
-4. **Steer members dynamically**
-   - after reading a member's result, if other members need updated context, steer them with `sp steer <job-id> 'specific direction from findings'`.
-   - only steer with concrete, evidence-based direction — never speculative.
-   - example: explorer finds X → steer researcher to 'investigate X patterns in external docs'.
-5. **Re-check status**
+4. **Re-check status**
    - re-read node status after each command sequence,
    - adjust the plan from actual runtime state.
-6. **Coordinator terminal behavior**
+5. **Coordinator terminal behavior**
    - once goals are satisfied (or terminally blocked with explicit reason),
-   - synthesize ALL member evidence into a unified report,
-   - this report is your final output — it MUST integrate all member findings,
-   - 'Node completed. ok:true.' is NOT acceptable synthesis,
-   - enter/remain in `waiting` after producing synthesis.
+   - synthesize evidence and enter/remain in `waiting`.
    - do not issue a completion command; operator decides lifecycle closure via `sp node stop` (or force-close via `sp node complete`).
 ---
@@ -137,70 +127,25 @@ Use this exact loop:
 1. `status`
 2. decide the next phase/member set
-3. spawn members for THIS phase only (not all phases)
+3. launch members
 4. `wait-phase`
-5. `result --wait` for each member
+5. `result --wait`
 6. synthesize evidence
-7. steer or spawn members for next phase based on synthesis
-8. repeat until all phases complete
-9. produce final synthesis report
-10. enter waiting for operator closure
-### Multi-phase coordination pattern
-The coordinator MUST use at least 2 distinct phases:
-**Phase 1 — Explore:**
-- Spawn explorer to gather initial evidence
-- wait-phase → read result → synthesize findings
-- Decide: what needs deeper investigation?
-**Phase 2 — Deep-dive (conditional):**
-- Based on explore findings, spawn researcher/overthinker with specific context
-- Steer running members with evidence from phase 1
-- wait-phase → read results → synthesize
-**Phase 3 — Synthesis:**
-- Read ALL member results from all phases
-- Produce unified report integrating all findings
-- Enter waiting
+7. choose next action or enter waiting after synthesis
 ### Synthesis mandate
-Before declaring synthesis complete, the coordinator **MUST** read the persisted results for ALL members across ALL phases.
-The synthesis report MUST:
-- Integrate findings from every member
-- Highlight agreements, contradictions, and gaps
-- Provide actionable conclusions
-- Be the coordinator's own substantive output
-'Node completed. ok:true.' is NEVER acceptable as synthesis output.
-### Synthesis mandate (repeated for emphasis)
 Before declaring synthesis complete, the coordinator **MUST** read the persisted results for the members that produced the evidence.
 Do not rely only on status transitions. `wait-phase` tells you the members are terminal; `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json` tells you what they actually found or changed. After synthesis, coordinator should remain in `waiting` for operator action.
 ### Steering guidance
-Steer when concrete result evidence shows a gap, contradiction, or missed requirement.
-**Steering commands:**
-- `sp steer <job-id> 'new direction based on evidence'` — for running members
-- `sp resume <job-id> 'next task with context from phase N'` — for waiting members
-- `sp node spawn-member ... --phase <next-phase>` — for new members with specific context
-**Good steering patterns:**
-- Explorer finds module X handles auth → steer researcher: 'Investigate how other frameworks handle auth patterns similar to module X'
-- Researcher finds tradeoff A vs B → spawn overthinker: 'Analyze tradeoff between A and B. Explorer found that X uses A, researcher found Y uses B. Consider: performance, complexity, ecosystem support.'
-- Reviewer finds missing test coverage → spawn executor: 'Add tests for the paths reviewer identified: ...'
+Only steer when concrete result evidence shows a gap, contradiction, or missed requirement.
-**Bad steering patterns:**
-- Steering a member before reading its completed output
-- Steering with generic instructions ('do better', 'investigate more')
-- Steering speculatively without evidence from a prior member result
+Do **not** steer speculatively.
+- Good: result evidence shows a reviewer found a missing acceptance criterion.
+- Bad: steering a member before reading its completed output.
 ---
@@ -242,49 +187,22 @@ When a command fails:
 ## Example command sequences
-### Sequence A: multi-phase explore → deep-dive → synthesis
+### Sequence A: explore -> synthesis -> impl -> waiting
 ```bash
-# Phase 1: explore
 sp ps --node $SPECIALISTS_NODE_ID --json
 sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
 sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
 sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
-# Synthesize explore-1 findings. Decide what needs deeper investigation.
-# Phase 2: deep-dive (spawned based on explore findings)
-sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key researcher-1 --specialist researcher --phase deep-dive-1 --json
-sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key overthinker-1 --specialist overthinker --phase deep-dive-1 --json
-sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1,overthinker-1 --json
-sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
-sp result $SPECIALISTS_NODE_ID:overthinker-1 --wait --json
-# Synthesize all phase 2 evidence.
-# Phase 3: final synthesis
-# Read all member results, produce unified report, enter waiting.
-sp ps --node $SPECIALISTS_NODE_ID --json
-```
-### Sequence B: explore → steer → synthesis
-```bash
-# Phase 1: explore
-sp ps --node $SPECIALISTS_NODE_ID --json
-sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key explore-1 --specialist explorer --phase explore-1 --json
-sp node wait-phase --node $SPECIALISTS_NODE_ID --phase explore-1 --members explore-1 --json
-sp result $SPECIALISTS_NODE_ID:explore-1 --wait --json
-# Explorer found X. Researcher is running — steer it.
-# Steer researcher with explorer findings
-sp steer <researcher-job-id> 'Based on explorer findings about X, investigate Y patterns in external docs'
-sp node wait-phase --node $SPECIALISTS_NODE_ID --phase deep-dive-1 --members researcher-1 --json
-sp result $SPECIALISTS_NODE_ID:researcher-1 --wait --json
-# Final synthesis — produce unified report integrating ALL findings
+# Synthesize the explore findings and decide whether impl is required.
+sp node spawn-member --node $SPECIALISTS_NODE_ID --member-key impl-1 --specialist executor --phase impl-1 --json
+sp node wait-phase --node $SPECIALISTS_NODE_ID --phase impl-1 --members impl-1 --json
+sp result $SPECIALISTS_NODE_ID:impl-1 --wait --json
+# Synthesize impl evidence, then stay in waiting for operator closure.
 sp ps --node $SPECIALISTS_NODE_ID --json
 ```
-### Sequence C: discovered work + review synthesis + operator closure
+### Sequence B: discovered work + review synthesis + operator closure
 ```bash
 sp ps --node $SPECIALISTS_NODE_ID --json
@@ -319,8 +237,6 @@ sp ps --node $SPECIALISTS_NODE_ID --json
 - `sp node wait-phase --node $SPECIALISTS_NODE_ID --phase <id> --members <k1,k2,...> [--json]`
 - `sp result $SPECIALISTS_NODE_ID:<member-key> --wait --json`
 - `sp ps --node $SPECIALISTS_NODE_ID --json`
-- `sp steer <job-id> 'new direction or context'` — steer a running member mid-flight
-- `sp resume <job-id> 'next task'` — resume a waiting member with new instructions
 ### Operator-only closure commands
 - `sp node stop <node-id>`

package/.xtrm/skills/default/using-script-specialists/SKILL.md ADDED Viewed

@@ -0,0 +1,208 @@
+---
+name: using-script-specialists
+description: >
+  Use this skill for synchronous one-shot specialist invocations via `sp script`
+  (CLI) or `sp serve` (HTTP daemon). These run READ_ONLY, template-driven
+  specialists with `$var` substitution and return JSON in-process — no beads,
+  no chains, no worktrees, no job lifecycle. Trigger when integrating a
+  specialist into a service, script, or library, when the caller needs the
+  output immediately, or when the work is a single LLM call with structured
+  input/output. Do NOT use for tracked agent work — that belongs to
+  `using-specialists-v2`.
+version: 1.0
+---
+# Script-Class Specialists
+`sp script` and `sp serve` are a separate runtime from the bead-first
+orchestration covered by `using-specialists-v2`. They exist for service and
+library integration, not for agent chains.
+| Aspect | `sp run` (orchestration) | `sp script` / `sp serve` |
+| --- | --- | --- |
+| Driver | bead contract | template + variables |
+| Execution | supervised job, async | one-shot, synchronous |
+| Permissions | READ_ONLY / MEDIUM / HIGH | READ_ONLY only |
+| Worktrees | edit-capable provisions one | rejected |
+| Output | result.txt + events.jsonl + bead notes | stdout JSON / HTTP body |
+| Audit | `.specialists/jobs/<id>/` | one row in `.specialists/db/observability.db` |
+Use `sp script` from a shell or build pipeline. Use `sp serve` from a service
+that needs an HTTP endpoint backed by `pi`. The same `.specialist.json` runs
+under both.
+## When To Use This Skill
+Trigger when:
+- A service or script needs a single LLM-backed transform (summarize, classify,
+  extract) returning JSON.
+- You are integrating specialists into Python/Node code that cannot block on a
+  supervised job lifecycle.
+- The call is request/response shaped: variables in, structured output out.
+- You need a sidecar HTTP endpoint (`sp serve`) to wrap a specialist for a
+  service consumer that already speaks HTTP.
+Do NOT trigger for: code review, debugging, implementation, multi-turn work,
+keep-alive sessions, anything that should write files. Those belong to
+`using-specialists-v2`.
+## Specialist Compatibility (compatGuard)
+A spec is rejected at request time (`specialist_load_error`) if any of:
+- `execution.interactive` is `true`
+- `execution.requires_worktree` is `true`
+- `execution.permission_required` is anything other than `READ_ONLY`
+- `skills.scripts` is non-empty
+- `prompt.task_template` is missing
+- a referenced `$var` in the chosen template is not supplied (`template_variable_missing`)
+Author specs that explicitly target script-class:
+```json
+{
+  "specialist": {
+    "metadata": { "name": "summarize-event", "version": "1.0.0", "category": "ingestion" },
+    "execution": {
+      "mode": "auto",
+      "model": "anthropic/claude-haiku-4-5",
+      "timeout_ms": 30000,
+      "interactive": false,
+      "response_format": "json",
+      "output_type": "custom",
+      "permission_required": "READ_ONLY",
+      "requires_worktree": false,
+      "max_retries": 0
+    },
+    "prompt": {
+      "task_template": "Summarize event $event_id with body: $body. Return JSON {\"summary\": \"...\"}.",
+      "output_schema": { "required": ["summary"] }
+    }
+  }
+}
+```
+## `sp script` — One-Shot CLI
+```bash
+sp script <specialist-name> \
+  --vars key1=value1 --vars key2=value2 \
+  [--template task_template] \
+  [--model anthropic/claude-sonnet-4-6] \
+  [--thinking medium] \
+  [--timeout-ms 60000] \
+  [--db-path /path/to/observability.db] \
+  [--single-instance <lock-name>] \
+  [--no-trace] \
+  [--json]
+```
+Behaviour:
+- Loads the spec via `SpecialistLoader` (same loader as `sp run`).
+- Renders `prompt.task_template` (or named template) with `--vars`.
+- Spawns `pi --mode json --no-session --no-extensions --no-tools` with the
+  resolved model.
+- Returns the final assistant text on stdout. With `--json`, returns the full
+  `ScriptGenerateResult` envelope.
+- Writes one row to `.specialists/db/observability.db` (same writer as `sp run`).
+Exit codes:
+- `0` — success.
+- non-zero — failure; with `--json`, body has `success: false` and `error_type`.
+Use `--single-instance <lock>` when concurrent invocations of the same logical
+job must be serialized (cron, batch script).
+## `sp serve` — HTTP Daemon
+```bash
+sp serve \
+  [--port 8000] \
+  [--concurrency 4] \
+  [--queue-timeout-ms 5000] \
+  [--shutdown-grace-ms 30000] \
+  [--project-dir /path/to/project] \
+  [--fallback-model anthropic/claude-haiku-4-5]
+```
+POST `/v1/generate`:
+```json
+{
+  "specialist": "summarize-event",
+  "variables": { "event_id": "abc", "body": "..." },
+  "template": "task_template",
+  "model_override": "anthropic/...",
+  "timeout_ms": 60000,
+  "trace": true
+}
+```
+Response (200, success):
+```json
+{
+  "success": true,
+  "output": "<final text>",
+  "parsed_json": { "summary": "..." },
+  "meta": {
+    "specialist": "summarize-event",
+    "model": "anthropic/claude-haiku-4-5",
+    "duration_ms": 1234,
+    "trace_id": "<uuid>"
+  }
+}
+```
+Response (200, failure):
+```json
+{ "success": false, "error": "...", "error_type": "..." }
+```
+Error types: `specialist_not_found | specialist_load_error |
+template_variable_missing | auth | quota | timeout | network | invalid_json |
+output_too_large | internal`.
+`400` is reserved for malformed HTTP. `429` returns when concurrency cap is
+saturated past `queue-timeout-ms`.
+## Operational Rules
+- One `pi` subprocess per in-flight request, bounded by `--concurrency`.
+- Credentials come from `pi`'s own `~/.pi/agent/auth.json`. The service never
+  touches API keys.
+- Observability DB is shared with `sp run`. Audit trail is unified.
+- The service is sidecar-per-consumer: no multi-tenant routing, no session
+  state, no orchestration. If you need orchestration, use `sp run` + beads.
+- For container deployments, see `docs/specialists-service-install.md`. Image
+  runs as non-root UID 10001; bind-mount `~/.pi` and `.specialists/`.
+## When To Switch Back To `using-specialists-v2`
+If any of these become true mid-design, drop script-class and use the
+orchestration runtime:
+- The work needs to write files.
+- The caller wants a multi-turn / keep-alive session.
+- A reviewer pass is needed.
+- The work should be tracked as a bead with auditability beyond a single
+  observability row.
+- The output is iterative (steer / resume).
+## What Not To Put Here
+- Bead workflow, chains, epics, reviewers, worktrees — those live in
+  `using-specialists-v2`.
+- Orchestration MCP tooling (`use_specialist`).
+- Long-running multi-turn examples.
+## Reference
+- `docs/specialists-service.md` — HTTP contract and operational notes.
+- `docs/specialists-service-install.md` — Docker/Podman install path.
+- `docs/script-specialists.md` — historical context for the script-class shape.
+- `src/cli/script.ts`, `src/cli/serve.ts`, `src/specialist/script-runner.ts` — runtime.

package/.xtrm/skills/default/using-specialists/SKILL.md CHANGED Viewed

@@ -62,6 +62,17 @@ Specialists are autonomous AI agents that run independently — fresh context, d
 8. **No destructive operations by specialists.** No `rm -rf`, no force pushes, no database drops, no credential rotation, no mass deletes, no history rewrites. Surface destructive requirements to the user.
 9. **Executor does not run tests.** Executor runs lint + tsc only. Tests are the reviewer's and test-runner's responsibility in the chained pipeline.
 10. **Keep specialists alive through the review cycle.** Never `sp stop` an executor or debugger before the reviewer delivers its verdict. The specialist stays in `waiting` so you can `resume` it — to commit changes, apply fixes from reviewer feedback, or continue work. Only stop after final reviewer PASS and confirmed commit.
+11. **Respect ownership layers and loader precedence.** Loader resolution order is `.specialists/user/*` > `.specialists/default/*` > package fallback `config/*`. Upstream source = package `config/*` (read-only for repo operators); managed mirror = `.specialists/default/*` (no hand edits); repo custom layer = `.specialists/user/*`; runtime/generated = `.specialists/{jobs,ready,db}`.
+12. **Keep backlog-clean isolated.** Do not mix backlog-clean changes into specialist ownership/migration tasks.
+## Mandatory-rules template sets
+Use template-driven mandatory rules for repeatable policy bundles.
+- Specialist config field: `specialist.mandatory_rules.template_sets`
+- Template source: `config/mandatory-rules/*.md`
+- Template format: YAML frontmatter + body content
+- Runtime behavior: runner resolves templates and injects rendered rules at end of prompt
 ---
@@ -127,11 +138,13 @@ specialists stop <job-id> --force             # 5s SIGTERM timeout, then pgroup
 # Management
 specialists edit <name>                       # edit specialist config (dot-path, --preset)
+specialists edit <name> --fork-from <base>   # fork non-user specialist into .specialists/user/ then edit
 specialists clean                             # purge old job dirs + worktree GC
 specialists clean --processes                 # kill all running/starting specialist jobs
 specialists db vacuum                         # compact SQLite storage (refuses if jobs running)
 specialists db prune --before <iso|duration> --dry-run|--apply  # prune old events/results/terminal jobs
 specialists doctor orphans                    # integrity scan: orphan, stale-pointer, integrity-violation
+specialists init --sync-defaults              # refresh specialists + mandatory-rules + nodes from canonical defaults
 specialists init --sync-skills                # re-sync skills only (no full init)
 specialists init --no-xtrm-check              # skip xtrm prerequisite check (CI/testing)
 ```

package/.xtrm/skills/default/using-specialists-v2/SKILL.md CHANGED Viewed

@@ -8,7 +8,7 @@ description: >
   work without drift. Trigger for code review, debugging, implementation,
   planning, test generation, doc sync, multi-chain epics, and any question about
   specialist orchestration.
-version: 1.1
+version: 1.4
 ---
 # Specialists V2
@@ -51,6 +51,21 @@ When the local version is behind, the latest CHANGELOG entry can be summarized v
 14. Stale-base guard: dispatch refuses to provision a worktree when sibling epic chains have unmerged substantive commits. Override only with explicit `--force-stale-base` and a reason. Merge-time rebase happens automatically.
 15. Auto-checkpoint: executor and debugger commit substantive worktree changes on `waiting` by default (`auto_commit: checkpoint_on_waiting`). Noise paths (`.xtrm/`, `.wolf/`, `.specialists/jobs/`, `.beads/`) are filtered.
 16. Per-turn output appends to the input bead notes for **all** specialists on every `run_complete`, with `[WAITING — more output may follow]` or `[DONE]` headers. `bd show <bead-id>` is a valid path to read intermediate output.
+17. Specialist jobs do not orchestrate nested specialist chains. The top-level orchestrator dispatches specialists, collects results, and advances the workflow.
+18. Treat test failures as evidence to classify against the bead scope. Validate whether failures are in-scope, pre-existing, or infrastructure-related before sending an executor into a fix loop.
+## Canonical Runtime State
+These are current operating facts, not migration notes:
+- **Asset ownership:** Cat A runtime assets — specialists, mandatory-rules, catalog, and nodes — resolve live from the specialists package after project tiers. Cat B filesystem assets — skills and hooks — are owned by xtrm-tools under `.xtrm/skills/default` and `.xtrm/hooks/default`.
+- **Resolution precedence:** project/user tiers win over managed defaults; package-live is the final fallback. Mandatory-rule indexes are not stacked across tiers; per-id mandatory-rule files may fall through to package canonical when absent locally.
+- **Drift surface:** use `sp doctor --check-drift` to inspect stale managed defaults and `sp prune-stale-defaults --dry-run` to preview cleanup.
+- **Source verification:** resolver/catalog changes in a worktree are verified with `sp config show <name> --resolved --from-source` so evidence comes from the checked-out source, not an installed dist.
+- **Worktree publication:** edit-capable specialists produce worktree branches. Before review or merge, verify the branch diff and status from that worktree.
+- **Epic publication:** epics are the merge-gated identity. Publish through `sp epic merge`; use `sp epic abandon` to deliberately close failed or cancelled epic bookkeeping.
+- **CLI safety:** command help paths are side-effect free. New commands must parse `--help`/`-h` before action and have a no-write help test.
+- **Release context:** changelog-keeper receives xt report context through the `releasing` skill's helper. Release-range logic supports annotated tags.
 ## Autonomous Drive
@@ -199,20 +214,24 @@ Run `specialists list` if you need the live registry. Choose by task, not by hab
 | Planning/decomposition | `planner` | You need beads, dependencies, file scopes, or sequencing. |
 | Design/tradeoffs | `overthinker` | The approach is risky, ambiguous, or needs critique. |
 | Implementation | `executor` | The contract is clear enough to write code or docs. |
-| Compliance/code review | `reviewer` | An executor/debugger produced changes that need a verdict. |
+| Compliance/code review | `reviewer` | An executor/debugger produced changes that need the final PASS/PARTIAL/FAIL verdict. |
+| Implementation sanity | `code-sanity` | You want a cheap READ_ONLY smell pass for simplicity, type safety, dead code, brittle async/error handling, or maintainability before reviewer. |
+| Security/dependency audit | `security-auditor` | You need threat modeling, secure-code review, package advisory triage, or agent/config security scanning. LOW: scan/read/recommend only. |
 | Multiple review perspectives | `parallel-review` | A critical diff needs independent review passes. |
 | Test execution | `test-runner` | You need suites run and failures interpreted. |
 | Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization. |
-| External/live research | `researcher` | Current library/docs/media lookup is needed. |
+| External/live research | `researcher` | Current non-security library/docs/media lookup is needed. |
 | Specialist config | `specialists-creator` | Creating or changing specialist JSON/config. |
-| Release changelog drafting | `changelog-keeper` | A new tag is being cut and a `[X.Y.Z] - YYYY-MM-DD` section is needed. Driven by `sp release prepare`, not invoked directly. |
+| Release publication (end-to-end) | `changelog-keeper` | A new tag is being cut. MEDIUM specialist: drafts CHANGELOG section from xt reports, bumps package.json, rebuilds dist, commits, tags, pushes. Use the `releasing` skill to dispatch. |
 Selection rules:
 - Explorer is READ_ONLY and should answer specific questions.
 - Debugger is better than explorer for failures because it traces causes and remediation.
 - Executor does not own full test validation; use reviewer/test-runner for that phase.
-- Reviewer always uses its own bead plus `--job <executor-job>`.
+- Code-sanity is optional and non-blocking by default: use it when a diff smells overcomplicated or type-risky, then resume executor with concrete findings. It is not a merge gate.
+- Security-auditor may run safe local audit commands and web/source research, but must not edit files, update dependencies, exfiltrate secrets, or run destructive/live-target exploit tests. Executor applies any recommended fixes in a separate bead.
+- Reviewer always uses its own bead plus `--job <executor-job>` and remains the final merge gate.
 - Sync-docs is for audit/sync; executor is for heavy doc rewrites.
 - Specialists-creator should precede specialist config/schema edits.
@@ -224,8 +243,12 @@ Daily commands:
 specialists list
 specialists list-rules                          # rule × specialist matrix
 specialists doctor
+specialists doctor --check-drift                 # inspect stale .specialists/default snapshots
+sp prune-stale-defaults --dry-run                # preview redundant default snapshots
 specialists run <name> --bead <id> --background
 specialists run executor --bead <impl-bead> --background       # worktree auto-provisioned
+specialists run code-sanity --bead <sanity-bead> --job <exec-job> --keep-alive --background
+specialists run security-auditor --bead <security-bead> --job <exec-job> --keep-alive --background
 specialists run reviewer --bead <review-bead> --job <exec-job> --keep-alive --background
 specialists ps
 specialists ps <job-id>
@@ -245,6 +268,7 @@ sp merge <chain-root-bead>
 sp epic status <epic-id>
 sp epic sync <epic-id> --apply
 sp epic merge <epic-id>
+sp epic abandon <epic-id> --reason "..."
 sp end
 ```
@@ -319,6 +343,42 @@ specialists run executor --worktree --bead <impl> --context-depth 3 --background
 specialists result <exec-job>
 ```
+Optional code-sanity pass for implementation smell checks (use when the diff is non-trivial or likely to accumulate agent-code complexity):
+```bash
+bd create --title "Code sanity check token refresh retry" --type task --priority 3 \
+  --description "PROBLEM: Cheap READ_ONLY sanity pass for executor implementation quality before final review.
+SUCCESS: Identify concrete simplicity/type-safety/maintainability findings, or return OK.
+SCOPE: executor job <exec-job>, implementation diff only.
+NON_GOALS: No requirements verdict, no security audit, no test execution, no edits.
+CONSTRAINTS: At most 5 concrete findings; cite files/symbols/lines where possible.
+VALIDATION: Findings are suitable to paste into specialists resume <exec-job>.
+OUTPUT: OK/FINDINGS/BLOCKED with handoff."
+bd dep add <sanity> <impl>
+specialists run code-sanity --bead <sanity> --job <exec-job> --context-depth 3 --keep-alive --background
+specialists result <sanity-job>
+```
+If code-sanity returns `FINDINGS`, resume executor with those concrete instructions, then rerun code-sanity only if the fixes were substantive. Do not treat code-sanity `OK` as reviewer PASS.
+Optional security pass when the task touches auth, secrets, input handling, dependency updates, package advisories, agent config, hooks, or exposed endpoints:
+```bash
+bd create --title "Security audit token refresh retry" --type task --priority 2 \
+  --description "PROBLEM: Scoped security/dependency/config audit for executor changes.
+SUCCESS: Identify evidence-backed security findings or return no findings.
+SCOPE: executor job <exec-job>, changed files, relevant manifests/config only.
+NON_GOALS: No edits, no package updates, no destructive scans, no live exploit testing.
+CONSTRAINTS: LOW permission; recommendations only. HN/social signals are not authoritative proof.
+VALIDATION: Findings cite local evidence or OSV/GHSA/NVD/vendor/package-audit sources.
+OUTPUT: Security audit summary, findings, dependency triage, residual risk."
+bd dep add <security> <impl>
+specialists run security-auditor --bead <security> --job <exec-job> --context-depth 3 --keep-alive --background
+specialists result <security-job>
+```
+If security-auditor recommends code or dependency changes, create/resume an executor fix bead. Do not let security-auditor apply updates.
 Create review bead:
 ```bash
@@ -432,6 +492,12 @@ Standard loop:
 ```text
 executor --worktree --bead impl
   -> waiting after turn
+optional code-sanity --bead sanity --job exec-job
+  -> OK: continue
+  -> FINDINGS: resume executor with exact sanity findings
+optional security-auditor --bead security --job exec-job
+  -> no findings: continue
+  -> findings: create/resume executor fix bead; auditor never edits
 reviewer --bead review --job exec-job
   -> PASS: verify commit, publish, stop members if needed
   -> PARTIAL: resume executor with exact findings
@@ -440,7 +506,7 @@ reviewer --bead review --job exec-job
 Prefer `sp resume <exec-job>` over a new fix executor when the original job is waiting and context is healthy. Use a new fix bead with `--job <exec-job>` only when the original executor is dead, context exhausted, or a separate audit trail is required.
-Reviewer output must be consumed before publishing. Do not treat job completion as equivalent to acceptance.
+Code-sanity and security-auditor outputs are advisory inputs to the chain; reviewer output must still be consumed before publishing. Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer acceptance.
 ## Dependency Mapping
@@ -480,10 +546,19 @@ Use `sp ps` instead of ad-hoc polling.
 sp ps
 sp ps <job-id>
 sp ps --follow
+sp ps --running                       # only starting/running/waiting jobs
+sp ps --bead <bead-id>                # only jobs linked to one bead
+sp ps --since 30m                     # only jobs started in the last 30 minutes
+sp ps --mine                          # only jobs whose bead is assigned to you
+sp ps --include-terminal              # include merged/abandoned epics (hidden by default)
 sp feed <job-id>
 sp result <job-id>
 ```
+Filter flags compose: `sp ps --running --bead <id>` is the canonical way to inspect "what's actively working on this issue right now". By default `sp ps` hides epics in `merged` or `abandoned` state to keep the snapshot focused; use `--include-terminal` (or `--all`) to bring them back.
+When dead epics pile up in `failed` state (sibling-chain conflicts, manual stops), recover with `sp epic abandon <epic-id> --reason "<text>"`. The `failed -> abandoned` transition is allowed specifically for cleanup; live members still require `--force`.
 Read results at every stage. Every specialist (not just READ_ONLY) auto-appends per-turn output to the input bead notes on each `run_complete`, with `[WAITING]` or `[DONE]` headers — `bd show <bead-id>` shows the full handoff trail. `sp result <job-id>` works on `waiting` jobs and returns the most recent turn plus a "Session is waiting for your input" footer; use it to decide whether to resume. If result is empty, inspect feed and rerun or switch specialists before relying on it.
 Context percentage in `sp ps`/feed is an action signal:
@@ -493,6 +568,8 @@ Context percentage in `sp ps`/feed is an action signal:
 - 65-80%: steer toward conclusion.
 - Above 80%: finish, summarize, or replace the job.
+Do not confuse raw token totals with context percentage. `sp ps` may show raw token counts around 50k-100k for large-context models; that alone is not a stop signal. Use the context percentage when available, plus stalls, repeated edit failures, or scope drift.
 ## Steering And Resume
 Use `steer` for running jobs:
@@ -534,18 +611,25 @@ Rules:
 ## Release Publication
-Tagged releases go through `sp release`, not manual `git tag`:
+Tagged releases go through the `releasing` skill, which dispatches the
+`changelog-keeper` MEDIUM specialist. The specialist reads xt session
+reports via the releasing skill's `xt-reports.ts` helper, drafts the new
+section into `CHANGELOG.md`, bumps `package.json`, rebuilds `dist/`, commits
+with `release: vX.Y.Z`, tags, and pushes `--follow-tags`. Optional
+`gh release create` if the bead requests it.
-```bash
-sp release prepare [--major | --minor | --patch]   # default: --patch
-sp release publish
-```
+Operator gate: a single `git diff --stat HEAD~1 HEAD` after the specialist
+finishes. Must show only `CHANGELOG.md`, `package.json`, `dist/`. Anything
+else means scope was violated — revert and refile.
-`prepare` invokes the `changelog-keeper` specialist to draft a Keep-a-Changelog section between the previous tag and the next tag, bumps `package.json`, and stages `CHANGELOG.md` + `package.json` + `dist/index.js`. It does not commit — operator reviews and commits with `release: v<version>`.
+The `changelog-keeper-scope` mandatory rule enforces the edit whitelist at
+the specialist level. See `config/skills/releasing/SKILL.md` for the bead
+template, dispatch command, and recovery commands.
-`publish` validates the staged commit (dirty-tree refusal, HEAD message match, version match, top-section match in `CHANGELOG.md`), creates the annotated tag, pushes to origin, and optionally creates a GitHub release via `gh`. Re-emits the empty `[Unreleased]` placeholder for the next cycle.
+Release helper contract:
-The `changelog-keeper` specialist is READ_ONLY; the CLI is the file mutator. See `docs/release.md` for the operator runbook.
+- Report extraction is provided by the `releasing` skill, so consumer repos do not need repo-local release helper scripts.
+- Release ranges support annotated tags and should be validated through the same path used by tagged releases.
 ## Epic Lifecycle
@@ -653,13 +737,17 @@ sp epic status <epic-id>
 sp epic sync <epic-id> --apply
 ```
-Specialist missing or config skipped:
+Specialist missing, config skipped, or stale default snapshots:
 ```bash
 specialists list
 specialists doctor
+specialists doctor --check-drift
+sp prune-stale-defaults --dry-run
 ```
+`sp prune-stale-defaults` is intentionally operator-facing. Always run `--dry-run` first unless the bead explicitly asks to apply cleanup.
 Worktree already exists:
 ```text
@@ -672,6 +760,8 @@ Reviewer cannot enter job workspace:
 Check target job status with sp ps. MEDIUM/HIGH jobs are blocked from entering a running write-capable workspace unless forced.
 ```
+When resolver/catalog changes are under review inside a worktree, run `sp config show <name> --resolved --from-source` so reviewer sees local source behavior, not installed dist.
 Explorer produced empty output:
 ```text