npm - @jaggerxtrm/specialists - Versions diffs - 3.3.4 → 3.4.1 - Mend

@jaggerxtrm/specialists 3.3.4 → 3.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/config/skills/using-specialists/SKILL.md CHANGED Viewed

@@ -5,14 +5,16 @@ description: >
   ask whether to delegate. Consult before any: code review, security audit, deep bug
   investigation, test generation, multi-file refactor, or architecture analysis. Also
   use for the mechanics of delegation: --bead workflow, --context-depth, background
-  jobs, MCP tools (use_specialist, start_specialist, poll_specialist), specialists init,
+  jobs, MCP tools (use_specialist, start_specialist, feed_specialist), specialists init,
   or specialists doctor. Don't wait for the user to say "use a specialist" — proactively
   evaluate whether delegation makes sense.
-version: 3.1
+version: 3.6
 ---
 # Specialists Usage
+When this skill is loaded, you are a **coordinator first**: delegate substantial work to specialists, monitor progress, and synthesize outcomes for the user.
 Specialists are autonomous AI agents that run independently — fresh context, different
 model, no prior bias. Delegate when a task would take you significant effort, spans
 multiple files, or benefits from a dedicated focused run.
@@ -29,6 +31,8 @@ Before starting any substantial task, ask: is this worth delegating?
 - It spans multiple files or modules
 - A fresh perspective adds value (code review, security audit)
 - It can run in the background while you do other things
+- You have multiple independent tasks — dispatch them as a wave
+- It's a review/analysis loop (audit → approve/deny → resume) that benefits from interactive keep-alive
 **Do it yourself when:**
 - It's a single-file edit or quick config change
@@ -44,24 +48,60 @@ When in doubt, delegate. Specialists run in parallel — you don't have to wait.
 For tracked work, always use `--bead`. This gives the specialist your issue as context,
 links results back to the tracker, and creates an audit trail.
+### CLI commands
+```bash
+specialists init                              # first-time project setup
+specialists list                              # discover available specialists
+specialists run <name> --bead <id>            # foreground run (streams output)
+specialists run <name> --prompt "..."         # ad-hoc (no bead tracking)
+specialists feed -f                           # tail merged feed (all jobs)
+specialists feed <job-id>                     # events for a specific job
+specialists result <job-id>                   # final output text
+specialists steer <job-id> "new direction"    # redirect ANY running job mid-run
+specialists resume <job-id> "next task"       # resume a waiting keep-alive job
+specialists stop <job-id>                     # cancel a job
+specialists edit <name>                       # edit a specialist's YAML config
+specialists status --job <job-id>             # single-job detail view
+specialists clean                             # purge old job directories
+specialists doctor                            # health check
+```
+### Typical flow
 ```bash
 # 1. Create a bead describing what you need
-bd create --title "Audit authentication module for security issues" --type task --priority 2
-# → unitAI-abc
+bd create --title "Fix auth token refresh bug" --type bug --priority 2
+# -> unitAI-abc
-# 2. Find and run the right specialist
-specialists list
-specialists run security-audit --bead unitAI-abc --background
+# 2. Run the right specialist against the bead
+specialists run executor --bead unitAI-abc &
+# -> Job started: a1b2c3
-# 3. Keep working; check in when ready
-specialists feed -f
+# 3. Monitor (pick one)
+specialists feed a1b2c3              # check events so far
+specialists feed -f                  # tail all active jobs
 # 4. Read results and close
-specialists result <job-id>
-bd close unitAI-abc --reason "2 issues found, filed as follow-ups"
+specialists result a1b2c3
+bd close unitAI-abc --reason "Fixed: token refresh now retries on 401"
+```
+### Giving specialists extra context via bead notes
+`--prompt` and `--bead` cannot be combined. When you need to give a specialist
+specific instructions beyond what's in the bead description, update the bead notes first:
+```bash
+bd update unitAI-abc --notes "INSTRUCTION: Rewrite docs/cli-reference.md from current
+source. Read every command in src/cli/ and src/index.ts. Document all flags and examples."
+specialists run executor --bead unitAI-abc &
 ```
-**`--background`** — returns immediately; use for anything that will take more than ~30 seconds.
+This pattern was used extensively in Wave 5 of a real session — 4 executors all received
+writing instructions via bead notes and successfully produced doc files.
 **`--context-depth N`** — how many levels of parent-bead context to inject (default: 1).
 **`--no-beads`** — skip creating an auto-tracking sub-bead, but still reads the `--bead` input.
@@ -71,61 +111,178 @@ bd close unitAI-abc --reason "2 issues found, filed as follow-ups"
 Run `specialists list` to see what's available. Match by task type:
-| Task type | Look for |
-|-----------|----------|
-| Bug / regression investigation | `bug-hunt`, `overthinker` |
-| Code review | `parallel-review`, `codebase-explorer` |
-| Test generation | `test-runner` |
-| Architecture / exploration | `codebase-explorer`, `feature-design` |
-| Planning / scoping | `planner` |
-| Documentation sync | `sync-docs` |
-When unsure, read descriptions: `specialists list --json | jq '.[].description'`
+| Task type | Best specialist | Why |
+|-----------|----------------|-----|
+| Bug fix / implementation | **executor** (gpt-5.3-codex) | HIGH perms, writes code + tests autonomously |
+| Bug investigation / "why is X broken" | **debugger** (claude-sonnet-4-6) | GitNexus-first triage, 5-phase investigation, hypothesis ranking, evidence-backed remediation. Use for ANY root cause analysis. |
+| Design decisions / tradeoffs | **overthinker** (gpt-5.4) | 4-phase reasoning: analysis, devil's advocate, synthesis, conclusion. Interactive by default (`execution.interactive: true`). |
+| Code review / compliance | **reviewer** (claude-sonnet-4-6) | Post-run compliance checks, verdict contract (PASS/PARTIAL/FAIL). Interactive by default (`execution.interactive: true`). |
+| Multi-backend review | **parallel-review** (claude-sonnet-4-6) | Concurrent review across multiple AI backends |
+| Architecture exploration | **explorer** (claude-haiku-4-5) | Fast codebase mapping, READ_ONLY |
+| Reference docs / dense schemas | **explorer** (claude-haiku-4-5) | Better than sync-docs for reference-heavy output |
+| Planning / scoping | **planner** (claude-sonnet-4-6) | Structured issue breakdown with deps |
+| Doc audit / drift detection | **sync-docs** (claude-sonnet-4-6) | Interactive by default (`execution.interactive: true`): audits first, then approve/deny via `resume` |
+| Doc drift / audit | **sync-docs** (claude-sonnet-4-6) | Detects stale docs, restructures content |
+| Doc writing / updates | **executor** (gpt-5.3-codex) | sync-docs defaults to audit mode; executor writes files |
+| Test generation | **test-runner** (claude-haiku-4-5) | Runs suites, interprets failures |
+| Specialist authoring | **specialists-creator** (claude-sonnet-4-6) | Guides YAML creation against schema |
+### Specialist selection lessons (from real sessions)
+- **debugger** is the most powerful investigation specialist. Uses GitNexus call-chain tracing (when available) for 5-phase root cause analysis with ranked hypotheses. Use for ANY "why is X broken" question — don't do the investigation yourself.
+- **sync-docs** is an interactive specialist — it audits first, then waits for approval before executing. This is correct behavior, not a bug.
+- **overthinker** and **reviewer** are also interactive.
+- Canonical pattern: use `start_specialist` / `specialists run` normally; interactive specialists now default to keep-alive via `execution.interactive: true`.
+- Use `--no-keep-alive` only when you explicitly want a one-shot run.
+- **explorer** is fast and cheap (Haiku) but READ_ONLY — output auto-appends to the input bead's notes. Use for investigation, not implementation.
+- **executor** is the workhorse — HIGH permissions, writes code and docs, runs tests, closes beads. Best for any task that needs files written.
+- **use_specialist MCP** is best for quick foreground runs where you need the result immediately in your context.
+### Pi extensions availability (known gap)
+GitNexus and Serena are **pi extensions** (not MCP servers) at `~/.pi/agent/extensions/`.
+Specialists run with `--no-extensions` and only selectively re-enable `quality-gates` and
+`service-skills`. GitNexus (call-chain tracing for debugger/planner) and Serena LSP
+(token-efficient reads for explorer/executor) are NOT currently wired. Tracked as `unitAI-4abv`.
 ---
-## When a Specialist Fails
+## Steering and Resume
+### Steer — redirect any running job
-If a specialist times out or errors, **don't silently fall back to doing the work yourself**.
-Surface the failure — the user may want to fix the specialist config or switch to a different one.
+`steer` sends a message to a running specialist. Delivered after the current tool call
+finishes, before the next LLM call. Works for **all running jobs**.
 ```bash
-specialists feed <job-id>          # see what happened
-specialists doctor                 # check for systemic issues
+# Specialist is going off track — redirect it
+specialists steer a1b2c3 "STOP what you are doing. Focus only on supervisor.ts"
+# Specialist is auditing when it should be writing
+specialists steer a1b2c3 "Do NOT audit. Write the actual file to disk now."
+```
+Real example from today: an explorer was reading every file in src/cli/ when we only needed
+confirmation that steering worked. Sent `specialists steer 763ff4 "STOP. Just output:
+STEERING WORKS"` — message delivered, output confirmed in 2 seconds.
+### Resume — continue an interactive session
+`resume` sends a new prompt to a specialist that has finished its turn and is `waiting`.
+Works for jobs started with keep-alive behavior (including specialists with `execution.interactive: true`).
+The session retains full conversation history.
+```bash
+# Start an overthinker for multi-turn design work
+# overthinker is interactive by default, so no extra flag is required
+specialists run overthinker --bead unitAI-xyz &
+# -> Job started: d4e5f6 (completes Phase 4, enters waiting state)
+# Read the design output
+specialists result d4e5f6
+# Ask follow-up questions
+specialists resume d4e5f6 "What about backward compatibility with existing YAML files?"
+specialists resume d4e5f6 "How would you handle migration from the old schema?"
 ```
-If you need to retry: try foreground mode (no `--background`) for shorter timeout exposure,
-or try a different specialist. If all else fails, tell the user what you attempted and why
-it failed before doing the work yourself.
+Use interactive specialists when you plan to iterate: design reviews, multi-phase analysis,
+investigation that may need follow-up questions based on findings.
+Use `--no-keep-alive` only when you want to force single-turn behavior.
 ---
-## Ad-Hoc (No Tracking)
+## Wave Orchestration
+For multiple independent tasks, dispatch specialists in parallel waves.
+### Planning a wave
+Group tasks by dependency:
+1. **Wave 1**: Bug fixes and blockers (unblock downstream work)
+2. **Wave 2**: Features and design (now that the surface is stable)
+3. **Wave 3**: Documentation (after code changes land — use executors, not sync-docs)
+### Dispatching a wave
 ```bash
-specialists run codebase-explorer --prompt "Map the feed command architecture"
+# Fire multiple specialists in parallel (--background for reliable detach)
+specialists run executor --bead unitAI-abc --background
+specialists run executor --bead unitAI-def --background
+specialists run overthinker --bead unitAI-ghi --background
 ```
-Use `--prompt` only for throwaway exploration. For anything worth remembering, use `--bead`.
+### Monitoring a wave
+```bash
+# Quick status check on all jobs
+for job in abc123 def456 ghi789; do
+  python3 -c "import json; d=json.load(open('.specialists/jobs/$job/status.json')); \
+    print(f'$job {d[\"specialist\"]:12} {d[\"status\"]:10} {d.get(\"elapsed_s\",\"?\")}s')"
+done
+# Or use feed for event-level detail
+specialists feed <job-id>
+```
+### Between waves
+After each wave completes:
+1. **Read results**: `specialists result <job-id>` for each
+2. **Validate**: run lint + tests on the combined output
+3. **Commit**: stage, commit, push — clean git before next wave
+4. **Close beads**: `bd close <id> --reason "..."`
+### Real wave example (from a 6-wave session)
+```
+Wave 1: 2x executor → fixed --background flag + migrated start_specialist to Supervisor
+Wave 2: overthinker + 2x executor → output contract design + retry logic + footer fix
+Wave 3: 4x sync-docs + 3x explorer → docs audit (produced reports, not files)
+Wave 4: 5x executor + 2x explorer → output contract impl + READ_ONLY auto-append + 4 fixes
+Wave 5: 4x executor → rewrote 4 doc files (executors write files, sync-docs only audits)
+Wave 6: 4x executor + overthinker (interactive) → cleanup + manifest design with follow-ups
+```
+Key insight: **executors write files, sync-docs audits**. When you need docs written
+to disk, use executor with bead notes containing "INSTRUCTION: Write <file>...".
 ---
-## Example: Delegation in Practice
+## Coordinator Responsibilities
-You're asked to review `src/auth/` for security issues. Without delegation, you'd read
-every file and write findings yourself — 15+ minutes, your full attention.
+As the orchestrator, you own things specialists cannot do:
-With a specialist:
+### 1. Validate combined output across specialists
+Multiple specialists writing to the same worktree can conflict. After each wave:
 ```bash
-bd create --title "Security review: src/auth/" --type task --priority 1  # → unitAI-xyz
-specialists list --category security
-specialists run security-audit --bead unitAI-xyz --background             # → job_4a2b1c
-# go do other work
-specialists result job_4a2b1c
-bd close unitAI-xyz --reason "Found 2 issues, filed unitAI-abc, unitAI-def"
+npm run lint          # or project-specific quality gate
+bun test              # run affected tests
+git diff --stat       # review what changed
+```
+### 2. Handle failures — don't silently fall back
+If a specialist stalls or errors, surface it. Don't quietly do the work yourself.
+```bash
+specialists feed <job-id>          # see what happened
+specialists doctor                 # check for systemic issues
 ```
-The specialist runs with full bead context, on a model tuned for the task, while you stay unblocked.
+Options when a specialist fails:
+- **Steer** it back on track: `specialists steer <id> "Focus on X instead"`
+- **Switch specialist** (e.g., sync-docs stalls → try explorer or executor)
+- **Stop and report** to the user before doing it yourself
+### 3. Close beads and commit between waves
+Keep git clean between waves. Specialists write to the same worktree, so stacking
+uncommitted changes from multiple waves creates merge pain.
+### 4. Run drift detection after doc-heavy sessions
+```bash
+python3 .agents/skills/sync-docs/scripts/drift_detector.py scan --json
+# Then dispatch executor for any stale docs, stamp synced_at on fresh ones:
+python3 .agents/skills/sync-docs/scripts/drift_detector.py update-sync <file>
+```
 ---
@@ -137,22 +294,76 @@ Available after `specialists init` and session restart.
 |------|---------|
 | `specialist_init` | Bootstrap once per session |
 | `use_specialist` | Foreground run; pass `bead_id` for tracked work |
-| `start_specialist` | Async: returns job ID immediately |
-| `poll_specialist` | Check status + delta output |
-| `stop_specialist` | Cancel |
-| `run_parallel` | Concurrent or pipeline execution |
+| `start_specialist` | Async: returns job ID immediately (Supervisor-backed) |
+| `feed_specialist` | Cursor-paginated run events (status + deltas) |
+| `resume_specialist` | Next-turn prompt for keep-alive jobs in `waiting` |
+| `steer_specialist` | Mid-run steering message for active jobs |
+| `stop_specialist` | Cancel (sends SIGTERM to job PID) |
 | `specialist_status` | Circuit breaker health + staleness |
+### CLI vs MCP equivalences
+| Action | CLI | MCP |
+|--------|-----|-----|
+| Run foreground (one-shot) | `specialists run <name> --bead <id>` | `use_specialist({name, bead_id})` |
+| Run background (interactive-capable) | `specialists run <name> --bead <id> --background` | `start_specialist({name, bead_id})` |
+| Monitor events | `specialists feed <job-id>` | `feed_specialist({job_id, cursor})` |
+| Read result | `specialists result <job-id>` | — (CLI only) |
+| Steer mid-run | `specialists steer <job-id> "msg"` | `steer_specialist({job_id, message})` |
+| Resume waiting | `specialists resume <job-id> "task"` | `resume_specialist({job_id, task})` |
+| Cancel | `specialists stop <job-id>` | `stop_specialist({job_id})` |
+**Canonical pattern:** `use_specialist` is one-shot. For interactive review/analysis flows,
+use `start_specialist` then `resume_specialist`.
+**Prefer CLI** for most orchestration work — it's simpler and output is easier to inspect.
+**Use MCP** (`use_specialist`) when you need a direct one-shot result in your conversation context.
+---
+## feed_specialist Observation Pattern
+Use cursor-based polling for structured progress when monitoring long specialist runs:
+```bash
+# first read
+feed_specialist({job_id: "abc123"})
+# => {events:[...], next_cursor: 12, has_more: true, is_complete: false}
+# continue from cursor
+feed_specialist({job_id: "abc123", cursor: 12})
+# => {events:[...], next_cursor: 25, has_more: false, is_complete: true}
+```
+When `is_complete: true` and `has_more: false`, fetch final text with:
+```bash
+specialists result <job-id>
+```
+---
+## Known Issues
+- **sync-docs audit-first behavior is expected** for interactive review/audit flow.
+  Start normally, read the audit output, then `specialists resume <job-id> "approved: execute phase 4-5"`
+  if you want it to apply fixes.
+- **READ_ONLY output auto-appends** to the input bead after completion. No manual piping
+  needed (fixed in the Supervisor). But the output also lives in `specialists result`.
 ---
 ## Setup and Troubleshooting
 ```bash
-specialists init        # first-time setup: creates specialists/, wires AGENTS.md
+specialists init        # first-time setup: creates .specialists/, wires AGENTS.md/CLAUDE.md
 specialists doctor      # health check: hooks, MCP, zombie jobs
+specialists edit <name> # edit a specialist's YAML config
 ```
 - **"specialist not found"** → `specialists list` (project-scope only)
-- **Job hangs** → `specialists feed <id>`; `specialists stop` to cancel
+- **Job hangs** → `specialists steer <id> "finish up"` or `specialists stop <id>`
 - **MCP tools missing** → `specialists init` then restart Claude Code
 - **YAML skipped** → stderr shows `[specialists] skipping <file>: <reason>`
+- **Stall timeout** → specialist hit 120s inactivity. Check `specialists feed <id>`, then retry or switch specialist.
+- **`--prompt` and `--bead` conflict** → use bead notes: `bd update <id> --notes "INSTRUCTION: ..."` then `--bead` only.

package/config/specialists/debugger.specialist.yaml ADDED Viewed

@@ -0,0 +1,121 @@
+specialist:
+  metadata:
+    name: debugger
+    version: 1.2.0
+    description: >-
+      Autonomous debugger: given any symptom, error, or stack trace, systematically
+      traces call chains with GitNexus, identifies root cause at file:line precision,
+      ranks hypotheses, and delivers a prioritized, evidence-backed remediation plan.
+    category: debugging
+    tags:
+      - debugging
+      - root-cause
+      - investigation
+      - remediation
+      - gitnexus
+      - call-chain
+      - autonomous
+    updated: "2026-03-27"
+  execution:
+    mode: tool
+    model: anthropic/claude-sonnet-4-6
+    fallback_model: qwen-cli/qwen3-coder-plus
+    timeout_ms: 0
+    stall_timeout_ms: 120000
+    response_format: markdown
+    permission_required: LOW
+    thinking_level: low
+  prompt:
+    system: |
+      You are an autonomous debugger specialist. Given a symptom, error message, or
+      stack trace, you conduct a disciplined, tool-driven investigation to identify
+      the root cause and deliver an actionable remediation plan.
+      ## Investigation Workflow
+      Work through these phases in order. Stop as soon as you have enough evidence.
+      ### Phase 0 — GitNexus Triage (preferred, skip if unavailable)
+      Use the knowledge graph to orient yourself before touching any source files.
+      1. `gitnexus_query({query: "<error text or symptom>"})`
+      2. `gitnexus_context({name: "<suspect symbol>"})`
+      3. Read `gitnexus://repo/{name}/process/{processName}` for execution trace details
+      4. Optional: `gitnexus_cypher({query: "MATCH path = ..."})` for custom traversal
+      Then read source files only for pinpointed suspects — never the whole codebase.
+      ### Phase 1 — File Discovery (fallback if GitNexus unavailable)
+      Parse the symptom for candidate locations:
+      - stack trace file paths + line numbers
+      - module/import names in errors
+      - error codes or exception types tied to subsystems
+      Use `grep` and `find` to locate code quickly; read only relevant sections.
+      ### Phase 2 — Root Cause Analysis
+      Determine:
+      - the exact line/expression causing failure
+      - causal explanation of observed symptom
+      - whether root cause or downstream effect
+      - likely side effects on related components
+      ### Phase 3 — Hypothesis Ranking
+      Produce 3–5 ranked hypotheses, each with:
+      - hypothesis statement
+      - supporting evidence
+      - quick confirmation experiment/command
+      - confidence (HIGH/MEDIUM/LOW)
+      ### Phase 4 — Remediation Plan
+      Produce up to 5 prioritized remediation steps with:
+      - file/line scope
+      - expected outcome
+      - verification command
+      - residual risks
+      ## Output Format
+      Always output a complete **Bug Investigation Report**:
+      - Symptoms
+      - Investigation path (GitNexus traces or files analyzed)
+      - Root cause (with file:line references)
+      - Ranked hypotheses
+      - Fix plan
+      - Concise summary
+      EFFICIENCY RULE: Stop using tools and write the final report after at most 15 tool calls.
+    task_template: |
+      Debug the following issue:
+      $prompt
+      Working directory: $cwd
+      Start with gitnexus_query for the symptom/error text if GitNexus is available.
+      Then trace call chains with gitnexus_context. Read source files for pinpointed suspects.
+      Fall back to grep/find if GitNexus is unavailable. Produce a full Bug Investigation Report.
+  skills:
+    paths:
+      - .agents/skills/xt-debugging/SKILL.md
+  capabilities:
+    required_tools: [bash, grep, find, read]
+    external_commands: [grep]
+  validation:
+    files_to_watch:
+      - src/specialist/schema.ts
+      - src/specialist/runner.ts
+      - .agents/skills/xt-debugging/SKILL.md
+    stale_threshold_days: 30