@jaggerxtrm/specialists 3.3.1 → 3.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/config/hooks/specialists-complete.mjs +60 -0
  2. package/config/hooks/specialists-session-start.mjs +120 -0
  3. package/config/skills/specialists-creator/SKILL.md +506 -0
  4. package/config/skills/specialists-creator/scripts/validate-specialist.ts +41 -0
  5. package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/old_skill/outputs/result.md +105 -0
  6. package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/with_skill/outputs/result.md +93 -0
  7. package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/old_skill/outputs/result.md +113 -0
  8. package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/with_skill/outputs/result.md +131 -0
  9. package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/old_skill/outputs/result.md +159 -0
  10. package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/with_skill/outputs/result.md +150 -0
  11. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/outputs/result.md +180 -0
  12. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/timing.json +5 -0
  13. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/outputs/result.md +223 -0
  14. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/timing.json +5 -0
  15. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/with_skill/timing.json +5 -0
  16. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/outputs/result.md +146 -0
  17. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/timing.json +5 -0
  18. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/outputs/result.md +89 -0
  19. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/timing.json +5 -0
  20. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/outputs/result.md +96 -0
  21. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/timing.json +5 -0
  22. package/config/skills/specialists-usage-workspace/skill-snapshot/SKILL.md.old +237 -0
  23. package/config/skills/using-specialists/SKILL.md +158 -0
  24. package/config/skills/using-specialists/evals/evals.json +68 -0
  25. package/config/specialists/.serena/project.yml +151 -0
  26. package/config/specialists/auto-remediation.specialist.yaml +70 -0
  27. package/config/specialists/bug-hunt.specialist.yaml +96 -0
  28. package/config/specialists/explorer.specialist.yaml +79 -0
  29. package/config/specialists/memory-processor.specialist.yaml +140 -0
  30. package/config/specialists/overthinker.specialist.yaml +63 -0
  31. package/config/specialists/parallel-runner.specialist.yaml +61 -0
  32. package/config/specialists/planner.specialist.yaml +87 -0
  33. package/config/specialists/specialists-creator.specialist.yaml +82 -0
  34. package/config/specialists/sync-docs.specialist.yaml +53 -0
  35. package/config/specialists/test-runner.specialist.yaml +58 -0
  36. package/config/specialists/xt-merge.specialist.yaml +78 -0
  37. package/dist/index.js +246 -214
  38. package/package.json +2 -3
@@ -0,0 +1,146 @@
1
+ # Code Review: src/specialist/runner.ts
2
+
3
+ ## Approach
4
+
5
+ I read the following files to build full context before reviewing:
6
+
7
+ - `/home/dawid/projects/specialists/src/specialist/runner.ts` — the target file (303 lines)
8
+ - `/home/dawid/projects/specialists/src/specialist/beads.ts` — BeadsClient interface
9
+ - `/home/dawid/projects/specialists/src/specialist/jobRegistry.ts` — async job state management
10
+ - `/home/dawid/projects/specialists/src/specialist/hooks.ts` — HookEmitter
11
+ - `/home/dawid/projects/specialists/src/specialist/loader.ts` — SpecialistLoader
12
+ - `/home/dawid/projects/specialists/src/specialist/schema.ts` — Zod schema / types
13
+ - `/home/dawid/projects/specialists/src/specialist/templateEngine.ts` — renderTemplate
14
+ - `/home/dawid/projects/specialists/src/pi/session.ts` — PiAgentSession, SessionKilledError
15
+ - `/home/dawid/projects/specialists/src/utils/circuitBreaker.ts` — CircuitBreaker
16
+ - `/home/dawid/projects/specialists/tests/unit/specialist/runner.test.ts`
17
+ - `/home/dawid/projects/specialists/tests/unit/specialist/runner-scripts.test.ts`
18
+
19
+ ---
20
+
21
+ ## Findings
22
+
23
+ ### Bug: Duplicate `sessionBackend` assignment (lines 208 and 210)
24
+
25
+ ```typescript
26
+ sessionBackend = session.meta.backend;
27
+ output = await session.getLastOutput();
28
+ sessionBackend = session.meta.backend; // capture before finally calls kill()
29
+ ```
30
+
31
+ `sessionBackend` is assigned on line 208, then again identically on line 210. The comment on line 210 says "capture before finally calls kill()" — but that is what line 208 does. The second assignment is dead code. The problem is that `getLastOutput()` (line 209) is async and could theoretically throw before line 210 is reached, which is exactly why the first capture on line 208 matters. However the second assignment is still redundant and misleading. If `getLastOutput()` throws, control jumps to the catch block before either assignment after it is reached — so the defensive intent of line 210 is already served by line 208.
32
+
33
+ This is not a functional bug in practice (the value is the same both times), but it creates misleading intent and maintenance risk.
34
+
35
+ ### Bug: `runScript` does not sanitise or validate the script path
36
+
37
+ ```typescript
38
+ function runScript(scriptPath: string): ScriptResult {
39
+ try {
40
+ const output = execSync(scriptPath, { encoding: 'utf8', timeout: 30_000 });
41
+ ```
42
+
43
+ `execSync` is called with a raw string from the specialist YAML (`s.path`). This passes the string directly to a shell. If a specialist YAML is written with a path containing shell metacharacters or if it is loaded from an untrusted source, this enables shell injection. A safer approach is to use `spawnSync` with an args array (the same pattern already used in `beads.ts`), or at minimum validate that the path refers to an actual file before executing. The same risk applies to path values that start with `~/` or `./` that are interpolated without being resolved first.
44
+
45
+ ### Edge case: Pre-script filter logic is subtle and fragile (lines 119–121)
46
+
47
+ ```typescript
48
+ const preResults = preScripts
49
+ .map(s => runScript(s.path))
50
+ .filter((_, i) => preScripts[i].inject_output);
51
+ ```
52
+
53
+ All pre-scripts run unconditionally (`.map`), but only the ones with `inject_output: true` are collected in `preResults`. This means all scripts execute as side effects, while only some contribute output to the prompt. This behaviour is intentional per the test in `runner-scripts.test.ts` ("Script still runs (for side effects)"), but it is not documented at the call site. The coupling between the mapped index and `preScripts[i]` is fragile: if a future refactor introduces a flatMap or reorders the chain, the index-based correlation silently breaks. Extracting this into a named function or a `reduce` would make it safer.
54
+
55
+ ### Edge case: `post_execute` hook is emitted twice on success — once implicitly never on failure path
56
+
57
+ On success, `post_execute` is emitted at line 250 with `status: 'COMPLETE'`. On failure, it is emitted at line 233 inside the catch block with `status: 'ERROR'` or `'CANCELLED'`. This is correct.
58
+
59
+ However, there is no `post_execute` emission in the `finally` block. This means if the `finally` block itself somehow throws (unlikely since `session?.kill()` is a no-op after `close()`), no hook is emitted. This is a very low risk edge case but worth noting for observability completeness.
60
+
61
+ ### Edge case: Circuit breaker records success against `model`, not `sessionBackend`
62
+
63
+ ```typescript
64
+ circuitBreaker.recordSuccess(model);
65
+ ```
66
+
67
+ `model` is the resolved model string (possibly a fallback). `sessionBackend` is the actual backend string returned from the pi session meta (e.g. `"google-gemini-cli"`). These may differ in format. The circuit breaker is keyed by whatever string is passed — so `recordSuccess("gemini")` and `recordFailure("gemini")` must use the same key. Currently both use `model`, which is consistent. But `onMeta` can update `sessionBackend` to a different string (`provider` from pi's message_start event). If a caller were ever to rely on `sessionBackend` as a circuit-breaker key, mismatches would occur silently. The inconsistency in semantics (model string vs. backend/provider string) is worth documenting.
68
+
69
+ ### Edge case: `sessionPath` option is declared but never used
70
+
71
+ ```typescript
72
+ export interface RunOptions {
73
+ /** Path to an existing pi session file for continuation (Phase 2+) */
74
+ sessionPath?: string;
75
+ }
76
+ ```
77
+
78
+ `sessionPath` is accepted in `RunOptions` but never read or passed to the session factory. The comment says "Phase 2+" implying it is intentionally deferred, but there is no guard, warning, or error when it is provided. A caller passing `sessionPath` expecting continuation behaviour would get silent no-op behaviour instead.
79
+
80
+ ### Edge case: `backendOverride` initialises registry `backend` field with `'starting'` string
81
+
82
+ ```typescript
83
+ registry.register(jobId, {
84
+ backend: options.backendOverride ?? 'starting',
85
+ model: '?',
86
+ specialistVersion,
87
+ });
88
+ ```
89
+
90
+ When `backendOverride` is not set, the registry records `backend: 'starting'`. This is a sentinel, not an actual backend name. If a polling client reads the snapshot before `onMeta` fires, it sees `backend: 'starting'` — which is fine for display. However if the caller passes `backendOverride`, it is used as the initial `backend` value even though the actual backend used by the pi session may differ (since `backendOverride` goes through circuit breaker fallback logic). This means an async job snapshot may show an incorrect backend name until `onMeta` fires and `setMeta` is called.
91
+
92
+ ### Code quality: Duplicated JSDoc comment (lines 274–277)
93
+
94
+ ```typescript
95
+ /** Fire-and-forget: registers job in registry, returns job_id immediately. */
96
+ /** Fire-and-forget: registers job in registry, returns job_id immediately. */
97
+ /** Fire-and-forget: registers job in registry, returns job_id immediately. */
98
+ /** Fire-and-forget: registers job in registry, returns job_id immediately. */
99
+ async startAsync(...)
100
+ ```
101
+
102
+ The same JSDoc comment is repeated four times. This is a copy-paste artifact with no functional impact, but it is noise that should be cleaned up.
103
+
104
+ ### Code quality: Dynamic import inside hot path (line 145)
105
+
106
+ ```typescript
107
+ const { readFile } = await import('node:fs/promises');
108
+ ```
109
+
110
+ `readFile` is dynamically imported inside the `run()` method. The static import of `writeFile` from the same module already exists at line 3. This is redundant — `readFile` should be added to the existing static import at the top of the file. Dynamic imports inside hot-path async functions add minor per-call overhead (though Node.js caches them) and make the dependency surface less obvious at a glance.
111
+
112
+ ### Code quality: `sessionFactory` type parameter bound is unnecessarily wide
113
+
114
+ ```typescript
115
+ export type SessionFactory = (opts: PiSessionOptions) => Promise<Pick<PiAgentSession, 'start' | 'prompt' | 'waitForDone' | 'getLastOutput' | 'getState' | 'close' | 'kill' | 'meta'>>;
116
+ ```
117
+
118
+ `getState` is listed in the `Pick` but is never called on the session inside `runner.ts`. The test mock includes it too. Removing unused methods from the `Pick` would tighten the contract and make mocking simpler, though this is a low-priority style concern.
119
+
120
+ ### Code quality: `beadVariables` uses `options.prompt` for `bead_context` (line 125)
121
+
122
+ ```typescript
123
+ const beadVariables = options.inputBeadId
124
+ ? { bead_context: options.prompt, bead_id: options.inputBeadId }
125
+ : {};
126
+ ```
127
+
128
+ When `inputBeadId` is set, `bead_context` is set to `options.prompt`. The intention appears to be: if the task was sourced from a bead, expose the bead's content as `$bead_context`. But the caller already provides the bead content as `options.prompt` — so this is effectively duplicating the prompt. The test confirms this is deliberate ("Bead=# Task: Refactor auth" equals the prompt value). The variable naming is slightly misleading: `bead_context` sounds like extra context added to the prompt, but it is actually the same as `$prompt`. A comment clarifying this equivalence would help future readers.
129
+
130
+ ---
131
+
132
+ ## Summary Table
133
+
134
+ | Severity | Issue | Location |
135
+ |----------|-------|----------|
136
+ | Low (bug, cosmetic) | Duplicate `sessionBackend` assignment | Lines 208, 210 |
137
+ | Medium (security) | `execSync` with unsanitised shell string | Lines 59, `runScript` function |
138
+ | Low (fragility) | Index-coupled filter for pre-script inject logic | Lines 119–121 |
139
+ | Low (silent no-op) | `sessionPath` option accepted but never used | `RunOptions`, `run()` |
140
+ | Low (misleading state) | `backendOverride` used as initial registry backend before `onMeta` fires | `startAsync`, lines 287–290 |
141
+ | Trivial | Quadruplicated JSDoc comment on `startAsync` | Lines 274–277 |
142
+ | Low (style) | Dynamic import of `readFile` inside hot path | Line 145 |
143
+ | Low (style) | `getState` in `SessionFactory` Pick but never called | Line 32 |
144
+ | Informational | `bead_context` === `prompt` when `inputBeadId` set — naming could mislead | Lines 125–126 |
145
+
146
+ The most impactful issue is the shell injection risk in `runScript`. Everything else is low severity or cosmetic.
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 48186,
3
+ "duration_ms": 109896,
4
+ "total_duration_seconds": 109.9
5
+ }
@@ -0,0 +1,89 @@
1
+ # Test Coverage Task — With Skill
2
+
3
+ ## Task
4
+
5
+ Identify and fill coverage gaps in `tests/unit/specialist/loader.test.ts` for `src/specialist/loader.ts`.
6
+
7
+ ## Skill Guidance Applied
8
+
9
+ Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md` before starting.
10
+
11
+ The skill says to delegate when a task:
12
+ - Would take >5 minutes of focused work
13
+ - Spans multiple files or modules
14
+ - Benefits from a fresh, focused run
15
+
16
+ This task involved analyzing a source file, cross-referencing against an existing test file, identifying untested behaviors, writing new tests across two describe blocks, and running the suite to verify. It spans 2 source files + schema context, and would take >5 minutes to do carefully. Decision: delegate.
17
+
18
+ ## What Was Done
19
+
20
+ ### Step 1: Read the skill
21
+
22
+ Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md`. Confirmed delegation is appropriate.
23
+
24
+ ### Step 2: Gathered context (both files read in parallel)
25
+
26
+ - `/home/dawid/projects/specialists/src/specialist/loader.ts` — source under test
27
+ - `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — existing tests
28
+ - `/home/dawid/projects/specialists/src/specialist/schema.ts` — schema used by loader
29
+
30
+ ### Step 3: Listed available specialists
31
+
32
+ Ran `specialists list`. Found `test-runner` (anthropic/claude-haiku-4-5) — focused on running tests, interpreting failures, and writing fixes.
33
+
34
+ ### Step 4: Delegated via start_specialist
35
+
36
+ Used `mcp__specialists__start_specialist` with specialist `test-runner` and a detailed prompt containing:
37
+ - Full source of `loader.ts`
38
+ - Full source of existing `loader.test.ts`
39
+ - Explicit list of all coverage gaps identified during pre-analysis
40
+ - Instructions to write tests, run the suite, and fix failures
41
+
42
+ Job ID: `c28f2b8f-39cf-4903-8d88-c815c1716c0c`
43
+ Bead: `unitAI-laiv`
44
+
45
+ ### Step 5: Polled for results
46
+
47
+ Used `mcp__specialists__poll_specialist` to wait for completion. Job completed in ~53 seconds with status `done`.
48
+
49
+ ### Step 6: Verified
50
+
51
+ Read the updated test file and ran `bun --bun vitest run tests/unit/specialist/loader.test.ts` locally. Result: **27 tests passed, 0 failed**.
52
+
53
+ ## Coverage Gaps Filled
54
+
55
+ ### checkStaleness (entire function — zero prior coverage)
56
+
57
+ Added `describe('checkStaleness', ...)` block with 10 tests:
58
+ - Returns `OK` when `filestoWatch` is absent
59
+ - Returns `OK` when `filestoWatch` is empty
60
+ - Returns `OK` when `updated` is absent
61
+ - Returns `OK` when `updated` is an invalid date string
62
+ - Returns `OK` when watched files have NOT changed since `updated`
63
+ - Returns `OK` when watched file does not exist (stat fails gracefully)
64
+ - Returns `STALE` when a watched file was modified after `updated`
65
+ - Returns `AGED` when stale AND `daysSinceUpdate > staleThresholdDays`
66
+ - Returns `STALE` (not `AGED`) when stale but within threshold
67
+ - Returns `STALE` when stale and no `staleThresholdDays` set
68
+
69
+ ### SpecialistLoader — additional it() blocks
70
+
71
+ - Discovers specialists in `.claude/specialists/` with `scope='project'`
72
+ - Discovers specialists in `.agent-forge/specialists/` with `scope='project'`
73
+ - Discovers specialists in user dir with `scope='user'`
74
+ - `list(category)` filters by category
75
+ - `list(category)` returns empty when no matches
76
+ - Ignores non-`.specialist.yaml` files
77
+ - `invalidateCache(name)` clears only the named entry
78
+ - `invalidateCache()` with no args clears all entries
79
+ - `get()` resolves `~/` prefixed skill paths to homedir-relative absolute paths
80
+ - `get()` resolves `./` prefixed skill paths relative to specialist file directory
81
+ - `get()` leaves absolute skill paths unchanged
82
+
83
+ ## Files Modified
84
+
85
+ - `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — 6 original tests, 21 new tests added (27 total)
86
+
87
+ ## Why Delegate vs Do Myself
88
+
89
+ The skill explicitly says "when in doubt, delegate" and that delegation produces higher quality results because the specialist has no competing context. Even though I had read both files before delegating, the analysis + careful test writing across multiple describe blocks, plus the test run + verification loop, justified delegation. The specialist completed in 53 seconds with all tests passing on first attempt — no iteration needed.
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 38715,
3
+ "duration_ms": 205928,
4
+ "total_duration_seconds": 205.9
5
+ }
@@ -0,0 +1,96 @@
1
+ # Test Coverage Gap Analysis and Additions: loader.ts
2
+
3
+ ## Files Read
4
+
5
+ - `/home/dawid/projects/specialists/src/specialist/loader.ts` — source under test
6
+ - `/home/dawid/projects/specialists/src/specialist/schema.ts` — Zod schema for Specialist type
7
+ - `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — existing tests (read before and after edits)
8
+
9
+ ## Existing Coverage (6 tests)
10
+
11
+ | Test | What it covers |
12
+ |------|----------------|
13
+ | discovers specialists in project specialists/ dir | `list()` finds `specialists/` dir, returns scope=project |
14
+ | returns empty list when no specialists | `list()` on empty dirs |
15
+ | loads and caches a specialist by name | `get()` basic load + cache hit (same reference) |
16
+ | throws when specialist not found | `get()` error path |
17
+ | warns to stderr and skips invalid YAML | `list()` error handling, stderr output |
18
+ | project-level overrides user-level (same name) | deduplication, first-wins logic |
19
+
20
+ ## Coverage Gaps Identified
21
+
22
+ ### 1. `checkStaleness()` — entirely untested (0 coverage)
23
+
24
+ This is an exported async function with 6 distinct return paths:
25
+ - Returns `'OK'` when `filestoWatch` is absent or empty
26
+ - Returns `'OK'` when `updated` is absent
27
+ - Returns `'OK'` when `updated` is an invalid date string (NaN guard)
28
+ - Returns `'OK'` when watched file does not exist on disk (`.catch(() => null)`)
29
+ - Returns `'OK'` when all watched files have mtimes older than `updated`
30
+ - Returns `'STALE'` when a watched file was modified after `updated`
31
+ - Returns `'AGED'` when stale AND `daysSinceUpdate > staleThresholdDays`
32
+ - Returns `'STALE'` (not `'AGED'`) when stale but within threshold, or no threshold set
33
+
34
+ ### 2. Alternate project-scope discovery dirs — untested
35
+
36
+ `getScanDirs()` scans three project directories: `specialists/`, `.claude/specialists/`, `.agent-forge/specialists/`. Only the first was tested.
37
+
38
+ ### 3. User-scope listing — untested
39
+
40
+ No test verified that a specialist found in `userDir` gets `scope: 'user'`.
41
+
42
+ ### 4. `list()` category filter — untested
43
+
44
+ The `category` parameter to `list()` was never exercised, including the case where it matches nothing.
45
+
46
+ ### 5. Non-.specialist.yaml files are ignored — untested
47
+
48
+ The `.filter(f => f.endsWith('.specialist.yaml'))` guard had no explicit test.
49
+
50
+ ### 6. `get()` skills path resolution — untested (3 branches)
51
+
52
+ `get()` resolves `~/`, `./`, and absolute paths differently. None of the three branches were tested.
53
+
54
+ ### 7. `invalidateCache()` — untested (2 branches)
55
+
56
+ - `invalidateCache(name)` — removes one entry, leaves others intact
57
+ - `invalidateCache()` — clears the entire cache
58
+
59
+ ## Tests Added (21 new tests)
60
+
61
+ ### SpecialistLoader describe block (new tests)
62
+
63
+ 1. **discovers specialists in .claude/specialists/ dir with project scope** — covers second scan dir
64
+ 2. **discovers specialists in .agent-forge/specialists/ dir with project scope** — covers third scan dir
65
+ 3. **discovers specialists in user dir with user scope** — covers user scope
66
+ 4. **filters list() by category** — category filter returns only matching specialists
67
+ 5. **list() returns all specialists when category filter matches none** — empty result from filter
68
+ 6. **ignores files that do not end with .specialist.yaml** — extension filter guard
69
+ 7. **invalidateCache() by name removes only that entry** — partial cache invalidation
70
+ 8. **invalidateCache() without name clears all cached entries** — full cache clear
71
+ 9. **get() resolves ~/ prefixed skill paths to absolute home-relative paths** — tilde expansion
72
+ 10. **get() resolves ./ prefixed skill paths relative to specialist file directory** — relative expansion
73
+ 11. **get() leaves absolute skill paths unchanged** — absolute passthrough
74
+
75
+ ### checkStaleness describe block (10 new tests)
76
+
77
+ 1. **returns OK when filestoWatch is absent** — no watch config
78
+ 2. **returns OK when filestoWatch is empty** — empty array guard
79
+ 3. **returns OK when updated is absent** — missing updated field
80
+ 4. **returns OK when updated is an invalid date string** — NaN guard
81
+ 5. **returns OK when all watched files have not changed since updated** — files older than updated
82
+ 6. **returns OK when watched file does not exist** — stat failure catch
83
+ 7. **returns STALE when a watched file was modified after updated** — core STALE path
84
+ 8. **returns AGED when file is stale and daysSinceUpdate exceeds staleThresholdDays** — AGED path
85
+ 9. **returns STALE (not AGED) when stale but daysSinceUpdate is within staleThresholdDays** — threshold not exceeded
86
+ 10. **returns STALE when stale and no staleThresholdDays is set** — stale without threshold
87
+
88
+ ## Results
89
+
90
+ ```
91
+ Tests: 27 passed (6 original + 21 new)
92
+ Test Files: 1 passed
93
+ Duration: ~67ms
94
+ ```
95
+
96
+ All tests pass. No source files were modified.
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 30589,
3
+ "duration_ms": 124787,
4
+ "total_duration_seconds": 124.8
5
+ }
@@ -0,0 +1,237 @@
1
+ ---
2
+ name: specialists-usage
3
+ description: >
4
+ How to use the specialists MCP server and CLI to delegate work to specialist AI agents.
5
+ Use this skill whenever you need to run a specialist, decide whether to delegate a task,
6
+ track work with --bead, monitor background jobs, read results, or understand the
7
+ bead-first workflow. Also use it when the user asks about specialists run, feed, result,
8
+ stop, list, init, doctor, or any MCP tool like use_specialist or start_specialist.
9
+ Consult this skill proactively when planning any task that might benefit from delegation —
10
+ don't wait for the user to explicitly say "use a specialist".
11
+ version: 2.0
12
+ ---
13
+
14
+ # Specialists Usage
15
+
16
+ > Specialists are autonomous AI agents optimised for heavy tasks. Use them instead of doing
17
+ > the work yourself when the task benefits from a dedicated expert, a second opinion, or a
18
+ > model tuned for the workload.
19
+
20
+ ## When to Use a Specialist
21
+
22
+ | Use a specialist | Do it yourself |
23
+ |-----------------|---------------|
24
+ | Code review / security audit | Single-file edit |
25
+ | Deep bug investigation | Quick config change |
26
+ | Architecture analysis | Short read-only query |
27
+ | Test generation for a module | Obvious one-liner |
28
+ | Refactoring across many files | Simple documentation update |
29
+ | Long-running analysis (>5 min) | Trivial formatting fix |
30
+
31
+ **Rule of thumb**: if the task would take you >5 minutes or benefit from a second opinion, delegate it.
32
+
33
+ ---
34
+
35
+ ## Primary Workflow: Bead-First (Tracked Work)
36
+
37
+ The canonical pattern for any real work. Always use `--bead` for tracked tasks.
38
+
39
+ ```bash
40
+ # 1. Create a bead to track the work
41
+ bd create --title "Review auth module for security issues" --type task --priority 2
42
+
43
+ # 2. Run the specialist, passing the bead ID
44
+ specialists run code-review --bead unitAI-abc [--context-depth 1] [--background]
45
+
46
+ # 3. Monitor progress
47
+ specialists feed -f # follow all active jobs
48
+
49
+ # 4. Close the bead when done
50
+ bd close unitAI-abc --reason "Review complete, 3 issues found"
51
+ ```
52
+
53
+ **`--context-depth N`** — controls how much bead context is injected into the specialist's
54
+ system prompt. Defaults to `1` (immediate bead only). Increase for deeper dependency context.
55
+
56
+ **`--no-beads`** — skips creating a new tracking bead for this run. Does **not** disable
57
+ reading the input bead passed via `--bead`. Use when you don't want an auto-created sub-issue.
58
+
59
+ ---
60
+
61
+ ## Secondary Workflow: Ad-Hoc (Untracked Work)
62
+
63
+ For quick, exploratory, or one-off tasks with no issue to track.
64
+
65
+ ```bash
66
+ specialists run codebase-explorer --prompt "Map the CLI architecture"
67
+ ```
68
+
69
+ Use `--prompt` only when there's no bead. For anything worth tracking, use `--bead`.
70
+
71
+ ---
72
+
73
+ ## Discovery
74
+
75
+ ```bash
76
+ specialists list # all specialists in this project
77
+ specialists list --category analysis # filter by category
78
+ specialists list --json # machine-readable
79
+ ```
80
+
81
+ Specialists are **project-scoped only** — loaded from `./specialists/*.specialist.yaml`.
82
+ User-scope (`~/.specialists/`) is deprecated.
83
+
84
+ ---
85
+
86
+ ## Running a Specialist
87
+
88
+ ### Foreground (streams output in real time)
89
+
90
+ ```bash
91
+ specialists run <name> --bead <id>
92
+ specialists run <name> --prompt "..." # ad-hoc only
93
+ ```
94
+
95
+ - Output streams to stdout as tokens arrive
96
+ - Ctrl+C sends SIGTERM (clean stop)
97
+ - Exit code 0 = success
98
+
99
+ ### Background (returns immediately, job runs async)
100
+
101
+ ```bash
102
+ specialists run <name> --bead <id> --background
103
+ # → Job started: job_a1b2c3d4
104
+ ```
105
+
106
+ Use background mode for tasks that will take >30 seconds, or when you want to keep working
107
+ while the specialist runs.
108
+
109
+ ### Other run flags
110
+
111
+ | Flag | Purpose |
112
+ |------|---------|
113
+ | `--model <model>` | Override model for this run only |
114
+ | `--no-beads` | Skip creating auto-tracking bead (still reads `--bead` input) |
115
+ | `--context-depth N` | Bead context depth, default 1 |
116
+ | stdin | Pipe a prompt: `cat brief.md \| specialists run code-review --bead <id>` |
117
+
118
+ ---
119
+
120
+ ## Background Job Lifecycle
121
+
122
+ ```
123
+ specialists run --background
124
+
125
+
126
+ job_a1b2c3d4 [starting]
127
+
128
+
129
+ job_a1b2c3d4 [running] ← specialists feed -f
130
+
131
+ ├─► done → specialists result <id>
132
+ └─► error → specialists feed <id> (see error event)
133
+ ```
134
+
135
+ ### Monitor with feed
136
+
137
+ ```bash
138
+ specialists feed # snapshot of all jobs
139
+ specialists feed -f # follow all active jobs (live)
140
+ specialists feed job_a1b2c3d4 --follow # follow a specific job
141
+ ```
142
+
143
+ Event types in the feed:
144
+ - `text` — streamed output token
145
+ - `thinking` — model reasoning token
146
+ - `tool` — specialist calling a tool (phase: start/end)
147
+ - `run_complete` — specialist finished
148
+
149
+ ### Read the result
150
+
151
+ ```bash
152
+ specialists result job_a1b2c3d4 # prints output, exits 1 if still running
153
+ specialists result job_a1b2c3d4 > out.md # capture to file
154
+ ```
155
+
156
+ ### Cancel
157
+
158
+ ```bash
159
+ specialists stop job_a1b2c3d4 # sends SIGTERM
160
+ ```
161
+
162
+ ### Completion Banner
163
+
164
+ When a background job completes, the next prompt you submit will show:
165
+
166
+ ```
167
+ [Specialist 'code-review' completed (job job_a1b2c3d4, 42s). Run: specialists result job_a1b2c3d4]
168
+ ```
169
+
170
+ This is injected by the `specialists-complete` hook. Retrieve the result with the shown command.
171
+
172
+ ---
173
+
174
+ ## MCP Tools (Claude Code)
175
+
176
+ Available after `specialists init` has been run and Claude Code restarted.
177
+
178
+ | Tool | When to use |
179
+ |------|-------------|
180
+ | `specialist_init` | **Start of every session** — bootstraps context, lists specialists |
181
+ | `list_specialists` | Discover specialists programmatically |
182
+ | `use_specialist` | **Preferred for foreground runs** — full lifecycle, pass `bead_id` for tracked work |
183
+ | `start_specialist` | Start async job, returns job ID immediately |
184
+ | `poll_specialist` | Check job status and read delta output |
185
+ | `stop_specialist` | Cancel a running job |
186
+ | `run_parallel` | Run multiple specialists concurrently or as a pipeline |
187
+ | `specialist_status` | Circuit breaker health + staleness info |
188
+
189
+ **Recommended pattern for tracked work:**
190
+
191
+ ```
192
+ 1. specialist_init ← bootstrap once per session
193
+ 2. use_specialist(name, prompt, bead_id=id) ← foreground, bead context injected
194
+ OR
195
+ 2. start_specialist(name, prompt, bead_id=id) ← async for long tasks
196
+ 3. poll_specialist(job_id) ← repeat until status=done
197
+ ```
198
+
199
+ ---
200
+
201
+ ## Project Setup
202
+
203
+ ```bash
204
+ specialists init # creates specialists/, .specialists/, injects AGENTS.md workflow
205
+ specialists list # verify specialists are discovered
206
+ ```
207
+
208
+ Run `specialists init` once per project root. `specialists install` and `specialists setup`
209
+ are **deprecated** — they redirect to `init`.
210
+
211
+ ---
212
+
213
+ ## Editing Specialists
214
+
215
+ ```bash
216
+ specialists edit code-review --model anthropic/claude-sonnet-4-6
217
+ specialists edit code-review --timeout 180000
218
+ specialists edit code-review --permission HIGH
219
+ specialists edit code-review --description "Updated description"
220
+ specialists edit code-review --dry-run # preview without writing
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Troubleshooting
226
+
227
+ ```bash
228
+ specialists doctor # detailed checks with fix hints (hooks, MCP, zombie jobs, pi)
229
+ specialists status # system health overview
230
+ ```
231
+
232
+ Common issues:
233
+ - **"specialist not found"** → run `specialists list`; only project-scope is searched
234
+ - **Job hangs** → check `specialists feed <id>` for stall; use `specialists stop`
235
+ - **MCP tools missing** → run `specialists init` then restart Claude Code
236
+ - **Hook not firing** → run `specialists doctor` to verify hook wiring
237
+ - **Invalid YAML skipped** → warnings now print to stderr with filename and reason