@jaggerxtrm/specialists 3.3.1 → 3.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/config/hooks/specialists-complete.mjs +60 -0
- package/config/hooks/specialists-session-start.mjs +120 -0
- package/config/skills/specialists-creator/SKILL.md +506 -0
- package/config/skills/specialists-creator/scripts/validate-specialist.ts +41 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/old_skill/outputs/result.md +105 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/with_skill/outputs/result.md +93 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/old_skill/outputs/result.md +113 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/with_skill/outputs/result.md +131 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/old_skill/outputs/result.md +159 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/with_skill/outputs/result.md +150 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/outputs/result.md +180 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/outputs/result.md +223 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/outputs/result.md +146 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/outputs/result.md +89 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/outputs/result.md +96 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/skill-snapshot/SKILL.md.old +237 -0
- package/config/skills/using-specialists/SKILL.md +158 -0
- package/config/skills/using-specialists/evals/evals.json +68 -0
- package/config/specialists/.serena/project.yml +151 -0
- package/config/specialists/auto-remediation.specialist.yaml +70 -0
- package/config/specialists/bug-hunt.specialist.yaml +96 -0
- package/config/specialists/explorer.specialist.yaml +79 -0
- package/config/specialists/memory-processor.specialist.yaml +140 -0
- package/config/specialists/overthinker.specialist.yaml +63 -0
- package/config/specialists/parallel-runner.specialist.yaml +61 -0
- package/config/specialists/planner.specialist.yaml +87 -0
- package/config/specialists/specialists-creator.specialist.yaml +82 -0
- package/config/specialists/sync-docs.specialist.yaml +53 -0
- package/config/specialists/test-runner.specialist.yaml +58 -0
- package/config/specialists/xt-merge.specialist.yaml +78 -0
- package/package.json +2 -3
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Code Review: src/specialist/runner.ts
|
|
2
|
+
|
|
3
|
+
## Approach
|
|
4
|
+
|
|
5
|
+
I read the following files to build full context before reviewing:
|
|
6
|
+
|
|
7
|
+
- `/home/dawid/projects/specialists/src/specialist/runner.ts` — the target file (303 lines)
|
|
8
|
+
- `/home/dawid/projects/specialists/src/specialist/beads.ts` — BeadsClient interface
|
|
9
|
+
- `/home/dawid/projects/specialists/src/specialist/jobRegistry.ts` — async job state management
|
|
10
|
+
- `/home/dawid/projects/specialists/src/specialist/hooks.ts` — HookEmitter
|
|
11
|
+
- `/home/dawid/projects/specialists/src/specialist/loader.ts` — SpecialistLoader
|
|
12
|
+
- `/home/dawid/projects/specialists/src/specialist/schema.ts` — Zod schema / types
|
|
13
|
+
- `/home/dawid/projects/specialists/src/specialist/templateEngine.ts` — renderTemplate
|
|
14
|
+
- `/home/dawid/projects/specialists/src/pi/session.ts` — PiAgentSession, SessionKilledError
|
|
15
|
+
- `/home/dawid/projects/specialists/src/utils/circuitBreaker.ts` — CircuitBreaker
|
|
16
|
+
- `/home/dawid/projects/specialists/tests/unit/specialist/runner.test.ts`
|
|
17
|
+
- `/home/dawid/projects/specialists/tests/unit/specialist/runner-scripts.test.ts`
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Findings
|
|
22
|
+
|
|
23
|
+
### Bug: Duplicate `sessionBackend` assignment (lines 208 and 210)
|
|
24
|
+
|
|
25
|
+
```typescript
|
|
26
|
+
sessionBackend = session.meta.backend;
|
|
27
|
+
output = await session.getLastOutput();
|
|
28
|
+
sessionBackend = session.meta.backend; // capture before finally calls kill()
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
`sessionBackend` is assigned on line 208, then again identically on line 210. The comment on line 210 says "capture before finally calls kill()" — but that is what line 208 does. The second assignment is dead code. The problem is that `getLastOutput()` (line 209) is async and could theoretically throw before line 210 is reached, which is exactly why the first capture on line 208 matters. However the second assignment is still redundant and misleading. If `getLastOutput()` throws, control jumps to the catch block before either assignment after it is reached — so the defensive intent of line 210 is already served by line 208.
|
|
32
|
+
|
|
33
|
+
This is not a functional bug in practice (the value is the same both times), but it creates misleading intent and maintenance risk.
|
|
34
|
+
|
|
35
|
+
### Bug: `runScript` does not sanitise or validate the script path
|
|
36
|
+
|
|
37
|
+
```typescript
|
|
38
|
+
function runScript(scriptPath: string): ScriptResult {
|
|
39
|
+
try {
|
|
40
|
+
const output = execSync(scriptPath, { encoding: 'utf8', timeout: 30_000 });
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
`execSync` is called with a raw string from the specialist YAML (`s.path`). This passes the string directly to a shell. If a specialist YAML is written with a path containing shell metacharacters or if it is loaded from an untrusted source, this enables shell injection. A safer approach is to use `spawnSync` with an args array (the same pattern already used in `beads.ts`), or at minimum validate that the path refers to an actual file before executing. The same risk applies to path values that start with `~/` or `./` that are interpolated without being resolved first.
|
|
44
|
+
|
|
45
|
+
### Edge case: Pre-script filter logic is subtle and fragile (lines 119–121)
|
|
46
|
+
|
|
47
|
+
```typescript
|
|
48
|
+
const preResults = preScripts
|
|
49
|
+
.map(s => runScript(s.path))
|
|
50
|
+
.filter((_, i) => preScripts[i].inject_output);
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
All pre-scripts run unconditionally (`.map`), but only the ones with `inject_output: true` are collected in `preResults`. This means all scripts execute as side effects, while only some contribute output to the prompt. This behaviour is intentional per the test in `runner-scripts.test.ts` ("Script still runs (for side effects)"), but it is not documented at the call site. The coupling between the mapped index and `preScripts[i]` is fragile: if a future refactor introduces a flatMap or reorders the chain, the index-based correlation silently breaks. Extracting this into a named function or a `reduce` would make it safer.
|
|
54
|
+
|
|
55
|
+
### Edge case: `post_execute` hook is emitted twice on success — once implicitly never on failure path
|
|
56
|
+
|
|
57
|
+
On success, `post_execute` is emitted at line 250 with `status: 'COMPLETE'`. On failure, it is emitted at line 233 inside the catch block with `status: 'ERROR'` or `'CANCELLED'`. This is correct.
|
|
58
|
+
|
|
59
|
+
However, there is no `post_execute` emission in the `finally` block. This means if the `finally` block itself somehow throws (unlikely since `session?.kill()` is a no-op after `close()`), no hook is emitted. This is a very low risk edge case but worth noting for observability completeness.
|
|
60
|
+
|
|
61
|
+
### Edge case: Circuit breaker records success against `model`, not `sessionBackend`
|
|
62
|
+
|
|
63
|
+
```typescript
|
|
64
|
+
circuitBreaker.recordSuccess(model);
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
`model` is the resolved model string (possibly a fallback). `sessionBackend` is the actual backend string returned from the pi session meta (e.g. `"google-gemini-cli"`). These may differ in format. The circuit breaker is keyed by whatever string is passed — so `recordSuccess("gemini")` and `recordFailure("gemini")` must use the same key. Currently both use `model`, which is consistent. But `onMeta` can update `sessionBackend` to a different string (`provider` from pi's message_start event). If a caller were ever to rely on `sessionBackend` as a circuit-breaker key, mismatches would occur silently. The inconsistency in semantics (model string vs. backend/provider string) is worth documenting.
|
|
68
|
+
|
|
69
|
+
### Edge case: `sessionPath` option is declared but never used
|
|
70
|
+
|
|
71
|
+
```typescript
|
|
72
|
+
export interface RunOptions {
|
|
73
|
+
/** Path to an existing pi session file for continuation (Phase 2+) */
|
|
74
|
+
sessionPath?: string;
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
`sessionPath` is accepted in `RunOptions` but never read or passed to the session factory. The comment says "Phase 2+" implying it is intentionally deferred, but there is no guard, warning, or error when it is provided. A caller passing `sessionPath` expecting continuation behaviour would get silent no-op behaviour instead.
|
|
79
|
+
|
|
80
|
+
### Edge case: `backendOverride` initialises registry `backend` field with `'starting'` string
|
|
81
|
+
|
|
82
|
+
```typescript
|
|
83
|
+
registry.register(jobId, {
|
|
84
|
+
backend: options.backendOverride ?? 'starting',
|
|
85
|
+
model: '?',
|
|
86
|
+
specialistVersion,
|
|
87
|
+
});
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
When `backendOverride` is not set, the registry records `backend: 'starting'`. This is a sentinel, not an actual backend name. If a polling client reads the snapshot before `onMeta` fires, it sees `backend: 'starting'` — which is fine for display. However if the caller passes `backendOverride`, it is used as the initial `backend` value even though the actual backend used by the pi session may differ (since `backendOverride` goes through circuit breaker fallback logic). This means an async job snapshot may show an incorrect backend name until `onMeta` fires and `setMeta` is called.
|
|
91
|
+
|
|
92
|
+
### Code quality: Duplicated JSDoc comment (lines 274–277)
|
|
93
|
+
|
|
94
|
+
```typescript
|
|
95
|
+
/** Fire-and-forget: registers job in registry, returns job_id immediately. */
|
|
96
|
+
/** Fire-and-forget: registers job in registry, returns job_id immediately. */
|
|
97
|
+
/** Fire-and-forget: registers job in registry, returns job_id immediately. */
|
|
98
|
+
/** Fire-and-forget: registers job in registry, returns job_id immediately. */
|
|
99
|
+
async startAsync(...)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The same JSDoc comment is repeated four times. This is a copy-paste artifact with no functional impact, but it is noise that should be cleaned up.
|
|
103
|
+
|
|
104
|
+
### Code quality: Dynamic import inside hot path (line 145)
|
|
105
|
+
|
|
106
|
+
```typescript
|
|
107
|
+
const { readFile } = await import('node:fs/promises');
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
`readFile` is dynamically imported inside the `run()` method. The static import of `writeFile` from the same module already exists at line 3. This is redundant — `readFile` should be added to the existing static import at the top of the file. Dynamic imports inside hot-path async functions add minor per-call overhead (though Node.js caches them) and make the dependency surface less obvious at a glance.
|
|
111
|
+
|
|
112
|
+
### Code quality: `sessionFactory` type parameter bound is unnecessarily wide
|
|
113
|
+
|
|
114
|
+
```typescript
|
|
115
|
+
export type SessionFactory = (opts: PiSessionOptions) => Promise<Pick<PiAgentSession, 'start' | 'prompt' | 'waitForDone' | 'getLastOutput' | 'getState' | 'close' | 'kill' | 'meta'>>;
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
`getState` is listed in the `Pick` but is never called on the session inside `runner.ts`. The test mock includes it too. Removing unused methods from the `Pick` would tighten the contract and make mocking simpler, though this is a low-priority style concern.
|
|
119
|
+
|
|
120
|
+
### Code quality: `beadVariables` uses `options.prompt` for `bead_context` (line 125)
|
|
121
|
+
|
|
122
|
+
```typescript
|
|
123
|
+
const beadVariables = options.inputBeadId
|
|
124
|
+
? { bead_context: options.prompt, bead_id: options.inputBeadId }
|
|
125
|
+
: {};
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
When `inputBeadId` is set, `bead_context` is set to `options.prompt`. The intention appears to be: if the task was sourced from a bead, expose the bead's content as `$bead_context`. But the caller already provides the bead content as `options.prompt` — so this is effectively duplicating the prompt. The test confirms this is deliberate ("Bead=# Task: Refactor auth" equals the prompt value). The variable naming is slightly misleading: `bead_context` sounds like extra context added to the prompt, but it is actually the same as `$prompt`. A comment clarifying this equivalence would help future readers.
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Summary Table
|
|
133
|
+
|
|
134
|
+
| Severity | Issue | Location |
|
|
135
|
+
|----------|-------|----------|
|
|
136
|
+
| Low (bug, cosmetic) | Duplicate `sessionBackend` assignment | Lines 208, 210 |
|
|
137
|
+
| Medium (security) | `execSync` with unsanitised shell string | Lines 59, `runScript` function |
|
|
138
|
+
| Low (fragility) | Index-coupled filter for pre-script inject logic | Lines 119–121 |
|
|
139
|
+
| Low (silent no-op) | `sessionPath` option accepted but never used | `RunOptions`, `run()` |
|
|
140
|
+
| Low (misleading state) | `backendOverride` used as initial registry backend before `onMeta` fires | `startAsync`, lines 287–290 |
|
|
141
|
+
| Trivial | Quadruplicated JSDoc comment on `startAsync` | Lines 274–277 |
|
|
142
|
+
| Low (style) | Dynamic import of `readFile` inside hot path | Line 145 |
|
|
143
|
+
| Low (style) | `getState` in `SessionFactory` Pick but never called | Line 32 |
|
|
144
|
+
| Informational | `bead_context` === `prompt` when `inputBeadId` set — naming could mislead | Lines 125–126 |
|
|
145
|
+
|
|
146
|
+
The most impactful issue is the shell injection risk in `runScript`. Everything else is low severity or cosmetic.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Test Coverage Task — With Skill
|
|
2
|
+
|
|
3
|
+
## Task
|
|
4
|
+
|
|
5
|
+
Identify and fill coverage gaps in `tests/unit/specialist/loader.test.ts` for `src/specialist/loader.ts`.
|
|
6
|
+
|
|
7
|
+
## Skill Guidance Applied
|
|
8
|
+
|
|
9
|
+
Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md` before starting.
|
|
10
|
+
|
|
11
|
+
The skill says to delegate when a task:
|
|
12
|
+
- Would take >5 minutes of focused work
|
|
13
|
+
- Spans multiple files or modules
|
|
14
|
+
- Benefits from a fresh, focused run
|
|
15
|
+
|
|
16
|
+
This task involved analyzing a source file, cross-referencing against an existing test file, identifying untested behaviors, writing new tests across two describe blocks, and running the suite to verify. It spans 2 source files + schema context, and would take >5 minutes to do carefully. Decision: delegate.
|
|
17
|
+
|
|
18
|
+
## What Was Done
|
|
19
|
+
|
|
20
|
+
### Step 1: Read the skill
|
|
21
|
+
|
|
22
|
+
Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md`. Confirmed delegation is appropriate.
|
|
23
|
+
|
|
24
|
+
### Step 2: Gathered context (both files read in parallel)
|
|
25
|
+
|
|
26
|
+
- `/home/dawid/projects/specialists/src/specialist/loader.ts` — source under test
|
|
27
|
+
- `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — existing tests
|
|
28
|
+
- `/home/dawid/projects/specialists/src/specialist/schema.ts` — schema used by loader
|
|
29
|
+
|
|
30
|
+
### Step 3: Listed available specialists
|
|
31
|
+
|
|
32
|
+
Ran `specialists list`. Found `test-runner` (anthropic/claude-haiku-4-5) — focused on running tests, interpreting failures, and writing fixes.
|
|
33
|
+
|
|
34
|
+
### Step 4: Delegated via start_specialist
|
|
35
|
+
|
|
36
|
+
Used `mcp__specialists__start_specialist` with specialist `test-runner` and a detailed prompt containing:
|
|
37
|
+
- Full source of `loader.ts`
|
|
38
|
+
- Full source of existing `loader.test.ts`
|
|
39
|
+
- Explicit list of all coverage gaps identified during pre-analysis
|
|
40
|
+
- Instructions to write tests, run the suite, and fix failures
|
|
41
|
+
|
|
42
|
+
Job ID: `c28f2b8f-39cf-4903-8d88-c815c1716c0c`
|
|
43
|
+
Bead: `unitAI-laiv`
|
|
44
|
+
|
|
45
|
+
### Step 5: Polled for results
|
|
46
|
+
|
|
47
|
+
Used `mcp__specialists__poll_specialist` to wait for completion. Job completed in ~53 seconds with status `done`.
|
|
48
|
+
|
|
49
|
+
### Step 6: Verified
|
|
50
|
+
|
|
51
|
+
Read the updated test file and ran `bun --bun vitest run tests/unit/specialist/loader.test.ts` locally. Result: **27 tests passed, 0 failed**.
|
|
52
|
+
|
|
53
|
+
## Coverage Gaps Filled
|
|
54
|
+
|
|
55
|
+
### checkStaleness (entire function — zero prior coverage)
|
|
56
|
+
|
|
57
|
+
Added `describe('checkStaleness', ...)` block with 10 tests:
|
|
58
|
+
- Returns `OK` when `filestoWatch` is absent
|
|
59
|
+
- Returns `OK` when `filestoWatch` is empty
|
|
60
|
+
- Returns `OK` when `updated` is absent
|
|
61
|
+
- Returns `OK` when `updated` is an invalid date string
|
|
62
|
+
- Returns `OK` when watched files have NOT changed since `updated`
|
|
63
|
+
- Returns `OK` when watched file does not exist (stat fails gracefully)
|
|
64
|
+
- Returns `STALE` when a watched file was modified after `updated`
|
|
65
|
+
- Returns `AGED` when stale AND `daysSinceUpdate > staleThresholdDays`
|
|
66
|
+
- Returns `STALE` (not `AGED`) when stale but within threshold
|
|
67
|
+
- Returns `STALE` when stale and no `staleThresholdDays` set
|
|
68
|
+
|
|
69
|
+
### SpecialistLoader — additional it() blocks
|
|
70
|
+
|
|
71
|
+
- Discovers specialists in `.claude/specialists/` with `scope='project'`
|
|
72
|
+
- Discovers specialists in `.agent-forge/specialists/` with `scope='project'`
|
|
73
|
+
- Discovers specialists in user dir with `scope='user'`
|
|
74
|
+
- `list(category)` filters by category
|
|
75
|
+
- `list(category)` returns empty when no matches
|
|
76
|
+
- Ignores non-`.specialist.yaml` files
|
|
77
|
+
- `invalidateCache(name)` clears only the named entry
|
|
78
|
+
- `invalidateCache()` with no args clears all entries
|
|
79
|
+
- `get()` resolves `~/` prefixed skill paths to homedir-relative absolute paths
|
|
80
|
+
- `get()` resolves `./` prefixed skill paths relative to specialist file directory
|
|
81
|
+
- `get()` leaves absolute skill paths unchanged
|
|
82
|
+
|
|
83
|
+
## Files Modified
|
|
84
|
+
|
|
85
|
+
- `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — 6 original tests, 21 new tests added (27 total)
|
|
86
|
+
|
|
87
|
+
## Why Delegate vs Do Myself
|
|
88
|
+
|
|
89
|
+
The skill explicitly says "when in doubt, delegate" and that delegation produces higher quality results because the specialist has no competing context. Even though I had read both files before delegating, the analysis + careful test writing across multiple describe blocks, plus the test run + verification loop, justified delegation. The specialist completed in 53 seconds with all tests passing on first attempt — no iteration needed.
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
# Test Coverage Gap Analysis and Additions: loader.ts
|
|
2
|
+
|
|
3
|
+
## Files Read
|
|
4
|
+
|
|
5
|
+
- `/home/dawid/projects/specialists/src/specialist/loader.ts` — source under test
|
|
6
|
+
- `/home/dawid/projects/specialists/src/specialist/schema.ts` — Zod schema for Specialist type
|
|
7
|
+
- `/home/dawid/projects/specialists/tests/unit/specialist/loader.test.ts` — existing tests (read before and after edits)
|
|
8
|
+
|
|
9
|
+
## Existing Coverage (6 tests)
|
|
10
|
+
|
|
11
|
+
| Test | What it covers |
|
|
12
|
+
|------|----------------|
|
|
13
|
+
| discovers specialists in project specialists/ dir | `list()` finds `specialists/` dir, returns scope=project |
|
|
14
|
+
| returns empty list when no specialists | `list()` on empty dirs |
|
|
15
|
+
| loads and caches a specialist by name | `get()` basic load + cache hit (same reference) |
|
|
16
|
+
| throws when specialist not found | `get()` error path |
|
|
17
|
+
| warns to stderr and skips invalid YAML | `list()` error handling, stderr output |
|
|
18
|
+
| project-level overrides user-level (same name) | deduplication, first-wins logic |
|
|
19
|
+
|
|
20
|
+
## Coverage Gaps Identified
|
|
21
|
+
|
|
22
|
+
### 1. `checkStaleness()` — entirely untested (0 coverage)
|
|
23
|
+
|
|
24
|
+
This is an exported async function with 6 distinct return paths:
|
|
25
|
+
- Returns `'OK'` when `filestoWatch` is absent or empty
|
|
26
|
+
- Returns `'OK'` when `updated` is absent
|
|
27
|
+
- Returns `'OK'` when `updated` is an invalid date string (NaN guard)
|
|
28
|
+
- Returns `'OK'` when watched file does not exist on disk (`.catch(() => null)`)
|
|
29
|
+
- Returns `'OK'` when all watched files have mtimes older than `updated`
|
|
30
|
+
- Returns `'STALE'` when a watched file was modified after `updated`
|
|
31
|
+
- Returns `'AGED'` when stale AND `daysSinceUpdate > staleThresholdDays`
|
|
32
|
+
- Returns `'STALE'` (not `'AGED'`) when stale but within threshold, or no threshold set
|
|
33
|
+
|
|
34
|
+
### 2. Alternate project-scope discovery dirs — untested
|
|
35
|
+
|
|
36
|
+
`getScanDirs()` scans three project directories: `specialists/`, `.claude/specialists/`, `.agent-forge/specialists/`. Only the first was tested.
|
|
37
|
+
|
|
38
|
+
### 3. User-scope listing — untested
|
|
39
|
+
|
|
40
|
+
No test verified that a specialist found in `userDir` gets `scope: 'user'`.
|
|
41
|
+
|
|
42
|
+
### 4. `list()` category filter — untested
|
|
43
|
+
|
|
44
|
+
The `category` parameter to `list()` was never exercised, including the case where it matches nothing.
|
|
45
|
+
|
|
46
|
+
### 5. Non-.specialist.yaml files are ignored — untested
|
|
47
|
+
|
|
48
|
+
The `.filter(f => f.endsWith('.specialist.yaml'))` guard had no explicit test.
|
|
49
|
+
|
|
50
|
+
### 6. `get()` skills path resolution — untested (3 branches)
|
|
51
|
+
|
|
52
|
+
`get()` resolves `~/`, `./`, and absolute paths differently. None of the three branches were tested.
|
|
53
|
+
|
|
54
|
+
### 7. `invalidateCache()` — untested (2 branches)
|
|
55
|
+
|
|
56
|
+
- `invalidateCache(name)` — removes one entry, leaves others intact
|
|
57
|
+
- `invalidateCache()` — clears the entire cache
|
|
58
|
+
|
|
59
|
+
## Tests Added (21 new tests)
|
|
60
|
+
|
|
61
|
+
### SpecialistLoader describe block (new tests)
|
|
62
|
+
|
|
63
|
+
1. **discovers specialists in .claude/specialists/ dir with project scope** — covers second scan dir
|
|
64
|
+
2. **discovers specialists in .agent-forge/specialists/ dir with project scope** — covers third scan dir
|
|
65
|
+
3. **discovers specialists in user dir with user scope** — covers user scope
|
|
66
|
+
4. **filters list() by category** — category filter returns only matching specialists
|
|
67
|
+
5. **list() returns all specialists when category filter matches none** — empty result from filter
|
|
68
|
+
6. **ignores files that do not end with .specialist.yaml** — extension filter guard
|
|
69
|
+
7. **invalidateCache() by name removes only that entry** — partial cache invalidation
|
|
70
|
+
8. **invalidateCache() without name clears all cached entries** — full cache clear
|
|
71
|
+
9. **get() resolves ~/ prefixed skill paths to absolute home-relative paths** — tilde expansion
|
|
72
|
+
10. **get() resolves ./ prefixed skill paths relative to specialist file directory** — relative expansion
|
|
73
|
+
11. **get() leaves absolute skill paths unchanged** — absolute passthrough
|
|
74
|
+
|
|
75
|
+
### checkStaleness describe block (10 new tests)
|
|
76
|
+
|
|
77
|
+
1. **returns OK when filestoWatch is absent** — no watch config
|
|
78
|
+
2. **returns OK when filestoWatch is empty** — empty array guard
|
|
79
|
+
3. **returns OK when updated is absent** — missing updated field
|
|
80
|
+
4. **returns OK when updated is an invalid date string** — NaN guard
|
|
81
|
+
5. **returns OK when all watched files have not changed since updated** — files older than updated
|
|
82
|
+
6. **returns OK when watched file does not exist** — stat failure catch
|
|
83
|
+
7. **returns STALE when a watched file was modified after updated** — core STALE path
|
|
84
|
+
8. **returns AGED when file is stale and daysSinceUpdate exceeds staleThresholdDays** — AGED path
|
|
85
|
+
9. **returns STALE (not AGED) when stale but daysSinceUpdate is within staleThresholdDays** — threshold not exceeded
|
|
86
|
+
10. **returns STALE when stale and no staleThresholdDays is set** — stale without threshold
|
|
87
|
+
|
|
88
|
+
## Results
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
Tests: 27 passed (6 original + 21 new)
|
|
92
|
+
Test Files: 1 passed
|
|
93
|
+
Duration: ~67ms
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
All tests pass. No source files were modified.
|
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: specialists-usage
|
|
3
|
+
description: >
|
|
4
|
+
How to use the specialists MCP server and CLI to delegate work to specialist AI agents.
|
|
5
|
+
Use this skill whenever you need to run a specialist, decide whether to delegate a task,
|
|
6
|
+
track work with --bead, monitor background jobs, read results, or understand the
|
|
7
|
+
bead-first workflow. Also use it when the user asks about specialists run, feed, result,
|
|
8
|
+
stop, list, init, doctor, or any MCP tool like use_specialist or start_specialist.
|
|
9
|
+
Consult this skill proactively when planning any task that might benefit from delegation —
|
|
10
|
+
don't wait for the user to explicitly say "use a specialist".
|
|
11
|
+
version: 2.0
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Specialists Usage
|
|
15
|
+
|
|
16
|
+
> Specialists are autonomous AI agents optimised for heavy tasks. Use them instead of doing
|
|
17
|
+
> the work yourself when the task benefits from a dedicated expert, a second opinion, or a
|
|
18
|
+
> model tuned for the workload.
|
|
19
|
+
|
|
20
|
+
## When to Use a Specialist
|
|
21
|
+
|
|
22
|
+
| Use a specialist | Do it yourself |
|
|
23
|
+
|-----------------|---------------|
|
|
24
|
+
| Code review / security audit | Single-file edit |
|
|
25
|
+
| Deep bug investigation | Quick config change |
|
|
26
|
+
| Architecture analysis | Short read-only query |
|
|
27
|
+
| Test generation for a module | Obvious one-liner |
|
|
28
|
+
| Refactoring across many files | Simple documentation update |
|
|
29
|
+
| Long-running analysis (>5 min) | Trivial formatting fix |
|
|
30
|
+
|
|
31
|
+
**Rule of thumb**: if the task would take you >5 minutes or benefit from a second opinion, delegate it.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Primary Workflow: Bead-First (Tracked Work)
|
|
36
|
+
|
|
37
|
+
The canonical pattern for any real work. Always use `--bead` for tracked tasks.
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# 1. Create a bead to track the work
|
|
41
|
+
bd create --title "Review auth module for security issues" --type task --priority 2
|
|
42
|
+
|
|
43
|
+
# 2. Run the specialist, passing the bead ID
|
|
44
|
+
specialists run code-review --bead unitAI-abc [--context-depth 1] [--background]
|
|
45
|
+
|
|
46
|
+
# 3. Monitor progress
|
|
47
|
+
specialists feed -f # follow all active jobs
|
|
48
|
+
|
|
49
|
+
# 4. Close the bead when done
|
|
50
|
+
bd close unitAI-abc --reason "Review complete, 3 issues found"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**`--context-depth N`** — controls how much bead context is injected into the specialist's
|
|
54
|
+
system prompt. Defaults to `1` (immediate bead only). Increase for deeper dependency context.
|
|
55
|
+
|
|
56
|
+
**`--no-beads`** — skips creating a new tracking bead for this run. Does **not** disable
|
|
57
|
+
reading the input bead passed via `--bead`. Use when you don't want an auto-created sub-issue.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Secondary Workflow: Ad-Hoc (Untracked Work)
|
|
62
|
+
|
|
63
|
+
For quick, exploratory, or one-off tasks with no issue to track.
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
specialists run codebase-explorer --prompt "Map the CLI architecture"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Use `--prompt` only when there's no bead. For anything worth tracking, use `--bead`.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Discovery
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
specialists list # all specialists in this project
|
|
77
|
+
specialists list --category analysis # filter by category
|
|
78
|
+
specialists list --json # machine-readable
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Specialists are **project-scoped only** — loaded from `./specialists/*.specialist.yaml`.
|
|
82
|
+
User-scope (`~/.specialists/`) is deprecated.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Running a Specialist
|
|
87
|
+
|
|
88
|
+
### Foreground (streams output in real time)
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
specialists run <name> --bead <id>
|
|
92
|
+
specialists run <name> --prompt "..." # ad-hoc only
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
- Output streams to stdout as tokens arrive
|
|
96
|
+
- Ctrl+C sends SIGTERM (clean stop)
|
|
97
|
+
- Exit code 0 = success
|
|
98
|
+
|
|
99
|
+
### Background (returns immediately, job runs async)
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
specialists run <name> --bead <id> --background
|
|
103
|
+
# → Job started: job_a1b2c3d4
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Use background mode for tasks that will take >30 seconds, or when you want to keep working
|
|
107
|
+
while the specialist runs.
|
|
108
|
+
|
|
109
|
+
### Other run flags
|
|
110
|
+
|
|
111
|
+
| Flag | Purpose |
|
|
112
|
+
|------|---------|
|
|
113
|
+
| `--model <model>` | Override model for this run only |
|
|
114
|
+
| `--no-beads` | Skip creating auto-tracking bead (still reads `--bead` input) |
|
|
115
|
+
| `--context-depth N` | Bead context depth, default 1 |
|
|
116
|
+
| stdin | Pipe a prompt: `cat brief.md \| specialists run code-review --bead <id>` |
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Background Job Lifecycle
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
specialists run --background
|
|
124
|
+
│
|
|
125
|
+
▼
|
|
126
|
+
job_a1b2c3d4 [starting]
|
|
127
|
+
│
|
|
128
|
+
▼
|
|
129
|
+
job_a1b2c3d4 [running] ← specialists feed -f
|
|
130
|
+
│
|
|
131
|
+
├─► done → specialists result <id>
|
|
132
|
+
└─► error → specialists feed <id> (see error event)
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Monitor with feed
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
specialists feed # snapshot of all jobs
|
|
139
|
+
specialists feed -f # follow all active jobs (live)
|
|
140
|
+
specialists feed job_a1b2c3d4 --follow # follow a specific job
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Event types in the feed:
|
|
144
|
+
- `text` — streamed output token
|
|
145
|
+
- `thinking` — model reasoning token
|
|
146
|
+
- `tool` — specialist calling a tool (phase: start/end)
|
|
147
|
+
- `run_complete` — specialist finished
|
|
148
|
+
|
|
149
|
+
### Read the result
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
specialists result job_a1b2c3d4 # prints output, exits 1 if still running
|
|
153
|
+
specialists result job_a1b2c3d4 > out.md # capture to file
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Cancel
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
specialists stop job_a1b2c3d4 # sends SIGTERM
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Completion Banner
|
|
163
|
+
|
|
164
|
+
When a background job completes, the next prompt you submit will show:
|
|
165
|
+
|
|
166
|
+
```
|
|
167
|
+
[Specialist 'code-review' completed (job job_a1b2c3d4, 42s). Run: specialists result job_a1b2c3d4]
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
This is injected by the `specialists-complete` hook. Retrieve the result with the shown command.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## MCP Tools (Claude Code)
|
|
175
|
+
|
|
176
|
+
Available after `specialists init` has been run and Claude Code restarted.
|
|
177
|
+
|
|
178
|
+
| Tool | When to use |
|
|
179
|
+
|------|-------------|
|
|
180
|
+
| `specialist_init` | **Start of every session** — bootstraps context, lists specialists |
|
|
181
|
+
| `list_specialists` | Discover specialists programmatically |
|
|
182
|
+
| `use_specialist` | **Preferred for foreground runs** — full lifecycle, pass `bead_id` for tracked work |
|
|
183
|
+
| `start_specialist` | Start async job, returns job ID immediately |
|
|
184
|
+
| `poll_specialist` | Check job status and read delta output |
|
|
185
|
+
| `stop_specialist` | Cancel a running job |
|
|
186
|
+
| `run_parallel` | Run multiple specialists concurrently or as a pipeline |
|
|
187
|
+
| `specialist_status` | Circuit breaker health + staleness info |
|
|
188
|
+
|
|
189
|
+
**Recommended pattern for tracked work:**
|
|
190
|
+
|
|
191
|
+
```
|
|
192
|
+
1. specialist_init ← bootstrap once per session
|
|
193
|
+
2. use_specialist(name, prompt, bead_id=id) ← foreground, bead context injected
|
|
194
|
+
OR
|
|
195
|
+
2. start_specialist(name, prompt, bead_id=id) ← async for long tasks
|
|
196
|
+
3. poll_specialist(job_id) ← repeat until status=done
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Project Setup
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
specialists init # creates specialists/, .specialists/, injects AGENTS.md workflow
|
|
205
|
+
specialists list # verify specialists are discovered
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
Run `specialists init` once per project root. `specialists install` and `specialists setup`
|
|
209
|
+
are **deprecated** — they redirect to `init`.
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Editing Specialists
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
specialists edit code-review --model anthropic/claude-sonnet-4-6
|
|
217
|
+
specialists edit code-review --timeout 180000
|
|
218
|
+
specialists edit code-review --permission HIGH
|
|
219
|
+
specialists edit code-review --description "Updated description"
|
|
220
|
+
specialists edit code-review --dry-run # preview without writing
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## Troubleshooting
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
specialists doctor # detailed checks with fix hints (hooks, MCP, zombie jobs, pi)
|
|
229
|
+
specialists status # system health overview
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
Common issues:
|
|
233
|
+
- **"specialist not found"** → run `specialists list`; only project-scope is searched
|
|
234
|
+
- **Job hangs** → check `specialists feed <id>` for stall; use `specialists stop`
|
|
235
|
+
- **MCP tools missing** → run `specialists init` then restart Claude Code
|
|
236
|
+
- **Hook not firing** → run `specialists doctor` to verify hook wiring
|
|
237
|
+
- **Invalid YAML skipped** → warnings now print to stderr with filename and reason
|