@jaggerxtrm/specialists 3.3.0 → 3.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/config/hooks/specialists-complete.mjs +60 -0
  2. package/config/hooks/specialists-session-start.mjs +120 -0
  3. package/config/skills/specialists-creator/SKILL.md +506 -0
  4. package/config/skills/specialists-creator/scripts/validate-specialist.ts +41 -0
  5. package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/old_skill/outputs/result.md +105 -0
  6. package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/with_skill/outputs/result.md +93 -0
  7. package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/old_skill/outputs/result.md +113 -0
  8. package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/with_skill/outputs/result.md +131 -0
  9. package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/old_skill/outputs/result.md +159 -0
  10. package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/with_skill/outputs/result.md +150 -0
  11. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/outputs/result.md +180 -0
  12. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/timing.json +5 -0
  13. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/outputs/result.md +223 -0
  14. package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/timing.json +5 -0
  15. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/with_skill/timing.json +5 -0
  16. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/outputs/result.md +146 -0
  17. package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/timing.json +5 -0
  18. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/outputs/result.md +89 -0
  19. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/timing.json +5 -0
  20. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/outputs/result.md +96 -0
  21. package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/timing.json +5 -0
  22. package/config/skills/specialists-usage-workspace/skill-snapshot/SKILL.md.old +237 -0
  23. package/config/skills/using-specialists/SKILL.md +158 -0
  24. package/config/skills/using-specialists/evals/evals.json +68 -0
  25. package/config/specialists/.serena/project.yml +151 -0
  26. package/config/specialists/auto-remediation.specialist.yaml +70 -0
  27. package/config/specialists/bug-hunt.specialist.yaml +96 -0
  28. package/config/specialists/explorer.specialist.yaml +79 -0
  29. package/config/specialists/memory-processor.specialist.yaml +140 -0
  30. package/config/specialists/overthinker.specialist.yaml +63 -0
  31. package/config/specialists/parallel-runner.specialist.yaml +61 -0
  32. package/config/specialists/planner.specialist.yaml +87 -0
  33. package/config/specialists/specialists-creator.specialist.yaml +82 -0
  34. package/config/specialists/sync-docs.specialist.yaml +53 -0
  35. package/config/specialists/test-runner.specialist.yaml +58 -0
  36. package/config/specialists/xt-merge.specialist.yaml +78 -0
  37. package/dist/index.js +7 -4
  38. package/package.json +2 -3
@@ -0,0 +1,150 @@
1
+ # Debugging a Specialist YAML That Doesn't Appear in `specialists list`
2
+
3
+ When a specialist YAML file you added doesn't show up in `specialists list`, work through these checks in order.
4
+
5
+ ---
6
+
7
+ ## 1. Verify the file is in a scanned directory
8
+
9
+ The loader scans exactly these directories (in priority order):
10
+
11
+ ```
12
+ <project-root>/specialists/
13
+ <project-root>/.claude/specialists/
14
+ <project-root>/.agent-forge/specialists/
15
+ ~/.agents/specialists/ (user scope)
16
+ ```
17
+
18
+ Only directories that exist on disk are scanned. If your file is anywhere else — for example in `./agent-specs/` or `./config/specialists/` — it will never be found.
19
+
20
+ **Fix**: Move the file into `<project-root>/specialists/`.
21
+
22
+ ---
23
+
24
+ ## 2. Verify the filename ends with `.specialist.yaml`
25
+
26
+ The loader filters for files ending in exactly `.specialist.yaml`. Common mistakes:
27
+
28
+ - `my-agent.yaml` — missing `.specialist` infix
29
+ - `my-agent.specialist.yml` — wrong extension (`.yml` not `.yaml`)
30
+ - `my-agent.Specialist.yaml` — wrong casing
31
+
32
+ **Fix**: Rename to `<name>.specialist.yaml`.
33
+
34
+ ---
35
+
36
+ ## 3. Check stderr for a parse/validation error
37
+
38
+ When a YAML file fails to parse, the loader silently skips it and writes a warning to stderr:
39
+
40
+ ```
41
+ [specialists] skipping /path/to/file.specialist.yaml: <reason>
42
+ ```
43
+
44
+ Run list and capture stderr explicitly:
45
+
46
+ ```bash
47
+ specialists list 2>&1 | grep -i skipping
48
+ ```
49
+
50
+ If your file appears there, the reason shown is the validation error to fix.
51
+
52
+ ---
53
+
54
+ ## 4. Validate required fields and their formats
55
+
56
+ The schema enforces strict rules. Every specialist YAML must have:
57
+
58
+ | Field | Requirement |
59
+ |---|---|
60
+ | `specialist.metadata.name` | **kebab-case** (`^[a-z][a-z0-9-]*$`) — no uppercase, no underscores |
61
+ | `specialist.metadata.version` | **semver** (`1.0.0` format — three numeric parts) |
62
+ | `specialist.metadata.description` | non-empty string |
63
+ | `specialist.metadata.category` | non-empty string |
64
+ | `specialist.execution.model` | non-empty string |
65
+ | `specialist.execution.mode` | one of `tool`, `skill`, `auto` |
66
+ | `specialist.execution.permission_required` | one of `READ_ONLY`, `LOW`, `MEDIUM`, `HIGH` |
67
+ | `specialist.prompt.task_template` | non-empty string |
68
+
69
+ Common errors that cause silent skipping:
70
+
71
+ - `name: My_Agent` — fails kebab-case (`_` and uppercase not allowed)
72
+ - `version: "1.0"` — fails semver (needs three parts: `1.0.0`)
73
+ - Missing `execution.model` entirely
74
+ - Missing `prompt.task_template` entirely
75
+ - Top-level key is not `specialist:` (e.g. accidentally used `agent:`)
76
+
77
+ ---
78
+
79
+ ## 5. Validate your YAML syntax independently
80
+
81
+ A YAML parse error (bad indentation, unquoted special characters, etc.) will also cause the file to be skipped. Validate with:
82
+
83
+ ```bash
84
+ python3 -c "import yaml, sys; yaml.safe_load(open('specialists/my-agent.specialist.yaml'))" && echo OK
85
+ # or
86
+ npx js-yaml specialists/my-agent.specialist.yaml
87
+ ```
88
+
89
+ ---
90
+
91
+ ## 6. Check for a name collision
92
+
93
+ If a specialist with the same `metadata.name` already exists in a higher-priority directory, the loader's deduplication (`seen` set, first-wins) will drop your file silently. Project-scope entries win over user-scope.
94
+
95
+ ```bash
96
+ specialists list --json | python3 -c "import sys,json; [print(s['name'], s['filePath']) for s in json.load(sys.stdin)]"
97
+ ```
98
+
99
+ If the name appears but points to a different file, rename your specialist.
100
+
101
+ ---
102
+
103
+ ## 7. Run `specialists doctor`
104
+
105
+ ```bash
106
+ specialists doctor
107
+ ```
108
+
109
+ This runs deeper checks — hook wiring, MCP registration, zombie jobs, and pi agent availability — and prints fix hints. It is the recommended first step for any unexplained `specialists` issue.
110
+
111
+ ---
112
+
113
+ ## 8. Minimal working example
114
+
115
+ If unsure what a valid file looks like, create `specialists/debug-test.specialist.yaml` with this minimal content and confirm it appears in `specialists list`:
116
+
117
+ ```yaml
118
+ specialist:
119
+ metadata:
120
+ name: debug-test
121
+ version: 1.0.0
122
+ description: Minimal test specialist
123
+ category: debug
124
+
125
+ execution:
126
+ mode: auto
127
+ model: anthropic/claude-haiku-4-5
128
+ permission_required: READ_ONLY
129
+
130
+ prompt:
131
+ task_template: |
132
+ $prompt
133
+ ```
134
+
135
+ Then incrementally add fields from your actual YAML until it breaks — that isolates the bad field.
136
+
137
+ ---
138
+
139
+ ## Summary checklist
140
+
141
+ - [ ] File is inside `specialists/`, `.claude/specialists/`, or `.agent-forge/specialists/` at the project root
142
+ - [ ] Filename ends in `.specialist.yaml`
143
+ - [ ] `specialists list 2>&1 | grep skipping` shows no entry for this file
144
+ - [ ] `metadata.name` is kebab-case (lowercase letters, digits, hyphens only)
145
+ - [ ] `metadata.version` is semver (`X.Y.Z`)
146
+ - [ ] `execution.model` is present
147
+ - [ ] `prompt.task_template` is present
148
+ - [ ] YAML syntax is valid (no indentation errors)
149
+ - [ ] No other specialist already uses the same `name`
150
+ - [ ] `specialists doctor` reports no blocking errors
@@ -0,0 +1,180 @@
1
+ # Bug Investigation: Job Status Inconsistency Between `specialists feed` and `specialists result`
2
+
3
+ ## Approach: Skill Consultation + Delegation Attempt + Direct Investigation
4
+
5
+ ### Step 1: Read the Skill
6
+
7
+ Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md`.
8
+
9
+ The skill guidance is clear: for deep bug investigations that span multiple files, delegate to a specialist. This task matched the criteria:
10
+ - It spans multiple files (feed.ts, result.ts, supervisor.ts, timeline-events.ts, timeline-query.ts, runner.ts, pi/session.ts)
11
+ - It would take >5 minutes of focused work
12
+ - A fresh perspective adds value
13
+
14
+ Decision: **delegate to the `bug-hunt` specialist**.
15
+
16
+ ### Step 2: Delegation Attempt
17
+
18
+ Created a bead for tracking:
19
+ ```
20
+ bd create --title "Investigate: job status inconsistency between specialists feed and specialists result" --type task --priority 2
21
+ # -> unitAI-s82l
22
+ ```
23
+
24
+ Updated with description:
25
+ ```
26
+ bd update unitAI-s82l --description "Intermittent failures: specialist jobs show status 'done' in 'specialists feed' output but 'specialists result' says they are still running..."
27
+ ```
28
+
29
+ Discovered that `--prompt` and `--bead` cannot be combined (run exits with error). Used `--bead` only:
30
+ ```
31
+ specialists run bug-hunt --bead unitAI-s82l --background
32
+ # -> bcyvkirdh (background job ID in Claude Code's subagent system)
33
+ ```
34
+
35
+ **Important observation**: The `--background` flag here refers to Claude Code's internal background subagent system, NOT the `Supervisor`-managed `.specialists/jobs/` system. The ID `bcyvkirdh` is a Claude Code task ID, not a Supervisor job ID. So `specialists result bcyvkirdh` correctly returned "No job found" — the job was never in `.specialists/jobs/`.
36
+
37
+ The specialist timed out after 5 minutes (300000ms). Since the delegation did not produce results, I proceeded with direct investigation.
38
+
39
+ ### Step 3: Direct Source Code Investigation
40
+
41
+ Read all relevant files:
42
+ - `/home/dawid/projects/specialists/src/cli/feed.ts`
43
+ - `/home/dawid/projects/specialists/src/cli/result.ts`
44
+ - `/home/dawid/projects/specialists/src/cli/format-helpers.ts`
45
+ - `/home/dawid/projects/specialists/src/specialist/supervisor.ts`
46
+ - `/home/dawid/projects/specialists/src/specialist/timeline-events.ts`
47
+ - `/home/dawid/projects/specialists/src/specialist/timeline-query.ts`
48
+ - `/home/dawid/projects/specialists/src/specialist/runner.ts`
49
+ - `/home/dawid/projects/specialists/src/pi/session.ts`
50
+
51
+ ---
52
+
53
+ ## Root Cause Analysis
54
+
55
+ ### The Two State Stores
56
+
57
+ The job lifecycle uses two separate state stores:
58
+
59
+ 1. **`status.json`** — mutable snapshot, read by `specialists result` via `supervisor.readStatus()`
60
+ 2. **`events.jsonl`** — append-only timeline, read by `specialists feed` via `readAllJobEvents()`
61
+
62
+ These are written independently and there is no atomicity guarantee between them.
63
+
64
+ ### The Normal Write Sequence (in `supervisor.run()`)
65
+
66
+ ```
67
+ # After runner.run() completes:
68
+ line 261: writeFileSync(result.txt, result.output)
69
+ line 265-272: updateStatus(id, { status: 'done', ... }) ← status.json -> 'done'
70
+ line 275-279: appendTimelineEvent(createRunCompleteEvent(...)) ← events.jsonl gets run_complete
71
+ line 282: writeFileSync(ready marker)
72
+ ```
73
+
74
+ In the normal path, `status.json` is set to `done` BEFORE `run_complete` is written to `events.jsonl`. This means `result` would work before `feed` shows done — no bug in the happy path.
75
+
76
+ ### The Race Condition (Root Cause)
77
+
78
+ The bug is caused by a late-firing `onEvent` callback that **overwrites `status.json` back to `running` after it has been set to `done`**.
79
+
80
+ In `supervisor.run()`, the `onEvent` callback is:
81
+
82
+ ```typescript
83
+ (eventType) => {
84
+ const now = Date.now();
85
+ this.updateStatus(id, {
86
+ status: 'running', // ← ALWAYS writes 'running', unconditionally
87
+ current_event: eventType,
88
+ last_event_at_ms: now,
89
+ elapsed_s: Math.round((now - startedAtMs) / 1000),
90
+ });
91
+ // ...map and append to events.jsonl...
92
+ }
93
+ ```
94
+
95
+ The `updateStatus` method does a read-modify-write:
96
+ ```typescript
97
+ private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
98
+ const current = this.readStatus(id);
99
+ if (!current) return;
100
+ this.writeStatusFile(id, { ...current, ...updates });
101
+ }
102
+ ```
103
+
104
+ **The race sequence:**
105
+
106
+ 1. `runner.run()` resolves (all events have been observed)
107
+ 2. `supervisor.run()` calls `updateStatus({status: 'done'})` — status.json now says `done`
108
+ 3. `appendTimelineEvent(run_complete)` — events.jsonl now has `run_complete`
109
+ 4. A **late `onEvent` callback fires** (queued in the event loop before `await runner.run()` returned but dispatched after)
110
+ 5. The late callback calls `updateStatus({status: 'running'})`:
111
+ - `readStatus()` reads the current `done` status
112
+ - spreads `{ ...current, ...{ status: 'running' } }` — **overwrites `done` back to `running`**
113
+ - writes the overwritten status back to `status.json`
114
+
115
+ **Result state after the race:**
116
+ - `events.jsonl`: contains `run_complete` event → `feed` shows job as **done**
117
+ - `status.json`: was overwritten to `running` → `result` reports job is **still running**
118
+
119
+ ### Why It's Intermittent
120
+
121
+ This race depends on event loop scheduling. The `onEvent` callback is called synchronously from within `PiAgentSession`'s stdout data handler (`session.ts`). However:
122
+
123
+ 1. The `agent_end` event fires `_doneResolve?.()` which resolves the `_donePromise`
124
+ 2. `waitForDone()` in runner.ts awaits this promise
125
+ 3. Due to microtask/macrotask ordering, there can be pending `onEvent` callbacks from late `tool_execution_end` or `text_delta` events that were buffered in the same or following stdout chunk
126
+
127
+ Additionally, the `close()` call in runner.ts (line 213) awaits `_donePromise` again after `waitForDone()`, and the process `close` event can fire additional event handlers.
128
+
129
+ ### Secondary Observation: `agent_end` Suppressed From events.jsonl
130
+
131
+ In `mapCallbackEventToTimelineEvent`, the `agent_end` callback event is explicitly **suppressed** (returns `null`). So `events.jsonl` never contains `agent_end`. The `isCompletionEvent` function in feed.ts checks for `agent_end`, but this can never be true for events read from disk. This is correct behavior by design, but means the only completion signal in `events.jsonl` is `run_complete`.
132
+
133
+ ### Where to Fix
134
+
135
+ The fix should be in `supervisor.ts`'s `onEvent` callback. It should not overwrite `status` back to `running` if the status has already been transitioned to `done` or `error`:
136
+
137
+ ```typescript
138
+ // Proposed guard in the onEvent callback:
139
+ (eventType) => {
140
+ const current = this.readStatus(id);
141
+ if (!current || current.status === 'done' || current.status === 'error') return; // guard
142
+ // ...rest of update logic...
143
+ }
144
+ ```
145
+
146
+ Alternatively, `updateStatus` could be made terminal-status-aware:
147
+
148
+ ```typescript
149
+ private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
150
+ const current = this.readStatus(id);
151
+ if (!current) return;
152
+ // Don't overwrite terminal states unless explicitly transitioning to one
153
+ if ((current.status === 'done' || current.status === 'error') && !updates.status) return;
154
+ this.writeStatusFile(id, { ...current, ...updates });
155
+ }
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Summary
161
+
162
+ | Aspect | Detail |
163
+ |--------|--------|
164
+ | **Bug type** | Race condition / state store inconsistency |
165
+ | **Root cause** | Late-firing `onEvent` callbacks overwrite `status.json` from `done` back to `running` after the job has completed |
166
+ | **Why intermittent** | Depends on event loop scheduling; only occurs when event callbacks are queued but not yet flushed at the time `runner.run()` returns |
167
+ | **feed shows done because** | `events.jsonl` gets `run_complete` written before any late event can append (events.jsonl is append-only and `run_complete` is written after `status.json`) |
168
+ | **result says running because** | `status.json` is read-modify-written by the late callback, which spreads `{status: 'running'}` over the already-written `{status: 'done'}` |
169
+ | **Key files** | `src/specialist/supervisor.ts` (lines 226-245, 265-272), `src/cli/result.ts` (line 27), `src/cli/feed.ts` (line 170-172) |
170
+ | **Fix location** | `supervisor.ts` `onEvent` callback — add guard to skip status update if terminal state already set |
171
+
172
+ ---
173
+
174
+ ## Delegation vs Direct Work Reflection
175
+
176
+ The skill correctly guided delegation as the right approach for this multi-file investigation. The delegation attempt failed due to:
177
+ 1. Specialist timeout (300s) — the `bug-hunt` specialist did not start within the timeout window
178
+ 2. The background job ran in Claude Code's subagent system, not the Supervisor job system, so `specialists result <id>` correctly could not find it in `.specialists/jobs/`
179
+
180
+ Since delegation failed, direct investigation was the fallback. The investigation required reading 8 source files and tracing the complete lifecycle across the `PiAgentSession` → `SpecialistRunner` → `Supervisor` → `feed/result CLI` stack.
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 62637,
3
+ "duration_ms": 399598,
4
+ "total_duration_seconds": 399.6
5
+ }
@@ -0,0 +1,223 @@
1
+ # Bug Investigation: Job Status Inconsistency Between `specialists feed` and `specialists result`
2
+
3
+ ## Summary
4
+
5
+ The intermittent inconsistency where `specialists feed` shows a job as `done` while `specialists result` says it is still running has **two root causes** operating at different layers of the system:
6
+
7
+ 1. **Race condition between `events.jsonl` and `status.json` writes** — `feed` reads from `events.jsonl`, `result` reads from `status.json`, and these are updated in separate, non-atomic steps with no ordering guarantee between them.
8
+ 2. **Dual state storage with divergent update timing in the Supervisor** — the `run_complete` event is appended to `events.jsonl` *after* `status.json` is updated to `done`, but critically, the `result.txt` file is written *before* `status.json` is updated, creating a window where each data source can disagree with the others.
9
+
10
+ ---
11
+
12
+ ## Investigation Steps
13
+
14
+ ### 1. Identified the data sources for each command
15
+
16
+ **`specialists feed`** (src/cli/feed.ts):
17
+ - Reads `events.jsonl` from `.specialists/jobs/<id>/events.jsonl`
18
+ - Determines "job complete" by looking for a `run_complete` event in the event stream (via `isRunCompleteEvent` in `timeline-events.ts`)
19
+ - Does NOT read `status.json` for completion determination
20
+
21
+ **`specialists result`** (src/cli/result.ts):
22
+ - Reads `status.json` via `Supervisor.readStatus(jobId)`
23
+ - Checks `status.status === 'running' || status.status === 'starting'` and exits with code 1 if true
24
+ - If status is `done`, reads and prints `result.txt`
25
+
26
+ ### 2. Traced the write sequence in Supervisor.run() (src/specialist/supervisor.ts, lines 260-283)
27
+
28
+ The completion path in `Supervisor.run()` executes these writes in sequence:
29
+
30
+ ```
31
+ Step A: writeFileSync(resultPath, result.output) // result.txt written
32
+ Step B: updateStatus(id, { status: 'done', ... }) // status.json updated to 'done'
33
+ Step C: appendTimelineEvent(createRunCompleteEvent(...)) // run_complete appended to events.jsonl
34
+ Step D: writeFileSync(readyDir/id, '') // ready marker written
35
+ ```
36
+
37
+ ### 3. Identified the race window
38
+
39
+ Between **Step C** (run_complete appended to events.jsonl) and **Step B** (status.json updated to 'done'), there is a real ordering problem in the opposite direction from what one might expect.
40
+
41
+ More precisely, looking at the actual code order:
42
+
43
+ - **Step B** (`status.json` → `done`) happens at line 265
44
+ - **Step C** (`run_complete` event appended to `events.jsonl`) happens at line 275
45
+
46
+ This means there is a window — however brief — where:
47
+ - `status.json` says `done`
48
+ - `events.jsonl` does NOT yet have `run_complete`
49
+
50
+ But the reported symptom is the *inverse*: `feed` shows `done` while `result` says `still running`. This means there is also a window where:
51
+ - `events.jsonl` has `run_complete`
52
+ - `status.json` still says `running`
53
+
54
+ **This can happen through a different path**: `updateStatus` (line 128) reads then writes `status.json`. It reads the current content, merges the patch, and writes via a temp file + rename. If this read-merge-write cycle is slow (e.g., filesystem contention), `events.jsonl` may have been appended with `run_complete` while the status.json rename hasn't completed yet.
55
+
56
+ Specifically:
57
+ ```
58
+ // supervisor.ts line 275 — run_complete appended AFTER status.json update
59
+ appendTimelineEvent(createRunCompleteEvent('COMPLETE', elapsed, {...}));
60
+ ```
61
+
62
+ Wait — looking again at the actual code order in lines 260–283:
63
+
64
+ ```typescript
65
+ // line 261: result.txt written
66
+ writeFileSync(this.resultPath(id), result.output, 'utf-8');
67
+
68
+ // lines 265-272: status.json updated to 'done'
69
+ this.updateStatus(id, {
70
+ status: 'done',
71
+ ...
72
+ });
73
+
74
+ // lines 275-279: run_complete appended to events.jsonl
75
+ appendTimelineEvent(createRunCompleteEvent('COMPLETE', elapsed, {...}));
76
+
77
+ // line 282: ready marker written
78
+ writeFileSync(join(this.readyDir(), id), '', 'utf-8');
79
+ ```
80
+
81
+ So the **correct ordering** is: `result.txt` → `status.json:done` → `events.jsonl:run_complete`.
82
+
83
+ This means there is a window where:
84
+ - `status.json` = `done`
85
+ - `events.jsonl` does NOT yet have `run_complete`
86
+
87
+ In this window, `specialists result` would succeed (status is done, result.txt exists), but `specialists feed` would NOT show the job as complete because the `run_complete` event hasn't been written yet.
88
+
89
+ **For the inverse case (feed shows done, result says running):** This can happen if the `run_complete` event somehow appears in `events.jsonl` before `status.json` is updated. However, based on the code, this ordering is not directly possible within a single run.
90
+
91
+ ### 4. Identified a second scenario: the in-progress events.jsonl vs. delayed status.json update
92
+
93
+ The `onEvent` callback fires during the run (line 224-244 of supervisor.ts):
94
+ ```typescript
95
+ (eventType) => {
96
+ const now = Date.now();
97
+ this.updateStatus(id, {
98
+ status: 'running',
99
+ current_event: eventType,
100
+ ...
101
+ });
102
+ const timelineEvent = mapCallbackEventToTimelineEvent(eventType, {...});
103
+ if (timelineEvent) {
104
+ appendTimelineEvent(timelineEvent);
105
+ }
106
+ }
107
+ ```
108
+
109
+ Each `updateStatus` call does a read-modify-write of `status.json`. If the `run_complete` append to `events.jsonl` (line 275) races with a still-pending `updateStatus` call that was triggered by an earlier callback, the result could be:
110
+
111
+ 1. `run_complete` written to `events.jsonl` → `feed` shows "done"
112
+ 2. A queued `updateStatus` write sets status back to `running` (stale write arrives after the `done` update)
113
+ 3. `result` reads `status.json`, sees `running`, exits with code 1
114
+
115
+ ### 5. Identified the third scenario: `updateStatus` is not atomic end-to-end
116
+
117
+ `updateStatus` in supervisor.ts:
118
+ ```typescript
119
+ private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
120
+ const current = this.readStatus(id); // read
121
+ if (!current) return;
122
+ this.writeStatusFile(id, { ...current, ...updates }); // merge + write
123
+ }
124
+ ```
125
+
126
+ `writeStatusFile` does use a temp-file + rename (atomic write), so partial writes are not the issue. However, the **read-then-write** is not protected by any mutex. If two `updateStatus` calls interleave:
127
+
128
+ 1. Call A reads status.json: `{ status: 'running', current_event: 'tool_execution_end' }`
129
+ 2. Call B reads status.json: `{ status: 'running', current_event: 'tool_execution_end' }`
130
+ 3. Call B writes: `{ status: 'done', ... }` (the final done update)
131
+ 4. Call A writes: `{ status: 'running', current_event: 'tool_execution_end' }` (stale — overwrites the done!)
132
+
133
+ This is the classic read-modify-write race condition. Since JavaScript is single-threaded, this specific race cannot happen in the same process. However, because `runner.run()` callbacks can fire synchronously within the await chain, and `updateStatus` does a synchronous readFileSync → writeFileSync, there is no async interleaving possible here either.
134
+
135
+ **Conclusion on this path**: In Node.js single-threaded execution, this race does not actually occur within one process.
136
+
137
+ ### 6. Re-examining the ordering: The real bug
138
+
139
+ After careful analysis, the primary bug is the **write ordering** in `Supervisor.run()`:
140
+
141
+ **Normal path (no race):**
142
+ - `result.txt` written
143
+ - `status.json` updated to `done`
144
+ - `events.jsonl` gets `run_complete` appended
145
+
146
+ **Window of inconsistency A** (status.json → done before events.jsonl → run_complete):
147
+ - `status.json = done` ← `specialists result` would succeed here
148
+ - `events.jsonl` missing `run_complete` ← `specialists feed` would NOT show done here
149
+
150
+ This is the opposite of the reported bug.
151
+
152
+ **The reported bug (feed shows done, result says running) points to a different scenario.** Looking at what `feed` considers "done": in follow mode, `isCompletionEvent` checks for `isRunCompleteEvent(event) || event.type === 'done' || event.type === 'agent_end'`.
153
+
154
+ The `agent_end` and `done` legacy event types are listed as completion signals in `feed.ts` (line 170-172), but these are NOT written to `events.jsonl` in the new code path — `mapCallbackEventToTimelineEvent` returns `null` for `agent_end` and `done` (lines 254-259 of timeline-events.ts). However, **legacy jobs** may still have these events on disk.
155
+
156
+ More importantly, in the **follow mode** polling loop (feed.ts lines 249-251):
157
+ ```typescript
158
+ if (batch.events.some(isCompletionEvent)) {
159
+ completedJobs.add(batch.jobId);
160
+ }
161
+ ```
162
+
163
+ This runs on every poll tick. If a legacy `agent_end` event is in `events.jsonl` (written by an older version of the code), `feed` would mark the job complete, but if `status.json` was never updated to `done` (e.g., due to a crash after `events.jsonl` write but before `status.json` update), `result` would see `running` or `starting`.
164
+
165
+ **This is the primary bug scenario**: A crash or error after `events.jsonl` write but before `status.json` update leaves the two sources permanently inconsistent. The crash recovery mechanism in `crashRecovery()` only fires on the next `Supervisor.run()` call (when a new job starts), not proactively. So until another job starts, the stale state persists.
166
+
167
+ ---
168
+
169
+ ## Root Causes Summary
170
+
171
+ ### Root Cause 1: Non-atomic dual-write between status.json and events.jsonl
172
+
173
+ The completion state is written to two separate files in two sequential writes with no atomicity guarantee between them. A process crash, OS kill, or even a slow filesystem between these two writes leaves them in permanently divergent states.
174
+
175
+ **Relevant code**: `supervisor.ts` lines 261-282 (the ordering: result.txt → status.json → events.jsonl)
176
+
177
+ ### Root Cause 2: Crash recovery is deferred (not proactive)
178
+
179
+ `crashRecovery()` is only called at the start of `Supervisor.run()` (line 177). If the process that was running a job crashes after writing `events.jsonl:run_complete` but before updating `status.json:done`, the state remains inconsistent until the next job starts. During this window, `feed` shows done and `result` says running.
180
+
181
+ **Relevant code**: `supervisor.ts` lines 149-167, 173-177
182
+
183
+ ### Root Cause 3: `result.ts` reads status.json; `feed` reads events.jsonl — no unified source of truth
184
+
185
+ The two CLI commands consult different data sources with no reconciliation:
186
+ - `specialists result` uses `Supervisor.readStatus()` → `status.json`
187
+ - `specialists feed` uses `readJobEvents()` → `events.jsonl`
188
+
189
+ `status.json` is the "live mutable state" per design comments (timeline-events.ts line 455), while `events.jsonl` is intended as source of truth for completed jobs. But `result.ts` uses `status.json` for both the completion gate AND the data source without checking `events.jsonl` for a `run_complete` event as a fallback.
190
+
191
+ ### Root Cause 4: Legacy event types create ambiguous completion detection
192
+
193
+ `feed.ts` treats `agent_end` and `done` events (legacy) as completion signals (line 170-172). These can exist in older `events.jsonl` files even when `status.json` was not properly finalized, causing feed to conclude "done" while result rejects with "still running."
194
+
195
+ ---
196
+
197
+ ## Files Involved
198
+
199
+ - `/home/dawid/projects/specialists/src/specialist/supervisor.ts` — dual-write ordering bug (lines 261-282), deferred crash recovery (lines 149-177)
200
+ - `/home/dawid/projects/specialists/src/cli/result.ts` — reads only status.json, no fallback to events.jsonl
201
+ - `/home/dawid/projects/specialists/src/cli/feed.ts` — reads only events.jsonl, no fallback to status.json; legacy completion events in isCompletionEvent (line 170-172)
202
+ - `/home/dawid/projects/specialists/src/specialist/timeline-events.ts` — defines completion event types including legacy done/agent_end
203
+ - `/home/dawid/projects/specialists/src/specialist/timeline-query.ts` — job reading utilities
204
+
205
+ ---
206
+
207
+ ## Recommended Fixes
208
+
209
+ 1. **Reverse the write order** in `Supervisor.run()`: write `events.jsonl:run_complete` first, then `status.json:done`. This ensures that if you see `done` in status.json, the timeline is already complete. The current order is backwards relative to what guarantees consistency for the reported symptom.
210
+
211
+ 2. **Add a fallback in `result.ts`**: If `status.json` says `running` but `events.jsonl` contains a `run_complete` event, treat the job as done and attempt to read `result.txt`. This handles the crash-recovery gap.
212
+
213
+ 3. **Make crash recovery proactive**: Call `crashRecovery()` in `readStatus()` or `Supervisor.constructor()`, not only on `run()`. A stale "running" status should be auto-corrected at read time, not only when a new job starts.
214
+
215
+ 4. **Reconcile on read in `readStatus()`**: Before returning a `running`/`starting` status, check if `events.jsonl` has a `run_complete` event. If it does, auto-correct `status.json` to `done` before returning.
216
+
217
+ ---
218
+
219
+ ## What Was NOT a Bug
220
+
221
+ - The atomic write of `status.json` via temp-file + rename (`writeStatusFile`) correctly prevents partial/corrupt JSON.
222
+ - The single-threaded Node.js execution model prevents the read-modify-write race within a single process.
223
+ - The `events.jsonl` append-only writes are safe; the only issue is the ordering relationship with `status.json` updates.
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 69075,
3
+ "duration_ms": 132071,
4
+ "total_duration_seconds": 132.1
5
+ }
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 371,
3
+ "duration_ms": 790094,
4
+ "total_duration_seconds": 790.1
5
+ }