@jaggerxtrm/specialists 3.3.4 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -2
- package/bin/install.js +15 -204
- package/config/skills/using-specialists/SKILL.md +1 -1
- package/config/specialists/debugger.specialist.yaml +111 -0
- package/config/specialists/explorer.specialist.yaml +1 -1
- package/dist/index.js +791 -566
- package/package.json +1 -1
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/old_skill/outputs/result.md +0 -105
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/with_skill/outputs/result.md +0 -93
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/old_skill/outputs/result.md +0 -113
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/with_skill/outputs/result.md +0 -131
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/old_skill/outputs/result.md +0 -159
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/with_skill/outputs/result.md +0 -150
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/outputs/result.md +0 -180
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/outputs/result.md +0 -223
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/with_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/outputs/result.md +0 -146
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/outputs/result.md +0 -89
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/outputs/result.md +0 -96
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/timing.json +0 -5
- package/config/skills/specialists-usage-workspace/skill-snapshot/SKILL.md.old +0 -237
- package/config/specialists/bug-hunt.specialist.yaml +0 -96
|
@@ -1,150 +0,0 @@
|
|
|
1
|
-
# Debugging a Specialist YAML That Doesn't Appear in `specialists list`
|
|
2
|
-
|
|
3
|
-
When a specialist YAML file you added doesn't show up in `specialists list`, work through these checks in order.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## 1. Verify the file is in a scanned directory
|
|
8
|
-
|
|
9
|
-
The loader scans exactly these directories (in priority order):
|
|
10
|
-
|
|
11
|
-
```
|
|
12
|
-
<project-root>/specialists/
|
|
13
|
-
<project-root>/.claude/specialists/
|
|
14
|
-
<project-root>/.agent-forge/specialists/
|
|
15
|
-
~/.agents/specialists/ (user scope)
|
|
16
|
-
```
|
|
17
|
-
|
|
18
|
-
Only directories that exist on disk are scanned. If your file is anywhere else — for example in `./agent-specs/` or `./config/specialists/` — it will never be found.
|
|
19
|
-
|
|
20
|
-
**Fix**: Move the file into `<project-root>/specialists/`.
|
|
21
|
-
|
|
22
|
-
---
|
|
23
|
-
|
|
24
|
-
## 2. Verify the filename ends with `.specialist.yaml`
|
|
25
|
-
|
|
26
|
-
The loader filters for files ending in exactly `.specialist.yaml`. Common mistakes:
|
|
27
|
-
|
|
28
|
-
- `my-agent.yaml` — missing `.specialist` infix
|
|
29
|
-
- `my-agent.specialist.yml` — wrong extension (`.yml` not `.yaml`)
|
|
30
|
-
- `my-agent.Specialist.yaml` — wrong casing
|
|
31
|
-
|
|
32
|
-
**Fix**: Rename to `<name>.specialist.yaml`.
|
|
33
|
-
|
|
34
|
-
---
|
|
35
|
-
|
|
36
|
-
## 3. Check stderr for a parse/validation error
|
|
37
|
-
|
|
38
|
-
When a YAML file fails to parse, the loader silently skips it and writes a warning to stderr:
|
|
39
|
-
|
|
40
|
-
```
|
|
41
|
-
[specialists] skipping /path/to/file.specialist.yaml: <reason>
|
|
42
|
-
```
|
|
43
|
-
|
|
44
|
-
Run list and capture stderr explicitly:
|
|
45
|
-
|
|
46
|
-
```bash
|
|
47
|
-
specialists list 2>&1 | grep -i skipping
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
If your file appears there, the reason shown is the validation error to fix.
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## 4. Validate required fields and their formats
|
|
55
|
-
|
|
56
|
-
The schema enforces strict rules. Every specialist YAML must have:
|
|
57
|
-
|
|
58
|
-
| Field | Requirement |
|
|
59
|
-
|---|---|
|
|
60
|
-
| `specialist.metadata.name` | **kebab-case** (`^[a-z][a-z0-9-]*$`) — no uppercase, no underscores |
|
|
61
|
-
| `specialist.metadata.version` | **semver** (`1.0.0` format — three numeric parts) |
|
|
62
|
-
| `specialist.metadata.description` | non-empty string |
|
|
63
|
-
| `specialist.metadata.category` | non-empty string |
|
|
64
|
-
| `specialist.execution.model` | non-empty string |
|
|
65
|
-
| `specialist.execution.mode` | one of `tool`, `skill`, `auto` |
|
|
66
|
-
| `specialist.execution.permission_required` | one of `READ_ONLY`, `LOW`, `MEDIUM`, `HIGH` |
|
|
67
|
-
| `specialist.prompt.task_template` | non-empty string |
|
|
68
|
-
|
|
69
|
-
Common errors that cause silent skipping:
|
|
70
|
-
|
|
71
|
-
- `name: My_Agent` — fails kebab-case (`_` and uppercase not allowed)
|
|
72
|
-
- `version: "1.0"` — fails semver (needs three parts: `1.0.0`)
|
|
73
|
-
- Missing `execution.model` entirely
|
|
74
|
-
- Missing `prompt.task_template` entirely
|
|
75
|
-
- Top-level key is not `specialist:` (e.g. accidentally used `agent:`)
|
|
76
|
-
|
|
77
|
-
---
|
|
78
|
-
|
|
79
|
-
## 5. Validate your YAML syntax independently
|
|
80
|
-
|
|
81
|
-
A YAML parse error (bad indentation, unquoted special characters, etc.) will also cause the file to be skipped. Validate with:
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
python3 -c "import yaml, sys; yaml.safe_load(open('specialists/my-agent.specialist.yaml'))" && echo OK
|
|
85
|
-
# or
|
|
86
|
-
npx js-yaml specialists/my-agent.specialist.yaml
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
---
|
|
90
|
-
|
|
91
|
-
## 6. Check for a name collision
|
|
92
|
-
|
|
93
|
-
If a specialist with the same `metadata.name` already exists in a higher-priority directory, the loader's deduplication (`seen` set, first-wins) will drop your file silently. Project-scope entries win over user-scope.
|
|
94
|
-
|
|
95
|
-
```bash
|
|
96
|
-
specialists list --json | python3 -c "import sys,json; [print(s['name'], s['filePath']) for s in json.load(sys.stdin)]"
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
If the name appears but points to a different file, rename your specialist.
|
|
100
|
-
|
|
101
|
-
---
|
|
102
|
-
|
|
103
|
-
## 7. Run `specialists doctor`
|
|
104
|
-
|
|
105
|
-
```bash
|
|
106
|
-
specialists doctor
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
This runs deeper checks — hook wiring, MCP registration, zombie jobs, and pi agent availability — and prints fix hints. It is the recommended first step for any unexplained `specialists` issue.
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## 8. Minimal working example
|
|
114
|
-
|
|
115
|
-
If unsure what a valid file looks like, create `specialists/debug-test.specialist.yaml` with this minimal content and confirm it appears in `specialists list`:
|
|
116
|
-
|
|
117
|
-
```yaml
|
|
118
|
-
specialist:
|
|
119
|
-
metadata:
|
|
120
|
-
name: debug-test
|
|
121
|
-
version: 1.0.0
|
|
122
|
-
description: Minimal test specialist
|
|
123
|
-
category: debug
|
|
124
|
-
|
|
125
|
-
execution:
|
|
126
|
-
mode: auto
|
|
127
|
-
model: anthropic/claude-haiku-4-5
|
|
128
|
-
permission_required: READ_ONLY
|
|
129
|
-
|
|
130
|
-
prompt:
|
|
131
|
-
task_template: |
|
|
132
|
-
$prompt
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
Then incrementally add fields from your actual YAML until it breaks — that isolates the bad field.
|
|
136
|
-
|
|
137
|
-
---
|
|
138
|
-
|
|
139
|
-
## Summary checklist
|
|
140
|
-
|
|
141
|
-
- [ ] File is inside `specialists/`, `.claude/specialists/`, or `.agent-forge/specialists/` at the project root
|
|
142
|
-
- [ ] Filename ends in `.specialist.yaml`
|
|
143
|
-
- [ ] `specialists list 2>&1 | grep skipping` shows no entry for this file
|
|
144
|
-
- [ ] `metadata.name` is kebab-case (lowercase letters, digits, hyphens only)
|
|
145
|
-
- [ ] `metadata.version` is semver (`X.Y.Z`)
|
|
146
|
-
- [ ] `execution.model` is present
|
|
147
|
-
- [ ] `prompt.task_template` is present
|
|
148
|
-
- [ ] YAML syntax is valid (no indentation errors)
|
|
149
|
-
- [ ] No other specialist already uses the same `name`
|
|
150
|
-
- [ ] `specialists doctor` reports no blocking errors
|
|
@@ -1,180 +0,0 @@
|
|
|
1
|
-
# Bug Investigation: Job Status Inconsistency Between `specialists feed` and `specialists result`
|
|
2
|
-
|
|
3
|
-
## Approach: Skill Consultation + Delegation Attempt + Direct Investigation
|
|
4
|
-
|
|
5
|
-
### Step 1: Read the Skill
|
|
6
|
-
|
|
7
|
-
Read `/home/dawid/projects/specialists/skills/specialists-usage/SKILL.md`.
|
|
8
|
-
|
|
9
|
-
The skill guidance is clear: for deep bug investigations that span multiple files, delegate to a specialist. This task matched the criteria:
|
|
10
|
-
- It spans multiple files (feed.ts, result.ts, supervisor.ts, timeline-events.ts, timeline-query.ts, runner.ts, pi/session.ts)
|
|
11
|
-
- It would take >5 minutes of focused work
|
|
12
|
-
- A fresh perspective adds value
|
|
13
|
-
|
|
14
|
-
Decision: **delegate to the `bug-hunt` specialist**.
|
|
15
|
-
|
|
16
|
-
### Step 2: Delegation Attempt
|
|
17
|
-
|
|
18
|
-
Created a bead for tracking:
|
|
19
|
-
```
|
|
20
|
-
bd create --title "Investigate: job status inconsistency between specialists feed and specialists result" --type task --priority 2
|
|
21
|
-
# -> unitAI-s82l
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
Updated with description:
|
|
25
|
-
```
|
|
26
|
-
bd update unitAI-s82l --description "Intermittent failures: specialist jobs show status 'done' in 'specialists feed' output but 'specialists result' says they are still running..."
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
Discovered that `--prompt` and `--bead` cannot be combined (run exits with error). Used `--bead` only:
|
|
30
|
-
```
|
|
31
|
-
specialists run bug-hunt --bead unitAI-s82l --background
|
|
32
|
-
# -> bcyvkirdh (background job ID in Claude Code's subagent system)
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
**Important observation**: The `--background` flag here refers to Claude Code's internal background subagent system, NOT the `Supervisor`-managed `.specialists/jobs/` system. The ID `bcyvkirdh` is a Claude Code task ID, not a Supervisor job ID. So `specialists result bcyvkirdh` correctly returned "No job found" — the job was never in `.specialists/jobs/`.
|
|
36
|
-
|
|
37
|
-
The specialist timed out after 5 minutes (300000ms). Since the delegation did not produce results, I proceeded with direct investigation.
|
|
38
|
-
|
|
39
|
-
### Step 3: Direct Source Code Investigation
|
|
40
|
-
|
|
41
|
-
Read all relevant files:
|
|
42
|
-
- `/home/dawid/projects/specialists/src/cli/feed.ts`
|
|
43
|
-
- `/home/dawid/projects/specialists/src/cli/result.ts`
|
|
44
|
-
- `/home/dawid/projects/specialists/src/cli/format-helpers.ts`
|
|
45
|
-
- `/home/dawid/projects/specialists/src/specialist/supervisor.ts`
|
|
46
|
-
- `/home/dawid/projects/specialists/src/specialist/timeline-events.ts`
|
|
47
|
-
- `/home/dawid/projects/specialists/src/specialist/timeline-query.ts`
|
|
48
|
-
- `/home/dawid/projects/specialists/src/specialist/runner.ts`
|
|
49
|
-
- `/home/dawid/projects/specialists/src/pi/session.ts`
|
|
50
|
-
|
|
51
|
-
---
|
|
52
|
-
|
|
53
|
-
## Root Cause Analysis
|
|
54
|
-
|
|
55
|
-
### The Two State Stores
|
|
56
|
-
|
|
57
|
-
The job lifecycle uses two separate state stores:
|
|
58
|
-
|
|
59
|
-
1. **`status.json`** — mutable snapshot, read by `specialists result` via `supervisor.readStatus()`
|
|
60
|
-
2. **`events.jsonl`** — append-only timeline, read by `specialists feed` via `readAllJobEvents()`
|
|
61
|
-
|
|
62
|
-
These are written independently and there is no atomicity guarantee between them.
|
|
63
|
-
|
|
64
|
-
### The Normal Write Sequence (in `supervisor.run()`)
|
|
65
|
-
|
|
66
|
-
```
|
|
67
|
-
# After runner.run() completes:
|
|
68
|
-
line 261: writeFileSync(result.txt, result.output)
|
|
69
|
-
line 265-272: updateStatus(id, { status: 'done', ... }) ← status.json -> 'done'
|
|
70
|
-
line 275-279: appendTimelineEvent(createRunCompleteEvent(...)) ← events.jsonl gets run_complete
|
|
71
|
-
line 282: writeFileSync(ready marker)
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
In the normal path, `status.json` is set to `done` BEFORE `run_complete` is written to `events.jsonl`. This means `result` would work before `feed` shows done — no bug in the happy path.
|
|
75
|
-
|
|
76
|
-
### The Race Condition (Root Cause)
|
|
77
|
-
|
|
78
|
-
The bug is caused by a late-firing `onEvent` callback that **overwrites `status.json` back to `running` after it has been set to `done`**.
|
|
79
|
-
|
|
80
|
-
In `supervisor.run()`, the `onEvent` callback is:
|
|
81
|
-
|
|
82
|
-
```typescript
|
|
83
|
-
(eventType) => {
|
|
84
|
-
const now = Date.now();
|
|
85
|
-
this.updateStatus(id, {
|
|
86
|
-
status: 'running', // ← ALWAYS writes 'running', unconditionally
|
|
87
|
-
current_event: eventType,
|
|
88
|
-
last_event_at_ms: now,
|
|
89
|
-
elapsed_s: Math.round((now - startedAtMs) / 1000),
|
|
90
|
-
});
|
|
91
|
-
// ...map and append to events.jsonl...
|
|
92
|
-
}
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
The `updateStatus` method does a read-modify-write:
|
|
96
|
-
```typescript
|
|
97
|
-
private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
|
|
98
|
-
const current = this.readStatus(id);
|
|
99
|
-
if (!current) return;
|
|
100
|
-
this.writeStatusFile(id, { ...current, ...updates });
|
|
101
|
-
}
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
**The race sequence:**
|
|
105
|
-
|
|
106
|
-
1. `runner.run()` resolves (all events have been observed)
|
|
107
|
-
2. `supervisor.run()` calls `updateStatus({status: 'done'})` — status.json now says `done`
|
|
108
|
-
3. `appendTimelineEvent(run_complete)` — events.jsonl now has `run_complete`
|
|
109
|
-
4. A **late `onEvent` callback fires** (queued in the event loop before `await runner.run()` returned but dispatched after)
|
|
110
|
-
5. The late callback calls `updateStatus({status: 'running'})`:
|
|
111
|
-
- `readStatus()` reads the current `done` status
|
|
112
|
-
- spreads `{ ...current, ...{ status: 'running' } }` — **overwrites `done` back to `running`**
|
|
113
|
-
- writes the overwritten status back to `status.json`
|
|
114
|
-
|
|
115
|
-
**Result state after the race:**
|
|
116
|
-
- `events.jsonl`: contains `run_complete` event → `feed` shows job as **done**
|
|
117
|
-
- `status.json`: was overwritten to `running` → `result` reports job is **still running**
|
|
118
|
-
|
|
119
|
-
### Why It's Intermittent
|
|
120
|
-
|
|
121
|
-
This race depends on event loop scheduling. The `onEvent` callback is called synchronously from within `PiAgentSession`'s stdout data handler (`session.ts`). However:
|
|
122
|
-
|
|
123
|
-
1. The `agent_end` event fires `_doneResolve?.()` which resolves the `_donePromise`
|
|
124
|
-
2. `waitForDone()` in runner.ts awaits this promise
|
|
125
|
-
3. Due to microtask/macrotask ordering, there can be pending `onEvent` callbacks from late `tool_execution_end` or `text_delta` events that were buffered in the same or following stdout chunk
|
|
126
|
-
|
|
127
|
-
Additionally, the `close()` call in runner.ts (line 213) awaits `_donePromise` again after `waitForDone()`, and the process `close` event can fire additional event handlers.
|
|
128
|
-
|
|
129
|
-
### Secondary Observation: `agent_end` Suppressed From events.jsonl
|
|
130
|
-
|
|
131
|
-
In `mapCallbackEventToTimelineEvent`, the `agent_end` callback event is explicitly **suppressed** (returns `null`). So `events.jsonl` never contains `agent_end`. The `isCompletionEvent` function in feed.ts checks for `agent_end`, but this can never be true for events read from disk. This is correct behavior by design, but means the only completion signal in `events.jsonl` is `run_complete`.
|
|
132
|
-
|
|
133
|
-
### Where to Fix
|
|
134
|
-
|
|
135
|
-
The fix should be in `supervisor.ts`'s `onEvent` callback. It should not overwrite `status` back to `running` if the status has already been transitioned to `done` or `error`:
|
|
136
|
-
|
|
137
|
-
```typescript
|
|
138
|
-
// Proposed guard in the onEvent callback:
|
|
139
|
-
(eventType) => {
|
|
140
|
-
const current = this.readStatus(id);
|
|
141
|
-
if (!current || current.status === 'done' || current.status === 'error') return; // guard
|
|
142
|
-
// ...rest of update logic...
|
|
143
|
-
}
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
Alternatively, `updateStatus` could be made terminal-status-aware:
|
|
147
|
-
|
|
148
|
-
```typescript
|
|
149
|
-
private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
|
|
150
|
-
const current = this.readStatus(id);
|
|
151
|
-
if (!current) return;
|
|
152
|
-
// Don't overwrite terminal states unless explicitly transitioning to one
|
|
153
|
-
if ((current.status === 'done' || current.status === 'error') && !updates.status) return;
|
|
154
|
-
this.writeStatusFile(id, { ...current, ...updates });
|
|
155
|
-
}
|
|
156
|
-
```
|
|
157
|
-
|
|
158
|
-
---
|
|
159
|
-
|
|
160
|
-
## Summary
|
|
161
|
-
|
|
162
|
-
| Aspect | Detail |
|
|
163
|
-
|--------|--------|
|
|
164
|
-
| **Bug type** | Race condition / state store inconsistency |
|
|
165
|
-
| **Root cause** | Late-firing `onEvent` callbacks overwrite `status.json` from `done` back to `running` after the job has completed |
|
|
166
|
-
| **Why intermittent** | Depends on event loop scheduling; only occurs when event callbacks are queued but not yet flushed at the time `runner.run()` returns |
|
|
167
|
-
| **feed shows done because** | `events.jsonl` gets `run_complete` written before any late event can append (events.jsonl is append-only and `run_complete` is written after `status.json`) |
|
|
168
|
-
| **result says running because** | `status.json` is read-modify-written by the late callback, which spreads `{status: 'running'}` over the already-written `{status: 'done'}` |
|
|
169
|
-
| **Key files** | `src/specialist/supervisor.ts` (lines 226-245, 265-272), `src/cli/result.ts` (line 27), `src/cli/feed.ts` (line 170-172) |
|
|
170
|
-
| **Fix location** | `supervisor.ts` `onEvent` callback — add guard to skip status update if terminal state already set |
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
## Delegation vs Direct Work Reflection
|
|
175
|
-
|
|
176
|
-
The skill correctly guided delegation as the right approach for this multi-file investigation. The delegation attempt failed due to:
|
|
177
|
-
1. Specialist timeout (300s) — the `bug-hunt` specialist did not start within the timeout window
|
|
178
|
-
2. The background job ran in Claude Code's subagent system, not the Supervisor job system, so `specialists result <id>` correctly could not find it in `.specialists/jobs/`
|
|
179
|
-
|
|
180
|
-
Since delegation failed, direct investigation was the fallback. The investigation required reading 8 source files and tracing the complete lifecycle across the `PiAgentSession` → `SpecialistRunner` → `Supervisor` → `feed/result CLI` stack.
|
|
@@ -1,223 +0,0 @@
|
|
|
1
|
-
# Bug Investigation: Job Status Inconsistency Between `specialists feed` and `specialists result`
|
|
2
|
-
|
|
3
|
-
## Summary
|
|
4
|
-
|
|
5
|
-
The intermittent inconsistency where `specialists feed` shows a job as `done` while `specialists result` says it is still running has **two root causes** operating at different layers of the system:
|
|
6
|
-
|
|
7
|
-
1. **Race condition between `events.jsonl` and `status.json` writes** — `feed` reads from `events.jsonl`, `result` reads from `status.json`, and these are updated in separate, non-atomic steps with no ordering guarantee between them.
|
|
8
|
-
2. **Dual state storage with divergent update timing in the Supervisor** — the `run_complete` event is appended to `events.jsonl` *after* `status.json` is updated to `done`, but critically, the `result.txt` file is written *before* `status.json` is updated, creating a window where each data source can disagree with the others.
|
|
9
|
-
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
## Investigation Steps
|
|
13
|
-
|
|
14
|
-
### 1. Identified the data sources for each command
|
|
15
|
-
|
|
16
|
-
**`specialists feed`** (src/cli/feed.ts):
|
|
17
|
-
- Reads `events.jsonl` from `.specialists/jobs/<id>/events.jsonl`
|
|
18
|
-
- Determines "job complete" by looking for a `run_complete` event in the event stream (via `isRunCompleteEvent` in `timeline-events.ts`)
|
|
19
|
-
- Does NOT read `status.json` for completion determination
|
|
20
|
-
|
|
21
|
-
**`specialists result`** (src/cli/result.ts):
|
|
22
|
-
- Reads `status.json` via `Supervisor.readStatus(jobId)`
|
|
23
|
-
- Checks `status.status === 'running' || status.status === 'starting'` and exits with code 1 if true
|
|
24
|
-
- If status is `done`, reads and prints `result.txt`
|
|
25
|
-
|
|
26
|
-
### 2. Traced the write sequence in Supervisor.run() (src/specialist/supervisor.ts, lines 260-283)
|
|
27
|
-
|
|
28
|
-
The completion path in `Supervisor.run()` executes these writes in sequence:
|
|
29
|
-
|
|
30
|
-
```
|
|
31
|
-
Step A: writeFileSync(resultPath, result.output) // result.txt written
|
|
32
|
-
Step B: updateStatus(id, { status: 'done', ... }) // status.json updated to 'done'
|
|
33
|
-
Step C: appendTimelineEvent(createRunCompleteEvent(...)) // run_complete appended to events.jsonl
|
|
34
|
-
Step D: writeFileSync(readyDir/id, '') // ready marker written
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
### 3. Identified the race window
|
|
38
|
-
|
|
39
|
-
Between **Step C** (run_complete appended to events.jsonl) and **Step B** (status.json updated to 'done'), there is a real ordering problem in the opposite direction from what one might expect.
|
|
40
|
-
|
|
41
|
-
More precisely, looking at the actual code order:
|
|
42
|
-
|
|
43
|
-
- **Step B** (`status.json` → `done`) happens at line 265
|
|
44
|
-
- **Step C** (`run_complete` event appended to `events.jsonl`) happens at line 275
|
|
45
|
-
|
|
46
|
-
This means there is a window — however brief — where:
|
|
47
|
-
- `status.json` says `done`
|
|
48
|
-
- `events.jsonl` does NOT yet have `run_complete`
|
|
49
|
-
|
|
50
|
-
But the reported symptom is the *inverse*: `feed` shows `done` while `result` says `still running`. This means there is also a window where:
|
|
51
|
-
- `events.jsonl` has `run_complete`
|
|
52
|
-
- `status.json` still says `running`
|
|
53
|
-
|
|
54
|
-
**This can happen through a different path**: `updateStatus` (line 128) reads then writes `status.json`. It reads the current content, merges the patch, and writes via a temp file + rename. If this read-merge-write cycle is slow (e.g., filesystem contention), `events.jsonl` may have been appended with `run_complete` while the status.json rename hasn't completed yet.
|
|
55
|
-
|
|
56
|
-
Specifically:
|
|
57
|
-
```
|
|
58
|
-
// supervisor.ts line 275 — run_complete appended AFTER status.json update
|
|
59
|
-
appendTimelineEvent(createRunCompleteEvent('COMPLETE', elapsed, {...}));
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
Wait — looking again at the actual code order in lines 260–283:
|
|
63
|
-
|
|
64
|
-
```typescript
|
|
65
|
-
// line 261: result.txt written
|
|
66
|
-
writeFileSync(this.resultPath(id), result.output, 'utf-8');
|
|
67
|
-
|
|
68
|
-
// lines 265-272: status.json updated to 'done'
|
|
69
|
-
this.updateStatus(id, {
|
|
70
|
-
status: 'done',
|
|
71
|
-
...
|
|
72
|
-
});
|
|
73
|
-
|
|
74
|
-
// lines 275-279: run_complete appended to events.jsonl
|
|
75
|
-
appendTimelineEvent(createRunCompleteEvent('COMPLETE', elapsed, {...}));
|
|
76
|
-
|
|
77
|
-
// line 282: ready marker written
|
|
78
|
-
writeFileSync(join(this.readyDir(), id), '', 'utf-8');
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
So the **correct ordering** is: `result.txt` → `status.json:done` → `events.jsonl:run_complete`.
|
|
82
|
-
|
|
83
|
-
This means there is a window where:
|
|
84
|
-
- `status.json` = `done`
|
|
85
|
-
- `events.jsonl` does NOT yet have `run_complete`
|
|
86
|
-
|
|
87
|
-
In this window, `specialists result` would succeed (status is done, result.txt exists), but `specialists feed` would NOT show the job as complete because the `run_complete` event hasn't been written yet.
|
|
88
|
-
|
|
89
|
-
**For the inverse case (feed shows done, result says running):** This can happen if the `run_complete` event somehow appears in `events.jsonl` before `status.json` is updated. However, based on the code, this ordering is not directly possible within a single run.
|
|
90
|
-
|
|
91
|
-
### 4. Identified a second scenario: the in-progress events.jsonl vs. delayed status.json update
|
|
92
|
-
|
|
93
|
-
The `onEvent` callback fires during the run (line 224-244 of supervisor.ts):
|
|
94
|
-
```typescript
|
|
95
|
-
(eventType) => {
|
|
96
|
-
const now = Date.now();
|
|
97
|
-
this.updateStatus(id, {
|
|
98
|
-
status: 'running',
|
|
99
|
-
current_event: eventType,
|
|
100
|
-
...
|
|
101
|
-
});
|
|
102
|
-
const timelineEvent = mapCallbackEventToTimelineEvent(eventType, {...});
|
|
103
|
-
if (timelineEvent) {
|
|
104
|
-
appendTimelineEvent(timelineEvent);
|
|
105
|
-
}
|
|
106
|
-
}
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
Each `updateStatus` call does a read-modify-write of `status.json`. If the `run_complete` append to `events.jsonl` (line 275) races with a still-pending `updateStatus` call that was triggered by an earlier callback, the result could be:
|
|
110
|
-
|
|
111
|
-
1. `run_complete` written to `events.jsonl` → `feed` shows "done"
|
|
112
|
-
2. A queued `updateStatus` write sets status back to `running` (stale write arrives after the `done` update)
|
|
113
|
-
3. `result` reads `status.json`, sees `running`, exits with code 1
|
|
114
|
-
|
|
115
|
-
### 5. Identified the third scenario: `updateStatus` is not atomic end-to-end
|
|
116
|
-
|
|
117
|
-
`updateStatus` in supervisor.ts:
|
|
118
|
-
```typescript
|
|
119
|
-
private updateStatus(id: string, updates: Partial<SupervisorStatus>): void {
|
|
120
|
-
const current = this.readStatus(id); // read
|
|
121
|
-
if (!current) return;
|
|
122
|
-
this.writeStatusFile(id, { ...current, ...updates }); // merge + write
|
|
123
|
-
}
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
`writeStatusFile` does use a temp-file + rename (atomic write), so partial writes are not the issue. However, the **read-then-write** is not protected by any mutex. If two `updateStatus` calls interleave:
|
|
127
|
-
|
|
128
|
-
1. Call A reads status.json: `{ status: 'running', current_event: 'tool_execution_end' }`
|
|
129
|
-
2. Call B reads status.json: `{ status: 'running', current_event: 'tool_execution_end' }`
|
|
130
|
-
3. Call B writes: `{ status: 'done', ... }` (the final done update)
|
|
131
|
-
4. Call A writes: `{ status: 'running', current_event: 'tool_execution_end' }` (stale — overwrites the done!)
|
|
132
|
-
|
|
133
|
-
This is the classic read-modify-write race condition. Since JavaScript is single-threaded, this specific race cannot happen in the same process. However, because `runner.run()` callbacks can fire synchronously within the await chain, and `updateStatus` does a synchronous readFileSync → writeFileSync, there is no async interleaving possible here either.
|
|
134
|
-
|
|
135
|
-
**Conclusion on this path**: In Node.js single-threaded execution, this race does not actually occur within one process.
|
|
136
|
-
|
|
137
|
-
### 6. Re-examining the ordering: The real bug
|
|
138
|
-
|
|
139
|
-
After careful analysis, the primary bug is the **write ordering** in `Supervisor.run()`:
|
|
140
|
-
|
|
141
|
-
**Normal path (no race):**
|
|
142
|
-
- `result.txt` written
|
|
143
|
-
- `status.json` updated to `done`
|
|
144
|
-
- `events.jsonl` gets `run_complete` appended
|
|
145
|
-
|
|
146
|
-
**Window of inconsistency A** (status.json → done before events.jsonl → run_complete):
|
|
147
|
-
- `status.json = done` ← `specialists result` would succeed here
|
|
148
|
-
- `events.jsonl` missing `run_complete` ← `specialists feed` would NOT show done here
|
|
149
|
-
|
|
150
|
-
This is the opposite of the reported bug.
|
|
151
|
-
|
|
152
|
-
**The reported bug (feed shows done, result says running) points to a different scenario.** Looking at what `feed` considers "done": in follow mode, `isCompletionEvent` checks for `isRunCompleteEvent(event) || event.type === 'done' || event.type === 'agent_end'`.
|
|
153
|
-
|
|
154
|
-
The `agent_end` and `done` legacy event types are listed as completion signals in `feed.ts` (line 170-172), but these are NOT written to `events.jsonl` in the new code path — `mapCallbackEventToTimelineEvent` returns `null` for `agent_end` and `done` (lines 254-259 of timeline-events.ts). However, **legacy jobs** may still have these events on disk.
|
|
155
|
-
|
|
156
|
-
More importantly, in the **follow mode** polling loop (feed.ts lines 249-251):
|
|
157
|
-
```typescript
|
|
158
|
-
if (batch.events.some(isCompletionEvent)) {
|
|
159
|
-
completedJobs.add(batch.jobId);
|
|
160
|
-
}
|
|
161
|
-
```
|
|
162
|
-
|
|
163
|
-
This runs on every poll tick. If a legacy `agent_end` event is in `events.jsonl` (written by an older version of the code), `feed` would mark the job complete, but if `status.json` was never updated to `done` (e.g., due to a crash after `events.jsonl` write but before `status.json` update), `result` would see `running` or `starting`.
|
|
164
|
-
|
|
165
|
-
**This is the primary bug scenario**: A crash or error after `events.jsonl` write but before `status.json` update leaves the two sources permanently inconsistent. The crash recovery mechanism in `crashRecovery()` only fires on the next `Supervisor.run()` call (when a new job starts), not proactively. So until another job starts, the stale state persists.
|
|
166
|
-
|
|
167
|
-
---
|
|
168
|
-
|
|
169
|
-
## Root Causes Summary
|
|
170
|
-
|
|
171
|
-
### Root Cause 1: Non-atomic dual-write between status.json and events.jsonl
|
|
172
|
-
|
|
173
|
-
The completion state is written to two separate files in two sequential writes with no atomicity guarantee between them. A process crash, OS kill, or even a slow filesystem between these two writes leaves them in permanently divergent states.
|
|
174
|
-
|
|
175
|
-
**Relevant code**: `supervisor.ts` lines 261-282 (the ordering: result.txt → status.json → events.jsonl)
|
|
176
|
-
|
|
177
|
-
### Root Cause 2: Crash recovery is deferred (not proactive)
|
|
178
|
-
|
|
179
|
-
`crashRecovery()` is only called at the start of `Supervisor.run()` (line 177). If the process that was running a job crashes after writing `events.jsonl:run_complete` but before updating `status.json:done`, the state remains inconsistent until the next job starts. During this window, `feed` shows done and `result` says running.
|
|
180
|
-
|
|
181
|
-
**Relevant code**: `supervisor.ts` lines 149-167, 173-177
|
|
182
|
-
|
|
183
|
-
### Root Cause 3: `result.ts` reads status.json; `feed` reads events.jsonl — no unified source of truth
|
|
184
|
-
|
|
185
|
-
The two CLI commands consult different data sources with no reconciliation:
|
|
186
|
-
- `specialists result` uses `Supervisor.readStatus()` → `status.json`
|
|
187
|
-
- `specialists feed` uses `readJobEvents()` → `events.jsonl`
|
|
188
|
-
|
|
189
|
-
`status.json` is the "live mutable state" per design comments (timeline-events.ts line 455), while `events.jsonl` is intended as source of truth for completed jobs. But `result.ts` uses `status.json` for both the completion gate AND the data source without checking `events.jsonl` for a `run_complete` event as a fallback.
|
|
190
|
-
|
|
191
|
-
### Root Cause 4: Legacy event types create ambiguous completion detection
|
|
192
|
-
|
|
193
|
-
`feed.ts` treats `agent_end` and `done` events (legacy) as completion signals (line 170-172). These can exist in older `events.jsonl` files even when `status.json` was not properly finalized, causing feed to conclude "done" while result rejects with "still running."
|
|
194
|
-
|
|
195
|
-
---
|
|
196
|
-
|
|
197
|
-
## Files Involved
|
|
198
|
-
|
|
199
|
-
- `/home/dawid/projects/specialists/src/specialist/supervisor.ts` — dual-write ordering bug (lines 261-282), deferred crash recovery (lines 149-177)
|
|
200
|
-
- `/home/dawid/projects/specialists/src/cli/result.ts` — reads only status.json, no fallback to events.jsonl
|
|
201
|
-
- `/home/dawid/projects/specialists/src/cli/feed.ts` — reads only events.jsonl, no fallback to status.json; legacy completion events in isCompletionEvent (line 170-172)
|
|
202
|
-
- `/home/dawid/projects/specialists/src/specialist/timeline-events.ts` — defines completion event types including legacy done/agent_end
|
|
203
|
-
- `/home/dawid/projects/specialists/src/specialist/timeline-query.ts` — job reading utilities
|
|
204
|
-
|
|
205
|
-
---
|
|
206
|
-
|
|
207
|
-
## Recommended Fixes
|
|
208
|
-
|
|
209
|
-
1. **Reverse the write order** in `Supervisor.run()`: write `events.jsonl:run_complete` first, then `status.json:done`. This ensures that if you see `done` in status.json, the timeline is already complete. The current order is backwards relative to what guarantees consistency for the reported symptom.
|
|
210
|
-
|
|
211
|
-
2. **Add a fallback in `result.ts`**: If `status.json` says `running` but `events.jsonl` contains a `run_complete` event, treat the job as done and attempt to read `result.txt`. This handles the crash-recovery gap.
|
|
212
|
-
|
|
213
|
-
3. **Make crash recovery proactive**: Call `crashRecovery()` in `readStatus()` or `Supervisor.constructor()`, not only on `run()`. A stale "running" status should be auto-corrected at read time, not only when a new job starts.
|
|
214
|
-
|
|
215
|
-
4. **Reconcile on read in `readStatus()`**: Before returning a `running`/`starting` status, check if `events.jsonl` has a `run_complete` event. If it does, auto-correct `status.json` to `done` before returning.
|
|
216
|
-
|
|
217
|
-
---
|
|
218
|
-
|
|
219
|
-
## What Was NOT a Bug
|
|
220
|
-
|
|
221
|
-
- The atomic write of `status.json` via temp-file + rename (`writeStatusFile`) correctly prevents partial/corrupt JSON.
|
|
222
|
-
- The single-threaded Node.js execution model prevents the read-modify-write race within a single process.
|
|
223
|
-
- The `events.jsonl` append-only writes are safe; the only issue is the ordering relationship with `status.json` updates.
|