oh-my-claude-sisyphus 3.5.8 → 3.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/executor-high.md +2 -0
- package/agents/executor-low.md +2 -0
- package/agents/executor.md +2 -0
- package/agents/templates/base-agent.md +9 -0
- package/commands/cancel.md +8 -8
- package/commands/swarm.md +350 -148
- package/dist/__tests__/hooks.test.js +10 -9
- package/dist/__tests__/hooks.test.js.map +1 -1
- package/dist/agents/codex-agents.d.ts +20 -0
- package/dist/agents/codex-agents.d.ts.map +1 -0
- package/dist/agents/codex-agents.js +36 -0
- package/dist/agents/codex-agents.js.map +1 -0
- package/dist/agents/preamble.d.ts +14 -0
- package/dist/agents/preamble.d.ts.map +1 -0
- package/dist/agents/preamble.js +26 -0
- package/dist/agents/preamble.js.map +1 -0
- package/dist/hooks/autopilot/__tests__/cancel.test.js +14 -4
- package/dist/hooks/autopilot/__tests__/cancel.test.js.map +1 -1
- package/dist/hooks/autopilot/__tests__/state.test.js +1 -0
- package/dist/hooks/autopilot/__tests__/state.test.js.map +1 -1
- package/dist/hooks/autopilot/__tests__/summary.test.js +38 -3
- package/dist/hooks/autopilot/__tests__/summary.test.js.map +1 -1
- package/dist/hooks/autopilot/state.d.ts +1 -1
- package/dist/hooks/autopilot/state.d.ts.map +1 -1
- package/dist/hooks/autopilot/state.js +15 -8
- package/dist/hooks/autopilot/state.js.map +1 -1
- package/dist/hooks/index.d.ts +2 -0
- package/dist/hooks/index.d.ts.map +1 -1
- package/dist/hooks/index.js +7 -0
- package/dist/hooks/index.js.map +1 -1
- package/dist/hooks/mode-registry/index.d.ts +135 -0
- package/dist/hooks/mode-registry/index.d.ts.map +1 -0
- package/dist/hooks/mode-registry/index.js +445 -0
- package/dist/hooks/mode-registry/index.js.map +1 -0
- package/dist/hooks/mode-registry/types.d.ts +31 -0
- package/dist/hooks/mode-registry/types.d.ts.map +1 -0
- package/dist/hooks/mode-registry/types.js +7 -0
- package/dist/hooks/mode-registry/types.js.map +1 -0
- package/dist/hooks/ralph/loop.js +6 -6
- package/dist/hooks/ralph/loop.js.map +1 -1
- package/dist/hooks/swarm/__tests__/claiming.test.d.ts +2 -0
- package/dist/hooks/swarm/__tests__/claiming.test.d.ts.map +1 -0
- package/dist/hooks/swarm/__tests__/claiming.test.js +170 -0
- package/dist/hooks/swarm/__tests__/claiming.test.js.map +1 -0
- package/dist/hooks/swarm/__tests__/index.test.d.ts +2 -0
- package/dist/hooks/swarm/__tests__/index.test.d.ts.map +1 -0
- package/dist/hooks/swarm/__tests__/index.test.js +157 -0
- package/dist/hooks/swarm/__tests__/index.test.js.map +1 -0
- package/dist/hooks/swarm/__tests__/mode-registry.test.d.ts +2 -0
- package/dist/hooks/swarm/__tests__/mode-registry.test.d.ts.map +1 -0
- package/dist/hooks/swarm/__tests__/mode-registry.test.js +177 -0
- package/dist/hooks/swarm/__tests__/mode-registry.test.js.map +1 -0
- package/dist/hooks/swarm/claiming.d.ts +101 -0
- package/dist/hooks/swarm/claiming.d.ts.map +1 -0
- package/dist/hooks/swarm/claiming.js +460 -0
- package/dist/hooks/swarm/claiming.js.map +1 -0
- package/dist/hooks/swarm/index.d.ts +221 -0
- package/dist/hooks/swarm/index.d.ts.map +1 -0
- package/dist/hooks/swarm/index.js +413 -0
- package/dist/hooks/swarm/index.js.map +1 -0
- package/dist/hooks/swarm/state.d.ts +94 -0
- package/dist/hooks/swarm/state.d.ts.map +1 -0
- package/dist/hooks/swarm/state.js +530 -0
- package/dist/hooks/swarm/state.js.map +1 -0
- package/dist/hooks/swarm/types.d.ts +116 -0
- package/dist/hooks/swarm/types.d.ts.map +1 -0
- package/dist/hooks/swarm/types.js +22 -0
- package/dist/hooks/swarm/types.js.map +1 -0
- package/dist/hooks/ultrapilot/decomposer.d.ts +141 -0
- package/dist/hooks/ultrapilot/decomposer.d.ts.map +1 -0
- package/dist/hooks/ultrapilot/decomposer.js +377 -0
- package/dist/hooks/ultrapilot/decomposer.js.map +1 -0
- package/dist/hooks/ultrapilot/index.d.ts +31 -0
- package/dist/hooks/ultrapilot/index.d.ts.map +1 -1
- package/dist/hooks/ultrapilot/index.js +43 -2
- package/dist/hooks/ultrapilot/index.js.map +1 -1
- package/dist/hooks/ultrapilot/state.d.ts +1 -1
- package/dist/hooks/ultrapilot/state.d.ts.map +1 -1
- package/dist/hooks/ultrapilot/state.js +7 -0
- package/dist/hooks/ultrapilot/state.js.map +1 -1
- package/dist/hooks/ultraqa/index.js +5 -5
- package/dist/hooks/ultraqa/index.js.map +1 -1
- package/dist/hooks/ultrawork/index.js +3 -3
- package/dist/hooks/ultrawork/index.js.map +1 -1
- package/package.json +3 -1
- package/skills/autopilot/SKILL.md +18 -0
- package/skills/cancel/SKILL.md +166 -141
- package/skills/ecomode/SKILL.md +14 -0
- package/skills/pipeline/SKILL.md +13 -0
- package/skills/ralph/SKILL.md +22 -1
- package/skills/swarm/SKILL.md +521 -197
- package/skills/ultrapilot/SKILL.md +82 -13
- package/skills/ultraqa/SKILL.md +13 -0
- package/skills/ultrawork/SKILL.md +14 -0
package/skills/swarm/SKILL.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: swarm
|
|
3
|
-
description: N coordinated agents on shared task list with atomic claiming
|
|
3
|
+
description: N coordinated agents on shared task list with SQLite-based atomic claiming
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Swarm Skill
|
|
7
7
|
|
|
8
|
-
Spawn N coordinated agents working on a shared task list with atomic claiming. Like a dev team tackling multiple files in parallel.
|
|
8
|
+
Spawn N coordinated agents working on a shared task list with SQLite-based atomic claiming. Like a dev team tackling multiple files in parallel—fast, reliable, and with full fault tolerance.
|
|
9
9
|
|
|
10
10
|
## Usage
|
|
11
11
|
|
|
@@ -44,14 +44,29 @@ User: "/swarm 5:executor fix all TypeScript errors"
|
|
|
44
44
|
+--+--+--+--+
|
|
45
45
|
|
|
|
46
46
|
v
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
47
|
+
[SQLITE DATABASE]
|
|
48
|
+
┌─────────────────────┐
|
|
49
|
+
│ tasks table │
|
|
50
|
+
├─────────────────────┤
|
|
51
|
+
│ id, description │
|
|
52
|
+
│ status (pending, │
|
|
53
|
+
│ claimed, done, │
|
|
54
|
+
│ failed) │
|
|
55
|
+
│ claimed_by, claimed_at
|
|
56
|
+
│ completed_at, result│
|
|
57
|
+
│ error │
|
|
58
|
+
├─────────────────────┤
|
|
59
|
+
│ heartbeats table │
|
|
60
|
+
│ (agent monitoring) │
|
|
61
|
+
└─────────────────────┘
|
|
53
62
|
```
|
|
54
63
|
|
|
64
|
+
**Key Features:**
|
|
65
|
+
- SQLite transactions ensure only one agent can claim a task
|
|
66
|
+
- Lease-based ownership with automatic timeout and recovery
|
|
67
|
+
- Heartbeat monitoring for detecting dead agents
|
|
68
|
+
- Full ACID compliance for task state
|
|
69
|
+
|
|
55
70
|
## Workflow
|
|
56
71
|
|
|
57
72
|
### 1. Parse Input
|
|
@@ -60,243 +75,518 @@ User: "/swarm 5:executor fix all TypeScript errors"
|
|
|
60
75
|
- Extract task description
|
|
61
76
|
- Validate N <= 5
|
|
62
77
|
|
|
63
|
-
### 2. Create Task
|
|
78
|
+
### 2. Create Task Pool
|
|
64
79
|
- Analyze codebase based on task
|
|
65
80
|
- Break into file-specific subtasks
|
|
66
|
-
- Initialize
|
|
67
|
-
- Each task gets: id,
|
|
81
|
+
- Initialize SQLite database with task pool
|
|
82
|
+
- Each task gets: id, description, status (pending), and metadata columns
|
|
68
83
|
|
|
69
84
|
### 3. Spawn Agents
|
|
70
85
|
- Launch N agents via Task tool
|
|
71
86
|
- Set `run_in_background: true` for all
|
|
72
|
-
- Each agent
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
87
|
+
- Each agent connects to the SQLite database
|
|
88
|
+
- Agents enter claiming loop automatically
|
|
89
|
+
|
|
90
|
+
### 3.1. Agent Preamble (IMPORTANT)
|
|
91
|
+
|
|
92
|
+
When spawning swarm agents, ALWAYS wrap the task with the worker preamble to prevent recursive sub-agent spawning:
|
|
93
|
+
|
|
94
|
+
```typescript
|
|
95
|
+
import { wrapWithPreamble } from '../agents/preamble.js';
|
|
96
|
+
|
|
97
|
+
// When spawning each agent:
|
|
98
|
+
const agentPrompt = wrapWithPreamble(`
|
|
99
|
+
Connect to swarm at ${cwd}/.omc/state/swarm.db
|
|
100
|
+
Claim tasks with claimTask('agent-${n}')
|
|
101
|
+
Complete work with completeTask() or failTask()
|
|
102
|
+
Send heartbeat every 60 seconds
|
|
103
|
+
Exit when hasPendingWork() returns false
|
|
104
|
+
`);
|
|
105
|
+
|
|
106
|
+
Task({
|
|
107
|
+
subagent_type: 'oh-my-claudecode:executor',
|
|
108
|
+
prompt: agentPrompt,
|
|
109
|
+
run_in_background: true
|
|
110
|
+
});
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The worker preamble ensures agents:
|
|
114
|
+
- Execute tasks directly using tools (Read, Write, Edit, Bash)
|
|
115
|
+
- Do NOT spawn sub-agents (prevents recursive agent storms)
|
|
116
|
+
- Report results with absolute file paths
|
|
76
117
|
|
|
77
|
-
### 4. Task Claiming Protocol
|
|
118
|
+
### 4. Task Claiming Protocol (SQLite Transactional)
|
|
78
119
|
Each agent follows this loop:
|
|
79
120
|
|
|
80
121
|
```
|
|
81
122
|
LOOP:
|
|
82
|
-
1.
|
|
83
|
-
2.
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
-
|
|
95
|
-
-
|
|
96
|
-
|
|
97
|
-
|
|
123
|
+
1. Call claimTask(agentId)
|
|
124
|
+
2. SQLite transaction:
|
|
125
|
+
- Find first pending task
|
|
126
|
+
- UPDATE status='claimed', claimed_by=agentId, claimed_at=now
|
|
127
|
+
- INSERT/UPDATE heartbeat record
|
|
128
|
+
- Atomically commit (only one agent succeeds)
|
|
129
|
+
3. Execute task
|
|
130
|
+
4. Call completeTask(agentId, taskId, result) or failTask()
|
|
131
|
+
5. GOTO LOOP (until hasPendingWork() returns false)
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Atomic Claiming Details:**
|
|
135
|
+
- SQLite `IMMEDIATE` transaction prevents race conditions
|
|
136
|
+
- Only agent updating the row successfully gets the task
|
|
137
|
+
- Heartbeat automatically updated on claim
|
|
138
|
+
- If claim fails (already claimed), agent retries with next task
|
|
139
|
+
- Lease Timeout: 5 minutes per task
|
|
140
|
+
- If timeout exceeded + no heartbeat, cleanupStaleClaims releases task back to pending
|
|
141
|
+
|
|
142
|
+
### 5. Heartbeat Protocol
|
|
143
|
+
- Agents call `heartbeat(agentId)` every 60 seconds (or custom interval)
|
|
144
|
+
- Heartbeat records: agent_id, last_heartbeat timestamp, current_task_id
|
|
145
|
+
- Orchestrator runs cleanupStaleClaims every 60 seconds
|
|
146
|
+
- If heartbeat is stale (>5 minutes old) and task claimed, task auto-releases
|
|
147
|
+
|
|
148
|
+
### 6. Progress Tracking
|
|
98
149
|
- Orchestrator monitors via TaskOutput
|
|
99
|
-
- Shows live progress: claimed/done/
|
|
100
|
-
-
|
|
150
|
+
- Shows live progress: pending/claimed/done/failed counts
|
|
151
|
+
- Active agent count via getActiveAgents()
|
|
152
|
+
- Reports which agent is working on which task via getAgentTasks()
|
|
101
153
|
- Detects idle agents (all tasks claimed by others)
|
|
102
154
|
|
|
103
|
-
###
|
|
155
|
+
### 7. Completion
|
|
104
156
|
Exit when ANY of:
|
|
105
|
-
-
|
|
106
|
-
- All agents idle (no pending tasks)
|
|
107
|
-
- User cancels via `/cancel
|
|
108
|
-
|
|
109
|
-
##
|
|
110
|
-
|
|
111
|
-
### `.omc/swarm
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
157
|
+
- isSwarmComplete() returns true (all tasks done or failed)
|
|
158
|
+
- All agents idle (no pending tasks, no claimed tasks)
|
|
159
|
+
- User cancels via `/oh-my-claudecode:cancel`
|
|
160
|
+
|
|
161
|
+
## Storage
|
|
162
|
+
|
|
163
|
+
### SQLite Database (`.omc/state/swarm.db`)
|
|
164
|
+
|
|
165
|
+
The swarm uses a single SQLite database stored at `.omc/state/swarm.db`. This provides:
|
|
166
|
+
- **ACID compliance** - All task state transitions are atomic
|
|
167
|
+
- **Concurrent access** - Multiple agents query/update safely
|
|
168
|
+
- **Persistence** - State survives agent crashes
|
|
169
|
+
- **Query efficiency** - Fast status lookups and filtering
|
|
170
|
+
|
|
171
|
+
#### `tasks` Table Schema
|
|
172
|
+
```sql
|
|
173
|
+
CREATE TABLE tasks (
|
|
174
|
+
id TEXT PRIMARY KEY,
|
|
175
|
+
description TEXT NOT NULL,
|
|
176
|
+
status TEXT NOT NULL DEFAULT 'pending',
|
|
177
|
+
-- pending: waiting to be claimed
|
|
178
|
+
-- claimed: claimed by an agent, in progress
|
|
179
|
+
-- done: completed successfully
|
|
180
|
+
-- failed: completed with error
|
|
181
|
+
claimed_by TEXT, -- agent ID that claimed this task
|
|
182
|
+
claimed_at INTEGER, -- Unix timestamp when claimed
|
|
183
|
+
completed_at INTEGER, -- Unix timestamp when completed
|
|
184
|
+
result TEXT, -- Optional result/output from task
|
|
185
|
+
error TEXT -- Error message if task failed
|
|
186
|
+
);
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
#### `heartbeats` Table Schema
|
|
190
|
+
```sql
|
|
191
|
+
CREATE TABLE heartbeats (
|
|
192
|
+
agent_id TEXT PRIMARY KEY,
|
|
193
|
+
last_heartbeat INTEGER NOT NULL, -- Unix timestamp of last heartbeat
|
|
194
|
+
current_task_id TEXT -- Task agent is currently working on
|
|
195
|
+
);
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
#### `session` Table Schema
|
|
199
|
+
```sql
|
|
200
|
+
CREATE TABLE session (
|
|
201
|
+
id TEXT PRIMARY KEY,
|
|
202
|
+
agent_count INTEGER NOT NULL,
|
|
203
|
+
started_at INTEGER NOT NULL,
|
|
204
|
+
completed_at INTEGER,
|
|
205
|
+
active INTEGER DEFAULT 1
|
|
206
|
+
);
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
## Task Claiming Protocol (Detailed)
|
|
210
|
+
|
|
211
|
+
### Atomic Claim Operation with SQLite
|
|
212
|
+
|
|
213
|
+
The core strength of the new implementation is transactional atomicity:
|
|
214
|
+
|
|
215
|
+
```typescript
|
|
216
|
+
function claimTask(agentId: string): ClaimResult {
|
|
217
|
+
// Transaction ensures only ONE agent succeeds
|
|
218
|
+
const claimTransaction = db.transaction(() => {
|
|
219
|
+
// Step 1: Find first pending task
|
|
220
|
+
const task = db.prepare(
|
|
221
|
+
'SELECT id, description FROM tasks WHERE status = "pending" ORDER BY id LIMIT 1'
|
|
222
|
+
).get();
|
|
223
|
+
|
|
224
|
+
if (!task) {
|
|
225
|
+
return { success: false, reason: 'No pending tasks' };
|
|
226
|
+
}
|
|
227
|
+
|
|
228
|
+
// Step 2: Attempt claim (will only succeed if status is still 'pending')
|
|
229
|
+
const result = db.prepare(
|
|
230
|
+
'UPDATE tasks SET status = "claimed", claimed_by = ?, claimed_at = ? WHERE id = ? AND status = "pending"'
|
|
231
|
+
).run(agentId, Date.now(), task.id);
|
|
232
|
+
|
|
233
|
+
if (result.changes === 0) {
|
|
234
|
+
// Another agent claimed it between SELECT and UPDATE - try next
|
|
235
|
+
return { success: false, reason: 'Task was claimed by another agent' };
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
// Step 3: Update heartbeat to show we're alive and working
|
|
239
|
+
db.prepare(
|
|
240
|
+
'INSERT OR REPLACE INTO heartbeats (agent_id, last_heartbeat, current_task_id) VALUES (?, ?, ?)'
|
|
241
|
+
).run(agentId, Date.now(), task.id);
|
|
242
|
+
|
|
243
|
+
return { success: true, taskId: task.id, description: task.description };
|
|
244
|
+
}).immediate(); // Explicitly acquire RESERVED lock for immediate transaction
|
|
245
|
+
|
|
246
|
+
return claimTransaction(); // Atomic execution
|
|
127
247
|
}
|
|
128
248
|
```
|
|
129
249
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
"
|
|
161
|
-
|
|
250
|
+
**Why SQLite Transactions Work:**
|
|
251
|
+
- Transactions are called with `.immediate()` to acquire RESERVED lock
|
|
252
|
+
- Prevents other agents from modifying rows between SELECT and UPDATE
|
|
253
|
+
- All-or-nothing atomicity: claim succeeds completely or fails completely
|
|
254
|
+
- No race conditions, no lost updates
|
|
255
|
+
|
|
256
|
+
### Lease Timeout & Auto-Release
|
|
257
|
+
|
|
258
|
+
Tasks are automatically released if claimed too long without heartbeat:
|
|
259
|
+
|
|
260
|
+
```typescript
|
|
261
|
+
function cleanupStaleClaims(leaseTimeout: number = 5 * 60 * 1000) {
|
|
262
|
+
// Default 5-minute timeout
|
|
263
|
+
const cutoffTime = Date.now() - leaseTimeout;
|
|
264
|
+
|
|
265
|
+
const cleanupTransaction = db.transaction(() => {
|
|
266
|
+
// Find claimed tasks where:
|
|
267
|
+
// 1. Claimed longer than timeout, OR
|
|
268
|
+
// 2. Agent hasn't sent heartbeat in that time
|
|
269
|
+
const staleTasks = db.prepare(`
|
|
270
|
+
SELECT t.id
|
|
271
|
+
FROM tasks t
|
|
272
|
+
LEFT JOIN heartbeats h ON t.claimed_by = h.agent_id
|
|
273
|
+
WHERE t.status = 'claimed'
|
|
274
|
+
AND t.claimed_at < ?
|
|
275
|
+
AND (h.last_heartbeat IS NULL OR h.last_heartbeat < ?)
|
|
276
|
+
`).all(cutoffTime, cutoffTime);
|
|
277
|
+
|
|
278
|
+
// Release each stale task back to pending
|
|
279
|
+
for (const staleTask of staleTasks) {
|
|
280
|
+
db.prepare('UPDATE tasks SET status = "pending", claimed_by = NULL, claimed_at = NULL WHERE id = ?')
|
|
281
|
+
.run(staleTask.id);
|
|
162
282
|
}
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
"done": 2
|
|
169
|
-
}
|
|
283
|
+
|
|
284
|
+
return staleTasks.length;
|
|
285
|
+
}).immediate(); // Explicitly acquire RESERVED lock for immediate transaction
|
|
286
|
+
|
|
287
|
+
return cleanupTransaction();
|
|
170
288
|
}
|
|
171
289
|
```
|
|
172
290
|
|
|
173
|
-
|
|
174
|
-
|
|
291
|
+
**How Recovery Works:**
|
|
292
|
+
1. Orchestrator calls cleanupStaleClaims() every 60 seconds
|
|
293
|
+
2. If agent hasn't sent heartbeat in 5 minutes, task is auto-released
|
|
294
|
+
3. Another agent picks up the orphaned task
|
|
295
|
+
4. Original agent can continue working (it doesn't know it was released)
|
|
296
|
+
5. When original agent tries to mark task as done, verification fails safely
|
|
175
297
|
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
298
|
+
## API Reference
|
|
299
|
+
|
|
300
|
+
Agents interact with the swarm via a TypeScript API:
|
|
301
|
+
|
|
302
|
+
### Initialization
|
|
303
|
+
|
|
304
|
+
```typescript
|
|
305
|
+
import { startSwarm, connectToSwarm } from './swarm';
|
|
306
|
+
|
|
307
|
+
// Orchestrator starts the swarm
|
|
308
|
+
await startSwarm({
|
|
309
|
+
agentCount: 5,
|
|
310
|
+
tasks: ['fix a.ts', 'fix b.ts', ...],
|
|
311
|
+
leaseTimeout: 5 * 60 * 1000, // 5 minutes (default)
|
|
312
|
+
heartbeatInterval: 60 * 1000 // 60 seconds (default)
|
|
313
|
+
});
|
|
314
|
+
|
|
315
|
+
// Agents join existing swarm
|
|
316
|
+
await connectToSwarm(process.cwd());
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
### Agent Loop Pattern
|
|
320
|
+
|
|
321
|
+
```typescript
|
|
322
|
+
import {
|
|
323
|
+
claimTask,
|
|
324
|
+
completeTask,
|
|
325
|
+
failTask,
|
|
326
|
+
heartbeat,
|
|
327
|
+
hasPendingWork,
|
|
328
|
+
disconnectFromSwarm
|
|
329
|
+
} from './swarm';
|
|
330
|
+
|
|
331
|
+
const agentId = 'agent-1';
|
|
332
|
+
|
|
333
|
+
// Main work loop
|
|
334
|
+
while (hasPendingWork()) {
|
|
335
|
+
// Claim next task
|
|
336
|
+
const claim = claimTask(agentId);
|
|
337
|
+
|
|
338
|
+
if (!claim.success) {
|
|
339
|
+
console.log('No tasks available:', claim.reason);
|
|
340
|
+
break;
|
|
341
|
+
}
|
|
342
|
+
|
|
343
|
+
const { taskId, description } = claim;
|
|
344
|
+
console.log(`Agent ${agentId} working on: ${description}`);
|
|
345
|
+
|
|
346
|
+
try {
|
|
347
|
+
// Do the work...
|
|
348
|
+
const result = await executeTask(description);
|
|
349
|
+
|
|
350
|
+
// Mark complete
|
|
351
|
+
completeTask(agentId, taskId, result);
|
|
352
|
+
console.log(`Agent ${agentId} completed task ${taskId}`);
|
|
353
|
+
|
|
354
|
+
} catch (error) {
|
|
355
|
+
// Mark failed
|
|
356
|
+
failTask(agentId, taskId, error.message);
|
|
357
|
+
console.error(`Agent ${agentId} failed on ${taskId}:`, error);
|
|
190
358
|
}
|
|
359
|
+
|
|
360
|
+
// Send heartbeat every 60 seconds (while working on long tasks)
|
|
361
|
+
heartbeat(agentId);
|
|
191
362
|
}
|
|
363
|
+
|
|
364
|
+
// Cleanup
|
|
365
|
+
disconnectFromSwarm();
|
|
192
366
|
```
|
|
193
367
|
|
|
194
|
-
|
|
368
|
+
### Core API Functions
|
|
195
369
|
|
|
196
|
-
|
|
370
|
+
#### `startSwarm(config: SwarmConfig): Promise<boolean>`
|
|
371
|
+
Initialize the swarm with task pool and start cleanup timer.
|
|
197
372
|
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
373
|
+
```typescript
|
|
374
|
+
const success = await startSwarm({
|
|
375
|
+
agentCount: 5,
|
|
376
|
+
tasks: ['task 1', 'task 2', 'task 3'],
|
|
377
|
+
leaseTimeout: 5 * 60 * 1000,
|
|
378
|
+
heartbeatInterval: 60 * 1000
|
|
379
|
+
});
|
|
380
|
+
```
|
|
202
381
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
// Attempt atomic claim
|
|
206
|
-
const now = new Date().toISOString();
|
|
207
|
-
const timeout = addMinutes(now, 5).toISOString();
|
|
382
|
+
#### `stopSwarm(deleteDatabase?: boolean): boolean`
|
|
383
|
+
Stop the swarm and optionally delete the database.
|
|
208
384
|
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
task.timeout_at = timeout;
|
|
385
|
+
```typescript
|
|
386
|
+
stopSwarm(true); // Delete database on cleanup
|
|
387
|
+
```
|
|
213
388
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
}
|
|
217
|
-
}
|
|
389
|
+
#### `claimTask(agentId: string): ClaimResult`
|
|
390
|
+
Claim the next pending task. Returns `{ success, taskId, description, reason }`.
|
|
218
391
|
|
|
219
|
-
|
|
392
|
+
```typescript
|
|
393
|
+
const claim = claimTask('agent-1');
|
|
394
|
+
if (claim.success) {
|
|
395
|
+
console.log(`Claimed: ${claim.description}`);
|
|
220
396
|
}
|
|
221
397
|
```
|
|
222
398
|
|
|
223
|
-
|
|
399
|
+
#### `completeTask(agentId: string, taskId: string, result?: string): boolean`
|
|
400
|
+
Mark a task as done. Only succeeds if agent still owns the task.
|
|
224
401
|
|
|
225
|
-
|
|
402
|
+
```typescript
|
|
403
|
+
completeTask('agent-1', 'task-1', 'Fixed the bug');
|
|
404
|
+
```
|
|
226
405
|
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
const tasks = readJSON('.omc/state/swarm-tasks.json');
|
|
230
|
-
const now = new Date();
|
|
406
|
+
#### `failTask(agentId: string, taskId: string, error: string): boolean`
|
|
407
|
+
Mark a task as failed with error details.
|
|
231
408
|
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
409
|
+
```typescript
|
|
410
|
+
failTask('agent-1', 'task-1', 'Could not compile: missing dependency');
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
#### `heartbeat(agentId: string): boolean`
|
|
414
|
+
Send a heartbeat to indicate agent is alive. Call every 60 seconds during long-running tasks.
|
|
415
|
+
|
|
416
|
+
```typescript
|
|
417
|
+
heartbeat('agent-1');
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
#### `cleanupStaleClaims(leaseTimeout?: number): number`
|
|
421
|
+
Manually trigger cleanup of expired claims. Called automatically every 60 seconds.
|
|
422
|
+
|
|
423
|
+
```typescript
|
|
424
|
+
const released = cleanupStaleClaims(5 * 60 * 1000);
|
|
425
|
+
console.log(`Released ${released} stale tasks`);
|
|
426
|
+
```
|
|
241
427
|
|
|
242
|
-
|
|
428
|
+
#### `hasPendingWork(): boolean`
|
|
429
|
+
Check if there are unclaimed tasks available.
|
|
430
|
+
|
|
431
|
+
```typescript
|
|
432
|
+
if (!hasPendingWork()) {
|
|
433
|
+
console.log('All tasks claimed or completed');
|
|
243
434
|
}
|
|
244
435
|
```
|
|
245
436
|
|
|
246
|
-
|
|
437
|
+
#### `isSwarmComplete(): boolean`
|
|
438
|
+
Check if all tasks are done or failed.
|
|
439
|
+
|
|
440
|
+
```typescript
|
|
441
|
+
if (isSwarmComplete()) {
|
|
442
|
+
console.log('Swarm finished!');
|
|
443
|
+
}
|
|
444
|
+
```
|
|
247
445
|
|
|
248
|
-
|
|
446
|
+
#### `getSwarmStats(): SwarmStats | null`
|
|
447
|
+
Get task counts and timing info.
|
|
249
448
|
|
|
250
|
-
```
|
|
251
|
-
|
|
449
|
+
```typescript
|
|
450
|
+
const stats = getSwarmStats();
|
|
451
|
+
console.log(`${stats.doneTasks}/${stats.totalTasks} done`);
|
|
452
|
+
```
|
|
252
453
|
|
|
253
|
-
|
|
454
|
+
#### `getActiveAgents(): number`
|
|
455
|
+
Get count of agents with recent heartbeats.
|
|
254
456
|
|
|
255
|
-
|
|
457
|
+
```typescript
|
|
458
|
+
const active = getActiveAgents();
|
|
459
|
+
console.log(`${active} agents currently active`);
|
|
460
|
+
```
|
|
256
461
|
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
2. Find first task with status="pending"
|
|
260
|
-
3. Claim it atomically (set status="claimed", owner="{id}", timestamp)
|
|
261
|
-
4. Execute the task
|
|
262
|
-
5. Mark status="done", set completed_at
|
|
263
|
-
6. Repeat until no pending tasks
|
|
462
|
+
#### `getAllTasks(): SwarmTask[]`
|
|
463
|
+
Get all tasks with current status.
|
|
264
464
|
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
- Write file back
|
|
270
|
-
- If file changed between read/write, retry
|
|
465
|
+
```typescript
|
|
466
|
+
const tasks = getAllTasks();
|
|
467
|
+
const pending = tasks.filter(t => t.status === 'pending');
|
|
468
|
+
```
|
|
271
469
|
|
|
272
|
-
|
|
273
|
-
|
|
470
|
+
#### `getTasksWithStatus(status: string): SwarmTask[]`
|
|
471
|
+
Filter tasks by status: 'pending', 'claimed', 'done', 'failed'.
|
|
274
472
|
|
|
275
|
-
|
|
276
|
-
|
|
473
|
+
```typescript
|
|
474
|
+
const failed = getTasksWithStatus('failed');
|
|
277
475
|
```
|
|
278
476
|
|
|
279
|
-
|
|
477
|
+
#### `getAgentTasks(agentId: string): SwarmTask[]`
|
|
478
|
+
Get all tasks claimed by a specific agent.
|
|
280
479
|
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
- **Auto-Release:** Timed-out claims automatically released by orchestrator
|
|
480
|
+
```typescript
|
|
481
|
+
const myTasks = getAgentTasks('agent-1');
|
|
482
|
+
```
|
|
285
483
|
|
|
286
|
-
|
|
484
|
+
#### `retryTask(agentId: string, taskId: string): ClaimResult`
|
|
485
|
+
Attempt to reclaim a failed task.
|
|
486
|
+
|
|
487
|
+
```typescript
|
|
488
|
+
const retry = retryTask('agent-1', 'task-1');
|
|
489
|
+
if (retry.success) {
|
|
490
|
+
console.log('Task reclaimed, trying again...');
|
|
491
|
+
}
|
|
492
|
+
```
|
|
287
493
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
494
|
+
### Configuration (SwarmConfig)
|
|
495
|
+
|
|
496
|
+
```typescript
|
|
497
|
+
interface SwarmConfig {
|
|
498
|
+
agentCount: number; // Number of agents (1-5)
|
|
499
|
+
tasks: string[]; // Task descriptions
|
|
500
|
+
agentType?: string; // Agent type (default: 'executor')
|
|
501
|
+
leaseTimeout?: number; // Milliseconds (default: 5 min)
|
|
502
|
+
heartbeatInterval?: number; // Milliseconds (default: 60 sec)
|
|
503
|
+
cwd?: string; // Working directory
|
|
504
|
+
}
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
### Types
|
|
508
|
+
|
|
509
|
+
```typescript
|
|
510
|
+
interface SwarmTask {
|
|
511
|
+
id: string;
|
|
512
|
+
description: string;
|
|
513
|
+
status: 'pending' | 'claimed' | 'done' | 'failed';
|
|
514
|
+
claimedBy: string | null;
|
|
515
|
+
claimedAt: number | null;
|
|
516
|
+
completedAt: number | null;
|
|
517
|
+
error?: string;
|
|
518
|
+
result?: string;
|
|
519
|
+
}
|
|
520
|
+
|
|
521
|
+
interface ClaimResult {
|
|
522
|
+
success: boolean;
|
|
523
|
+
taskId: string | null;
|
|
524
|
+
description?: string;
|
|
525
|
+
reason?: string;
|
|
526
|
+
}
|
|
527
|
+
|
|
528
|
+
interface SwarmStats {
|
|
529
|
+
totalTasks: number;
|
|
530
|
+
pendingTasks: number;
|
|
531
|
+
claimedTasks: number;
|
|
532
|
+
doneTasks: number;
|
|
533
|
+
failedTasks: number;
|
|
534
|
+
activeAgents: number;
|
|
535
|
+
elapsedTime: number;
|
|
536
|
+
}
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
## Key Parameters
|
|
540
|
+
|
|
541
|
+
- **Max Agents:** 5 (enforced by Claude Code background task limit)
|
|
542
|
+
- **Lease Timeout:** 5 minutes (default, configurable)
|
|
543
|
+
- Tasks claimed longer than this without heartbeat are auto-released
|
|
544
|
+
- **Heartbeat Interval:** 60 seconds (recommended)
|
|
545
|
+
- Agents should call `heartbeat()` at least this often
|
|
546
|
+
- Prevents false timeout while working on long tasks
|
|
547
|
+
- **Cleanup Interval:** 60 seconds
|
|
548
|
+
- Orchestrator automatically runs `cleanupStaleClaims()` to release orphaned tasks
|
|
549
|
+
- **Database:** SQLite (stored at `.omc/state/swarm.db`)
|
|
550
|
+
- One database per swarm session
|
|
551
|
+
- Survives agent crashes
|
|
552
|
+
- Provides ACID guarantees
|
|
553
|
+
|
|
554
|
+
## Error Handling & Recovery
|
|
555
|
+
|
|
556
|
+
### Agent Crash
|
|
557
|
+
- Task is claimed but agent stops sending heartbeats
|
|
558
|
+
- After 5 minutes of no heartbeat, cleanupStaleClaims() releases the task
|
|
559
|
+
- Task returns to 'pending' status for another agent to claim
|
|
560
|
+
- Original agent's incomplete work is safely abandoned
|
|
561
|
+
|
|
562
|
+
### Task Completion Failure
|
|
563
|
+
- Agent calls `completeTask()` but is no longer the owner (was released)
|
|
564
|
+
- The update silently fails (no agent matches in WHERE clause)
|
|
565
|
+
- Agent can detect this by checking return value
|
|
566
|
+
- Agent should log error and continue to next task
|
|
567
|
+
|
|
568
|
+
### Database Unavailable
|
|
569
|
+
- `startSwarm()` returns false if database initialization fails
|
|
570
|
+
- `claimTask()` returns `{ success: false, reason: 'Database not initialized' }`
|
|
571
|
+
- Check `isSwarmReady()` before proceeding
|
|
572
|
+
|
|
573
|
+
### All Agents Idle
|
|
574
|
+
- Orchestrator detects via `getActiveAgents() === 0` or `hasPendingWork() === false`
|
|
575
|
+
- Triggers final cleanup and marks swarm as complete
|
|
576
|
+
- Remaining failed tasks are preserved in database
|
|
577
|
+
|
|
578
|
+
### No Tasks Available
|
|
579
|
+
- `claimTask()` returns success=false with reason 'No pending tasks available'
|
|
580
|
+
- Agent should check `hasPendingWork()` before looping
|
|
581
|
+
- Safe for agent to exit cleanly when no work remains
|
|
292
582
|
|
|
293
583
|
## Cancel Swarm
|
|
294
584
|
|
|
295
|
-
User can cancel via `/cancel
|
|
585
|
+
User can cancel via `/oh-my-claudecode:cancel`:
|
|
296
586
|
- Stops orchestrator monitoring
|
|
297
587
|
- Signals all background agents to exit
|
|
298
|
-
- Preserves partial progress in
|
|
299
|
-
- Marks session as "cancelled" in
|
|
588
|
+
- Preserves partial progress in SQLite database
|
|
589
|
+
- Marks session as "cancelled" in database
|
|
300
590
|
|
|
301
591
|
## Use Cases
|
|
302
592
|
|
|
@@ -324,26 +614,60 @@ Spawns 4 security reviewers, each auditing different endpoints.
|
|
|
324
614
|
```
|
|
325
615
|
Spawns 2 writers, each documenting different modules.
|
|
326
616
|
|
|
327
|
-
## Benefits
|
|
617
|
+
## Benefits of SQLite-Based Implementation
|
|
618
|
+
|
|
619
|
+
### Atomicity & Safety
|
|
620
|
+
- **Race-Condition Free:** SQLite transactions guarantee only one agent claims each task
|
|
621
|
+
- **No Lost Updates:** ACID compliance means state changes are durable
|
|
622
|
+
- **Orphan Prevention:** Expired claims are automatically released without manual intervention
|
|
623
|
+
|
|
624
|
+
### Performance
|
|
625
|
+
- **Fast Queries:** Indexed lookups on task status and agent ID
|
|
626
|
+
- **Concurrent Access:** Multiple agents read/write without blocking
|
|
627
|
+
- **Minimal Lock Time:** Transactions are microseconds, not seconds
|
|
628
|
+
|
|
629
|
+
### Reliability
|
|
630
|
+
- **Crash Recovery:** Database survives agent failures
|
|
631
|
+
- **Automatic Cleanup:** Stale claims don't block progress
|
|
632
|
+
- **Lease-Based:** Time-based expiration prevents indefinite hangs
|
|
633
|
+
|
|
634
|
+
### Developer Experience
|
|
635
|
+
- **Simple API:** Just `claimTask()`, `completeTask()`, `heartbeat()`
|
|
636
|
+
- **Full Visibility:** Query any task or agent status at any time
|
|
637
|
+
- **Easy Debugging:** SQL queries show exact state without decoding JSON
|
|
638
|
+
|
|
639
|
+
### Scalability
|
|
640
|
+
- **10s to 1000s of Tasks:** SQLite handles easily
|
|
641
|
+
- **Full Task Retention:** Complete history in database for analysis
|
|
642
|
+
- **Extensible Schema:** Add custom columns for task metadata
|
|
328
643
|
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
644
|
+
## STATE CLEANUP ON COMPLETION
|
|
645
|
+
|
|
646
|
+
**IMPORTANT: Delete state files on completion - do NOT just set `active: false`**
|
|
647
|
+
|
|
648
|
+
When all tasks are done:
|
|
649
|
+
|
|
650
|
+
```bash
|
|
651
|
+
# Delete swarm state files
|
|
652
|
+
rm -f .omc/state/swarm-state.json
|
|
653
|
+
rm -f .omc/state/swarm-tasks.json
|
|
654
|
+
rm -f .omc/state/swarm-claims.json
|
|
655
|
+
```
|
|
334
656
|
|
|
335
657
|
## Implementation Notes
|
|
336
658
|
|
|
337
659
|
The orchestrator (main skill handler) is responsible for:
|
|
338
660
|
1. Initial task decomposition (via explore/architect)
|
|
339
|
-
2. Creating
|
|
661
|
+
2. Creating and initializing SQLite database via `startSwarm()`
|
|
340
662
|
3. Spawning N background agents
|
|
341
|
-
4. Monitoring progress via
|
|
342
|
-
5.
|
|
343
|
-
6. Detecting completion
|
|
344
|
-
7. Reporting final summary
|
|
663
|
+
4. Monitoring progress via `getSwarmStats()` and `getActiveAgents()`
|
|
664
|
+
5. Running `cleanupStaleClaims()` automatically (via setInterval)
|
|
665
|
+
6. Detecting completion via `isSwarmComplete()`
|
|
666
|
+
7. Reporting final summary from database query
|
|
345
667
|
|
|
346
668
|
Each agent is a standard Task invocation with:
|
|
347
669
|
- `run_in_background: true`
|
|
348
|
-
- Agent-specific prompt with
|
|
349
|
-
-
|
|
670
|
+
- Agent-specific prompt with work loop instructions
|
|
671
|
+
- API import: `import { claimTask, completeTask, ... } from './swarm'`
|
|
672
|
+
- Connection: `await connectToSwarm(cwd)` to join existing swarm
|
|
673
|
+
- Loop: repeatedly call `claimTask()` → do work → `completeTask()` or `failTask()`
|