agent-pool-mcp 1.6.0 → 1.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +82 -143
- package/package.json +3 -2
- package/src/scheduler/daemon.js +176 -26
- package/src/scheduler/pipeline.js +69 -79
- package/src/scheduler/run-signals.js +81 -0
- package/src/server.js +61 -1
- package/src/tool-definitions.js +45 -1
- package/src/tools/messaging.js +104 -0
package/README.md
CHANGED
|
@@ -1,3 +1,7 @@
|
|
|
1
|
+
[](https://www.npmjs.com/package/agent-pool-mcp)
|
|
2
|
+
[](https://opensource.org/licenses/MIT)
|
|
3
|
+
[](https://nodejs.org)
|
|
4
|
+
|
|
1
5
|
# agent-pool-mcp
|
|
2
6
|
|
|
3
7
|
**MCP server for multi-agent orchestration** — parallel task delegation, sequential pipelines, cron scheduling, and cross-model peer review via [Gemini CLI](https://github.com/google-gemini/gemini-cli).
|
|
@@ -6,13 +10,9 @@
|
|
|
6
10
|
|
|
7
11
|
Compatible with [Antigravity](https://antigravity.dev), Cursor, Windsurf, Claude Code, and any MCP-enabled coding agent.
|
|
8
12
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
AI coding assistants are powerful, but they work **sequentially** — one task at a time. Agent-pool turns your single Gemini subscription into a **parallel agent workforce**: your primary IDE agent delegates background tasks to Gemini CLI workers, all sharing the same authentication.
|
|
13
|
+
Your primary IDE agent delegates background tasks to Gemini CLI workers in parallel — all sharing the same authentication from a single Gemini subscription.
|
|
12
14
|
|
|
13
|
-
When the primary agent and Gemini workers are **different foundation models** (e.g. Claude + Gemini), `consult_peer`
|
|
14
|
-
|
|
15
|
-
## How It Works
|
|
15
|
+
When the primary agent and Gemini workers are **different foundation models** (e.g. Claude + Gemini), `consult_peer` gives you cross-model review — two models check each other's reasoning independently.
|
|
16
16
|
|
|
17
17
|
```
|
|
18
18
|
┌─────────────────────────────────┐
|
|
@@ -30,45 +30,18 @@ When the primary agent and Gemini workers are **different foundation models** (e
|
|
|
30
30
|
(task1) (task2) (review) (same auth, parallel)
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
### 🚀 Task Delegation
|
|
36
|
-
- **`delegate_task`** — Non-blocking task delegation to Gemini CLI (full filesystem access).
|
|
37
|
-
- **`delegate_task_readonly`** — Read-only analysis (plan mode). Supports `session_id` to resume previous analyses.
|
|
38
|
-
- **`get_task_result`** — Poll task status, retrieve results, and see live progress (last 200 tool/message events).
|
|
39
|
-
- **`cancel_task`** — Kill a running task and its entire process group immediately.
|
|
40
|
-
|
|
41
|
-
### 🔗 Pipelines — Sequential Task Chains
|
|
42
|
-
Define multi-step workflows where agents execute sequentially, with automatic handoff:
|
|
43
|
-
|
|
44
|
-
```
|
|
45
|
-
┌─ frontend ─┐
|
|
46
|
-
research ─┤ ├── deploy
|
|
47
|
-
└─ backend ─┘
|
|
48
|
-
```
|
|
33
|
+
> [!TIP]
|
|
34
|
+
> A single $20/month Google AI Ultra subscription can power dozens of parallel workers — no additional API keys required.
|
|
49
35
|
|
|
50
|
-
|
|
51
|
-
- **`run_pipeline`** — Start executing a pipeline. A detached daemon manages the lifecycle.
|
|
52
|
-
- **`list_pipelines`** — See all definitions, active runs, and recent completions.
|
|
53
|
-
- **`get_pipeline_status`** — Step-by-step status with emoji indicators.
|
|
54
|
-
- **`cancel_pipeline`** — Stop a running pipeline and kill active step processes.
|
|
36
|
+
### Task Delegation
|
|
55
37
|
|
|
56
|
-
|
|
57
|
-
- **`signal_step_complete`** — Mark the current step as done. Accepts optional output and `run_id`.
|
|
58
|
-
- **`bounce_back`** — Return task to a previous step with feedback (e.g. "data incomplete"). Supports `maxBounces` limit.
|
|
38
|
+
Non-blocking task delegation to Gemini CLI workers. The primary agent fires off a task and continues working — polling for results when ready. Workers get full filesystem access (`delegate_task`) or read-only mode (`delegate_task_readonly`). Cancel anytime with `cancel_task`.
|
|
59
39
|
|
|
60
|
-
|
|
40
|
+
### Pipelines — Sequential Task Chains
|
|
61
41
|
|
|
62
|
-
|
|
63
|
-
|---------|-------------|
|
|
64
|
-
| `on_complete` | Start when a specific step succeeds |
|
|
65
|
-
| `on_complete_all` | Fan-in: start when ALL listed steps succeed |
|
|
66
|
-
| `on_file` | Start when a file appears and the producing process exits |
|
|
67
|
-
| Auto-fallback | Process death without signal → auto-complete/fail |
|
|
42
|
+
Multi-step workflows with automatic handoff between steps:
|
|
68
43
|
|
|
69
|
-
**Example — 3-step pipeline:**
|
|
70
44
|
```javascript
|
|
71
|
-
// Agent creates the pipeline
|
|
72
45
|
create_pipeline({
|
|
73
46
|
name: "article-workflow",
|
|
74
47
|
steps: [
|
|
@@ -77,86 +50,59 @@ create_pipeline({
|
|
|
77
50
|
{ name: "review", prompt: "Review the draft for accuracy and style" }
|
|
78
51
|
]
|
|
79
52
|
})
|
|
80
|
-
|
|
81
|
-
// Agent starts execution — daemon handles the rest
|
|
82
53
|
run_pipeline({ pipeline_id: "article-workflow" })
|
|
83
54
|
```
|
|
84
55
|
|
|
85
|
-
|
|
86
|
-
Schedule agents to run automatically on a cron schedule:
|
|
56
|
+
Steps support triggers: `on_complete` (chain after one step), `on_complete_all` (fan-in after several), and `on_file` (start when a file appears). Agents can `bounce_back` to a previous step with feedback if data is incomplete.
|
|
87
57
|
|
|
88
|
-
|
|
89
|
-
- **`list_schedules`** — See all schedules with next run times and daemon status.
|
|
90
|
-
- **`cancel_schedule`** — Remove a schedule. Daemon auto-exits when no schedules remain.
|
|
91
|
-
- **`get_scheduled_results`** — Retrieve results from past scheduled executions.
|
|
58
|
+
### Cron Scheduler
|
|
92
59
|
|
|
93
|
-
|
|
60
|
+
Schedule agents on a cron expression — a detached daemon survives IDE/CLI restarts. Uses atomic file locks to prevent duplicate execution.
|
|
94
61
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
2
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
**Skill Tools:**
|
|
102
|
-
- **`list_skills`** — See all available skills and their tiers.
|
|
103
|
-
- **`install_skill`** — Copy a global or built-in skill to the project tier for local customization.
|
|
104
|
-
- **`create_skill` / `delete_skill`** — Manage skill files in project or global scope.
|
|
62
|
+
```
|
|
63
|
+
"0 9 * * MON-FRI" — 9am weekdays
|
|
64
|
+
"*/30 * * * *" — every 30 minutes
|
|
65
|
+
"0 */2 * * *" — every 2 hours
|
|
66
|
+
```
|
|
105
67
|
|
|
106
|
-
|
|
68
|
+
Results are saved to `.agents/scheduled-results/` and retrievable via `get_scheduled_results`.
|
|
107
69
|
|
|
108
|
-
###
|
|
109
|
-
Restrict tool usage for specific tasks using YAML policies. Use built-in templates or custom paths:
|
|
110
|
-
- `policy: "read-only"` — Disables all file-writing and destructive shell tools.
|
|
111
|
-
- `policy: "safe-edit"` — Allows file modifications but blocks arbitrary shell execution.
|
|
112
|
-
- `policy: "/path/to/my-policy.yaml"` — Use a custom security policy.
|
|
70
|
+
### 3-Tier Skill System
|
|
113
71
|
|
|
114
|
-
|
|
115
|
-
- **`consult_peer`** — Architectural review with structured verdicts (AGREE / SUGGEST_CHANGES / DISAGREE).
|
|
116
|
-
- Supports iterative rounds: propose → get feedback → revise → re-send until consensus.
|
|
72
|
+
Skills are Markdown files with YAML frontmatter that extend agent behavior:
|
|
117
73
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
74
|
+
1. **Project** — `.gemini/skills/` (local to repo, takes precedence)
|
|
75
|
+
2. **Global** — `~/.gemini/skills/` (available across all projects)
|
|
76
|
+
3. **Built-in** — shipped with agent-pool (`code-reviewer`, `test-writer`, `doc-fixer`, `orchestrator`)
|
|
121
77
|
|
|
122
|
-
|
|
78
|
+
Install a built-in or global skill into the project for local customization with `install_skill`.
|
|
123
79
|
|
|
124
|
-
|
|
125
|
-
Create `agent-pool.config.json` in your project root or `~/.config/agent-pool/config.json`:
|
|
80
|
+
### Per-Task Policies
|
|
126
81
|
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
{ "id": "gpu", "type": "ssh", "host": "gpu-server", "cwd": "/home/dev/project" }
|
|
132
|
-
],
|
|
133
|
-
"defaultRunner": "local"
|
|
134
|
-
}
|
|
135
|
-
```
|
|
82
|
+
Restrict tool usage for specific tasks using YAML policies:
|
|
83
|
+
- `"read-only"` — disables all file-writing and destructive shell tools
|
|
84
|
+
- `"safe-edit"` — allows file modifications but blocks arbitrary shell execution
|
|
85
|
+
- Custom path — `"/path/to/my-policy.yaml"`
|
|
136
86
|
|
|
137
|
-
|
|
87
|
+
### Cross-Model Peer Review
|
|
138
88
|
|
|
139
|
-
|
|
89
|
+
`consult_peer` sends architectural proposals to a Gemini worker for structured review. The worker responds with a verdict: **AGREE**, **SUGGEST_CHANGES**, or **DISAGREE**. Supports iterative rounds until consensus.
|
|
140
90
|
|
|
141
|
-
|
|
142
|
-
|----------|---------|--------|
|
|
143
|
-
| `AGENT_POOL_DEPTH` | Current nesting level (auto-incremented) | `0` |
|
|
144
|
-
| `AGENT_POOL_MAX_DEPTH` | Max allowed depth | not set (no limit) |
|
|
91
|
+
### Security
|
|
145
92
|
|
|
146
|
-
|
|
93
|
+
- **Path Traversal Protection** — all skill and policy operations are sanitized to prevent access outside designated directories
|
|
94
|
+
- **Process Isolation** — tasks run as detached processes; `cancel_task` and server shutdown kill entire process groups
|
|
95
|
+
- **Credential Safety** — uses your local Gemini CLI authentication; no keys are stored or transmitted
|
|
147
96
|
|
|
148
|
-
##
|
|
97
|
+
## Quick Start
|
|
149
98
|
|
|
150
|
-
|
|
151
|
-
- **[Gemini CLI](https://github.com/google-gemini/gemini-cli)** — installed and authenticated:
|
|
99
|
+
**Prerequisites:** Node.js >= 20, [Gemini CLI](https://github.com/google-gemini/gemini-cli) installed and authenticated.
|
|
152
100
|
|
|
153
101
|
```bash
|
|
154
102
|
npm install -g @google/gemini-cli
|
|
155
103
|
gemini # First run: opens browser for OAuth
|
|
156
104
|
```
|
|
157
105
|
|
|
158
|
-
## Installation
|
|
159
|
-
|
|
160
106
|
Add to your IDE's MCP configuration:
|
|
161
107
|
|
|
162
108
|
```json
|
|
@@ -173,7 +119,7 @@ Add to your IDE's MCP configuration:
|
|
|
173
119
|
Restart your IDE — agent-pool-mcp will be downloaded and started automatically.
|
|
174
120
|
|
|
175
121
|
<details>
|
|
176
|
-
<summary
|
|
122
|
+
<summary>Where is my MCP config file?</summary>
|
|
177
123
|
|
|
178
124
|
| IDE | Config path |
|
|
179
125
|
|-----|------------|
|
|
@@ -185,7 +131,7 @@ Restart your IDE — agent-pool-mcp will be downloaded and started automatically
|
|
|
185
131
|
</details>
|
|
186
132
|
|
|
187
133
|
<details>
|
|
188
|
-
<summary
|
|
134
|
+
<summary>Alternative: global install</summary>
|
|
189
135
|
|
|
190
136
|
```bash
|
|
191
137
|
npm install -g agent-pool-mcp
|
|
@@ -201,9 +147,9 @@ Then use `"command": "agent-pool-mcp"` in your MCP config (no npx needed).
|
|
|
201
147
|
npx agent-pool-mcp --check
|
|
202
148
|
```
|
|
203
149
|
|
|
204
|
-
|
|
150
|
+
Runs diagnostics: checks Node.js, Gemini CLI, authentication, and remote runner connectivity.
|
|
205
151
|
|
|
206
|
-
### CLI
|
|
152
|
+
### CLI
|
|
207
153
|
|
|
208
154
|
```bash
|
|
209
155
|
npx agent-pool-mcp --check # Doctor mode: diagnose prerequisites
|
|
@@ -212,6 +158,31 @@ npx agent-pool-mcp --version # Show version
|
|
|
212
158
|
npx agent-pool-mcp --help # Full help
|
|
213
159
|
```
|
|
214
160
|
|
|
161
|
+
## Remote Workers (SSH)
|
|
162
|
+
|
|
163
|
+
Run workers on remote servers via SSH — same interface, transparent stdio forwarding. Create `agent-pool.config.json` in your project root or `~/.config/agent-pool/config.json`:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
{
|
|
167
|
+
"runners": [
|
|
168
|
+
{ "id": "local", "type": "local" },
|
|
169
|
+
{ "id": "gpu", "type": "ssh", "host": "gpu-server", "cwd": "/home/dev/project" }
|
|
170
|
+
],
|
|
171
|
+
"defaultRunner": "local"
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### Nested Orchestration
|
|
176
|
+
|
|
177
|
+
Install agent-pool inside Gemini CLI to enable hierarchical delegation — workers can spawn their own workers.
|
|
178
|
+
|
|
179
|
+
| Variable | Purpose | Default |
|
|
180
|
+
|----------|---------|--------|
|
|
181
|
+
| `AGENT_POOL_DEPTH` | Current nesting level (auto-incremented) | `0` |
|
|
182
|
+
| `AGENT_POOL_MAX_DEPTH` | Max allowed depth | not set (no limit) |
|
|
183
|
+
|
|
184
|
+
See [parallel-work guide](examples/parallel-work.md) and built-in `orchestrator` skill for patterns.
|
|
185
|
+
|
|
215
186
|
## MCP Ecosystem
|
|
216
187
|
|
|
217
188
|
Best used together with [**project-graph-mcp**](https://www.npmjs.com/package/project-graph-mcp) — AST-based codebase analysis:
|
|
@@ -221,8 +192,6 @@ Best used together with [**project-graph-mcp**](https://www.npmjs.com/package/pr
|
|
|
221
192
|
| **Primary IDE agent** | Delegates tasks, consults peer | Navigates codebase, runs analysis |
|
|
222
193
|
| **Gemini CLI workers** | Executes delegated tasks | Available as MCP tool inside workers |
|
|
223
194
|
|
|
224
|
-
Combined config for both:
|
|
225
|
-
|
|
226
195
|
```json
|
|
227
196
|
{
|
|
228
197
|
"mcpServers": {
|
|
@@ -238,53 +207,23 @@ Combined config for both:
|
|
|
238
207
|
}
|
|
239
208
|
```
|
|
240
209
|
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
- **Path Traversal Protection**: All skill and policy operations are sanitized to prevent access outside designated directories.
|
|
244
|
-
- **Process Isolation**: Tasks run as detached processes; `cancel_task` and server shutdown ensure no zombie processes remain by killing entire process groups.
|
|
245
|
-
- **Credential Safety**: Uses your local Gemini CLI authentication; no keys are stored or transmitted by this server.
|
|
210
|
+
> [!IMPORTANT]
|
|
211
|
+
> Each Gemini CLI worker gets its own MCP server instance but shares pipeline state via filesystem — no coordination overhead.
|
|
246
212
|
|
|
247
|
-
##
|
|
213
|
+
## Documentation
|
|
248
214
|
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
policies/ ← Tool restriction policies (YAML)
|
|
252
|
-
├── read-only.yaml
|
|
253
|
-
└── safe-edit.yaml
|
|
254
|
-
skills/ ← Built-in Gemini CLI skills (Markdown)
|
|
255
|
-
├── code-reviewer.md
|
|
256
|
-
├── doc-fixer.md
|
|
257
|
-
├── orchestrator.md
|
|
258
|
-
└── test-writer.md
|
|
259
|
-
src/
|
|
260
|
-
├── cli.js ← CLI commands (--check, --init, --help)
|
|
261
|
-
├── server.js ← MCP server setup + tool routing
|
|
262
|
-
├── tool-definitions.js ← Tool schemas (JSON Schema)
|
|
263
|
-
├── tools/
|
|
264
|
-
│ ├── consult.js ← Peer review via Gemini CLI
|
|
265
|
-
│ ├── results.js ← Task store + result formatting (TTL cleanup, ring buffer)
|
|
266
|
-
│ └── skills.js ← 3-tier skill management (project/global/built-in)
|
|
267
|
-
├── runner/
|
|
268
|
-
│ ├── config.js ← Runner config loader (local/SSH)
|
|
269
|
-
│ ├── gemini-runner.js ← Process spawning (streaming JSON, depth tracking)
|
|
270
|
-
│ ├── process-manager.js ← PID tracking, system load awareness, group kill
|
|
271
|
-
│ └── ssh.js ← Shell escaping, remote PID tracking
|
|
272
|
-
└── scheduler/
|
|
273
|
-
├── cron.js ← Minimal cron expression parser (zero-dependency)
|
|
274
|
-
├── daemon.js ← Detached daemon: schedule ticks + pipeline lifecycle
|
|
275
|
-
├── pipeline.js ← Pipeline CRUD, run state, signals, bounce-back
|
|
276
|
-
└── scheduler.js ← Schedule management + daemon spawning
|
|
277
|
-
```
|
|
215
|
+
- [ARCHITECTURE.md](ARCHITECTURE.md) — Source code structure and process management details
|
|
216
|
+
- [examples/parallel-work.md](examples/parallel-work.md) — Delegation patterns and best practices
|
|
278
217
|
|
|
279
|
-
|
|
280
|
-
-
|
|
281
|
-
-
|
|
282
|
-
-
|
|
283
|
-
- **Depth Tracking**: Nested orchestration support with optional `AGENT_POOL_MAX_DEPTH` limit.
|
|
284
|
-
- **Adaptive Polling**: Pipeline daemon uses 3s intervals when active, 30s when idle.
|
|
285
|
-
- **File-Based Communication**: Pipeline agents communicate through `.agents/runs/` JSON files — each Gemini process has its own MCP server instance but shares state via filesystem.
|
|
218
|
+
## Related Projects
|
|
219
|
+
- [project-graph-mcp](https://github.com/rnd-pro/project-graph-mcp) — AST-based codebase analysis for AI agents
|
|
220
|
+
- [Symbiote.js](https://github.com/symbiotejs/symbiote.js) — Isomorphic Reactive Web Components framework
|
|
221
|
+
- [JSDA-Kit](https://github.com/rnd-pro/jsda-kit) — SSG/SSR toolkit for modern web applications
|
|
286
222
|
|
|
287
223
|
## License
|
|
288
224
|
|
|
289
|
-
MIT
|
|
225
|
+
MIT © [RND-PRO.com](https://rnd-pro.com)
|
|
226
|
+
|
|
227
|
+
---
|
|
290
228
|
|
|
229
|
+
**Made with ❤️ by the RND-PRO team**
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-pool-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.7.1",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "MCP Server for multi-agent task delegation and orchestration via Gemini CLI",
|
|
6
6
|
"main": "index.js",
|
|
@@ -31,7 +31,8 @@
|
|
|
31
31
|
"cron",
|
|
32
32
|
"ai"
|
|
33
33
|
],
|
|
34
|
-
"author": "
|
|
34
|
+
"author": "RND-PRO",
|
|
35
|
+
"homepage": "https://github.com/rnd-pro/agent-pool-mcp",
|
|
35
36
|
"license": "MIT",
|
|
36
37
|
"repository": {
|
|
37
38
|
"type": "git",
|
package/src/scheduler/daemon.js
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
* @module agent-pool/scheduler/daemon
|
|
12
12
|
*/
|
|
13
13
|
|
|
14
|
-
import { readFileSync, writeFileSync, existsSync, mkdirSync, unlinkSync, readdirSync } from 'node:fs';
|
|
14
|
+
import { readFileSync, writeFileSync, existsSync, mkdirSync, unlinkSync, readdirSync, renameSync } from 'node:fs';
|
|
15
15
|
import { spawn } from 'node:child_process';
|
|
16
16
|
import { join, dirname } from 'node:path';
|
|
17
17
|
import { matchesCron } from './cron.js';
|
|
@@ -19,6 +19,7 @@ import { getGroup } from '../tools/groups.js';
|
|
|
19
19
|
import { getRunner } from '../runner/config.js';
|
|
20
20
|
import { buildSshSpawn } from '../runner/ssh.js';
|
|
21
21
|
import { killGroup } from '../runner/process-manager.js';
|
|
22
|
+
import { consumeSignals, deleteSignals } from './run-signals.js';
|
|
22
23
|
|
|
23
24
|
const POLL_INTERVAL_MS = 30_000; // Check schedules every 30 seconds
|
|
24
25
|
const PID_FILE = '.agents/scheduler.pid';
|
|
@@ -163,11 +164,125 @@ function executeSchedule(schedule) {
|
|
|
163
164
|
console.error(`[scheduler] Started: ${schedule.id} → gemini pid ${child.pid}`);
|
|
164
165
|
}
|
|
165
166
|
|
|
166
|
-
// ─── Pipeline tick
|
|
167
|
+
// ─── Pipeline tick ──────────────────────────────────────────────────
|
|
167
168
|
|
|
168
169
|
const PIPELINES_DIR = '.agents/pipelines';
|
|
169
170
|
const RUNS_DIR = '.agents/runs';
|
|
170
171
|
|
|
172
|
+
/**
|
|
173
|
+
* In-memory pipeline state cache.
|
|
174
|
+
* Loaded from disk on startup, updated in-place during ticks.
|
|
175
|
+
* Written to disk on state transitions (write-through).
|
|
176
|
+
* @type {Map<string, object>}
|
|
177
|
+
*/
|
|
178
|
+
const runCache = new Map();
|
|
179
|
+
|
|
180
|
+
/**
|
|
181
|
+
* Load all active runs from disk into the in-memory cache.
|
|
182
|
+
* Called once on daemon startup.
|
|
183
|
+
*/
|
|
184
|
+
function loadRunCache() {
|
|
185
|
+
const dir = join(cwd, RUNS_DIR);
|
|
186
|
+
if (!existsSync(dir)) return;
|
|
187
|
+
for (const f of readdirSync(dir).filter(f => f.endsWith('.json') && !f.includes('.signal-'))) {
|
|
188
|
+
try {
|
|
189
|
+
const run = JSON.parse(readFileSync(join(dir, f), 'utf-8'));
|
|
190
|
+
const runId = f.replace('.json', '');
|
|
191
|
+
runCache.set(runId, run);
|
|
192
|
+
} catch { /* skip corrupted */ }
|
|
193
|
+
}
|
|
194
|
+
console.error(`[pipeline] Loaded ${runCache.size} runs into memory cache`);
|
|
195
|
+
}
|
|
196
|
+
|
|
197
|
+
/**
|
|
198
|
+
* Persist a run to disk atomically (write-then-rename).
|
|
199
|
+
* Prevents corruption if daemon crashes mid-write.
|
|
200
|
+
* @param {string} runId
|
|
201
|
+
* @param {object} run
|
|
202
|
+
*/
|
|
203
|
+
function persistRun(runId, run) {
|
|
204
|
+
const dir = join(cwd, RUNS_DIR);
|
|
205
|
+
mkdirSync(dir, { recursive: true });
|
|
206
|
+
const target = join(dir, `${runId}.json`);
|
|
207
|
+
const tmp = join(dir, `${runId}.json.tmp`);
|
|
208
|
+
writeFileSync(tmp, JSON.stringify(run, null, 2));
|
|
209
|
+
// Atomic rename (same filesystem) — prevents corruption on crash
|
|
210
|
+
try { renameSync(tmp, target); }
|
|
211
|
+
catch { writeFileSync(target, JSON.stringify(run, null, 2)); }
|
|
212
|
+
}
|
|
213
|
+
|
|
214
|
+
/**
|
|
215
|
+
* Apply consumed signal files to a run's in-memory state.
|
|
216
|
+
* @param {object} run - Run state object (mutated in place)
|
|
217
|
+
* @param {Array} signals - Consumed signal objects
|
|
218
|
+
* @param {object} pipeline - Pipeline definition
|
|
219
|
+
* @returns {boolean} true if any signal was applied
|
|
220
|
+
*/
|
|
221
|
+
function applySignals(run, signals, pipeline) {
|
|
222
|
+
let modified = false;
|
|
223
|
+
for (const signal of signals) {
|
|
224
|
+
if (signal.type === 'STEP_COMPLETE') {
|
|
225
|
+
const step = run.steps[signal.stepName];
|
|
226
|
+
if (step && step.status === 'running') {
|
|
227
|
+
step.status = 'success';
|
|
228
|
+
step.signaled = true;
|
|
229
|
+
step.completedAt = new Date().toISOString();
|
|
230
|
+
if (signal.output) step.output = signal.output;
|
|
231
|
+
modified = true;
|
|
232
|
+
console.error(`[pipeline] Signal: step "${signal.stepName}" completed`);
|
|
233
|
+
}
|
|
234
|
+
} else if (signal.type === 'BOUNCE_BACK') {
|
|
235
|
+
const targetStep = run.steps[signal.stepName];
|
|
236
|
+
if (!targetStep) continue;
|
|
237
|
+
|
|
238
|
+
const stepDef = pipeline?.steps.find(s => s.name === signal.stepName);
|
|
239
|
+
const maxBounces = stepDef?.maxBounces ?? 2;
|
|
240
|
+
|
|
241
|
+
if (targetStep.bounces >= maxBounces) {
|
|
242
|
+
// Bounce limit reached
|
|
243
|
+
targetStep.status = 'failed';
|
|
244
|
+
targetStep.lastBounceReason = `Bounce limit (${maxBounces}) reached. Last: ${signal.reason}`;
|
|
245
|
+
run.status = 'failed';
|
|
246
|
+
run.completedAt = new Date().toISOString();
|
|
247
|
+
console.error(`[pipeline] Bounce limit reached for "${signal.stepName}"`);
|
|
248
|
+
} else {
|
|
249
|
+
// Reset target step
|
|
250
|
+
targetStep.status = 'bounce_pending';
|
|
251
|
+
targetStep.bounces = (targetStep.bounces || 0) + 1;
|
|
252
|
+
targetStep.lastBounceReason = signal.reason;
|
|
253
|
+
|
|
254
|
+
// Kill running processes for this step
|
|
255
|
+
const pidsToKill = [...(targetStep.pids || [])];
|
|
256
|
+
if (targetStep.pid && !pidsToKill.includes(targetStep.pid)) pidsToKill.push(targetStep.pid);
|
|
257
|
+
for (const pid of pidsToKill) killGroup(pid);
|
|
258
|
+
|
|
259
|
+
targetStep.pid = null;
|
|
260
|
+
targetStep.pids = [];
|
|
261
|
+
targetStep.exitCode = null;
|
|
262
|
+
targetStep.signaled = false;
|
|
263
|
+
|
|
264
|
+
// Reset calling step
|
|
265
|
+
if (signal.callingStepName && run.steps[signal.callingStepName]) {
|
|
266
|
+
run.steps[signal.callingStepName].status = 'waiting_bounce';
|
|
267
|
+
}
|
|
268
|
+
console.error(`[pipeline] Bounce: step "${signal.stepName}" reset (reason: ${signal.reason})`);
|
|
269
|
+
}
|
|
270
|
+
modified = true;
|
|
271
|
+
} else if (signal.type === 'CANCEL_RUN') {
|
|
272
|
+
// Cancel the entire run
|
|
273
|
+
for (const [name, step] of Object.entries(run.steps)) {
|
|
274
|
+
if (step.status === 'running') step.status = 'cancelled';
|
|
275
|
+
if (step.status === 'pending') step.status = 'skipped';
|
|
276
|
+
}
|
|
277
|
+
run.status = 'cancelled';
|
|
278
|
+
run.completedAt = new Date().toISOString();
|
|
279
|
+
console.error(`[pipeline] Signal: run cancelled`);
|
|
280
|
+
modified = true;
|
|
281
|
+
}
|
|
282
|
+
}
|
|
283
|
+
return modified;
|
|
284
|
+
}
|
|
285
|
+
|
|
171
286
|
/**
|
|
172
287
|
* Spawn Gemini CLI agent(s) for a pipeline step.
|
|
173
288
|
* @param {object} stepDef - Step definition from pipeline
|
|
@@ -251,19 +366,17 @@ function spawnStep(stepDef, run, runId, bounceReason) {
|
|
|
251
366
|
const child = spawn(spawnCmd, spawnArgs, spawnOpts);
|
|
252
367
|
|
|
253
368
|
child.on('close', (code) => {
|
|
254
|
-
// Update step exit code in
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
if (
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
} else if (currentRun.steps[stepDef.name].exitCode === null) {
|
|
262
|
-
currentRun.steps[stepDef.name].exitCode = 0;
|
|
263
|
-
}
|
|
369
|
+
// Update step exit code in in-memory state directly (same process)
|
|
370
|
+
const currentRun = runCache.get(runId);
|
|
371
|
+
if (currentRun?.steps[stepDef.name]) {
|
|
372
|
+
if (code !== 0) {
|
|
373
|
+
currentRun.steps[stepDef.name].exitCode = code;
|
|
374
|
+
} else if (currentRun.steps[stepDef.name].exitCode === null) {
|
|
375
|
+
currentRun.steps[stepDef.name].exitCode = 0;
|
|
264
376
|
}
|
|
265
|
-
|
|
266
|
-
|
|
377
|
+
// Write-through to disk
|
|
378
|
+
persistRun(runId, currentRun);
|
|
379
|
+
}
|
|
267
380
|
console.error(`[pipeline] Step "${stepDef.name}" [pid ${child.pid}] exited (code: ${code}, run: ${runId})`);
|
|
268
381
|
});
|
|
269
382
|
|
|
@@ -289,33 +402,69 @@ function isAlive(pid) {
|
|
|
289
402
|
}
|
|
290
403
|
|
|
291
404
|
/**
|
|
292
|
-
* Process pipeline runs — check triggers, advance steps.
|
|
405
|
+
* Process pipeline runs — consume signals, check triggers, advance steps.
|
|
406
|
+
* Uses in-memory cache for state; persists to disk on changes.
|
|
293
407
|
* @returns {boolean} true if any pipeline is actively running
|
|
294
408
|
*/
|
|
295
409
|
function tickPipelines() {
|
|
410
|
+
// Pick up new runs added to disk since last tick (e.g., from runPipeline)
|
|
296
411
|
const runsDir = join(cwd, RUNS_DIR);
|
|
297
|
-
if (
|
|
412
|
+
if (existsSync(runsDir)) {
|
|
413
|
+
for (const f of readdirSync(runsDir).filter(f => f.endsWith('.json') && !f.includes('.signal-') && !f.endsWith('.tmp'))) {
|
|
414
|
+
const runId = f.replace('.json', '');
|
|
415
|
+
if (!runCache.has(runId)) {
|
|
416
|
+
try {
|
|
417
|
+
const run = JSON.parse(readFileSync(join(runsDir, f), 'utf-8'));
|
|
418
|
+
runCache.set(runId, run);
|
|
419
|
+
console.error(`[pipeline] Picked up new run: ${runId}`);
|
|
420
|
+
} catch { /* skip corrupted */ }
|
|
421
|
+
}
|
|
422
|
+
}
|
|
423
|
+
}
|
|
298
424
|
|
|
299
425
|
const pipelinesDir = join(cwd, PIPELINES_DIR);
|
|
300
426
|
let hasActive = false;
|
|
301
427
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
if (run.status !== 'running')
|
|
428
|
+
// Iterate over a copy of keys to allow modification of runCache during iteration
|
|
429
|
+
for (const runId of Array.from(runCache.keys())) {
|
|
430
|
+
const run = runCache.get(runId);
|
|
431
|
+
|
|
432
|
+
// Evict completed runs from cache (memory leak fix)
|
|
433
|
+
if (run.status !== 'running') {
|
|
434
|
+
// Clean up any orphaned/late signals for completed runs
|
|
435
|
+
const lateSignals = consumeSignals(cwd, runId);
|
|
436
|
+
if (lateSignals.length > 0) {
|
|
437
|
+
deleteSignals(cwd, lateSignals);
|
|
438
|
+
console.error(`[pipeline] Cleaned ${lateSignals.length} orphaned signal(s) for completed run ${runId}`);
|
|
439
|
+
}
|
|
440
|
+
runCache.delete(runId);
|
|
441
|
+
continue;
|
|
442
|
+
}
|
|
308
443
|
hasActive = true;
|
|
309
444
|
|
|
310
445
|
// Load pipeline definition
|
|
311
446
|
let pipeline;
|
|
312
447
|
try {
|
|
313
448
|
pipeline = JSON.parse(readFileSync(join(pipelinesDir, `${run.pipeline}.json`), 'utf-8'));
|
|
314
|
-
} catch {
|
|
449
|
+
} catch {
|
|
450
|
+
console.error(`[pipeline] Could not load pipeline definition for run ${runId}: ${run.pipeline}.json`);
|
|
451
|
+
continue;
|
|
452
|
+
}
|
|
315
453
|
|
|
316
|
-
|
|
454
|
+
// 1. Consume and apply signal files
|
|
455
|
+
const signals = consumeSignals(cwd, runId);
|
|
317
456
|
let modified = false;
|
|
318
457
|
|
|
458
|
+
if (signals.length > 0) {
|
|
459
|
+
modified = applySignals(run, signals, pipeline);
|
|
460
|
+
if (modified) {
|
|
461
|
+
// Durability: persist state BEFORE deleting signals
|
|
462
|
+
persistRun(runId, run);
|
|
463
|
+
deleteSignals(cwd, signals);
|
|
464
|
+
}
|
|
465
|
+
}
|
|
466
|
+
|
|
467
|
+
// 2. Process each step
|
|
319
468
|
for (const stepDef of pipeline.steps) {
|
|
320
469
|
const step = run.steps[stepDef.name];
|
|
321
470
|
if (!step) continue;
|
|
@@ -455,7 +604,7 @@ function tickPipelines() {
|
|
|
455
604
|
}
|
|
456
605
|
|
|
457
606
|
if (modified) {
|
|
458
|
-
|
|
607
|
+
persistRun(runId, run);
|
|
459
608
|
}
|
|
460
609
|
}
|
|
461
610
|
|
|
@@ -516,9 +665,10 @@ function tick() {
|
|
|
516
665
|
setTimeout(tick, nextTickMs);
|
|
517
666
|
}
|
|
518
667
|
|
|
519
|
-
// ─── Startup
|
|
668
|
+
// ─── Startup ────────────────────────────────────────────────────
|
|
520
669
|
|
|
521
670
|
acquireLock();
|
|
671
|
+
loadRunCache();
|
|
522
672
|
|
|
523
673
|
process.on('SIGINT', () => { releaseLock(); process.exit(0); });
|
|
524
674
|
process.on('SIGTERM', () => { releaseLock(); process.exit(0); });
|
|
@@ -12,6 +12,7 @@ import { join, dirname } from 'node:path';
|
|
|
12
12
|
import { randomUUID } from 'node:crypto';
|
|
13
13
|
import { ensureDaemon } from './scheduler.js';
|
|
14
14
|
import { killGroup } from '../runner/process-manager.js';
|
|
15
|
+
import { writeSignal } from './run-signals.js';
|
|
15
16
|
|
|
16
17
|
const PIPELINES_DIR = '.agents/pipelines';
|
|
17
18
|
const RUNS_DIR = '.agents/runs';
|
|
@@ -202,7 +203,7 @@ export function listRuns(cwd, pipelineId) {
|
|
|
202
203
|
const dir = join(cwd, RUNS_DIR);
|
|
203
204
|
if (!existsSync(dir)) return [];
|
|
204
205
|
return readdirSync(dir)
|
|
205
|
-
.filter(f => f.endsWith('.json'))
|
|
206
|
+
.filter(f => f.endsWith('.json') && !f.includes('.signal-'))
|
|
206
207
|
.map(f => {
|
|
207
208
|
try { return JSON.parse(readFileSync(join(dir, f), 'utf-8')); }
|
|
208
209
|
catch { return null; }
|
|
@@ -212,7 +213,8 @@ export function listRuns(cwd, pipelineId) {
|
|
|
212
213
|
}
|
|
213
214
|
|
|
214
215
|
/**
|
|
215
|
-
* Cancel a pipeline run.
|
|
216
|
+
* Cancel a pipeline run. Writes a signal file for the daemon.
|
|
217
|
+
* Kills running processes immediately for responsiveness.
|
|
216
218
|
* @param {string} cwd
|
|
217
219
|
* @param {string} runId
|
|
218
220
|
* @returns {boolean}
|
|
@@ -221,24 +223,22 @@ export function cancelRun(cwd, runId) {
|
|
|
221
223
|
const run = getRun(cwd, runId);
|
|
222
224
|
if (!run || run.status !== 'running') return false;
|
|
223
225
|
|
|
224
|
-
// Kill
|
|
226
|
+
// Kill running processes immediately (side-effect safe)
|
|
225
227
|
for (const [name, step] of Object.entries(run.steps)) {
|
|
226
228
|
if (step.status === 'running') {
|
|
227
229
|
const pidsToKill = [...(step.pids || [])];
|
|
228
230
|
if (step.pid && !pidsToKill.includes(step.pid)) pidsToKill.push(step.pid);
|
|
229
|
-
|
|
230
231
|
for (const pid of pidsToKill) {
|
|
231
232
|
killGroup(pid);
|
|
232
233
|
}
|
|
233
|
-
step.status = 'cancelled';
|
|
234
|
-
}
|
|
235
|
-
if (step.status === 'pending') {
|
|
236
|
-
step.status = 'skipped';
|
|
237
234
|
}
|
|
238
235
|
}
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
236
|
+
|
|
237
|
+
// Write signal file — daemon will apply the state change
|
|
238
|
+
writeSignal(cwd, runId, {
|
|
239
|
+
type: 'CANCEL_RUN',
|
|
240
|
+
});
|
|
241
|
+
|
|
242
242
|
return true;
|
|
243
243
|
}
|
|
244
244
|
|
|
@@ -254,7 +254,7 @@ export function findActiveRunByStep(cwd, stepName) {
|
|
|
254
254
|
const dir = join(cwd, RUNS_DIR);
|
|
255
255
|
if (!existsSync(dir)) return null;
|
|
256
256
|
|
|
257
|
-
for (const f of readdirSync(dir).filter(f => f.endsWith('.json'))) {
|
|
257
|
+
for (const f of readdirSync(dir).filter(f => f.endsWith('.json') && !f.includes('.signal-'))) {
|
|
258
258
|
try {
|
|
259
259
|
const run = JSON.parse(readFileSync(join(dir, f), 'utf-8'));
|
|
260
260
|
if (run.status === 'running' && run.steps[stepName]) {
|
|
@@ -267,42 +267,42 @@ export function findActiveRunByStep(cwd, stepName) {
|
|
|
267
267
|
|
|
268
268
|
/**
|
|
269
269
|
* Signal step completion. Called by agent via MCP tool.
|
|
270
|
+
* Writes a signal file instead of mutating run state directly.
|
|
271
|
+
* The daemon will consume this signal on its next tick.
|
|
270
272
|
* @param {string} cwd
|
|
271
273
|
* @param {string} stepName
|
|
272
274
|
* @param {string} [output]
|
|
273
275
|
* @param {string} [runId] - Specific run ID (recommended)
|
|
274
|
-
* @returns {{ success: boolean
|
|
276
|
+
* @returns {{ success: boolean }}
|
|
275
277
|
*/
|
|
276
278
|
export function signalStepComplete(cwd, stepName, output, runId) {
|
|
277
|
-
let
|
|
279
|
+
let resolvedRunId = runId;
|
|
278
280
|
|
|
279
|
-
if (
|
|
280
|
-
// Direct lookup by run ID
|
|
281
|
-
run = getRun(cwd, runId);
|
|
282
|
-
resolvedRunId = runId;
|
|
283
|
-
} else {
|
|
281
|
+
if (!resolvedRunId) {
|
|
284
282
|
// Fallback: search by step name
|
|
285
283
|
const found = findActiveRunByStep(cwd, stepName);
|
|
286
284
|
if (!found) return { success: false };
|
|
287
|
-
run = found.run;
|
|
288
285
|
resolvedRunId = found.runId;
|
|
289
286
|
}
|
|
290
287
|
|
|
288
|
+
// Verify run exists and is active
|
|
289
|
+
const run = getRun(cwd, resolvedRunId);
|
|
291
290
|
if (!run || run.status !== 'running') return { success: false };
|
|
292
|
-
|
|
293
|
-
if (!step || step.status !== 'running') return { success: false };
|
|
291
|
+
if (!run.steps[stepName] || run.steps[stepName].status !== 'running') return { success: false };
|
|
294
292
|
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
293
|
+
// Write signal file — daemon will apply it
|
|
294
|
+
writeSignal(cwd, resolvedRunId, {
|
|
295
|
+
type: 'STEP_COMPLETE',
|
|
296
|
+
stepName,
|
|
297
|
+
output: output || null,
|
|
298
|
+
});
|
|
299
299
|
|
|
300
|
-
saveRun(cwd, resolvedRunId, run);
|
|
301
300
|
return { success: true };
|
|
302
301
|
}
|
|
303
302
|
|
|
304
303
|
/**
|
|
305
304
|
* Bounce back to a previous step. Called by agent via MCP tool.
|
|
305
|
+
* Writes a signal file instead of mutating run state directly.
|
|
306
306
|
* @param {string} cwd
|
|
307
307
|
* @param {string} targetStepName - Step to re-run
|
|
308
308
|
* @param {string} reason - Why bouncing back
|
|
@@ -310,63 +310,53 @@ export function signalStepComplete(cwd, stepName, output, runId) {
|
|
|
310
310
|
* @returns {{ success: boolean, bounceCount?: number, maxBounces?: number }}
|
|
311
311
|
*/
|
|
312
312
|
export function bounceBack(cwd, targetStepName, reason, runId) {
|
|
313
|
-
// Find active run
|
|
314
|
-
|
|
315
|
-
|
|
313
|
+
// Find the active run containing this step
|
|
314
|
+
let resolvedRunId = runId;
|
|
315
|
+
let run;
|
|
316
316
|
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
targetStep.lastBounceReason = `Bounce limit (${maxBounces}) reached. Last: ${reason}`;
|
|
334
|
-
run.status = 'failed';
|
|
335
|
-
run.completedAt = new Date().toISOString();
|
|
336
|
-
saveRun(cwd, f.replace('.json', ''), run);
|
|
337
|
-
return { success: false, bounceCount: targetStep.bounces, maxBounces };
|
|
338
|
-
}
|
|
317
|
+
if (resolvedRunId) {
|
|
318
|
+
run = getRun(cwd, resolvedRunId);
|
|
319
|
+
} else {
|
|
320
|
+
const dir = join(cwd, RUNS_DIR);
|
|
321
|
+
if (!existsSync(dir)) return { success: false };
|
|
322
|
+
for (const f of readdirSync(dir).filter(f => f.endsWith('.json') && !f.includes('.signal-'))) {
|
|
323
|
+
try {
|
|
324
|
+
const r = JSON.parse(readFileSync(join(dir, f), 'utf-8'));
|
|
325
|
+
if (r.status === 'running' && r.steps[targetStepName]) {
|
|
326
|
+
run = r;
|
|
327
|
+
resolvedRunId = f.replace('.json', '');
|
|
328
|
+
break;
|
|
329
|
+
}
|
|
330
|
+
} catch { /* skip */ }
|
|
331
|
+
}
|
|
332
|
+
}
|
|
339
333
|
|
|
340
|
-
|
|
341
|
-
targetStep.status = 'bounce_pending';
|
|
342
|
-
targetStep.bounces += 1;
|
|
343
|
-
targetStep.lastBounceReason = reason;
|
|
334
|
+
if (!run || run.status !== 'running') return { success: false };
|
|
344
335
|
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
if (targetStep.pid && !pidsToKill.includes(targetStep.pid)) pidsToKill.push(targetStep.pid);
|
|
348
|
-
for (const pid of pidsToKill) {
|
|
349
|
-
killGroup(pid);
|
|
350
|
-
}
|
|
336
|
+
const targetStep = run.steps[targetStepName];
|
|
337
|
+
if (!targetStep) return { success: false };
|
|
351
338
|
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
// Reset the calling step too
|
|
358
|
-
const callingStepName = Object.keys(run.steps).find(name => {
|
|
359
|
-
const s = run.steps[name];
|
|
360
|
-
return s.status === 'running';
|
|
361
|
-
});
|
|
362
|
-
if (callingStepName) {
|
|
363
|
-
run.steps[callingStepName].status = 'waiting_bounce';
|
|
364
|
-
}
|
|
339
|
+
// Check bounce limit (read-only check — safe without lock)
|
|
340
|
+
const pipeline = getPipeline(run.cwd || cwd, run.pipeline);
|
|
341
|
+
const stepDef = pipeline?.steps.find(s => s.name === targetStepName);
|
|
342
|
+
const maxBounces = stepDef?.maxBounces ?? 2;
|
|
365
343
|
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
} catch { /* skip */ }
|
|
344
|
+
if (targetStep.bounces >= maxBounces) {
|
|
345
|
+
return { success: false, bounceCount: targetStep.bounces, maxBounces };
|
|
369
346
|
}
|
|
370
347
|
|
|
371
|
-
|
|
348
|
+
// Find the calling step name (the step that's bouncing back)
|
|
349
|
+
const callingStepName = Object.keys(run.steps).find(name =>
|
|
350
|
+
run.steps[name].status === 'running' && name !== targetStepName,
|
|
351
|
+
);
|
|
352
|
+
|
|
353
|
+
// Write signal file — daemon will apply the state changes and kill processes
|
|
354
|
+
writeSignal(cwd, resolvedRunId, {
|
|
355
|
+
type: 'BOUNCE_BACK',
|
|
356
|
+
stepName: targetStepName,
|
|
357
|
+
callingStepName: callingStepName || null,
|
|
358
|
+
reason,
|
|
359
|
+
});
|
|
360
|
+
|
|
361
|
+
return { success: true, bounceCount: targetStep.bounces + 1, maxBounces };
|
|
372
362
|
}
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Run signal files — atomic communication between MCP server and daemon.
|
|
3
|
+
*
|
|
4
|
+
* Instead of MCP tools writing directly to run JSON (race condition),
|
|
5
|
+
* they write small signal files that the daemon consumes on each tick.
|
|
6
|
+
*
|
|
7
|
+
* Signal types: STEP_COMPLETE, BOUNCE_BACK
|
|
8
|
+
*
|
|
9
|
+
* @module agent-pool/scheduler/run-signals
|
|
10
|
+
*/
|
|
11
|
+
|
|
12
|
+
import { writeFileSync, readFileSync, readdirSync, unlinkSync, existsSync, mkdirSync } from 'node:fs';
|
|
13
|
+
import { join } from 'node:path';
|
|
14
|
+
import { randomUUID } from 'node:crypto';
|
|
15
|
+
|
|
16
|
+
const RUNS_DIR = '.agents/runs';
|
|
17
|
+
|
|
18
|
+
/**
|
|
19
|
+
* Write a signal file for a specific run.
|
|
20
|
+
* Signal files are atomic — no concurrent read-modify-write.
|
|
21
|
+
* @param {string} cwd
|
|
22
|
+
* @param {string} runId
|
|
23
|
+
* @param {object} signal - { type, stepName, output?, reason?, targetStep? }
|
|
24
|
+
*/
|
|
25
|
+
export function writeSignal(cwd, runId, signal) {
|
|
26
|
+
const dir = join(cwd, RUNS_DIR);
|
|
27
|
+
mkdirSync(dir, { recursive: true });
|
|
28
|
+
|
|
29
|
+
const id = randomUUID().split('-')[0];
|
|
30
|
+
const fileName = `${runId}.signal-${id}.json`;
|
|
31
|
+
const payload = {
|
|
32
|
+
...signal,
|
|
33
|
+
timestamp: new Date().toISOString(),
|
|
34
|
+
};
|
|
35
|
+
|
|
36
|
+
writeFileSync(join(dir, fileName), JSON.stringify(payload));
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
/**
|
|
40
|
+
* Consume all pending signal files for a run.
|
|
41
|
+
* Returns signals sorted by timestamp. Does NOT delete them —
|
|
42
|
+
* caller must call deleteSignals() after persisting state.
|
|
43
|
+
* @param {string} cwd
|
|
44
|
+
* @param {string} runId
|
|
45
|
+
* @returns {Array<{ type: string, stepName: string, fileName: string, [key: string]: any }>}
|
|
46
|
+
*/
|
|
47
|
+
export function consumeSignals(cwd, runId) {
|
|
48
|
+
const dir = join(cwd, RUNS_DIR);
|
|
49
|
+
if (!existsSync(dir)) return [];
|
|
50
|
+
|
|
51
|
+
const prefix = `${runId}.signal-`;
|
|
52
|
+
const signalFiles = readdirSync(dir).filter(f => f.startsWith(prefix) && f.endsWith('.json'));
|
|
53
|
+
|
|
54
|
+
const signals = [];
|
|
55
|
+
for (const f of signalFiles) {
|
|
56
|
+
try {
|
|
57
|
+
const data = JSON.parse(readFileSync(join(dir, f), 'utf-8'));
|
|
58
|
+
signals.push({ ...data, fileName: f });
|
|
59
|
+
} catch {
|
|
60
|
+
// Include corrupted files so they get cleaned up by deleteSignals
|
|
61
|
+
signals.push({ type: '_corrupted', fileName: f });
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
// Sort by timestamp for deterministic processing
|
|
66
|
+
signals.sort((a, b) => (a.timestamp || '').localeCompare(b.timestamp || ''));
|
|
67
|
+
return signals;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
/**
|
|
71
|
+
* Delete signal files after state has been persisted to disk.
|
|
72
|
+
* @param {string} cwd
|
|
73
|
+
* @param {Array<{ fileName: string }>} signals
|
|
74
|
+
*/
|
|
75
|
+
export function deleteSignals(cwd, signals) {
|
|
76
|
+
const dir = join(cwd, RUNS_DIR);
|
|
77
|
+
for (const s of signals) {
|
|
78
|
+
try { unlinkSync(join(dir, s.fileName)); }
|
|
79
|
+
catch { /* ignore */ }
|
|
80
|
+
}
|
|
81
|
+
}
|
package/src/server.js
CHANGED
|
@@ -23,6 +23,7 @@ import { consultPeer } from './tools/consult.js';
|
|
|
23
23
|
import { addSchedule, listSchedules, removeSchedule, getScheduledResults, getDaemonStatus } from './scheduler/scheduler.js';
|
|
24
24
|
import { createPipeline, listPipelines, runPipeline, getRun, listRuns, cancelRun, signalStepComplete, bounceBack } from './scheduler/pipeline.js';
|
|
25
25
|
import { createGroup, listGroups, getGroup } from './tools/groups.js';
|
|
26
|
+
import { sendMessage, getMessages } from './tools/messaging.js';
|
|
26
27
|
|
|
27
28
|
import { TOOL_DEFINITIONS } from './tool-definitions.js';
|
|
28
29
|
|
|
@@ -112,7 +113,7 @@ export function createServer() {
|
|
|
112
113
|
}
|
|
113
114
|
|
|
114
115
|
const server = new Server(
|
|
115
|
-
{ name: 'agent-pool', version: '1.
|
|
116
|
+
{ name: 'agent-pool', version: '1.7.0' },
|
|
116
117
|
{ capabilities: { tools: {}, resources: {} } },
|
|
117
118
|
);
|
|
118
119
|
|
|
@@ -208,6 +209,10 @@ export function createServer() {
|
|
|
208
209
|
response = handleListGroups(args); break;
|
|
209
210
|
case 'delegate_to_group':
|
|
210
211
|
response = handleDelegateToGroup(args); break;
|
|
212
|
+
case 'send_message':
|
|
213
|
+
response = handleSendMessage(args); break;
|
|
214
|
+
case 'get_messages':
|
|
215
|
+
response = handleGetMessages(args); break;
|
|
211
216
|
default:
|
|
212
217
|
response = { content: [{ type: 'text', text: `Unknown tool: ${name}` }], isError: true };
|
|
213
218
|
}
|
|
@@ -803,3 +808,58 @@ function handleDelegateToGroup(args) {
|
|
|
803
808
|
}],
|
|
804
809
|
};
|
|
805
810
|
}
|
|
811
|
+
|
|
812
|
+
// ─── Messaging handlers ─────────────────────────────────────
|
|
813
|
+
|
|
814
|
+
function handleSendMessage(args) {
|
|
815
|
+
const cwd = args.cwd ?? defaultCwd;
|
|
816
|
+
const result = sendMessage(cwd, {
|
|
817
|
+
channel: args.channel,
|
|
818
|
+
payload: args.payload,
|
|
819
|
+
from: args.from,
|
|
820
|
+
});
|
|
821
|
+
|
|
822
|
+
if (!result.success) {
|
|
823
|
+
return {
|
|
824
|
+
content: [{ type: 'text', text: `❌ Failed to send message: ${result.error || 'unknown error'}` }],
|
|
825
|
+
isError: true,
|
|
826
|
+
};
|
|
827
|
+
}
|
|
828
|
+
|
|
829
|
+
return {
|
|
830
|
+
content: [{ type: 'text', text: `📨 Message sent to channel \`${result.channel}\`.` }],
|
|
831
|
+
};
|
|
832
|
+
}
|
|
833
|
+
|
|
834
|
+
function handleGetMessages(args) {
|
|
835
|
+
const cwd = args.cwd ?? defaultCwd;
|
|
836
|
+
const result = getMessages(cwd, {
|
|
837
|
+
channel: args.channel,
|
|
838
|
+
clear: args.clear,
|
|
839
|
+
});
|
|
840
|
+
|
|
841
|
+
if (result.error) {
|
|
842
|
+
return {
|
|
843
|
+
content: [{ type: 'text', text: `❌ ${result.error}` }],
|
|
844
|
+
isError: true,
|
|
845
|
+
};
|
|
846
|
+
}
|
|
847
|
+
|
|
848
|
+
if (result.count === 0) {
|
|
849
|
+
return {
|
|
850
|
+
content: [{ type: 'text', text: `📭 No messages on channel \`${args.channel}\`.` }],
|
|
851
|
+
};
|
|
852
|
+
}
|
|
853
|
+
|
|
854
|
+
const lines = result.messages.map((m, i) =>
|
|
855
|
+
`**${i + 1}.** [${m.timestamp}] from \`${m.from}\`:\n\`\`\`json\n${JSON.stringify(m.payload, null, 2)}\n\`\`\``
|
|
856
|
+
);
|
|
857
|
+
|
|
858
|
+
return {
|
|
859
|
+
content: [{
|
|
860
|
+
type: 'text',
|
|
861
|
+
text: `📬 **${result.count}** message(s) on channel \`${args.channel}\`${args.clear ? ' (cleared)' : ''}:\n\n${lines.join('\n\n')}`,
|
|
862
|
+
}],
|
|
863
|
+
};
|
|
864
|
+
}
|
|
865
|
+
|
package/src/tool-definitions.js
CHANGED
|
@@ -408,5 +408,49 @@ export const TOOL_DEFINITIONS = [
|
|
|
408
408
|
required: ['group', 'prompt'],
|
|
409
409
|
},
|
|
410
410
|
},
|
|
411
|
+
{
|
|
412
|
+
name: 'send_message',
|
|
413
|
+
description: [
|
|
414
|
+
'Send a message to a channel for inter-agent communication.',
|
|
415
|
+
'Use this to pass structured data between pipeline steps or between any agents.',
|
|
416
|
+
'',
|
|
417
|
+
'Channel conventions:',
|
|
418
|
+
' - {run_id} — broadcast to all steps in a pipeline run',
|
|
419
|
+
' - {run_id}:{step_name} — targeted to a specific step',
|
|
420
|
+
' - any string — ad-hoc channel for custom messaging',
|
|
421
|
+
'',
|
|
422
|
+
'Messages are persisted to disk (survives restarts). Uses JSONL format for concurrent-write safety.',
|
|
423
|
+
].join('\n'),
|
|
424
|
+
inputSchema: {
|
|
425
|
+
type: 'object',
|
|
426
|
+
properties: {
|
|
427
|
+
channel: { type: 'string', description: 'Target channel. Use run_id for broadcast, run_id:step_name for targeted.' },
|
|
428
|
+
payload: { description: 'Message payload (any JSON-serializable value).' },
|
|
429
|
+
from: { type: 'string', description: 'Sender identifier (e.g., step name or task description).' },
|
|
430
|
+
cwd: { type: 'string', description: 'Working directory. Defaults to current working directory.' },
|
|
431
|
+
},
|
|
432
|
+
required: ['channel', 'payload'],
|
|
433
|
+
},
|
|
434
|
+
},
|
|
435
|
+
{
|
|
436
|
+
name: 'get_messages',
|
|
437
|
+
description: [
|
|
438
|
+
'Read messages from a channel. Returns all messages in chronological order.',
|
|
439
|
+
'',
|
|
440
|
+
'Channel conventions:',
|
|
441
|
+
' - {run_id} — read broadcast messages for a pipeline run',
|
|
442
|
+
' - {run_id}:{step_name} — read messages targeted to a specific step',
|
|
443
|
+
'',
|
|
444
|
+
'Use clear=true to consume messages (delete after reading).',
|
|
445
|
+
].join('\n'),
|
|
446
|
+
inputSchema: {
|
|
447
|
+
type: 'object',
|
|
448
|
+
properties: {
|
|
449
|
+
channel: { type: 'string', description: 'Channel to read messages from.' },
|
|
450
|
+
clear: { type: 'boolean', description: 'If true, clear the channel after reading (consume mode). Default: false.' },
|
|
451
|
+
cwd: { type: 'string', description: 'Working directory. Defaults to current working directory.' },
|
|
452
|
+
},
|
|
453
|
+
required: ['channel'],
|
|
454
|
+
},
|
|
455
|
+
},
|
|
411
456
|
];
|
|
412
|
-
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Inter-agent messaging — file-based JSONL mailboxes.
|
|
3
|
+
*
|
|
4
|
+
* Provides send_message / get_messages tools for agents
|
|
5
|
+
* to pass structured data between pipeline steps or tasks.
|
|
6
|
+
*
|
|
7
|
+
* Uses JSONL format (one JSON object per line) with appendFileSync()
|
|
8
|
+
* to avoid read-modify-write race conditions on concurrent writes.
|
|
9
|
+
*
|
|
10
|
+
* Channel addressing:
|
|
11
|
+
* - {run_id} → broadcast to all steps in a pipeline run
|
|
12
|
+
* - {run_id}:{step} → targeted to a specific step
|
|
13
|
+
* - {custom_channel} → any string for ad-hoc messaging
|
|
14
|
+
*
|
|
15
|
+
* @module agent-pool/tools/messaging
|
|
16
|
+
*/
|
|
17
|
+
|
|
18
|
+
import { appendFileSync, readFileSync, writeFileSync, existsSync, mkdirSync, renameSync, unlinkSync } from 'node:fs';
|
|
19
|
+
import { join, dirname } from 'node:path';
|
|
20
|
+
|
|
21
|
+
const MESSAGES_DIR = '.agents/messages';
|
|
22
|
+
|
|
23
|
+
/**
|
|
24
|
+
* Sanitize channel name for use as filename.
|
|
25
|
+
* @param {string} channel
|
|
26
|
+
* @returns {string}
|
|
27
|
+
*/
|
|
28
|
+
function sanitizeChannel(channel) {
|
|
29
|
+
return channel.replace(/[^a-zA-Z0-9_:-]/g, '_');
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
/**
|
|
33
|
+
* Send a message to a channel.
|
|
34
|
+
* Uses appendFileSync for atomic writes (no read-modify-write).
|
|
35
|
+
* @param {string} cwd
|
|
36
|
+
* @param {object} opts
|
|
37
|
+
* @param {string} opts.channel - Target channel (e.g., "run_id:step_name")
|
|
38
|
+
* @param {*} opts.payload - Message payload (any JSON-serializable value)
|
|
39
|
+
* @param {string} [opts.from] - Sender identifier
|
|
40
|
+
* @returns {{ success: boolean, channel: string }}
|
|
41
|
+
*/
|
|
42
|
+
export function sendMessage(cwd, { channel, payload, from }) {
|
|
43
|
+
if (!channel) return { success: false, error: 'channel is required' };
|
|
44
|
+
|
|
45
|
+
const dir = join(cwd, MESSAGES_DIR);
|
|
46
|
+
mkdirSync(dir, { recursive: true });
|
|
47
|
+
|
|
48
|
+
const filePath = join(dir, `${sanitizeChannel(channel)}.jsonl`);
|
|
49
|
+
const message = {
|
|
50
|
+
timestamp: new Date().toISOString(),
|
|
51
|
+
from: from || 'unknown',
|
|
52
|
+
payload,
|
|
53
|
+
};
|
|
54
|
+
|
|
55
|
+
// JSONL: one JSON object per line, appended atomically
|
|
56
|
+
appendFileSync(filePath, JSON.stringify(message) + '\n');
|
|
57
|
+
|
|
58
|
+
return { success: true, channel };
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
/**
|
|
62
|
+
* Get messages from a channel.
|
|
63
|
+
* @param {string} cwd
|
|
64
|
+
* @param {object} opts
|
|
65
|
+
* @param {string} opts.channel - Channel to read from
|
|
66
|
+
* @param {boolean} [opts.clear] - If true, clear the channel after reading
|
|
67
|
+
* @returns {{ messages: Array<{ timestamp: string, from: string, payload: any }>, count: number }}
|
|
68
|
+
*/
|
|
69
|
+
export function getMessages(cwd, { channel, clear }) {
|
|
70
|
+
if (!channel) return { messages: [], count: 0, error: 'channel is required' };
|
|
71
|
+
|
|
72
|
+
const filePath = join(cwd, MESSAGES_DIR, `${sanitizeChannel(channel)}.jsonl`);
|
|
73
|
+
if (!existsSync(filePath)) return { messages: [], count: 0 };
|
|
74
|
+
|
|
75
|
+
let content;
|
|
76
|
+
if (clear) {
|
|
77
|
+
// Atomic consume: rename file first, then read. Any new messages
|
|
78
|
+
// appended after rename go to a NEW file (no data loss).
|
|
79
|
+
const tmpPath = filePath + '.consuming';
|
|
80
|
+
try {
|
|
81
|
+
renameSync(filePath, tmpPath);
|
|
82
|
+
content = readFileSync(tmpPath, 'utf-8').trim();
|
|
83
|
+
unlinkSync(tmpPath);
|
|
84
|
+
} catch {
|
|
85
|
+
// File was deleted or renamed between check and read
|
|
86
|
+
return { messages: [], count: 0 };
|
|
87
|
+
}
|
|
88
|
+
} else {
|
|
89
|
+
try {
|
|
90
|
+
content = readFileSync(filePath, 'utf-8').trim();
|
|
91
|
+
} catch {
|
|
92
|
+
return { messages: [], count: 0 };
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
if (!content) return { messages: [], count: 0 };
|
|
97
|
+
|
|
98
|
+
const messages = content.split('\n').map(line => {
|
|
99
|
+
try { return JSON.parse(line); }
|
|
100
|
+
catch { return null; }
|
|
101
|
+
}).filter(Boolean);
|
|
102
|
+
|
|
103
|
+
return { messages, count: messages.length };
|
|
104
|
+
}
|