@adaptic/maestro 1.1.8 → 1.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/init-maestro.md +304 -8
- package/README.md +28 -0
- package/bin/maestro.mjs +1 -1
- package/docs/guides/agents-observe-setup.md +64 -0
- package/docs/guides/ccxray-diagnostics.md +65 -0
- package/docs/guides/claude-mem-setup.md +79 -0
- package/docs/guides/claude-pace-setup.md +56 -0
- package/docs/guides/claudraband-sessions.md +98 -0
- package/docs/guides/clawteam-swarm.md +116 -0
- package/docs/guides/code-review-graph-setup.md +86 -0
- package/docs/guides/self-optimization-pattern.md +82 -0
- package/docs/guides/slack-setup.md +4 -2
- package/docs/guides/twilio-subaccounts-setup.md +223 -0
- package/docs/guides/webhook-relay-setup.md +349 -0
- package/package.json +2 -1
- package/plugins/maestro-skills/plugin.json +16 -0
- package/plugins/maestro-skills/skills/agents-observe.md +110 -0
- package/plugins/maestro-skills/skills/ccxray-diagnostics.md +91 -0
- package/plugins/maestro-skills/skills/claude-pace.md +61 -0
- package/plugins/maestro-skills/skills/code-review-graph.md +99 -0
- package/scaffold/CLAUDE.md +64 -0
- package/scaffold/config/agent.ts.example +2 -1
- package/scaffold/config/known-agents.json +35 -0
- package/scripts/daemon/classifier.mjs +264 -50
- package/scripts/daemon/dispatcher.mjs +109 -5
- package/scripts/daemon/launchd-wrapper-generic.sh +96 -0
- package/scripts/daemon/launchd-wrapper-slack-events.sh +37 -0
- package/scripts/daemon/launchd-wrapper.sh +91 -0
- package/scripts/daemon/lib/session-router.mjs +274 -0
- package/scripts/daemon/lib/session-router.test.mjs +295 -0
- package/scripts/daemon/prompt-builder.mjs +51 -11
- package/scripts/daemon/responder.mjs +234 -19
- package/scripts/daemon/session-lock.mjs +194 -0
- package/scripts/daemon/sophie-daemon.mjs +16 -2
- package/scripts/email-signature.html +20 -4
- package/scripts/local-triggers/generate-plists.sh +62 -10
- package/scripts/poller/imap-client.mjs +4 -2
- package/scripts/poller/slack-poller.mjs +104 -52
- package/scripts/setup/init-agent.sh +91 -1
- package/scripts/setup/install-dev-tools.sh +150 -0
- package/scripts/spawn-session.sh +21 -6
- package/workflows/continuous/backlog-executor.yaml +141 -0
- package/workflows/daily/evening-wrap.yaml +41 -1
- package/workflows/daily/morning-brief.yaml +17 -0
- package/workflows/event-driven/agent-failure-investigation.yaml +137 -0
- package/workflows/event-driven/pr-review.yaml +104 -0
- package/workflows/weekly/engineering-health.yaml +154 -0
|
@@ -0,0 +1,349 @@
|
|
|
1
|
+
# Webhook Relay Setup Guide (Railway)
|
|
2
|
+
|
|
3
|
+
How to deploy each agent's per-instance webhook relay to Railway. The relay is the canonical pattern for receiving inbound webhooks from Slack and Twilio without requiring a stable inbound connection from the Mac mini.
|
|
4
|
+
|
|
5
|
+
**Prerequisites**: Complete the [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md). The agent needs:
|
|
6
|
+
- A Slack app already created (see `slack-setup.md`) so we have `SLACK_SIGNING_SECRET`
|
|
7
|
+
- A Twilio account (or sub-account) and number purchased so we have `TWILIO_AUTH_TOKEN`
|
|
8
|
+
- Admin rights in the company's Railway workspace (e.g., "Adaptic")
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Why a relay?
|
|
13
|
+
|
|
14
|
+
Maestro agents run on a Mac mini that doesn't have a stable public IP. External services need a permanent HTTPS URL for webhooks. Cloudflare Quick Tunnels work but the URL changes on restart, and named tunnels require Cloudflare DNS configuration.
|
|
15
|
+
|
|
16
|
+
Railway gives every agent a permanent `*.up.railway.app` domain with HTTPS and zero infrastructure. The pattern: Slack/Twilio send to Railway → Railway buffers in memory → Mac mini polls Railway every 5 seconds → events land in `state/inbox/`.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
┌─────────────┐ ┌──────────────────────┐
|
|
20
|
+
│ Slack │──webhook──────▶│ {agent}-webhook-relay │
|
|
21
|
+
│ Twilio │ │ (Railway, in {Org}) │
|
|
22
|
+
└─────────────┘ │ in-memory buffers │
|
|
23
|
+
└──────┬─────────────────┘
|
|
24
|
+
│ /events, /sms/messages, /whatsapp/messages
|
|
25
|
+
▼
|
|
26
|
+
┌──────────────────┐
|
|
27
|
+
│ Mac mini │
|
|
28
|
+
│ poll-relay.sh │
|
|
29
|
+
│ every 5 sec │
|
|
30
|
+
└────────┬─────────┘
|
|
31
|
+
▼
|
|
32
|
+
state/inbox/{slack,sms,whatsapp}/
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 1. Source code
|
|
38
|
+
|
|
39
|
+
The relay source is at `services/webhook-relay/` in every agent's repo (copied from the maestro framework at create time). It's a single ~250-line Node 20 HTTP server, no dependencies, deployable straight to Railway.
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
services/webhook-relay/
|
|
43
|
+
├── server.mjs # The HTTP server with all endpoints
|
|
44
|
+
├── package.json # Node 20+, type: module, no deps
|
|
45
|
+
├── railway.json # NIXPACKS builder, /health check, restart-on-failure
|
|
46
|
+
├── .gitignore
|
|
47
|
+
└── README.md
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### Endpoints
|
|
51
|
+
|
|
52
|
+
#### Webhook receivers (called by external services)
|
|
53
|
+
|
|
54
|
+
| Path | Method | Caller | Notes |
|
|
55
|
+
|---|---|---|---|
|
|
56
|
+
| `/slack/events` | POST | Slack Events API | HMAC verified via `SLACK_SIGNING_SECRET`. Echoes `url_verification` challenges. |
|
|
57
|
+
| `/sms` | POST | Twilio SMS | HMAC verified via `TWILIO_AUTH_TOKEN`. Returns empty TwiML. |
|
|
58
|
+
| `/whatsapp` | POST | Twilio WhatsApp | Same. |
|
|
59
|
+
| `/whatsapp/status` | POST | Twilio WhatsApp status callback | Same. |
|
|
60
|
+
|
|
61
|
+
#### Drain endpoints (called by the local Mac mini)
|
|
62
|
+
|
|
63
|
+
| Path | Method | Returns |
|
|
64
|
+
|---|---|---|
|
|
65
|
+
| `/events` | GET | All buffered Slack events, drains buffer |
|
|
66
|
+
| `/sms/messages` | GET | All buffered SMS events, drains buffer |
|
|
67
|
+
| `/whatsapp/messages` | GET | All buffered WhatsApp events |
|
|
68
|
+
| `/whatsapp/statuses` | GET | All buffered WhatsApp delivery status events |
|
|
69
|
+
| `/health` | GET | Service status, buffer sizes, signature flags |
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 2. Install Railway CLI
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
brew install railway 2>/dev/null || true
|
|
77
|
+
railway --version # should be 4.x
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Then log in (interactive — opens browser):
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
railway login
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Confirm with `railway whoami`. If you don't see the company workspace in `railway list`, ask an existing admin to grant you access.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 3. Create the project
|
|
91
|
+
|
|
92
|
+
Run from the agent's repo:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
cd services/webhook-relay
|
|
96
|
+
railway init --name {firstname-lower}-webhook-relay --workspace {Company}
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
For Lucas, this is `lucas-webhook-relay` in the `Adaptic` workspace.
|
|
100
|
+
|
|
101
|
+
The CLI will print a project URL — note it for reference.
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## 4. Deploy
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
railway up --service {firstname-lower}-webhook-relay --detach
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
This packages the local directory, uploads it to Railway, and triggers a Nixpacks build. Build typically takes 30–60 seconds. Watch the build logs in the URL the CLI prints.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## 5. Generate a public domain
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
railway domain --service {firstname-lower}-webhook-relay
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Prints something like:
|
|
122
|
+
```
|
|
123
|
+
🚀 https://lucas-webhook-relay-production.up.railway.app
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Save this URL — you'll use it everywhere.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## 6. Set environment variables
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
source ../../.env
|
|
134
|
+
railway variables --service {firstname-lower}-webhook-relay \
|
|
135
|
+
--set "SLACK_SIGNING_SECRET=$SLACK_SIGNING_SECRET" \
|
|
136
|
+
--set "TWILIO_AUTH_TOKEN=$TWILIO_AUTH_TOKEN" \
|
|
137
|
+
--set "PUBLIC_HOSTNAME={firstname-lower}-webhook-relay-production.up.railway.app" \
|
|
138
|
+
--set "BUFFER_TTL_MS=600000" \
|
|
139
|
+
--set "MAX_BUFFER_SIZE=1000"
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
| Var | Required | Source | Purpose |
|
|
143
|
+
|---|---|---|---|
|
|
144
|
+
| `SLACK_SIGNING_SECRET` | yes | Slack app → Basic Information | HMAC verify Slack webhooks |
|
|
145
|
+
| `TWILIO_AUTH_TOKEN` | yes | Twilio Console → Account | HMAC verify Twilio webhooks |
|
|
146
|
+
| `PUBLIC_HOSTNAME` | yes | Railway domain (without https://) | Used in Twilio HMAC base string |
|
|
147
|
+
| `BUFFER_TTL_MS` | no | default 600000 (10 min) | Drop events older than this |
|
|
148
|
+
| `MAX_BUFFER_SIZE` | no | default 1000 | Max events per channel |
|
|
149
|
+
| `POLL_AUTH_TOKEN` | no | (unset) | If set, GET endpoints require Bearer token |
|
|
150
|
+
|
|
151
|
+
The `PORT` variable is provided automatically by Railway — don't set it.
|
|
152
|
+
|
|
153
|
+
After setting variables, **trigger a redeploy** so the running container picks them up. The variables exist in the project but the running container was built with the previous values:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
railway up --service {firstname-lower}-webhook-relay --detach
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## 7. Verify deployment
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
for i in 1 2 3 4 5 6 7 8 9 10 11 12; do
|
|
165
|
+
RESP=$(curl -sf -m 5 https://{firstname-lower}-webhook-relay-production.up.railway.app/health)
|
|
166
|
+
if echo "$RESP" | grep -q '"slack_signature":true' && echo "$RESP" | grep -q '"twilio_signature":true'; then
|
|
167
|
+
echo "✅ Relay live with signature verification"
|
|
168
|
+
break
|
|
169
|
+
fi
|
|
170
|
+
sleep 10
|
|
171
|
+
done
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
Expected output of `/health`:
|
|
175
|
+
|
|
176
|
+
```json
|
|
177
|
+
{
|
|
178
|
+
"ok": true,
|
|
179
|
+
"service": "{firstname-lower}-webhook-relay",
|
|
180
|
+
"uptime_sec": 12.3,
|
|
181
|
+
"buffers": {
|
|
182
|
+
"slack": { "events": 0, "seen": 0 },
|
|
183
|
+
"sms": { "events": 0, "seen": 0 },
|
|
184
|
+
"whatsapp": { "events": 0, "seen": 0 },
|
|
185
|
+
"whatsapp_status": { "events": 0, "seen": 0 }
|
|
186
|
+
},
|
|
187
|
+
"config": {
|
|
188
|
+
"slack_signature": true,
|
|
189
|
+
"twilio_signature": true,
|
|
190
|
+
"poll_auth": false,
|
|
191
|
+
"buffer_ttl_ms": 600000,
|
|
192
|
+
"max_buffer_size": 1000
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
If `slack_signature` or `twilio_signature` is `false`, the env vars didn't reach the running container — re-run `railway up`.
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## 8. Configure external services to point at the relay
|
|
202
|
+
|
|
203
|
+
### Slack Events Subscription
|
|
204
|
+
|
|
205
|
+
Update via the App Manifest editor (more reliable than the events page):
|
|
206
|
+
|
|
207
|
+
1. Navigate to `https://app.slack.com/app-settings/{TEAM_ID}/{APP_ID}/app-manifest`
|
|
208
|
+
2. Read the JSON, add this block to `settings`:
|
|
209
|
+
|
|
210
|
+
```json
|
|
211
|
+
"event_subscriptions": {
|
|
212
|
+
"request_url": "https://{firstname-lower}-webhook-relay-production.up.railway.app/slack/events",
|
|
213
|
+
"bot_events": [
|
|
214
|
+
"app_mention",
|
|
215
|
+
"message.channels",
|
|
216
|
+
"message.groups",
|
|
217
|
+
"message.im",
|
|
218
|
+
"message.mpim"
|
|
219
|
+
]
|
|
220
|
+
}
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
3. Click **Save Changes**
|
|
224
|
+
4. Navigate to `https://api.slack.com/apps/{APP_ID}/event-subscriptions` and click **Click here to verify** if the yellow banner appears
|
|
225
|
+
5. **Reinstall the app** at `https://api.slack.com/apps/{APP_ID}/install-on-team` so the new event scopes activate
|
|
226
|
+
|
|
227
|
+
### Twilio SMS webhook
|
|
228
|
+
|
|
229
|
+
```bash
|
|
230
|
+
RELAY_URL="https://{firstname-lower}-webhook-relay-production.up.railway.app"
|
|
231
|
+
curl -s -u "$TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN" -X POST \
|
|
232
|
+
"https://api.twilio.com/2010-04-01/Accounts/$TWILIO_ACCOUNT_SID/IncomingPhoneNumbers/$TWILIO_PHONE_SID.json" \
|
|
233
|
+
--data-urlencode "SmsUrl=$RELAY_URL/sms" --data-urlencode "SmsMethod=POST"
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Twilio WhatsApp sandbox
|
|
237
|
+
|
|
238
|
+
This requires a per-agent Twilio sub-account (the WhatsApp sandbox webhook is account-wide). See `docs/guides/twilio-subaccounts-setup.md`.
|
|
239
|
+
|
|
240
|
+
Once you have the sub-account credentials, configure the sandbox via the Twilio Console UI (no API support):
|
|
241
|
+
|
|
242
|
+
1. Navigate to `https://console.twilio.com/us1/develop/sms/try-it-out/whatsapp-learn?frameUrl=%2Fconsole%2Fsms%2Fwhatsapp%2Flearn`
|
|
243
|
+
2. Click the **Sandbox settings** tab
|
|
244
|
+
3. Set the inbound webhook URL to `https://{firstname-lower}-webhook-relay-production.up.railway.app/whatsapp` (HTTP POST)
|
|
245
|
+
4. Set the status callback URL to `https://{firstname-lower}-webhook-relay-production.up.railway.app/whatsapp/status` (HTTP POST)
|
|
246
|
+
5. Save
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## 9. Local poll job
|
|
251
|
+
|
|
252
|
+
The Mac mini runs a launchd job that polls the relay every 5 seconds. Install it:
|
|
253
|
+
|
|
254
|
+
```bash
|
|
255
|
+
# 1. Update scripts/poll-slack-events.sh — set EVENTS_URL to your relay's /events endpoint
|
|
256
|
+
# 2. Update scripts/comms-monitor.sh — set RAILWAY_URL similarly
|
|
257
|
+
# 3. Install the launchd plist:
|
|
258
|
+
|
|
259
|
+
cat > scripts/local-triggers/plists/ai.adaptic.{firstname-lower}-poll-relay.plist <<EOF
|
|
260
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
261
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
262
|
+
<plist version="1.0">
|
|
263
|
+
<dict>
|
|
264
|
+
<key>Label</key><string>ai.adaptic.{firstname-lower}-poll-relay</string>
|
|
265
|
+
<key>ProgramArguments</key><array>
|
|
266
|
+
<string>/bin/bash</string>
|
|
267
|
+
<string>{REPO_ROOT}/scripts/poll-slack-events.sh</string>
|
|
268
|
+
</array>
|
|
269
|
+
<key>WorkingDirectory</key><string>{REPO_ROOT}</string>
|
|
270
|
+
<key>StartInterval</key><integer>5</integer>
|
|
271
|
+
<key>RunAtLoad</key><true/>
|
|
272
|
+
<key>StandardOutPath</key><string>{REPO_ROOT}/logs/polling/poll-relay-stdout.log</string>
|
|
273
|
+
<key>StandardErrorPath</key><string>{REPO_ROOT}/logs/polling/poll-relay-stderr.log</string>
|
|
274
|
+
</dict>
|
|
275
|
+
</plist>
|
|
276
|
+
EOF
|
|
277
|
+
|
|
278
|
+
cp scripts/local-triggers/plists/ai.adaptic.{firstname-lower}-poll-relay.plist ~/Library/LaunchAgents/
|
|
279
|
+
launchctl load ~/Library/LaunchAgents/ai.adaptic.{firstname-lower}-poll-relay.plist
|
|
280
|
+
launchctl list | grep poll-relay # verify loaded
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
## 10. End-to-end test
|
|
286
|
+
|
|
287
|
+
1. Have a teammate (or yourself, from a separate Slack account) send a DM to the agent's bot in Slack, OR @-mention the bot in a public channel where the bot is a member
|
|
288
|
+
2. Within ~5 seconds:
|
|
289
|
+
- `railway logs --service {firstname-lower}-webhook-relay` should show `[slack] buffered message...`
|
|
290
|
+
- `state/inbox/slack/` should contain a new YAML file
|
|
291
|
+
- `logs/polling/poll-relay-stdout.log` should show the inbox write
|
|
292
|
+
3. The inbox processor (running as launchd `ai.adaptic.{firstname-lower}-inbox-processor`) picks up the YAML and routes it through the daemon
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## Troubleshooting
|
|
297
|
+
|
|
298
|
+
### `railway up` returns "Service not found"
|
|
299
|
+
|
|
300
|
+
Run `railway add --service {firstname-lower}-webhook-relay` first to create the service container. The flag is required even when the project exists.
|
|
301
|
+
|
|
302
|
+
### `railway init` returns "Unauthorized"
|
|
303
|
+
|
|
304
|
+
Your Railway account doesn't have admin rights in the workspace. Ask an existing admin (or your principal) to grant you Admin role in the workspace's Members settings.
|
|
305
|
+
|
|
306
|
+
### `slack_signature: false` in /health after deploy
|
|
307
|
+
|
|
308
|
+
The env vars exist in the project but the running container was built before they were set. Re-run `railway up --service {firstname-lower}-webhook-relay --detach` to redeploy.
|
|
309
|
+
|
|
310
|
+
### Slack URL verification fails with "URL didn't respond"
|
|
311
|
+
|
|
312
|
+
The relay is reachable but signature verification fails. Causes:
|
|
313
|
+
|
|
314
|
+
- `SLACK_SIGNING_SECRET` mismatch (check Slack app → Basic Information vs. Railway env var)
|
|
315
|
+
- The relay is responding correctly to `url_verification` (returns the challenge), but Slack's caching layer hasn't flushed. Re-save the manifest, or click "Click here to verify" on the event subscriptions page.
|
|
316
|
+
|
|
317
|
+
### Twilio webhook returns 401 "invalid signature"
|
|
318
|
+
|
|
319
|
+
`PUBLIC_HOSTNAME` env var is wrong. It must match the actual host Twilio sends to (typically `{firstname-lower}-webhook-relay-production.up.railway.app`, no scheme, no trailing slash). Twilio includes the full URL in its signature base string.
|
|
320
|
+
|
|
321
|
+
### Buffer fills up but Mac mini doesn't drain
|
|
322
|
+
|
|
323
|
+
Check the launchd job:
|
|
324
|
+
|
|
325
|
+
```bash
|
|
326
|
+
launchctl list | grep poll-relay # should show the label with a non-error exit code
|
|
327
|
+
tail logs/polling/poll-relay-stderr.log
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
Also check `EVENTS_URL` in `scripts/poll-slack-events.sh` matches the deployed Railway domain.
|
|
331
|
+
|
|
332
|
+
### The relay restarts and loses buffered events
|
|
333
|
+
|
|
334
|
+
In-memory buffers are TTL'd at 10 minutes. If the relay restarts (Railway redeploy, instance recycle, OOM), buffered events are lost. The local Slack poller (`scripts/poller/slack-poller.mjs`) runs every 60 seconds as a fallback and will catch any missed channel/DM messages — at most a 60s gap. For SMS/WhatsApp this is acceptable because Twilio retries on failure.
|
|
335
|
+
|
|
336
|
+
If you need persistence, add a Postgres or Redis service in the same Railway project and persist buffers there. Not currently necessary at our message volume.
|
|
337
|
+
|
|
338
|
+
---
|
|
339
|
+
|
|
340
|
+
## Why one service per agent?
|
|
341
|
+
|
|
342
|
+
Each agent has its own Slack signing secret and Twilio account/sub-account. A shared relay would have to multiplex by header or path, which adds complexity and a single point of failure. Per-agent deployment is simpler, isolated, and gives independent scaling/restarts.
|
|
343
|
+
|
|
344
|
+
## Related guides
|
|
345
|
+
|
|
346
|
+
- [Slack Setup](slack-setup.md) — Slack app creation and OAuth
|
|
347
|
+
- [Voice & SMS Setup](voice-sms-setup.md) — Twilio account, SMS handlers
|
|
348
|
+
- [WhatsApp Setup](whatsapp-setup.md) — WhatsApp sandbox / production
|
|
349
|
+
- [Twilio Subaccounts Setup](twilio-subaccounts-setup.md) — per-agent isolation for WhatsApp
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@adaptic/maestro",
|
|
3
|
-
"version": "1.1
|
|
3
|
+
"version": "1.4.1",
|
|
4
4
|
"description": "Maestro — Autonomous AI agent operating system. Deploy AI employees on dedicated Mac minis.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -11,6 +11,7 @@
|
|
|
11
11
|
"./tools": "./lib/tool-definitions.js",
|
|
12
12
|
"./executor": "./lib/action-executor.js",
|
|
13
13
|
"./singleton": "./lib/singleton.js",
|
|
14
|
+
"./tts": "./lib/tts.mjs",
|
|
14
15
|
"./package.json": "./package.json"
|
|
15
16
|
},
|
|
16
17
|
"files": [
|
|
@@ -50,6 +50,22 @@
|
|
|
50
50
|
{
|
|
51
51
|
"name": "schedule-meeting",
|
|
52
52
|
"description": "Schedule a meeting using Google Calendar MCP or Calendly. Handle timezone conversion, find availability, send invites."
|
|
53
|
+
},
|
|
54
|
+
{
|
|
55
|
+
"name": "claude-pace",
|
|
56
|
+
"description": "Check Claude API rate limit quota — 5-hour/7-day usage, reset countdowns, pace delta. Use before heavy backlog cycles or when quota warnings appear."
|
|
57
|
+
},
|
|
58
|
+
{
|
|
59
|
+
"name": "ccxray-diagnostics",
|
|
60
|
+
"description": "Run token/cost X-ray diagnostics on Claude API sessions — per-turn breakdown, context window heatmaps, system prompt diffs. Use for debugging expensive sessions."
|
|
61
|
+
},
|
|
62
|
+
{
|
|
63
|
+
"name": "code-review-graph",
|
|
64
|
+
"description": "Structural code analysis via tree-sitter knowledge graph — blast radius, review context, architecture overview, semantic search. Use for PR reviews and refactoring."
|
|
65
|
+
},
|
|
66
|
+
{
|
|
67
|
+
"name": "agents-observe",
|
|
68
|
+
"description": "Multi-agent observability dashboard — monitor parallel agent teams, tool calls, session state, and performance in real time. Use when debugging agent failures."
|
|
53
69
|
}
|
|
54
70
|
]
|
|
55
71
|
}
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agents-observe
|
|
3
|
+
description: Multi-agent observability dashboard — monitor parallel agent teams, tool calls, session state, and performance in real time. Use when debugging agent failures or profiling backlog execution.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Multi-Agent Observability (agents-observe)
|
|
7
|
+
|
|
8
|
+
You are a Maestro agent. Use agents-observe to monitor, debug, and profile Claude Code agent teams in real time.
|
|
9
|
+
|
|
10
|
+
## When to Invoke
|
|
11
|
+
|
|
12
|
+
- **Monitoring parallel execution** — watching multiple backlog executor agents running simultaneously
|
|
13
|
+
- **Debugging agent failures** — an agent crashed, hung, or produced unexpected results
|
|
14
|
+
- **Performance profiling** — identifying which agents or tool calls are slow
|
|
15
|
+
- **Post-mortem analysis** — reviewing what happened in a completed session
|
|
16
|
+
- **Orchestrator oversight** — the orchestrator agent needs visibility into sub-agent activity
|
|
17
|
+
|
|
18
|
+
## Background
|
|
19
|
+
|
|
20
|
+
agents-observe captures tool calls, agent hierarchy, and session state via background hooks. Data is stored in SQLite and streamed to a WebSocket-powered dashboard with 3-5ms latency. It is NOT always-on due to SQLite write overhead — enable on demand for specific debugging sessions.
|
|
21
|
+
|
|
22
|
+
## Steps
|
|
23
|
+
|
|
24
|
+
### Starting Observation
|
|
25
|
+
|
|
26
|
+
1. **Launch the dashboard**:
|
|
27
|
+
```bash
|
|
28
|
+
agents-observe serve
|
|
29
|
+
```
|
|
30
|
+
Opens the dashboard at `http://localhost:3847`. Active Claude Code sessions appear automatically.
|
|
31
|
+
|
|
32
|
+
2. **Monitor live sessions** — the dashboard shows:
|
|
33
|
+
- Agent hierarchy (parent orchestrator and child sub-agents)
|
|
34
|
+
- Real-time tool call stream with timing
|
|
35
|
+
- Session state (active, idle, completed, errored)
|
|
36
|
+
- Per-agent resource usage
|
|
37
|
+
|
|
38
|
+
### Querying Historical Data
|
|
39
|
+
|
|
40
|
+
3. **Query by session** — inspect a specific session's full history:
|
|
41
|
+
```bash
|
|
42
|
+
agents-observe query --session <session-id>
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
4. **Query by tool** — find all uses of a specific tool in a time window:
|
|
46
|
+
```bash
|
|
47
|
+
agents-observe query --tool Write --last 1h
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
5. **Query by agent** — filter to a specific agent's activity:
|
|
51
|
+
```bash
|
|
52
|
+
agents-observe query --agent backlog-executor --last 2h
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### Analysis Workflows
|
|
56
|
+
|
|
57
|
+
6. **Debugging a failed agent**:
|
|
58
|
+
- Find the session ID from workflow logs or the dashboard
|
|
59
|
+
- Query the session to see the full tool call sequence
|
|
60
|
+
- Identify the last successful tool call before failure
|
|
61
|
+
- Check for error patterns: timeout, rate limit, permission denied, context overflow
|
|
62
|
+
|
|
63
|
+
7. **Performance profiling**:
|
|
64
|
+
- Compare tool call durations across agents
|
|
65
|
+
- Identify bottleneck tools (slow reads, large writes)
|
|
66
|
+
- Check for redundant tool calls (same file read multiple times)
|
|
67
|
+
- Measure time-to-first-action for each sub-agent
|
|
68
|
+
|
|
69
|
+
8. **Parallel execution monitoring**:
|
|
70
|
+
- Verify all expected sub-agents launched successfully
|
|
71
|
+
- Watch for agents blocking on shared resources (file locks, API limits)
|
|
72
|
+
- Detect agents that completed quickly vs those that are stuck
|
|
73
|
+
- Identify ordering issues in agent dependencies
|
|
74
|
+
|
|
75
|
+
## Output
|
|
76
|
+
|
|
77
|
+
Report findings appropriate to the workflow:
|
|
78
|
+
|
|
79
|
+
**For debugging:**
|
|
80
|
+
```
|
|
81
|
+
## Agent Failure Analysis: [session-id]
|
|
82
|
+
- Agent: [name]
|
|
83
|
+
- Failed at: [timestamp]
|
|
84
|
+
- Last successful tool: [tool name] at [timestamp]
|
|
85
|
+
- Error: [error message or pattern]
|
|
86
|
+
- Root cause: [analysis]
|
|
87
|
+
- Recommendation: [fix]
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**For performance profiling:**
|
|
91
|
+
```
|
|
92
|
+
## Agent Performance Profile
|
|
93
|
+
| Agent | Duration | Tool Calls | Slowest Tool | Outcome |
|
|
94
|
+
|---|---|---|---|---|
|
|
95
|
+
| [name] | [Xs] | [N] | [tool, Xms] | [success/fail] |
|
|
96
|
+
|
|
97
|
+
### Bottlenecks
|
|
98
|
+
- [Specific finding, e.g., "Agent X read config.yaml 7 times — cache recommended"]
|
|
99
|
+
|
|
100
|
+
### Recommendations
|
|
101
|
+
- [Actionable optimization]
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Integration Notes
|
|
105
|
+
|
|
106
|
+
- Do NOT leave agents-observe running permanently — SQLite writes add overhead to every tool call
|
|
107
|
+
- Enable for specific debugging sessions, then shut down
|
|
108
|
+
- Dashboard port is 3847 — ensure no conflicts with other local services
|
|
109
|
+
- Pairs well with ccxray for combined tool-call and token-level analysis
|
|
110
|
+
- Session data persists in SQLite after the dashboard is closed — queryable later
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ccxray-diagnostics
|
|
3
|
+
description: Run token/cost X-ray diagnostics on Claude API sessions — per-turn breakdown, context window heatmaps, system prompt diffs. Use for debugging expensive sessions or comparing sub-agent efficiency.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Token/Cost X-Ray Diagnostics (ccxray)
|
|
7
|
+
|
|
8
|
+
You are a Maestro agent. Use ccxray to diagnose token usage, cost, and context window utilization across agent sessions.
|
|
9
|
+
|
|
10
|
+
## When to Invoke
|
|
11
|
+
|
|
12
|
+
- **Debugging expensive sessions** — a backlog cycle consumed unexpectedly high tokens
|
|
13
|
+
- **Sub-agent efficiency comparison** — comparing token cost across parallel agents doing similar work
|
|
14
|
+
- **Context window analysis** — investigating whether agents are hitting context limits or carrying stale context
|
|
15
|
+
- **System prompt auditing** — diffing system prompts across agents to find bloat
|
|
16
|
+
- **Post-incident analysis** — understanding what went wrong in a failed or degraded session
|
|
17
|
+
|
|
18
|
+
## Background
|
|
19
|
+
|
|
20
|
+
ccxray is a transparent HTTP proxy that intercepts all Claude API calls and provides a live dashboard with per-turn token/cost breakdowns. It is an on-demand diagnostic tool, not always-on infrastructure.
|
|
21
|
+
|
|
22
|
+
- Proxy + dashboard: `npx ccxray claude`
|
|
23
|
+
- Logs stored at: `~/.ccxray/logs/`
|
|
24
|
+
- Dashboard shows live data during the session
|
|
25
|
+
|
|
26
|
+
## Steps
|
|
27
|
+
|
|
28
|
+
1. **Launch a diagnostic session** — run the target agent through ccxray:
|
|
29
|
+
```bash
|
|
30
|
+
npx ccxray claude
|
|
31
|
+
```
|
|
32
|
+
This wraps the Claude Code session in the ccxray proxy. All API calls are intercepted and logged.
|
|
33
|
+
|
|
34
|
+
2. **Open the dashboard** — ccxray serves a local dashboard showing:
|
|
35
|
+
- Per-turn input/output token counts
|
|
36
|
+
- Cumulative cost tracking
|
|
37
|
+
- Context window heatmap (what is filling the window)
|
|
38
|
+
- System prompt content and diffs between turns
|
|
39
|
+
|
|
40
|
+
3. **Analyze token distribution** — identify where tokens are being spent:
|
|
41
|
+
- Large system prompts eating context budget
|
|
42
|
+
- Repeated tool outputs inflating input tokens
|
|
43
|
+
- Verbose agent responses consuming output tokens
|
|
44
|
+
- Context window near capacity causing cache misses
|
|
45
|
+
|
|
46
|
+
4. **Compare sub-agents** — if multiple agents ran in parallel, compare their ccxray logs:
|
|
47
|
+
- Which agent consumed the most tokens per unit of work?
|
|
48
|
+
- Are any agents carrying unnecessary context?
|
|
49
|
+
- Are system prompts appropriately sized for each agent's role?
|
|
50
|
+
|
|
51
|
+
5. **Review historical logs** — check `~/.ccxray/logs/` for past session data:
|
|
52
|
+
```bash
|
|
53
|
+
ls -lt ~/.ccxray/logs/
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
6. **Produce findings** — summarize token/cost insights and recommend optimizations.
|
|
57
|
+
|
|
58
|
+
## Output
|
|
59
|
+
|
|
60
|
+
Report the following:
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
## Token/Cost X-Ray Report
|
|
64
|
+
|
|
65
|
+
### Session Summary
|
|
66
|
+
- Total input tokens: [N]
|
|
67
|
+
- Total output tokens: [N]
|
|
68
|
+
- Estimated cost: $[X.XX]
|
|
69
|
+
- Turns: [N]
|
|
70
|
+
- Context window peak: [N]% utilization
|
|
71
|
+
|
|
72
|
+
### Top Token Consumers
|
|
73
|
+
1. [Turn/phase] — [N] tokens ([reason])
|
|
74
|
+
2. [Turn/phase] — [N] tokens ([reason])
|
|
75
|
+
|
|
76
|
+
### Optimization Recommendations
|
|
77
|
+
- [Specific recommendation, e.g., "Reduce system prompt by removing unused policy sections"]
|
|
78
|
+
- [Specific recommendation, e.g., "Cache tool outputs instead of re-reading files"]
|
|
79
|
+
|
|
80
|
+
### Sub-Agent Comparison (if applicable)
|
|
81
|
+
| Agent | Input Tokens | Output Tokens | Cost | Efficiency |
|
|
82
|
+
|---|---|---|---|---|
|
|
83
|
+
| [name] | [N] | [N] | $[X] | [notes] |
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Integration Notes
|
|
87
|
+
|
|
88
|
+
- This is an on-demand tool — do NOT run ccxray on every session, only when diagnosing issues
|
|
89
|
+
- ccxray adds minimal overhead (~5ms per request) but writes logs to disk
|
|
90
|
+
- Use for periodic token budget audits per backlog cycle (e.g., weekly)
|
|
91
|
+
- Pair with claude-pace to correlate token usage with quota consumption
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: claude-pace
|
|
3
|
+
description: Check Claude API rate limit quota — 5-hour/7-day usage, reset countdowns, pace delta. Use before heavy backlog cycles, when quota warnings appear, or to decide whether to defer non-urgent work.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Rate Limit Monitoring (claude-pace)
|
|
7
|
+
|
|
8
|
+
You are a Maestro agent. Use the claude-pace status line plugin to monitor API quota and make informed scheduling decisions.
|
|
9
|
+
|
|
10
|
+
## When to Invoke
|
|
11
|
+
|
|
12
|
+
- **Before heavy backlog cycles** — check headroom before spawning parallel agents
|
|
13
|
+
- **When quota warnings appear** — "hit your limit" or rate-limit errors in logs
|
|
14
|
+
- **Capacity planning** — deciding whether to defer improvement-backlog or non-urgent items
|
|
15
|
+
- **Post-incident** — confirming quota has recovered after a rate-limit outage
|
|
16
|
+
|
|
17
|
+
## Background
|
|
18
|
+
|
|
19
|
+
claude-pace is a Claude Code status line plugin (Bash + jq, ~10ms latency). It is auto-installed by `init-agent.sh` via `claude plugin marketplace add Astro-Han/claude-pace`. It displays real-time quota data in the Claude Code status bar.
|
|
20
|
+
|
|
21
|
+
The key signal is **pace delta**:
|
|
22
|
+
- **Green** — headroom available, safe to proceed with full parallelism
|
|
23
|
+
- **Yellow** — approaching limits, reduce parallel agent count
|
|
24
|
+
- **Red** — near or at limit, defer all non-urgent work immediately
|
|
25
|
+
|
|
26
|
+
## Steps
|
|
27
|
+
|
|
28
|
+
1. **Read the status line** — claude-pace displays 5-hour window usage, 7-day rolling usage, and time until next reset directly in the Claude Code status bar.
|
|
29
|
+
|
|
30
|
+
2. **Evaluate pace delta** — determine current capacity:
|
|
31
|
+
- **Green (>40% remaining)**: Proceed normally. Full parallel agent spawning is safe.
|
|
32
|
+
- **Yellow (15-40% remaining)**: Reduce to 2 parallel agents max. Defer improvement-backlog items.
|
|
33
|
+
- **Red (<15% remaining)**: Critical items only. No parallel spawning. Defer all non-urgent work.
|
|
34
|
+
|
|
35
|
+
3. **Check reset countdown** — if quota is constrained, note when the 5-hour window resets. Schedule deferred work for after reset.
|
|
36
|
+
|
|
37
|
+
4. **Adjust execution plan** based on findings:
|
|
38
|
+
- Prioritize critical/high-priority queue items only when constrained
|
|
39
|
+
- Move improvement-backlog and low-priority items to next cycle
|
|
40
|
+
- Reduce `parallel_with` fan-out in workflow steps
|
|
41
|
+
- Log the capacity decision in the session audit
|
|
42
|
+
|
|
43
|
+
5. **If rate-limit outage is suspected**, check workflow logs for the pattern: `"started"` entry without a matching `"completed"` entry, and inspect `.log` files for `"hit your limit"` strings.
|
|
44
|
+
|
|
45
|
+
## Output
|
|
46
|
+
|
|
47
|
+
Report the following to the calling agent or session:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
## Quota Status
|
|
51
|
+
- 5-hour window: [X]% used — resets in [T]
|
|
52
|
+
- 7-day rolling: [X]% used
|
|
53
|
+
- Pace delta: [green/yellow/red]
|
|
54
|
+
- Recommendation: [proceed normally / reduce parallelism / critical-only mode]
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Integration Notes
|
|
58
|
+
|
|
59
|
+
- Complements Maestro's reactive rate-limit detection (checking logs post-facto) with proactive visibility
|
|
60
|
+
- Does NOT require launching a separate process — data is in the status bar
|
|
61
|
+
- When pace delta is red, the backlog executor should automatically shrink its batch size from 3-5 items to 1-2
|