onkol 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,322 @@
1
+ # Onkol
2
+
3
+ Your AI on-call team. One command per VM, and you get an autonomous agent on Discord that handles bugs, features, analysis, and ops so you don't have to.
4
+
5
+ Onkol turns Claude Code into a decentralized on-call system. Each VM runs an orchestrator that listens on Discord. You describe a problem in plain English, it spins up a dedicated worker session to solve it, and reports back when it's done.
6
+
7
+ ## How it works
8
+
9
+ ```
10
+ You on Discord: "the auth endpoint is returning 403 after token refresh"
11
+ |
12
+ Orchestrator (Claude Code)
13
+ reads your message, understands intent,
14
+ prepares context, spawns a worker
15
+ |
16
+ Worker (new Claude Code session)
17
+ diagnoses the bug, fixes auth.py,
18
+ runs tests, commits to a branch
19
+ |
20
+ You on Discord: "Fixed. Clock skew between auth server and app server.
21
+ Added 5s tolerance. Tests pass. Branch: fix/auth-403"
22
+ ```
23
+
24
+ **What makes it different:**
25
+ - **Decentralized.** Each VM is self-contained. No central server. 10 VMs = 10 independent agents.
26
+ - **Intent-driven.** Say "fix this" and it fixes autonomously. Say "look into this" and it investigates without touching code. Your phrasing controls the behavior.
27
+ - **Gets smarter.** Every resolved task leaves behind a learning. Next time a similar issue comes up, the agent already knows what to look for.
28
+ - **Works behind firewalls.** All connections are outbound to Discord. No inbound ports, no SSH tunnels, no VPN required.
29
+
30
+ ## Real-world setup
31
+
32
+ The intended way to use Onkol is with a **dedicated Discord server** that becomes your ops control center.
33
+
34
+ I manage about 10 applications across prod and staging. I created one Discord server and set it up exclusively for Onkol. Each VM I onboard creates its own category with an orchestrator channel. My Discord sidebar looks like this:
35
+
36
+ ```
37
+ MY-INFRA (Discord server)
38
+
39
+ ├── API-SERVER-PROD ← VM running in GCP
40
+ │ ├── #orchestrator ← talk to this VM's brain here
41
+ │ ├── #fix-auth-403 ← active worker (auto-created)
42
+ │ └── #analyze-error-logs ← active worker (auto-created)
43
+
44
+ ├── WEB-APP-STAGING ← VM running in AWS
45
+ │ └── #orchestrator
46
+
47
+ ├── BACKEND-PROD ← VM behind corporate VPN
48
+ │ ├── #orchestrator
49
+ │ └── #add-export-endpoint ← active worker
50
+
51
+ ├── DATA-PIPELINE-STAGING ← Another GCP VM
52
+ │ └── #orchestrator
53
+
54
+ └── ... (as many VMs as you have)
55
+ ```
56
+
57
+ ### The workflow
58
+
59
+ From your phone, laptop, or anywhere with Discord:
60
+
61
+ 1. Open the server, go to `#orchestrator` under the VM you care about
62
+ 2. Type what you need: "there's a bug where users get 403 after token refresh"
63
+ 3. The orchestrator creates a new channel `#fix-auth-403` and spawns a worker
64
+ 4. The worker posts its progress and findings in `#fix-auth-403`
65
+ 5. You can jump into that channel to give more context or redirect
66
+ 6. When it's done, the orchestrator dissolves the worker, the channel disappears, learnings are saved
67
+
68
+ You can do this from a party, a flight, or bed at 2 AM. You're just texting on Discord. The agent does the SSH, the debugging, the code reading, the fixing.
69
+
70
+ ### Multiple VMs, one view
71
+
72
+ Every VM is a category. Every task is a channel. You see your entire infrastructure at a glance in the Discord sidebar. No dashboards to build, no web apps to deploy. Discord IS the dashboard.
73
+
74
+ The VMs don't need to know about each other. Each one connects outbound to Discord independently. If a VM is behind a VPN you can only reach from one specific laptop, doesn't matter. As long as it has outbound HTTPS, it can connect to Discord and you can talk to it.
75
+
76
+ ### Setting up a new VM
77
+
78
+ ```bash
79
+ # SSH into the VM (one time only)
80
+ ssh user@my-new-vm
81
+
82
+ # Run setup (2 minutes)
83
+ npx onkol@latest setup
84
+
85
+ # Answer the questions, done.
86
+ # A new category appears in your Discord server.
87
+ # You never need to SSH into this VM again.
88
+ ```
89
+
90
+ ## Quick start
91
+
92
+ ### Prerequisites
93
+
94
+ You need these on the VM where you're setting up:
95
+
96
+ | Tool | Why | Install |
97
+ |------|-----|---------|
98
+ | **Node.js 18+** | Runs the setup CLI | [nodejs.org](https://nodejs.org) |
99
+ | **Bun** | Runs the Discord channel plugin | `curl -fsSL https://bun.sh/install \| bash` |
100
+ | **Claude Code** | The AI that does the work | [docs.anthropic.com](https://docs.anthropic.com/en/docs/claude-code/getting-started) |
101
+ | **tmux** | Keeps sessions alive | `apt install tmux` / `yum install tmux` |
102
+ | **jq** | JSON processing in scripts | `apt install jq` / `yum install jq` |
103
+
104
+ Claude Code must be logged in via `claude.ai` OAuth on the VM (not API key).
105
+
106
+ The setup wizard checks all dependencies before asking any questions. If something's missing, it tells you exactly what to install and exits without wasting your time.
107
+
108
+ ### Create a Discord bot
109
+
110
+ 1. Go to [discord.com/developers/applications](https://discord.com/developers/applications)
111
+ 2. New Application, name it, Create
112
+ 3. Bot, Reset Token, **copy it** (you only see it once)
113
+ 4. Bot, Privileged Gateway Intents, enable **Message Content Intent**, Save
114
+ 5. OAuth2, URL Generator, check `bot`, check permissions:
115
+ - View Channels, Send Messages, Read Message History, Attach Files, Manage Channels
116
+ 6. Copy the URL, open in browser, invite to your Discord server
117
+
118
+ The setup wizard validates your bot token and checks that Message Content Intent is enabled before proceeding. If something's wrong, it tells you exactly what to fix.
119
+
120
+ ### Run setup
121
+
122
+ ```bash
123
+ npx onkol@latest setup
124
+ ```
125
+
126
+ The wizard walks you through everything:
127
+
128
+ ```
129
+ Welcome to Onkol Setup
130
+
131
+ Checking dependencies...
132
+ ✓ claude
133
+ ✓ bun
134
+ ✓ tmux
135
+ ✓ jq
136
+ ✓ curl
137
+
138
+ All dependencies found.
139
+
140
+ ✔ Where should Onkol live? ~/onkol
141
+ ✔ What should this node be called? api-server-prod
142
+ ✔ Discord bot token: ****
143
+ ✔ Discord server (guild) ID: 1234567890
144
+ ✔ Your Discord user ID: 9876543210
145
+ ✔ Registry file? Write a prompt — tell Claude what to find
146
+ ✔ Describe: Find the API endpoints and database URLs from .env
147
+ ✔ Service summary? Auto-discover
148
+ ✔ CLAUDE.md? Yes — This is a Node.js API server deployed via docker...
149
+ ✔ Plugins? context7, superpowers, code-simplifier
150
+
151
+ ✓ Bot token is valid
152
+ ✓ Message Content intent is enabled
153
+ ✓ Discord category and #orchestrator channel created
154
+ ✓ 6 scripts installed
155
+ ✓ Plugin installed with 4 files + dependencies
156
+ ✓ Systemd service installed and enabled
157
+ ✓ Orchestrator started in tmux session "onkol-api-server-prod"
158
+
159
+ ✓ Onkol node "api-server-prod" is live!
160
+ ```
161
+
162
+ Go to your Discord server. You'll see a new category with an `#orchestrator` channel. Send it a message.
163
+
164
+ ## Usage
165
+
166
+ ### Talking to the orchestrator
167
+
168
+ The orchestrator lives in the `#orchestrator` channel of your node's category. It reads your intent from how you phrase things:
169
+
170
+ | You say | What happens |
171
+ |---------|-------------|
172
+ | "fix the 403 bug in auth" | Spawns a worker that diagnoses, fixes, tests, and commits |
173
+ | "look into why response times are high" | Spawns a worker that investigates and reports, no code changes |
174
+ | "add retry logic to the webhook handler" | Spawns a worker that implements, tests, and waits for your approval |
175
+ | "analyze transferred calls for the last 3 weeks" | Spawns a worker that reads logs/data and produces an analysis |
176
+ | "just ship it" | Fully autonomous, pushes and deploys (asks for confirmation first) |
177
+
178
+ ### How workers work
179
+
180
+ When the orchestrator spawns a worker:
181
+
182
+ 1. A new Discord channel appears in your category (e.g., `#fix-auth-bug`)
183
+ 2. A new Claude Code session starts in tmux on the VM
184
+ 3. The worker posts progress and results in its Discord channel
185
+ 4. You can talk to the worker directly in that channel
186
+ 5. When done, tell the orchestrator to dissolve it. The channel disappears, learnings are saved.
187
+
188
+ ### Managing workers
189
+
190
+ From the orchestrator channel:
191
+ - "dissolve fix-auth-bug" kills the worker, saves learnings, deletes channel
192
+ - "list workers" shows all active workers
193
+ - "check on fix-auth-bug" gets the worker's current status
194
+
195
+ ### Setup prompts
196
+
197
+ During setup, you can describe things in plain English instead of providing config files:
198
+
199
+ - **Registry**: "Find the API endpoints from .env and the S3 bucket from AWS CLI"
200
+ - **Services**: Auto-discovers running services, or you describe what to look for
201
+ - **CLAUDE.md**: "This is a Node.js API server, Express, deployed via docker..."
202
+
203
+ The orchestrator executes these prompts on first boot and generates the structured files.
204
+
205
+ ## Architecture
206
+
207
+ ```
208
+ Your Discord Server
209
+ ├── Category: api-server-prod ← VM 1
210
+ │ ├── #orchestrator ← persistent Claude Code session
211
+ │ ├── #fix-auth-bug ← worker (temporary)
212
+ │ └── #analyze-error-logs ← worker (temporary)
213
+ ├── Category: web-app-staging ← VM 2
214
+ │ └── #orchestrator
215
+ └── Category: backend-prod ← VM 3
216
+ └── #orchestrator
217
+ ```
218
+
219
+ Each VM runs independently:
220
+ - **Orchestrator.** Long-running Claude Code session in tmux. Receives Discord messages, spawns workers, manages lifecycle.
221
+ - **Workers.** Ephemeral Claude Code sessions. One per task. Each gets its own Discord channel, its own context, its own instructions.
222
+ - **discord-filtered plugin.** Custom MCP channel server that routes Discord messages by channel ID. All sessions share one bot but each only hears its own channel.
223
+
224
+ ### On-disk structure
225
+
226
+ ```
227
+ ~/onkol/
228
+ ├── config.json # Node config (bot token, server ID, etc.)
229
+ ├── registry.json # VM-specific secrets, endpoints, ports
230
+ ├── services.md # What runs on this VM
231
+ ├── CLAUDE.md # Orchestrator instructions
232
+ ├── knowledge/ # Learnings from dissolved workers
233
+ │ ├── index.json
234
+ │ └── 2026-03-22-fix-auth-clock-skew.md
235
+ ├── workers/
236
+ │ ├── tracking.json # Active workers
237
+ │ └── fix-auth-bug/ # Worker directory (while active)
238
+ ├── scripts/ # Lifecycle scripts
239
+ └── plugins/
240
+ └── discord-filtered/ # MCP channel plugin
241
+ ```
242
+
243
+ ### Knowledge base
244
+
245
+ Every dissolved worker leaves behind a learning:
246
+
247
+ ```markdown
248
+ ## What happened
249
+ Token validation rejected valid tokens for 2-3 seconds after refresh.
250
+
251
+ ## Root cause
252
+ No clock skew tolerance between auth server and app server.
253
+
254
+ ## Fix
255
+ Added 5-second CLOCK_SKEW_TOLERANCE in auth.py:47.
256
+
257
+ ## For next time
258
+ If 403 errors appear after token operations, check clock sync first.
259
+ ```
260
+
261
+ The orchestrator includes relevant past learnings when spawning new workers. The system gets better at diagnosing issues over time.
262
+
263
+ ## Resumable setup
264
+
265
+ If setup fails midway (missing dependency, network error, wrong bot token), your answers are saved automatically. Next time you run `npx onkol setup`, it offers to resume:
266
+
267
+ ```
268
+ ? Found a previous setup attempt (4 steps completed). What do you want to do?
269
+ ❯ Resume from where it left off (node: api-server-prod)
270
+ Start fresh
271
+ ```
272
+
273
+ No re-entering bot tokens or server IDs. It picks up right where it left off.
274
+
275
+ ## Commands
276
+
277
+ ```bash
278
+ npx onkol setup # Interactive setup wizard
279
+ npx onkol@latest setup # Force latest version
280
+ ```
281
+
282
+ On the VM after setup:
283
+
284
+ ```bash
285
+ # Attach to the orchestrator
286
+ tmux attach -t onkol-<node-name>
287
+
288
+ # Check service status
289
+ systemctl status onkol-<node-name>
290
+
291
+ # Restart orchestrator
292
+ sudo systemctl restart onkol-<node-name>
293
+
294
+ # View active workers
295
+ bash ~/onkol/scripts/list-workers.sh
296
+
297
+ # Manually dissolve a worker
298
+ bash ~/onkol/scripts/dissolve-worker.sh --name "worker-name"
299
+ ```
300
+
301
+ ## Requirements
302
+
303
+ - Claude Code with `claude.ai` OAuth login (Max plan recommended for concurrent sessions)
304
+ - Node.js 18+ and Bun on each VM
305
+ - tmux and jq on each VM
306
+ - A Discord server with a bot that has Manage Channels permission
307
+ - VMs need outbound HTTPS access (no inbound ports needed)
308
+
309
+ ## How it's built
310
+
311
+ | Component | Tech | Lines |
312
+ |-----------|------|-------|
313
+ | Setup wizard | Node.js, TypeScript, Inquirer | ~500 |
314
+ | Discord channel plugin | Bun, MCP SDK, discord.js | ~300 |
315
+ | Worker lifecycle scripts | Bash | ~400 |
316
+ | Orchestrator/worker templates | Handlebars | ~150 |
317
+
318
+ The core mechanism is [Claude Code Channels](https://code.claude.com/docs/en/channels), an MCP-based system that pushes Discord messages into Claude Code sessions. The `discord-filtered` plugin is a custom channel that routes by Discord channel ID, allowing multiple sessions to share one bot.
319
+
320
+ ## License
321
+
322
+ MIT
package/dist/cli/index.js CHANGED
@@ -212,6 +212,14 @@ program
212
212
  maxWorkers: 3,
213
213
  installDir: dir,
214
214
  plugins: answers.plugins,
215
+ ...(answers.watchdogProvider !== 'skip' ? {
216
+ watchdog: {
217
+ provider: answers.watchdogProvider,
218
+ model: answers.watchdogModel,
219
+ apiKey: answers.watchdogApiKey,
220
+ ...(answers.watchdogApiUrl ? { apiUrl: answers.watchdogApiUrl } : {}),
221
+ },
222
+ } : {}),
215
223
  };
216
224
  writeFileSync(resolve(dir, 'config.json'), JSON.stringify(config, null, 2), { mode: 0o600 });
217
225
  markStep(homeDir, checkpoint, 'config');
@@ -252,6 +260,7 @@ program
252
260
  DISCORD_BOT_TOKEN: answers.botToken,
253
261
  DISCORD_CHANNEL_ID: orchChannelId,
254
262
  DISCORD_ALLOWED_USERS: JSON.stringify(allowedUsers),
263
+ TMUX_TARGET: `onkol-${answers.nodeName}`,
255
264
  },
256
265
  },
257
266
  },
@@ -294,7 +303,7 @@ program
294
303
  console.log(chalk.gray(' Config files already written, skipping'));
295
304
  }
296
305
  // --- CRITICAL: Copy scripts ---
297
- const requiredScripts = ['spawn-worker.sh', 'dissolve-worker.sh', 'list-workers.sh', 'check-worker.sh', 'healthcheck.sh', 'start-orchestrator.sh'];
306
+ const requiredScripts = ['spawn-worker.sh', 'dissolve-worker.sh', 'list-workers.sh', 'check-worker.sh', 'healthcheck.sh', 'worker-watchdog.sh', 'start-orchestrator.sh'];
298
307
  const scriptsSource = resolve(__dirname, '../../scripts');
299
308
  if (skip('scripts')) {
300
309
  console.log(chalk.gray(' Scripts already installed, skipping'));
@@ -424,12 +433,16 @@ program
424
433
  const timerDir = resolve(homeDir, '.config/systemd/user');
425
434
  mkdirSync(timerDir, { recursive: true });
426
435
  const healthcheckPath = resolve(dir, 'scripts/healthcheck.sh');
436
+ const watchdogPath = resolve(dir, 'scripts/worker-watchdog.sh');
427
437
  writeFileSync(resolve(timerDir, 'onkol-healthcheck.service'), `[Unit]\nDescription=Onkol healthcheck\n[Service]\nType=oneshot\nExecStart=${healthcheckPath}\n`);
428
438
  writeFileSync(resolve(timerDir, 'onkol-healthcheck.timer'), `[Unit]\nDescription=Onkol healthcheck every 5min\n[Timer]\nOnBootSec=2min\nOnUnitActiveSec=5min\n[Install]\nWantedBy=timers.target\n`);
439
+ writeFileSync(resolve(timerDir, 'onkol-worker-watchdog.service'), `[Unit]\nDescription=Onkol worker watchdog\n[Service]\nType=oneshot\nExecStart=${watchdogPath}\n`);
440
+ writeFileSync(resolve(timerDir, 'onkol-worker-watchdog.timer'), `[Unit]\nDescription=Onkol worker watchdog every 3min\n[Timer]\nOnBootSec=3min\nOnUnitActiveSec=3min\n[Install]\nWantedBy=timers.target\n`);
429
441
  writeFileSync(resolve(timerDir, 'onkol-cleanup.service'), `[Unit]\nDescription=Onkol archive cleanup\n[Service]\nType=oneshot\nExecStart=/usr/bin/find ${resolve(dir, 'workers/.archive')} -maxdepth 1 -mtime +30 -exec rm -rf {} \\;\n`);
430
442
  writeFileSync(resolve(timerDir, 'onkol-cleanup.timer'), `[Unit]\nDescription=Onkol archive cleanup daily\n[Timer]\nOnCalendar=*-*-* 04:00:00\n[Install]\nWantedBy=timers.target\n`);
431
443
  execSync('systemctl --user daemon-reload', { stdio: 'pipe' });
432
444
  execSync('systemctl --user enable --now onkol-healthcheck.timer', { stdio: 'pipe' });
445
+ execSync('systemctl --user enable --now onkol-worker-watchdog.timer', { stdio: 'pipe' });
433
446
  execSync('systemctl --user enable --now onkol-cleanup.timer', { stdio: 'pipe' });
434
447
  }
435
448
  console.log(chalk.green(`✓ Systemd user timers installed (healthcheck every 5min, cleanup daily)`));
@@ -519,4 +532,148 @@ program
519
532
  console.log(chalk.gray(`\n To attach to the session: tmux attach -t onkol-${answers.nodeName}`));
520
533
  console.log(chalk.gray(` To check status: systemctl status onkol-${answers.nodeName}`));
521
534
  });
535
+ program
536
+ .command('update')
537
+ .description('Update plugin + scripts and restart workers with conversation history preserved')
538
+ .option('--skip-update', 'Skip pulling latest npm package, just restart workers')
539
+ .option('--dir <path>', 'Onkol install directory', '')
540
+ .action(async (opts) => {
541
+ // Find install directory
542
+ let dir = opts.dir;
543
+ if (!dir) {
544
+ // Try common locations
545
+ const homeDir = process.env.HOME || '';
546
+ const candidates = [
547
+ resolve(homeDir, 'onkol'),
548
+ resolve(homeDir, '.onkol'),
549
+ '/opt/onkol',
550
+ ];
551
+ for (const c of candidates) {
552
+ if (existsSync(resolve(c, 'config.json'))) {
553
+ dir = c;
554
+ break;
555
+ }
556
+ }
557
+ }
558
+ if (!dir || !existsSync(resolve(dir, 'config.json'))) {
559
+ console.log(chalk.red('Could not find Onkol install. Use --dir <path> to specify.'));
560
+ process.exit(1);
561
+ }
562
+ const config = JSON.parse(readFileSync(resolve(dir, 'config.json'), 'utf-8'));
563
+ const nodeName = config.nodeName;
564
+ console.log(chalk.bold('=== Onkol Update & Restart ==='));
565
+ console.log(chalk.gray(`Node: ${nodeName}`));
566
+ console.log(chalk.gray(`Install dir: ${dir}`));
567
+ console.log('');
568
+ // Step 1: Update files
569
+ if (!opts.skipUpdate) {
570
+ console.log(chalk.cyan('[1/3] Updating files from npm package...'));
571
+ try {
572
+ // Find where this CLI is running from — that's the latest package
573
+ const pkgRoot = resolve(__dirname, '..');
574
+ const pluginSrc = existsSync(resolve(pkgRoot, 'src/plugin'))
575
+ ? resolve(pkgRoot, 'src/plugin')
576
+ : resolve(pkgRoot, 'dist/plugin');
577
+ const scriptsSrc = resolve(pkgRoot, 'scripts');
578
+ // Copy plugin files
579
+ if (existsSync(pluginSrc)) {
580
+ const pluginDest = resolve(dir, 'plugins/discord-filtered');
581
+ const { readdirSync } = await import('fs');
582
+ for (const f of readdirSync(pluginSrc)) {
583
+ if (f.endsWith('.ts') || f.endsWith('.js')) {
584
+ copyFileSync(resolve(pluginSrc, f), resolve(pluginDest, f));
585
+ }
586
+ }
587
+ console.log(chalk.green(' ✓ Plugin files updated'));
588
+ }
589
+ // Copy scripts
590
+ if (existsSync(scriptsSrc)) {
591
+ const { readdirSync, chmodSync } = await import('fs');
592
+ for (const f of readdirSync(scriptsSrc)) {
593
+ if (f.endsWith('.sh')) {
594
+ copyFileSync(resolve(scriptsSrc, f), resolve(dir, 'scripts', f));
595
+ chmodSync(resolve(dir, 'scripts', f), 0o755);
596
+ }
597
+ }
598
+ console.log(chalk.green(' ✓ Scripts updated'));
599
+ }
600
+ }
601
+ catch (err) {
602
+ console.log(chalk.yellow(` ⚠ Update failed: ${err instanceof Error ? err.message : err}`));
603
+ console.log(chalk.yellow(' Continuing with restart...'));
604
+ }
605
+ }
606
+ else {
607
+ console.log(chalk.gray('[1/3] Skipping update (--skip-update)'));
608
+ }
609
+ console.log('');
610
+ // Step 2: Find active workers and their session IDs
611
+ console.log(chalk.cyan('[2/3] Dissolving active workers...'));
612
+ const trackingPath = resolve(dir, 'workers/tracking.json');
613
+ if (!existsSync(trackingPath)) {
614
+ console.log(chalk.gray(' No active workers.'));
615
+ console.log(chalk.green.bold('\n✓ Update complete. No workers to restart.'));
616
+ return;
617
+ }
618
+ const tracking = JSON.parse(readFileSync(trackingPath, 'utf-8'));
619
+ const active = tracking.filter((w) => w.status === 'active');
620
+ if (active.length === 0) {
621
+ console.log(chalk.gray(' No active workers.'));
622
+ console.log(chalk.green.bold('\n✓ Update complete. No workers to restart.'));
623
+ return;
624
+ }
625
+ const workers = [];
626
+ for (const w of active) {
627
+ // Find session ID: look in ~/.claude/projects/<encoded-path>/
628
+ const encoded = '-' + w.workDir.replace(/^\//, '').replace(/\//g, '-');
629
+ const sessionDir = resolve(process.env.HOME || '', '.claude/projects', encoded);
630
+ let sessionId = '';
631
+ try {
632
+ const { readdirSync, statSync } = await import('fs');
633
+ const jsonls = readdirSync(sessionDir)
634
+ .filter((f) => f.endsWith('.jsonl'))
635
+ .map((f) => ({ name: f, mtime: statSync(resolve(sessionDir, f)).mtimeMs }))
636
+ .sort((a, b) => a.mtime - b.mtime);
637
+ if (jsonls.length > 0) {
638
+ sessionId = jsonls[jsonls.length - 1].name.replace('.jsonl', '');
639
+ }
640
+ }
641
+ catch { /* session dir may not exist */ }
642
+ workers.push({ name: w.name, workDir: w.workDir, intent: w.intent, sessionId });
643
+ console.log(chalk.gray(` ${w.name} → session: ${sessionId || 'none'}`));
644
+ }
645
+ console.log('');
646
+ // Dissolve
647
+ for (const w of workers) {
648
+ try {
649
+ execSync(`bash "${resolve(dir, 'scripts/dissolve-worker.sh')}" --name "${w.name}"`, { stdio: 'pipe' });
650
+ console.log(chalk.gray(` ✓ ${w.name} dissolved`));
651
+ }
652
+ catch (err) {
653
+ console.log(chalk.yellow(` ⚠ Failed to dissolve ${w.name}: ${err instanceof Error ? err.message : err}`));
654
+ }
655
+ }
656
+ console.log('');
657
+ // Step 3: Respawn with --resume
658
+ console.log(chalk.cyan('[3/3] Respawning workers with --resume...'));
659
+ for (const w of workers) {
660
+ const resumeArg = w.sessionId ? `--resume ${w.sessionId}` : '';
661
+ const cmd = `bash "${resolve(dir, 'scripts/spawn-worker.sh')}" \
662
+ --name "${w.name}" \
663
+ --dir "${w.workDir}" \
664
+ --task "Continue the previous work. Check your conversation history for context." \
665
+ --intent "${w.intent}" \
666
+ ${resumeArg}`;
667
+ try {
668
+ execSync(cmd, { stdio: 'pipe' });
669
+ console.log(chalk.green(` ✓ ${w.name} respawned${w.sessionId ? ' (resumed)' : ''}`));
670
+ }
671
+ catch (err) {
672
+ console.log(chalk.red(` ✗ Failed to spawn ${w.name}: ${err instanceof Error ? err.message : err}`));
673
+ }
674
+ // Small delay to avoid Discord rate limits
675
+ await new Promise(r => setTimeout(r, 2000));
676
+ }
677
+ console.log(chalk.green.bold(`\n✓ Update complete. ${workers.length} worker(s) restarted.`));
678
+ });
522
679
  program.parse();
@@ -13,5 +13,9 @@ export interface SetupAnswers {
13
13
  claudeMdMode: 'prompt' | 'skip';
14
14
  claudeMdPrompt: string | null;
15
15
  plugins: string[];
16
+ watchdogProvider: 'openrouter' | 'gemini' | 'custom' | 'skip';
17
+ watchdogModel: string | null;
18
+ watchdogApiKey: string | null;
19
+ watchdogApiUrl: string | null;
16
20
  }
17
21
  export declare function runSetupPrompts(homeDir: string): Promise<SetupAnswers>;
@@ -165,8 +165,70 @@ export async function runSetupPrompts(homeDir) {
165
165
  { name: 'frontend-design', value: 'frontend-design', checked: false },
166
166
  ],
167
167
  },
168
+ {
169
+ type: 'list',
170
+ name: 'watchdogProvider',
171
+ message: 'Worker watchdog LLM (monitors workers, nudges if stuck/silent):',
172
+ choices: [
173
+ { name: 'OpenRouter (recommended — use any model via openrouter.ai)', value: 'openrouter' },
174
+ { name: 'Google Gemini (direct API)', value: 'gemini' },
175
+ { name: 'Custom OpenAI-compatible endpoint', value: 'custom' },
176
+ { name: 'Skip (disable LLM watchdog)', value: 'skip' },
177
+ ],
178
+ },
179
+ {
180
+ type: 'list',
181
+ name: 'watchdogModel',
182
+ message: 'Watchdog model:',
183
+ choices: (a) => {
184
+ const base = [
185
+ { name: 'google/gemini-2.5-flash (fast, cheap)', value: 'google/gemini-2.5-flash' },
186
+ { name: 'google/gemini-2.0-flash-001 (fast, cheap)', value: 'google/gemini-2.0-flash-001' },
187
+ { name: 'anthropic/claude-haiku (fast)', value: 'anthropic/claude-3-5-haiku-20241022' },
188
+ { name: 'Custom — enter model ID', value: '__custom__' },
189
+ ];
190
+ if (a.watchdogProvider === 'gemini') {
191
+ return [
192
+ { name: 'gemini-2.5-flash-preview-05-20 (recommended)', value: 'gemini-2.5-flash-preview-05-20' },
193
+ { name: 'gemini-2.0-flash', value: 'gemini-2.0-flash' },
194
+ { name: 'Custom — enter model ID', value: '__custom__' },
195
+ ];
196
+ }
197
+ return base;
198
+ },
199
+ when: (a) => a.watchdogProvider !== 'skip',
200
+ },
201
+ {
202
+ type: 'input',
203
+ name: 'watchdogModelCustom',
204
+ message: 'Enter model ID:',
205
+ when: (a) => a.watchdogProvider !== 'skip' && a.watchdogModel === '__custom__',
206
+ },
207
+ {
208
+ type: 'password',
209
+ name: 'watchdogApiKey',
210
+ message: (a) => {
211
+ if (a.watchdogProvider === 'openrouter')
212
+ return 'OpenRouter API key (sk-or-...):';
213
+ if (a.watchdogProvider === 'gemini')
214
+ return 'Google Gemini API key:';
215
+ return 'API key:';
216
+ },
217
+ mask: '*',
218
+ when: (a) => a.watchdogProvider !== 'skip',
219
+ },
220
+ {
221
+ type: 'input',
222
+ name: 'watchdogApiUrl',
223
+ message: 'API base URL (OpenAI-compatible, e.g. https://api.example.com/v1/chat/completions):',
224
+ when: (a) => a.watchdogProvider === 'custom',
225
+ },
168
226
  ]);
169
227
  const answers = { ...preDiscordAnswers, ...discordAndRestAnswers };
228
+ // Resolve custom model selection
229
+ const watchdogModel = answers.watchdogModel === '__custom__'
230
+ ? (answers.watchdogModelCustom || null)
231
+ : (answers.watchdogModel || null);
170
232
  return {
171
233
  ...answers,
172
234
  registryPath: answers.registryPath || null,
@@ -174,5 +236,8 @@ export async function runSetupPrompts(homeDir) {
174
236
  serviceSummaryPath: answers.serviceSummaryPath || null,
175
237
  servicesPrompt: answers.servicesPrompt || null,
176
238
  claudeMdPrompt: answers.claudeMdPrompt || null,
239
+ watchdogModel,
240
+ watchdogApiKey: answers.watchdogApiKey || null,
241
+ watchdogApiUrl: answers.watchdogApiUrl || null,
177
242
  };
178
243
  }
@@ -35,6 +35,7 @@ WantedBy=multi-user.target
35
35
  }
36
36
  export function generateCrontab(onkolDir) {
37
37
  return `*/5 * * * * ${onkolDir}/scripts/healthcheck.sh
38
+ */3 * * * * ${onkolDir}/scripts/worker-watchdog.sh
38
39
  0 4 * * * find ${onkolDir}/workers/.archive -maxdepth 1 -mtime +30 -exec rm -rf {} \\;
39
40
  `;
40
41
  }
@@ -5,7 +5,7 @@ export interface DiscordClientConfig {
5
5
  allowedUsers: string[];
6
6
  }
7
7
  export declare function shouldForwardMessage(messageChannelId: string, authorId: string, isBot: boolean, targetChannelId: string, allowedUsers: string[]): boolean;
8
- export declare function createDiscordClient(config: DiscordClientConfig, onMessage: (message: Message) => void): {
8
+ export declare function createDiscordClient(config: DiscordClientConfig, onMessage: (content: string, message: Message) => void): {
9
9
  login: () => Promise<string>;
10
10
  client: Client<boolean>;
11
11
  sendMessage(channelId: string, text: string): Promise<void>;
@@ -8,6 +8,25 @@ export function shouldForwardMessage(messageChannelId, authorId, isBot, targetCh
8
8
  return false;
9
9
  return true;
10
10
  }
11
+ // When a message is too long, Discord auto-converts it to a .txt file attachment
12
+ // with empty message content. This fetches the text from those attachments.
13
+ async function resolveTextAttachments(message) {
14
+ let content = message.content;
15
+ const textAttachments = message.attachments.filter((a) => a.contentType?.startsWith('text/') || a.name?.endsWith('.txt'));
16
+ for (const attachment of textAttachments.values()) {
17
+ try {
18
+ const res = await fetch(attachment.url);
19
+ if (res.ok) {
20
+ const text = await res.text();
21
+ content = content ? `${content}\n\n${text}` : text;
22
+ }
23
+ }
24
+ catch (err) {
25
+ console.error(`[discord-filtered] Failed to fetch attachment ${attachment.name}: ${err}`);
26
+ }
27
+ }
28
+ return content;
29
+ }
11
30
  export function createDiscordClient(config, onMessage) {
12
31
  const client = new Client({
13
32
  intents: [
@@ -16,9 +35,12 @@ export function createDiscordClient(config, onMessage) {
16
35
  GatewayIntentBits.MessageContent,
17
36
  ],
18
37
  });
19
- client.on('messageCreate', (message) => {
38
+ client.on('messageCreate', async (message) => {
20
39
  if (shouldForwardMessage(message.channel.id, message.author.id, message.author.bot, config.channelId, config.allowedUsers)) {
21
- onMessage(message);
40
+ const content = await resolveTextAttachments(message);
41
+ if (content) {
42
+ onMessage(content, message);
43
+ }
22
44
  }
23
45
  });
24
46
  client.on('ready', () => {
@@ -1,11 +1,13 @@
1
1
  #!/usr/bin/env bun
2
2
  import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
3
+ import { execSync } from 'child_process';
3
4
  import { createMcpServer } from './mcp-server.js';
4
5
  import { createDiscordClient } from './discord-client.js';
5
6
  import { MessageBatcher } from './message-batcher.js';
6
7
  const BOT_TOKEN = process.env.DISCORD_BOT_TOKEN;
7
8
  const CHANNEL_ID = process.env.DISCORD_CHANNEL_ID;
8
9
  const ALLOWED_USERS = JSON.parse(process.env.DISCORD_ALLOWED_USERS || '[]');
10
+ const TMUX_TARGET = process.env.TMUX_TARGET || '';
9
11
  if (!BOT_TOKEN) {
10
12
  console.error('[discord-filtered] DISCORD_BOT_TOKEN is required');
11
13
  process.exit(1);
@@ -14,11 +16,60 @@ if (!CHANNEL_ID) {
14
16
  console.error('[discord-filtered] DISCORD_CHANNEL_ID is required');
15
17
  process.exit(1);
16
18
  }
17
- const discord = createDiscordClient({ botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS }, async (message) => {
19
+ function sendInterrupt() {
20
+ if (!TMUX_TARGET) {
21
+ console.error('[discord-filtered] !stop received but TMUX_TARGET not set — cannot interrupt');
22
+ return false;
23
+ }
24
+ try {
25
+ // Escape is Claude Code's interrupt key
26
+ execSync(`tmux send-keys -t ${JSON.stringify(TMUX_TARGET)} Escape`, { stdio: 'pipe' });
27
+ console.error(`[discord-filtered] Sent interrupt (Escape) to ${TMUX_TARGET}`);
28
+ return true;
29
+ }
30
+ catch (err) {
31
+ console.error(`[discord-filtered] Failed to send interrupt: ${err}`);
32
+ return false;
33
+ }
34
+ }
35
+ const discord = createDiscordClient({ botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS }, async (content, message) => {
36
+ // Instant acknowledgment — user knows the message reached the session
37
+ try {
38
+ await message.react('👀');
39
+ }
40
+ catch { /* ignore */ }
41
+ const isInterrupt = /^!stop\b/i.test(content);
42
+ if (isInterrupt) {
43
+ sendInterrupt();
44
+ // Strip the !stop prefix and forward the rest as a normal message
45
+ const rest = content.replace(/^!stop\s*/i, '').trim();
46
+ // React to confirm the interrupt was received
47
+ try {
48
+ await message.react('🛑');
49
+ }
50
+ catch { /* ignore */ }
51
+ // Small delay to let Claude Code process the Escape before the new message arrives
52
+ await new Promise(r => setTimeout(r, 1500));
53
+ // Forward the message (with or without remaining text)
54
+ await mcpServer.notification({
55
+ method: 'notifications/claude/channel',
56
+ params: {
57
+ content: rest || '[interrupted by user]',
58
+ meta: {
59
+ channel_id: message.channel.id,
60
+ sender: message.author.username,
61
+ sender_id: message.author.id,
62
+ message_id: message.id,
63
+ interrupt: true,
64
+ },
65
+ },
66
+ });
67
+ return;
68
+ }
18
69
  await mcpServer.notification({
19
70
  method: 'notifications/claude/channel',
20
71
  params: {
21
- content: message.content,
72
+ content: content,
22
73
  meta: {
23
74
  channel_id: message.channel.id,
24
75
  sender: message.author.username,
@@ -1,5 +1,4 @@
1
1
  const DISCORD_MAX_LENGTH = 2000;
2
- const TRUNCATION_SUFFIX = '\n... (truncated)';
3
2
  export class MessageBatcher {
4
3
  buffer = [];
5
4
  timer = null;
@@ -18,12 +17,55 @@ export class MessageBatcher {
18
17
  async flush() {
19
18
  if (this.buffer.length === 0)
20
19
  return;
21
- let combined = this.buffer.join('\n');
20
+ const combined = this.buffer.join('\n');
22
21
  this.buffer = [];
23
22
  this.timer = null;
24
- if (combined.length > DISCORD_MAX_LENGTH) {
25
- combined = combined.slice(0, DISCORD_MAX_LENGTH - TRUNCATION_SUFFIX.length) + TRUNCATION_SUFFIX;
23
+ // Split into multiple messages instead of truncating
24
+ const chunks = splitMessage(combined);
25
+ for (const chunk of chunks) {
26
+ await this.sendFn(chunk);
26
27
  }
27
- await this.sendFn(combined);
28
28
  }
29
29
  }
30
+ // Split long text into Discord-safe chunks, preferring line breaks as split points
31
+ function splitMessage(text) {
32
+ if (text.length <= DISCORD_MAX_LENGTH)
33
+ return [text];
34
+ const chunks = [];
35
+ let remaining = text;
36
+ while (remaining.length > 0) {
37
+ if (remaining.length <= DISCORD_MAX_LENGTH) {
38
+ chunks.push(remaining);
39
+ break;
40
+ }
41
+ // Find a good split point: prefer double newline, then single newline, then space
42
+ let splitAt = -1;
43
+ const searchWindow = remaining.slice(0, DISCORD_MAX_LENGTH);
44
+ // Try splitting at last paragraph break
45
+ const lastParagraph = searchWindow.lastIndexOf('\n\n');
46
+ if (lastParagraph > DISCORD_MAX_LENGTH * 0.3) {
47
+ splitAt = lastParagraph;
48
+ }
49
+ // Fall back to last line break
50
+ if (splitAt === -1) {
51
+ const lastNewline = searchWindow.lastIndexOf('\n');
52
+ if (lastNewline > DISCORD_MAX_LENGTH * 0.3) {
53
+ splitAt = lastNewline;
54
+ }
55
+ }
56
+ // Fall back to last space
57
+ if (splitAt === -1) {
58
+ const lastSpace = searchWindow.lastIndexOf(' ');
59
+ if (lastSpace > DISCORD_MAX_LENGTH * 0.3) {
60
+ splitAt = lastSpace;
61
+ }
62
+ }
63
+ // Hard split as last resort
64
+ if (splitAt === -1) {
65
+ splitAt = DISCORD_MAX_LENGTH;
66
+ }
67
+ chunks.push(remaining.slice(0, splitAt));
68
+ remaining = remaining.slice(splitAt).replace(/^\n+/, '');
69
+ }
70
+ return chunks;
71
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "onkol",
3
- "version": "0.4.0",
3
+ "version": "0.5.0",
4
4
  "description": "Decentralized on-call agent system powered by Claude Code",
5
5
  "type": "module",
6
6
  "bin": {
@@ -9,6 +9,7 @@ while [[ $# -gt 0 ]]; do
9
9
  --task) TASK_DESC="$2"; shift 2 ;;
10
10
  --intent) INTENT="$2"; shift 2 ;;
11
11
  --context) CONTEXT="$2"; shift 2 ;;
12
+ --resume) RESUME_SESSION="$2"; shift 2 ;;
12
13
  *) echo "Unknown arg: $1"; exit 1 ;;
13
14
  esac
14
15
  done
@@ -19,6 +20,7 @@ done
19
20
  : "${TASK_DESC:?--task is required}"
20
21
  : "${INTENT:=fix}"
21
22
  : "${CONTEXT:=No additional context.}"
23
+ : "${RESUME_SESSION:=}"
22
24
 
23
25
  # Load config
24
26
  ONKOL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
@@ -82,8 +84,7 @@ cat > "$WORKER_DIR/.mcp.json" << MCPEOF
82
84
  "env": {
83
85
  "DISCORD_BOT_TOKEN": "$BOT_TOKEN",
84
86
  "DISCORD_CHANNEL_ID": "$CHANNEL_ID",
85
- "DISCORD_ALLOWED_USERS": "$ALLOWED_USERS_ESCAPED",
86
- "TMUX_TARGET": "${TMUX_SESSION}:${WORKER_NAME}"
87
+ "DISCORD_ALLOWED_USERS": "$ALLOWED_USERS_ESCAPED"
87
88
  }
88
89
  }
89
90
  }
@@ -178,21 +179,23 @@ cat >> "$WORKER_DIR/CLAUDE.md" << STARTEOF
178
179
  Immediately when you start:
179
180
  1. Read $WORKER_DIR/task.md for your task
180
181
  2. Read $WORKER_DIR/context.md for context
181
- 3. Use the \`reply\` tool to send "Starting work on: <brief task summary>" to Discord
182
- 4. Begin work send progress updates via \`reply\` every few steps
183
- 5. When done, send your full results/summary via \`reply\` (split into <2000 char messages)
184
- 6. For file deliverables, use \`replyWithFile\` to attach them
185
-
186
- IMPORTANT: The user CANNOT see your terminal. The ONLY way to communicate is the reply tool.
187
- If you complete work without sending results via reply, the user will never see your output.
182
+ 3. Begin work according to your intent
183
+ 4. Report progress and results using the reply tool to your Discord channel
188
184
  Do NOT wait for a message. Start working as soon as you boot.
189
185
  STARTEOF
190
186
 
187
+ # Build the resume flags and initial prompt
188
+ RESUME_FLAGS=""
189
+ if [ -n "$RESUME_SESSION" ]; then
190
+ RESUME_FLAGS="--resume $RESUME_SESSION --fork-session"
191
+ fi
192
+
191
193
  # Create a self-contained wrapper script with all paths baked in
192
194
  WRAPPER="$WORKER_DIR/start-worker.sh"
193
195
  cat > "$WRAPPER" << WRAPEOF
194
196
  #!/bin/bash
195
197
  TMUX_TARGET="${TMUX_SESSION}:${WORKER_NAME}"
198
+ RESUMING="$RESUME_SESSION"
196
199
 
197
200
  # Auto-accept prompts in the background
198
201
  (
@@ -200,9 +203,14 @@ TMUX_TARGET="${TMUX_SESSION}:${WORKER_NAME}"
200
203
  sleep 2
201
204
  PANE_CONTENT=\$(tmux capture-pane -t "\$TMUX_TARGET" -p 2>/dev/null || echo "")
202
205
  if echo "\$PANE_CONTENT" | grep -q "^❯"; then
203
- # Claude is ready — send the initial prompt via tmux keys
204
206
  sleep 1
205
- tmux send-keys -t "\$TMUX_TARGET" "Read $WORKER_DIR/task.md and $WORKER_DIR/context.md, then begin work. IMPORTANT: You MUST use the reply tool from the discord-filtered MCP server for ALL communication — send a starting message now, progress updates as you work, and final results when done. The user cannot see your terminal." Enter
207
+ if [ -n "\$RESUMING" ]; then
208
+ # Resuming a previous session — tell it to continue and use the new Discord channel
209
+ tmux send-keys -t "\$TMUX_TARGET" "You have been resumed in a new session. Your Discord channel has changed — use the reply tool to communicate. Check $WORKER_DIR/task.md for your task. Continue where you left off and report progress via Discord." Enter
210
+ else
211
+ # Fresh session — send the initial task prompt
212
+ tmux send-keys -t "\$TMUX_TARGET" "Read $WORKER_DIR/task.md and $WORKER_DIR/context.md, then begin work per CLAUDE.md." Enter
213
+ fi
206
214
  break
207
215
  fi
208
216
  tmux send-keys -t "\$TMUX_TARGET" Enter 2>/dev/null || true
@@ -234,7 +242,8 @@ trap cleanup EXIT
234
242
  # and the auto-acceptor sends the first prompt via tmux keys once claude is ready)
235
243
  cd "$WORK_DIR" && claude \\
236
244
  --dangerously-skip-permissions \\
237
- --dangerously-load-development-channels server:discord-filtered
245
+ --dangerously-load-development-channels server:discord-filtered \\
246
+ $RESUME_FLAGS
238
247
  WRAPEOF
239
248
  chmod +x "$WRAPPER"
240
249
 
@@ -0,0 +1,183 @@
1
+ #!/bin/bash
2
+ # Update Onkol plugin + scripts from the latest npm package, then
3
+ # dissolve all active workers and respawn them with --resume so they
4
+ # keep their conversation history but pick up the new code.
5
+ #
6
+ # Usage:
7
+ # onkol-update # update + restart all workers
8
+ # onkol-update --skip-update # just restart workers (no npm pull)
9
+ # onkol-update --workers-only # alias for --skip-update
10
+
11
+ set -uo pipefail
12
+
13
+ ONKOL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
14
+ CONFIG="$ONKOL_DIR/config.json"
15
+ TRACKING="$ONKOL_DIR/workers/tracking.json"
16
+
17
+ if [ ! -f "$CONFIG" ]; then
18
+ echo "ERROR: No config.json found at $ONKOL_DIR. Is Onkol installed here?"
19
+ exit 1
20
+ fi
21
+
22
+ NODE_NAME=$(jq -r '.nodeName' "$CONFIG")
23
+ SKIP_UPDATE=false
24
+
25
+ while [[ $# -gt 0 ]]; do
26
+ case $1 in
27
+ --skip-update|--workers-only) SKIP_UPDATE=true; shift ;;
28
+ *) echo "Unknown arg: $1"; exit 1 ;;
29
+ esac
30
+ done
31
+
32
+ echo "=== Onkol Update & Restart ==="
33
+ echo "Node: $NODE_NAME"
34
+ echo "Install dir: $ONKOL_DIR"
35
+ echo ""
36
+
37
+ # ── Step 1: Update files from npm ──────────────────────────────────────────
38
+
39
+ if [ "$SKIP_UPDATE" = false ]; then
40
+ echo "[1/3] Updating from latest npm package..."
41
+
42
+ # Create a temp dir, download latest package, extract the files we need
43
+ TMPDIR=$(mktemp -d)
44
+ trap "rm -rf $TMPDIR" EXIT
45
+
46
+ # Use npm pack to download the tarball without installing
47
+ if command -v npm &>/dev/null; then
48
+ npm pack onkol --pack-destination "$TMPDIR" &>/dev/null
49
+ TARBALL=$(ls "$TMPDIR"/onkol-*.tgz 2>/dev/null | head -1)
50
+ fi
51
+
52
+ if [ -z "${TARBALL:-}" ] || [ ! -f "${TARBALL:-}" ]; then
53
+ echo "WARNING: Could not download npm package. Trying npx..."
54
+ # Fallback: use npx to find the package cache
55
+ npx --yes onkol@latest --help &>/dev/null 2>&1
56
+ PKG_DIR=$(find ~/.npm/_npx -name "onkol" -path "*/node_modules/*" -type d 2>/dev/null | head -1)
57
+ if [ -z "$PKG_DIR" ]; then
58
+ echo "ERROR: Could not find onkol package. Skipping update."
59
+ echo "You can update manually: copy plugin/ and scripts/ from the repo."
60
+ SKIP_UPDATE=true
61
+ fi
62
+ fi
63
+
64
+ if [ "$SKIP_UPDATE" = false ]; then
65
+ if [ -n "${TARBALL:-}" ] && [ -f "${TARBALL:-}" ]; then
66
+ # Extract from tarball
67
+ tar xzf "$TARBALL" -C "$TMPDIR"
68
+ PKG_DIR="$TMPDIR/package"
69
+ fi
70
+
71
+ if [ -d "$PKG_DIR" ]; then
72
+ # Update plugin files
73
+ if [ -d "$PKG_DIR/src/plugin" ]; then
74
+ cp "$PKG_DIR/src/plugin/"*.ts "$ONKOL_DIR/plugins/discord-filtered/" 2>/dev/null && \
75
+ echo " ✓ Plugin files updated"
76
+ elif [ -d "$PKG_DIR/dist/plugin" ]; then
77
+ cp "$PKG_DIR/dist/plugin/"*.js "$ONKOL_DIR/plugins/discord-filtered/" 2>/dev/null && \
78
+ echo " ✓ Plugin files updated (dist)"
79
+ fi
80
+
81
+ # Update scripts
82
+ if [ -d "$PKG_DIR/scripts" ]; then
83
+ for script in "$PKG_DIR/scripts/"*.sh; do
84
+ name=$(basename "$script")
85
+ cp "$script" "$ONKOL_DIR/scripts/$name"
86
+ chmod +x "$ONKOL_DIR/scripts/$name"
87
+ done
88
+ echo " ✓ Scripts updated"
89
+ fi
90
+
91
+ echo " Done."
92
+ fi
93
+ fi
94
+ else
95
+ echo "[1/3] Skipping update (--skip-update)"
96
+ fi
97
+
98
+ echo ""
99
+
100
+ # ── Step 2: Dissolve active workers (saving session IDs) ──────────────────
101
+
102
+ echo "[2/3] Dissolving active workers..."
103
+
104
+ if [ ! -f "$TRACKING" ] || [ "$(jq length "$TRACKING" 2>/dev/null)" -eq 0 ]; then
105
+ echo " No active workers to restart."
106
+ echo ""
107
+ echo "=== Update complete. No workers to restart. ==="
108
+ exit 0
109
+ fi
110
+
111
+ # Build a list of workers with their session IDs before dissolving
112
+ declare -a WORKER_NAMES=()
113
+ declare -a WORKER_DIRS=()
114
+ declare -a WORKER_INTENTS=()
115
+ declare -a WORKER_SESSIONS=()
116
+
117
+ while IFS= read -r line; do
118
+ W_NAME=$(echo "$line" | jq -r '.name')
119
+ W_DIR=$(echo "$line" | jq -r '.workDir')
120
+ W_INTENT=$(echo "$line" | jq -r '.intent')
121
+
122
+ # Find the latest session ID for this worker's project directory
123
+ # Claude Code stores sessions in ~/.claude/projects/<encoded-path>/
124
+ ENCODED_PATH=$(echo "$W_DIR" | sed 's|^/||; s|/|-|g; s|^|-|')
125
+ SESSION_DIR="$HOME/.claude/projects/$ENCODED_PATH"
126
+ SESSION_ID=""
127
+
128
+ if [ -d "$SESSION_DIR" ]; then
129
+ LATEST_JSONL=$(find "$SESSION_DIR" -maxdepth 1 -name "*.jsonl" \
130
+ -not -path "*/subagents/*" -printf '%T@ %f\n' 2>/dev/null \
131
+ | sort -n | tail -1 | awk '{print $2}')
132
+ if [ -n "$LATEST_JSONL" ]; then
133
+ SESSION_ID="${LATEST_JSONL%.jsonl}"
134
+ fi
135
+ fi
136
+
137
+ WORKER_NAMES+=("$W_NAME")
138
+ WORKER_DIRS+=("$W_DIR")
139
+ WORKER_INTENTS+=("$W_INTENT")
140
+ WORKER_SESSIONS+=("$SESSION_ID")
141
+
142
+ echo " $W_NAME → session: ${SESSION_ID:-none}"
143
+ done < <(jq -c '.[] | select(.status == "active")' "$TRACKING")
144
+
145
+ echo ""
146
+
147
+ # Dissolve all workers
148
+ for name in "${WORKER_NAMES[@]}"; do
149
+ "$ONKOL_DIR/scripts/dissolve-worker.sh" --name "$name" 2>&1 | sed 's/^/ /'
150
+ done
151
+
152
+ echo ""
153
+
154
+ # ── Step 3: Respawn with --resume ──────────────────────────────────────────
155
+
156
+ echo "[3/3] Respawning workers with --resume..."
157
+
158
+ for i in "${!WORKER_NAMES[@]}"; do
159
+ W_NAME="${WORKER_NAMES[$i]}"
160
+ W_DIR="${WORKER_DIRS[$i]}"
161
+ W_INTENT="${WORKER_INTENTS[$i]}"
162
+ W_SESSION="${WORKER_SESSIONS[$i]}"
163
+
164
+ RESUME_ARG=""
165
+ if [ -n "$W_SESSION" ]; then
166
+ RESUME_ARG="--resume $W_SESSION"
167
+ fi
168
+
169
+ echo " Spawning $W_NAME (intent: $W_INTENT, resume: ${W_SESSION:-fresh})..."
170
+
171
+ "$ONKOL_DIR/scripts/spawn-worker.sh" \
172
+ --name "$W_NAME" \
173
+ --dir "$W_DIR" \
174
+ --task "Continue the previous work. Check your conversation history for context." \
175
+ --intent "$W_INTENT" \
176
+ $RESUME_ARG 2>&1 | sed 's/^/ /'
177
+
178
+ # Small delay between spawns to avoid Discord rate limits
179
+ sleep 2
180
+ done
181
+
182
+ echo ""
183
+ echo "=== Update complete. ${#WORKER_NAMES[@]} worker(s) restarted. ==="
@@ -76,15 +76,18 @@ llm_analyze() {
76
76
 
77
77
  Keys:
78
78
  - status: one of: working, done_replied, done_silent, error, idle
79
- - action: one of: none, nudge_reply, nudge_error, nudge_idle
79
+ - action: one of: none, nudge_reply, nudge_error, nudge_idle, progress_update
80
80
  - reason: one short sentence explaining your assessment
81
+ - summary: (ONLY when action is progress_update) A brief 1-2 sentence user-facing summary of what the worker is currently doing. Be specific — mention file names, tools being run, or operations in progress. Example: \"Reading agent config files and analyzing the call flow pipeline.\" or \"Running TypeScript type-check after modifying 4 frontend components.\"
81
82
 
82
- Rules:
83
- - working: Claude is actively executing tools, thinking, or generating output. Action: none
84
- - done_replied: Worker finished AND used the discord-filtered reply MCP tool (you'll see 'discord-filtered - reply (MCP)' with result 'sent'). Action: none
85
- - done_silent: Worker finished work (wrote files, completed analysis, etc.) but NEVER used the reply MCP tool to send results to Discord. Action: nudge_reply
86
- - error: Worker hit a fatal error and stopped (Traceback, FATAL, crash at the prompt). Action: nudge_error. Note: errors from EARLIER that the worker recovered from do NOT count.
87
- - idle: Worker is sitting at the prompt with no clear completion or error. Action: nudge_idle"
83
+ Rules (check in this order):
84
+ 1. done_replied: If ANYWHERE in the output you see 'discord-filtered - reply (MCP)' or 'discord-filtered - reply_with_file (MCP)' followed by 'sent', the worker HAS replied. Status=done_replied, Action=none. This takes priority — even if the worker is now idle at the prompt, if it replied earlier it is done_replied NOT idle.
85
+ 2. working: Claude is actively executing tools, thinking, or generating output (not at the idle prompt). Action=progress_update. Include a summary field.
86
+ 3. error: Worker hit a fatal error and stopped (Traceback, FATAL, crash at the prompt). Action: nudge_error. Errors from EARLIER that the worker recovered from do NOT count only errors right before the current prompt.
87
+ 4. done_silent: Worker finished work (wrote files, completed analysis, etc.) but NEVER used the reply MCP tool anywhere in the visible output. Action: nudge_reply
88
+ 5. idle: Worker is sitting at the prompt with no clear completion, no error, and no reply tool usage. Action: nudge_idle
89
+
90
+ CRITICAL: If you see ANY 'discord-filtered - reply (MCP)' with 'sent' in the output, the answer is ALWAYS done_replied with action none, regardless of current prompt state."
88
91
 
89
92
  # Use jq to build the payload — handles all JSON escaping correctly
90
93
  local payload
@@ -102,7 +105,7 @@ ${pane_content}" \
102
105
  {role: "user", content: $user}
103
106
  ],
104
107
  temperature: 0,
105
- max_tokens: 150
108
+ max_tokens: 250
106
109
  }')
107
110
 
108
111
  local response
@@ -151,9 +154,9 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
151
154
  continue
152
155
  fi
153
156
 
154
- # Check nudge cooldown (don't analyze more than once per 10 minutes per worker)
157
+ # Check nudge cooldown (don't analyze more than once per 3 minutes per worker)
155
158
  NUDGE_FLAG="$WORKER_DIR/.watchdog-last-nudge"
156
- if [ -f "$NUDGE_FLAG" ] && [ -z "$(find "$NUDGE_FLAG" -mmin +10 2>/dev/null)" ]; then
159
+ if [ -f "$NUDGE_FLAG" ] && [ -z "$(find "$NUDGE_FLAG" -mmin +3 2>/dev/null)" ]; then
157
160
  continue
158
161
  fi
159
162
 
@@ -163,7 +166,16 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
163
166
  STATUS=$(echo "$ANALYSIS" | jq -r '.status // "unknown"')
164
167
  REASON=$(echo "$ANALYSIS" | jq -r '.reason // ""')
165
168
 
169
+ SUMMARY=$(echo "$ANALYSIS" | jq -r '.summary // ""')
170
+
166
171
  case "$ACTION" in
172
+ progress_update)
173
+ # Worker is actively working — post a progress summary to its channel
174
+ if [ -n "$SUMMARY" ]; then
175
+ touch "$NUDGE_FLAG"
176
+ discord_msg "$WORKER_CHANNEL" "⏳ $SUMMARY"
177
+ fi
178
+ ;;
167
179
  nudge_reply)
168
180
  touch "$NUDGE_FLAG"
169
181
  tmux send-keys -t "$TMUX_TARGET" \
@@ -186,7 +198,7 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
186
198
  "[watchdog] Worker **${WORKER}** — $REASON. Nudged it to respond."
187
199
  ;;
188
200
  none|*)
189
- # Worker is fine (working or already replied) — do nothing
201
+ # Worker is fine (already replied) — do nothing
190
202
  ;;
191
203
  esac
192
204
  done
@@ -1,4 +1,4 @@
1
- import { Client, GatewayIntentBits, type Message } from 'discord.js'
1
+ import { Client, GatewayIntentBits, type Message, type Attachment } from 'discord.js'
2
2
 
3
3
  export interface DiscordClientConfig {
4
4
  botToken: string
@@ -19,9 +19,30 @@ export function shouldForwardMessage(
19
19
  return true
20
20
  }
21
21
 
22
+ // When a message is too long, Discord auto-converts it to a .txt file attachment
23
+ // with empty message content. This fetches the text from those attachments.
24
+ async function resolveTextAttachments(message: Message): Promise<string> {
25
+ let content = message.content
26
+ const textAttachments = message.attachments.filter(
27
+ (a: Attachment) => a.contentType?.startsWith('text/') || a.name?.endsWith('.txt')
28
+ )
29
+ for (const attachment of textAttachments.values()) {
30
+ try {
31
+ const res = await fetch(attachment.url)
32
+ if (res.ok) {
33
+ const text = await res.text()
34
+ content = content ? `${content}\n\n${text}` : text
35
+ }
36
+ } catch (err) {
37
+ console.error(`[discord-filtered] Failed to fetch attachment ${attachment.name}: ${err}`)
38
+ }
39
+ }
40
+ return content
41
+ }
42
+
22
43
  export function createDiscordClient(
23
44
  config: DiscordClientConfig,
24
- onMessage: (message: Message) => void
45
+ onMessage: (content: string, message: Message) => void
25
46
  ) {
26
47
  const client = new Client({
27
48
  intents: [
@@ -31,7 +52,7 @@ export function createDiscordClient(
31
52
  ],
32
53
  })
33
54
 
34
- client.on('messageCreate', (message) => {
55
+ client.on('messageCreate', async (message) => {
35
56
  if (
36
57
  shouldForwardMessage(
37
58
  message.channel.id,
@@ -41,7 +62,10 @@ export function createDiscordClient(
41
62
  config.allowedUsers
42
63
  )
43
64
  ) {
44
- onMessage(message)
65
+ const content = await resolveTextAttachments(message)
66
+ if (content) {
67
+ onMessage(content, message)
68
+ }
45
69
  }
46
70
  })
47
71
 
@@ -37,8 +37,10 @@ function sendInterrupt(): boolean {
37
37
 
38
38
  const discord = createDiscordClient(
39
39
  { botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS },
40
- async (message) => {
41
- const content = message.content
40
+ async (content, message) => {
41
+ // Instant acknowledgment — user knows the message reached the session
42
+ try { await message.react('👀') } catch { /* ignore */ }
43
+
42
44
  const isInterrupt = /^!stop\b/i.test(content)
43
45
 
44
46
  if (isInterrupt) {
@@ -1,5 +1,4 @@
1
1
  const DISCORD_MAX_LENGTH = 2000
2
- const TRUNCATION_SUFFIX = '\n... (truncated)'
3
2
 
4
3
  export class MessageBatcher {
5
4
  private buffer: string[] = []
@@ -20,14 +19,65 @@ export class MessageBatcher {
20
19
 
21
20
  private async flush(): Promise<void> {
22
21
  if (this.buffer.length === 0) return
23
- let combined = this.buffer.join('\n')
22
+ const combined = this.buffer.join('\n')
24
23
  this.buffer = []
25
24
  this.timer = null
26
25
 
27
- if (combined.length > DISCORD_MAX_LENGTH) {
28
- combined = combined.slice(0, DISCORD_MAX_LENGTH - TRUNCATION_SUFFIX.length) + TRUNCATION_SUFFIX
26
+ // Split into multiple messages instead of truncating
27
+ const chunks = splitMessage(combined)
28
+ for (const chunk of chunks) {
29
+ await this.sendFn(chunk)
30
+ }
31
+ }
32
+ }
33
+
34
+ // Split long text into Discord-safe chunks, preferring line breaks as split points
35
+ function splitMessage(text: string): string[] {
36
+ if (text.length <= DISCORD_MAX_LENGTH) return [text]
37
+
38
+ const chunks: string[] = []
39
+ let remaining = text
40
+
41
+ while (remaining.length > 0) {
42
+ if (remaining.length <= DISCORD_MAX_LENGTH) {
43
+ chunks.push(remaining)
44
+ break
45
+ }
46
+
47
+ // Find a good split point: prefer double newline, then single newline, then space
48
+ let splitAt = -1
49
+ const searchWindow = remaining.slice(0, DISCORD_MAX_LENGTH)
50
+
51
+ // Try splitting at last paragraph break
52
+ const lastParagraph = searchWindow.lastIndexOf('\n\n')
53
+ if (lastParagraph > DISCORD_MAX_LENGTH * 0.3) {
54
+ splitAt = lastParagraph
55
+ }
56
+
57
+ // Fall back to last line break
58
+ if (splitAt === -1) {
59
+ const lastNewline = searchWindow.lastIndexOf('\n')
60
+ if (lastNewline > DISCORD_MAX_LENGTH * 0.3) {
61
+ splitAt = lastNewline
62
+ }
63
+ }
64
+
65
+ // Fall back to last space
66
+ if (splitAt === -1) {
67
+ const lastSpace = searchWindow.lastIndexOf(' ')
68
+ if (lastSpace > DISCORD_MAX_LENGTH * 0.3) {
69
+ splitAt = lastSpace
70
+ }
71
+ }
72
+
73
+ // Hard split as last resort
74
+ if (splitAt === -1) {
75
+ splitAt = DISCORD_MAX_LENGTH
29
76
  }
30
77
 
31
- await this.sendFn(combined)
78
+ chunks.push(remaining.slice(0, splitAt))
79
+ remaining = remaining.slice(splitAt).replace(/^\n+/, '')
32
80
  }
81
+
82
+ return chunks
33
83
  }