npm - onkol - Versions diffs - 0.4.0 → 0.5.1 - Mend

onkol 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +322 -0
package/dist/cli/index.js +177 -1
package/dist/cli/prompts.d.ts +4 -0
package/dist/cli/prompts.js +65 -0
package/dist/cli/systemd.js +1 -0
package/dist/plugin/discord-client.d.ts +1 -1
package/dist/plugin/discord-client.js +24 -2
package/dist/plugin/index.js +53 -2
package/dist/plugin/message-batcher.js +47 -5
package/package.json +1 -1
package/scripts/spawn-worker.sh +21 -12
package/scripts/update-and-restart.sh +183 -0
package/scripts/worker-watchdog.sh +23 -11
package/src/plugin/discord-client.ts +28 -4
package/src/plugin/index.ts +4 -2
package/src/plugin/message-batcher.ts +55 -5

package/README.md ADDED Viewed

@@ -0,0 +1,322 @@
+# Onkol
+Your AI on-call team. One command per VM, and you get an autonomous agent on Discord that handles bugs, features, analysis, and ops so you don't have to.
+Onkol turns Claude Code into a decentralized on-call system. Each VM runs an orchestrator that listens on Discord. You describe a problem in plain English, it spins up a dedicated worker session to solve it, and reports back when it's done.
+## How it works
+```
+You on Discord:  "the auth endpoint is returning 403 after token refresh"
+                              |
+                    Orchestrator (Claude Code)
+                    reads your message, understands intent,
+                    prepares context, spawns a worker
+                              |
+                    Worker (new Claude Code session)
+                    diagnoses the bug, fixes auth.py,
+                    runs tests, commits to a branch
+                              |
+You on Discord:  "Fixed. Clock skew between auth server and app server.
+                  Added 5s tolerance. Tests pass. Branch: fix/auth-403"
+```
+**What makes it different:**
+- **Decentralized.** Each VM is self-contained. No central server. 10 VMs = 10 independent agents.
+- **Intent-driven.** Say "fix this" and it fixes autonomously. Say "look into this" and it investigates without touching code. Your phrasing controls the behavior.
+- **Gets smarter.** Every resolved task leaves behind a learning. Next time a similar issue comes up, the agent already knows what to look for.
+- **Works behind firewalls.** All connections are outbound to Discord. No inbound ports, no SSH tunnels, no VPN required.
+## Real-world setup
+The intended way to use Onkol is with a **dedicated Discord server** that becomes your ops control center.
+I manage about 10 applications across prod and staging. I created one Discord server and set it up exclusively for Onkol. Each VM I onboard creates its own category with an orchestrator channel. My Discord sidebar looks like this:
+```
+MY-INFRA (Discord server)
+│
+├── API-SERVER-PROD                ← VM running in GCP
+│   ├── #orchestrator              ← talk to this VM's brain here
+│   ├── #fix-auth-403              ← active worker (auto-created)
+│   └── #analyze-error-logs        ← active worker (auto-created)
+│
+├── WEB-APP-STAGING                ← VM running in AWS
+│   └── #orchestrator
+│
+├── BACKEND-PROD                   ← VM behind corporate VPN
+│   ├── #orchestrator
+│   └── #add-export-endpoint       ← active worker
+│
+├── DATA-PIPELINE-STAGING          ← Another GCP VM
+│   └── #orchestrator
+│
+└── ... (as many VMs as you have)
+```
+### The workflow
+From your phone, laptop, or anywhere with Discord:
+1. Open the server, go to `#orchestrator` under the VM you care about
+2. Type what you need: "there's a bug where users get 403 after token refresh"
+3. The orchestrator creates a new channel `#fix-auth-403` and spawns a worker
+4. The worker posts its progress and findings in `#fix-auth-403`
+5. You can jump into that channel to give more context or redirect
+6. When it's done, the orchestrator dissolves the worker, the channel disappears, learnings are saved
+You can do this from a party, a flight, or bed at 2 AM. You're just texting on Discord. The agent does the SSH, the debugging, the code reading, the fixing.
+### Multiple VMs, one view
+Every VM is a category. Every task is a channel. You see your entire infrastructure at a glance in the Discord sidebar. No dashboards to build, no web apps to deploy. Discord IS the dashboard.
+The VMs don't need to know about each other. Each one connects outbound to Discord independently. If a VM is behind a VPN you can only reach from one specific laptop, doesn't matter. As long as it has outbound HTTPS, it can connect to Discord and you can talk to it.
+### Setting up a new VM
+```bash
+# SSH into the VM (one time only)
+ssh user@my-new-vm
+# Run setup (2 minutes)
+npx onkol@latest setup
+# Answer the questions, done.
+# A new category appears in your Discord server.
+# You never need to SSH into this VM again.
+```
+## Quick start
+### Prerequisites
+You need these on the VM where you're setting up:
+| Tool | Why | Install |
+|------|-----|---------|
+| **Node.js 18+** | Runs the setup CLI | [nodejs.org](https://nodejs.org) |
+| **Bun** | Runs the Discord channel plugin | `curl -fsSL https://bun.sh/install \| bash` |
+| **Claude Code** | The AI that does the work | [docs.anthropic.com](https://docs.anthropic.com/en/docs/claude-code/getting-started) |
+| **tmux** | Keeps sessions alive | `apt install tmux` / `yum install tmux` |
+| **jq** | JSON processing in scripts | `apt install jq` / `yum install jq` |
+Claude Code must be logged in via `claude.ai` OAuth on the VM (not API key).
+The setup wizard checks all dependencies before asking any questions. If something's missing, it tells you exactly what to install and exits without wasting your time.
+### Create a Discord bot
+1. Go to [discord.com/developers/applications](https://discord.com/developers/applications)
+2. New Application, name it, Create
+3. Bot, Reset Token, **copy it** (you only see it once)
+4. Bot, Privileged Gateway Intents, enable **Message Content Intent**, Save
+5. OAuth2, URL Generator, check `bot`, check permissions:
+   - View Channels, Send Messages, Read Message History, Attach Files, Manage Channels
+6. Copy the URL, open in browser, invite to your Discord server
+The setup wizard validates your bot token and checks that Message Content Intent is enabled before proceeding. If something's wrong, it tells you exactly what to fix.
+### Run setup
+```bash
+npx onkol@latest setup
+```
+The wizard walks you through everything:
+```
+Welcome to Onkol Setup
+Checking dependencies...
+  ✓ claude
+  ✓ bun
+  ✓ tmux
+  ✓ jq
+  ✓ curl
+  All dependencies found.
+✔ Where should Onkol live? ~/onkol
+✔ What should this node be called? api-server-prod
+✔ Discord bot token: ****
+✔ Discord server (guild) ID: 1234567890
+✔ Your Discord user ID: 9876543210
+✔ Registry file? Write a prompt — tell Claude what to find
+✔ Describe: Find the API endpoints and database URLs from .env
+✔ Service summary? Auto-discover
+✔ CLAUDE.md? Yes — This is a Node.js API server deployed via docker...
+✔ Plugins? context7, superpowers, code-simplifier
+✓ Bot token is valid
+✓ Message Content intent is enabled
+✓ Discord category and #orchestrator channel created
+✓ 6 scripts installed
+✓ Plugin installed with 4 files + dependencies
+✓ Systemd service installed and enabled
+✓ Orchestrator started in tmux session "onkol-api-server-prod"
+✓ Onkol node "api-server-prod" is live!
+```
+Go to your Discord server. You'll see a new category with an `#orchestrator` channel. Send it a message.
+## Usage
+### Talking to the orchestrator
+The orchestrator lives in the `#orchestrator` channel of your node's category. It reads your intent from how you phrase things:
+| You say | What happens |
+|---------|-------------|
+| "fix the 403 bug in auth" | Spawns a worker that diagnoses, fixes, tests, and commits |
+| "look into why response times are high" | Spawns a worker that investigates and reports, no code changes |
+| "add retry logic to the webhook handler" | Spawns a worker that implements, tests, and waits for your approval |
+| "analyze transferred calls for the last 3 weeks" | Spawns a worker that reads logs/data and produces an analysis |
+| "just ship it" | Fully autonomous, pushes and deploys (asks for confirmation first) |
+### How workers work
+When the orchestrator spawns a worker:
+1. A new Discord channel appears in your category (e.g., `#fix-auth-bug`)
+2. A new Claude Code session starts in tmux on the VM
+3. The worker posts progress and results in its Discord channel
+4. You can talk to the worker directly in that channel
+5. When done, tell the orchestrator to dissolve it. The channel disappears, learnings are saved.
+### Managing workers
+From the orchestrator channel:
+- "dissolve fix-auth-bug" kills the worker, saves learnings, deletes channel
+- "list workers" shows all active workers
+- "check on fix-auth-bug" gets the worker's current status
+### Setup prompts
+During setup, you can describe things in plain English instead of providing config files:
+- **Registry**: "Find the API endpoints from .env and the S3 bucket from AWS CLI"
+- **Services**: Auto-discovers running services, or you describe what to look for
+- **CLAUDE.md**: "This is a Node.js API server, Express, deployed via docker..."
+The orchestrator executes these prompts on first boot and generates the structured files.
+## Architecture
+```
+Your Discord Server
+├── Category: api-server-prod           ← VM 1
+│   ├── #orchestrator                   ← persistent Claude Code session
+│   ├── #fix-auth-bug                   ← worker (temporary)
+│   └── #analyze-error-logs             ← worker (temporary)
+├── Category: web-app-staging           ← VM 2
+│   └── #orchestrator
+└── Category: backend-prod              ← VM 3
+    └── #orchestrator
+```
+Each VM runs independently:
+- **Orchestrator.** Long-running Claude Code session in tmux. Receives Discord messages, spawns workers, manages lifecycle.
+- **Workers.** Ephemeral Claude Code sessions. One per task. Each gets its own Discord channel, its own context, its own instructions.
+- **discord-filtered plugin.** Custom MCP channel server that routes Discord messages by channel ID. All sessions share one bot but each only hears its own channel.
+### On-disk structure
+```
+~/onkol/
+├── config.json          # Node config (bot token, server ID, etc.)
+├── registry.json        # VM-specific secrets, endpoints, ports
+├── services.md          # What runs on this VM
+├── CLAUDE.md            # Orchestrator instructions
+├── knowledge/           # Learnings from dissolved workers
+│   ├── index.json
+│   └── 2026-03-22-fix-auth-clock-skew.md
+├── workers/
+│   ├── tracking.json    # Active workers
+│   └── fix-auth-bug/    # Worker directory (while active)
+├── scripts/             # Lifecycle scripts
+└── plugins/
+    └── discord-filtered/  # MCP channel plugin
+```
+### Knowledge base
+Every dissolved worker leaves behind a learning:
+```markdown
+## What happened
+Token validation rejected valid tokens for 2-3 seconds after refresh.
+## Root cause
+No clock skew tolerance between auth server and app server.
+## Fix
+Added 5-second CLOCK_SKEW_TOLERANCE in auth.py:47.
+## For next time
+If 403 errors appear after token operations, check clock sync first.
+```
+The orchestrator includes relevant past learnings when spawning new workers. The system gets better at diagnosing issues over time.
+## Resumable setup
+If setup fails midway (missing dependency, network error, wrong bot token), your answers are saved automatically. Next time you run `npx onkol setup`, it offers to resume:
+```
+? Found a previous setup attempt (4 steps completed). What do you want to do?
+  ❯ Resume from where it left off (node: api-server-prod)
+    Start fresh
+```
+No re-entering bot tokens or server IDs. It picks up right where it left off.
+## Commands
+```bash
+npx onkol setup          # Interactive setup wizard
+npx onkol@latest setup   # Force latest version
+```
+On the VM after setup:
+```bash
+# Attach to the orchestrator
+tmux attach -t onkol-<node-name>
+# Check service status
+systemctl status onkol-<node-name>
+# Restart orchestrator
+sudo systemctl restart onkol-<node-name>
+# View active workers
+bash ~/onkol/scripts/list-workers.sh
+# Manually dissolve a worker
+bash ~/onkol/scripts/dissolve-worker.sh --name "worker-name"
+```
+## Requirements
+- Claude Code with `claude.ai` OAuth login (Max plan recommended for concurrent sessions)
+- Node.js 18+ and Bun on each VM
+- tmux and jq on each VM
+- A Discord server with a bot that has Manage Channels permission
+- VMs need outbound HTTPS access (no inbound ports needed)
+## How it's built
+| Component | Tech | Lines |
+|-----------|------|-------|
+| Setup wizard | Node.js, TypeScript, Inquirer | ~500 |
+| Discord channel plugin | Bun, MCP SDK, discord.js | ~300 |
+| Worker lifecycle scripts | Bash | ~400 |
+| Orchestrator/worker templates | Handlebars | ~150 |
+The core mechanism is [Claude Code Channels](https://code.claude.com/docs/en/channels), an MCP-based system that pushes Discord messages into Claude Code sessions. The `discord-filtered` plugin is a custom channel that routes by Discord channel ID, allowing multiple sessions to share one bot.
+## License
+MIT

package/dist/cli/index.js CHANGED Viewed

@@ -212,6 +212,14 @@ program
             maxWorkers: 3,
             installDir: dir,
             plugins: answers.plugins,
+            ...(answers.watchdogProvider !== 'skip' ? {
+                watchdog: {
+                    provider: answers.watchdogProvider,
+                    model: answers.watchdogModel,
+                    apiKey: answers.watchdogApiKey,
+                    ...(answers.watchdogApiUrl ? { apiUrl: answers.watchdogApiUrl } : {}),
+                },
+            } : {}),
         };
         writeFileSync(resolve(dir, 'config.json'), JSON.stringify(config, null, 2), { mode: 0o600 });
         markStep(homeDir, checkpoint, 'config');
@@ -252,6 +260,7 @@ program
                         DISCORD_BOT_TOKEN: answers.botToken,
                         DISCORD_CHANNEL_ID: orchChannelId,
                         DISCORD_ALLOWED_USERS: JSON.stringify(allowedUsers),
+                        TMUX_TARGET: `onkol-${answers.nodeName}`,
                     },
                 },
             },
@@ -294,7 +303,7 @@ program
         console.log(chalk.gray('  Config files already written, skipping'));
     }
     // --- CRITICAL: Copy scripts ---
-    const requiredScripts = ['spawn-worker.sh', 'dissolve-worker.sh', 'list-workers.sh', 'check-worker.sh', 'healthcheck.sh', 'start-orchestrator.sh'];
+    const requiredScripts = ['spawn-worker.sh', 'dissolve-worker.sh', 'list-workers.sh', 'check-worker.sh', 'healthcheck.sh', 'worker-watchdog.sh', 'start-orchestrator.sh'];
     const scriptsSource = resolve(__dirname, '../../scripts');
     if (skip('scripts')) {
         console.log(chalk.gray('  Scripts already installed, skipping'));
@@ -424,12 +433,16 @@ program
                 const timerDir = resolve(homeDir, '.config/systemd/user');
                 mkdirSync(timerDir, { recursive: true });
                 const healthcheckPath = resolve(dir, 'scripts/healthcheck.sh');
+                const watchdogPath = resolve(dir, 'scripts/worker-watchdog.sh');
                 writeFileSync(resolve(timerDir, 'onkol-healthcheck.service'), `[Unit]\nDescription=Onkol healthcheck\n[Service]\nType=oneshot\nExecStart=${healthcheckPath}\n`);
                 writeFileSync(resolve(timerDir, 'onkol-healthcheck.timer'), `[Unit]\nDescription=Onkol healthcheck every 5min\n[Timer]\nOnBootSec=2min\nOnUnitActiveSec=5min\n[Install]\nWantedBy=timers.target\n`);
+                writeFileSync(resolve(timerDir, 'onkol-worker-watchdog.service'), `[Unit]\nDescription=Onkol worker watchdog\n[Service]\nType=oneshot\nExecStart=${watchdogPath}\n`);
+                writeFileSync(resolve(timerDir, 'onkol-worker-watchdog.timer'), `[Unit]\nDescription=Onkol worker watchdog every 3min\n[Timer]\nOnBootSec=3min\nOnUnitActiveSec=3min\n[Install]\nWantedBy=timers.target\n`);
                 writeFileSync(resolve(timerDir, 'onkol-cleanup.service'), `[Unit]\nDescription=Onkol archive cleanup\n[Service]\nType=oneshot\nExecStart=/usr/bin/find ${resolve(dir, 'workers/.archive')} -maxdepth 1 -mtime +30 -exec rm -rf {} \\;\n`);
                 writeFileSync(resolve(timerDir, 'onkol-cleanup.timer'), `[Unit]\nDescription=Onkol archive cleanup daily\n[Timer]\nOnCalendar=*-*-* 04:00:00\n[Install]\nWantedBy=timers.target\n`);
                 execSync('systemctl --user daemon-reload', { stdio: 'pipe' });
                 execSync('systemctl --user enable --now onkol-healthcheck.timer', { stdio: 'pipe' });
+                execSync('systemctl --user enable --now onkol-worker-watchdog.timer', { stdio: 'pipe' });
                 execSync('systemctl --user enable --now onkol-cleanup.timer', { stdio: 'pipe' });
             }
             console.log(chalk.green(`✓ Systemd user timers installed (healthcheck every 5min, cleanup daily)`));
@@ -519,4 +532,167 @@ program
     console.log(chalk.gray(`\n  To attach to the session: tmux attach -t onkol-${answers.nodeName}`));
     console.log(chalk.gray(`  To check status: systemctl status onkol-${answers.nodeName}`));
 });
+program
+    .command('update')
+    .description('Update plugin + scripts and restart workers with conversation history preserved')
+    .option('--skip-update', 'Skip pulling latest npm package, just restart workers')
+    .option('--dir <path>', 'Onkol install directory', '')
+    .action(async (opts) => {
+    // Find install directory
+    let dir = opts.dir;
+    if (!dir) {
+        // Try common locations
+        const homeDir = process.env.HOME || '';
+        const candidates = [
+            resolve(homeDir, 'onkol'),
+            resolve(homeDir, '.onkol'),
+            '/opt/onkol',
+        ];
+        for (const c of candidates) {
+            if (existsSync(resolve(c, 'config.json'))) {
+                dir = c;
+                break;
+            }
+        }
+    }
+    if (!dir || !existsSync(resolve(dir, 'config.json'))) {
+        console.log(chalk.red('Could not find Onkol install. Use --dir <path> to specify.'));
+        process.exit(1);
+    }
+    const config = JSON.parse(readFileSync(resolve(dir, 'config.json'), 'utf-8'));
+    const nodeName = config.nodeName;
+    console.log(chalk.bold('=== Onkol Update & Restart ==='));
+    console.log(chalk.gray(`Node: ${nodeName}`));
+    console.log(chalk.gray(`Install dir: ${dir}`));
+    console.log('');
+    // Step 1: Update files
+    if (!opts.skipUpdate) {
+        console.log(chalk.cyan('[1/3] Updating files from npm package...'));
+        try {
+            // Find where this CLI is running from — that's the latest package
+            // __dirname is dist/cli/, so pkgRoot is the npm package root
+            const pkgRoot = resolve(__dirname, '..');
+            const { readdirSync, chmodSync } = await import('fs');
+            // Try src/plugin first (has .ts files), then dist/plugin (.js files)
+            let pluginUpdated = false;
+            for (const candidate of ['src/plugin', 'dist/plugin']) {
+                const pluginSrc = resolve(pkgRoot, candidate);
+                if (existsSync(pluginSrc)) {
+                    const pluginDest = resolve(dir, 'plugins/discord-filtered');
+                    mkdirSync(pluginDest, { recursive: true });
+                    for (const f of readdirSync(pluginSrc)) {
+                        if (f.endsWith('.ts') || f.endsWith('.js')) {
+                            copyFileSync(resolve(pluginSrc, f), resolve(pluginDest, f));
+                        }
+                    }
+                    console.log(chalk.green(`  ✓ Plugin files updated (from ${candidate})`));
+                    pluginUpdated = true;
+                    break;
+                }
+            }
+            if (!pluginUpdated) {
+                console.log(chalk.yellow(`  ⚠ No plugin source found in package (looked in ${pkgRoot})`));
+            }
+            // Copy scripts
+            const scriptsSrc = resolve(pkgRoot, 'scripts');
+            if (existsSync(scriptsSrc)) {
+                mkdirSync(resolve(dir, 'scripts'), { recursive: true });
+                let count = 0;
+                for (const f of readdirSync(scriptsSrc)) {
+                    if (f.endsWith('.sh')) {
+                        copyFileSync(resolve(scriptsSrc, f), resolve(dir, 'scripts', f));
+                        chmodSync(resolve(dir, 'scripts', f), 0o755);
+                        count++;
+                    }
+                }
+                console.log(chalk.green(`  ✓ ${count} scripts updated`));
+            }
+            else {
+                console.log(chalk.yellow(`  ⚠ No scripts dir found at ${scriptsSrc}`));
+            }
+        }
+        catch (err) {
+            console.log(chalk.yellow(`  ⚠ Update failed: ${err instanceof Error ? err.message : err}`));
+            console.log(chalk.yellow('  Continuing with restart...'));
+        }
+    }
+    else {
+        console.log(chalk.gray('[1/3] Skipping update (--skip-update)'));
+    }
+    console.log('');
+    // Step 2: Find active workers and their session IDs
+    console.log(chalk.cyan('[2/3] Dissolving active workers...'));
+    const trackingPath = resolve(dir, 'workers/tracking.json');
+    if (!existsSync(trackingPath)) {
+        console.log(chalk.gray('  No active workers.'));
+        console.log(chalk.green.bold('\n✓ Update complete. No workers to restart.'));
+        return;
+    }
+    const tracking = JSON.parse(readFileSync(trackingPath, 'utf-8'));
+    const active = tracking.filter((w) => w.status === 'active');
+    if (active.length === 0) {
+        console.log(chalk.gray('  No active workers.'));
+        console.log(chalk.green.bold('\n✓ Update complete. No workers to restart.'));
+        return;
+    }
+    const workers = [];
+    for (const w of active) {
+        // Find session ID: look in ~/.claude/projects/<encoded-path>/
+        const encoded = '-' + w.workDir.replace(/^\//, '').replace(/\//g, '-');
+        const sessionDir = resolve(process.env.HOME || '', '.claude/projects', encoded);
+        let sessionId = '';
+        try {
+            const { readdirSync, statSync } = await import('fs');
+            const jsonls = readdirSync(sessionDir)
+                .filter((f) => f.endsWith('.jsonl'))
+                .map((f) => ({ name: f, mtime: statSync(resolve(sessionDir, f)).mtimeMs }))
+                .sort((a, b) => a.mtime - b.mtime);
+            if (jsonls.length > 0) {
+                sessionId = jsonls[jsonls.length - 1].name.replace('.jsonl', '');
+            }
+        }
+        catch { /* session dir may not exist */ }
+        workers.push({ name: w.name, workDir: w.workDir, intent: w.intent, sessionId });
+        console.log(chalk.gray(`  ${w.name} → session: ${sessionId || 'none'}`));
+    }
+    console.log('');
+    // Dissolve
+    for (const w of workers) {
+        try {
+            execSync(`bash "${resolve(dir, 'scripts/dissolve-worker.sh')}" --name "${w.name}"`, { stdio: 'pipe' });
+            console.log(chalk.gray(`  ✓ ${w.name} dissolved`));
+        }
+        catch (err) {
+            console.log(chalk.yellow(`  ⚠ Failed to dissolve ${w.name}: ${err instanceof Error ? err.message : err}`));
+        }
+    }
+    console.log('');
+    // Step 3: Respawn with --resume
+    console.log(chalk.cyan('[3/3] Respawning workers with --resume...'));
+    for (const w of workers) {
+        const resumeArg = w.sessionId ? `--resume ${w.sessionId}` : '';
+        const cmd = `bash "${resolve(dir, 'scripts/spawn-worker.sh')}" \
+        --name "${w.name}" \
+        --dir "${w.workDir}" \
+        --task "Continue the previous work. Check your conversation history for context." \
+        --intent "${w.intent}" \
+        ${resumeArg}`;
+        try {
+            const output = execSync(cmd, { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] });
+            console.log(chalk.green(`  ✓ ${w.name} respawned${w.sessionId ? ' (resumed)' : ''}`));
+            if (output.trim())
+                console.log(chalk.gray(`    ${output.trim()}`));
+        }
+        catch (err) {
+            console.log(chalk.red(`  ✗ Failed to spawn ${w.name}`));
+            if (err.stderr)
+                console.log(chalk.red(`    stderr: ${err.stderr.toString().trim()}`));
+            if (err.stdout)
+                console.log(chalk.gray(`    stdout: ${err.stdout.toString().trim()}`));
+        }
+        // Small delay to avoid Discord rate limits
+        await new Promise(r => setTimeout(r, 2000));
+    }
+    console.log(chalk.green.bold(`\n✓ Update complete. ${workers.length} worker(s) restarted.`));
+});
 program.parse();

package/dist/cli/prompts.d.ts CHANGED Viewed

@@ -13,5 +13,9 @@ export interface SetupAnswers {
     claudeMdMode: 'prompt' | 'skip';
     claudeMdPrompt: string | null;
     plugins: string[];
+    watchdogProvider: 'openrouter' | 'gemini' | 'custom' | 'skip';
+    watchdogModel: string | null;
+    watchdogApiKey: string | null;
+    watchdogApiUrl: string | null;
 }
 export declare function runSetupPrompts(homeDir: string): Promise<SetupAnswers>;

package/dist/cli/prompts.js CHANGED Viewed

@@ -165,8 +165,70 @@ export async function runSetupPrompts(homeDir) {
                 { name: 'frontend-design', value: 'frontend-design', checked: false },
             ],
         },
+        {
+            type: 'list',
+            name: 'watchdogProvider',
+            message: 'Worker watchdog LLM (monitors workers, nudges if stuck/silent):',
+            choices: [
+                { name: 'OpenRouter (recommended — use any model via openrouter.ai)', value: 'openrouter' },
+                { name: 'Google Gemini (direct API)', value: 'gemini' },
+                { name: 'Custom OpenAI-compatible endpoint', value: 'custom' },
+                { name: 'Skip (disable LLM watchdog)', value: 'skip' },
+            ],
+        },
+        {
+            type: 'list',
+            name: 'watchdogModel',
+            message: 'Watchdog model:',
+            choices: (a) => {
+                const base = [
+                    { name: 'google/gemini-2.5-flash (fast, cheap)', value: 'google/gemini-2.5-flash' },
+                    { name: 'google/gemini-2.0-flash-001 (fast, cheap)', value: 'google/gemini-2.0-flash-001' },
+                    { name: 'anthropic/claude-haiku (fast)', value: 'anthropic/claude-3-5-haiku-20241022' },
+                    { name: 'Custom — enter model ID', value: '__custom__' },
+                ];
+                if (a.watchdogProvider === 'gemini') {
+                    return [
+                        { name: 'gemini-2.5-flash-preview-05-20 (recommended)', value: 'gemini-2.5-flash-preview-05-20' },
+                        { name: 'gemini-2.0-flash', value: 'gemini-2.0-flash' },
+                        { name: 'Custom — enter model ID', value: '__custom__' },
+                    ];
+                }
+                return base;
+            },
+            when: (a) => a.watchdogProvider !== 'skip',
+        },
+        {
+            type: 'input',
+            name: 'watchdogModelCustom',
+            message: 'Enter model ID:',
+            when: (a) => a.watchdogProvider !== 'skip' && a.watchdogModel === '__custom__',
+        },
+        {
+            type: 'password',
+            name: 'watchdogApiKey',
+            message: (a) => {
+                if (a.watchdogProvider === 'openrouter')
+                    return 'OpenRouter API key (sk-or-...):';
+                if (a.watchdogProvider === 'gemini')
+                    return 'Google Gemini API key:';
+                return 'API key:';
+            },
+            mask: '*',
+            when: (a) => a.watchdogProvider !== 'skip',
+        },
+        {
+            type: 'input',
+            name: 'watchdogApiUrl',
+            message: 'API base URL (OpenAI-compatible, e.g. https://api.example.com/v1/chat/completions):',
+            when: (a) => a.watchdogProvider === 'custom',
+        },
     ]);
     const answers = { ...preDiscordAnswers, ...discordAndRestAnswers };
+    // Resolve custom model selection
+    const watchdogModel = answers.watchdogModel === '__custom__'
+        ? (answers.watchdogModelCustom || null)
+        : (answers.watchdogModel || null);
     return {
         ...answers,
         registryPath: answers.registryPath || null,
@@ -174,5 +236,8 @@ export async function runSetupPrompts(homeDir) {
         serviceSummaryPath: answers.serviceSummaryPath || null,
         servicesPrompt: answers.servicesPrompt || null,
         claudeMdPrompt: answers.claudeMdPrompt || null,
+        watchdogModel,
+        watchdogApiKey: answers.watchdogApiKey || null,
+        watchdogApiUrl: answers.watchdogApiUrl || null,
     };
 }

package/dist/cli/systemd.js CHANGED Viewed

@@ -35,6 +35,7 @@ WantedBy=multi-user.target
 }
 export function generateCrontab(onkolDir) {
     return `*/5 * * * * ${onkolDir}/scripts/healthcheck.sh
+*/3 * * * * ${onkolDir}/scripts/worker-watchdog.sh
 0 4 * * * find ${onkolDir}/workers/.archive -maxdepth 1 -mtime +30 -exec rm -rf {} \\;
 `;
 }

package/dist/plugin/discord-client.d.ts CHANGED Viewed

@@ -5,7 +5,7 @@ export interface DiscordClientConfig {
     allowedUsers: string[];
 }
 export declare function shouldForwardMessage(messageChannelId: string, authorId: string, isBot: boolean, targetChannelId: string, allowedUsers: string[]): boolean;
-export declare function createDiscordClient(config: DiscordClientConfig, onMessage: (message: Message) => void): {
+export declare function createDiscordClient(config: DiscordClientConfig, onMessage: (content: string, message: Message) => void): {
     login: () => Promise<string>;
     client: Client<boolean>;
     sendMessage(channelId: string, text: string): Promise<void>;

package/dist/plugin/discord-client.js CHANGED Viewed

@@ -8,6 +8,25 @@ export function shouldForwardMessage(messageChannelId, authorId, isBot, targetCh
         return false;
     return true;
 }
+// When a message is too long, Discord auto-converts it to a .txt file attachment
+// with empty message content. This fetches the text from those attachments.
+async function resolveTextAttachments(message) {
+    let content = message.content;
+    const textAttachments = message.attachments.filter((a) => a.contentType?.startsWith('text/') || a.name?.endsWith('.txt'));
+    for (const attachment of textAttachments.values()) {
+        try {
+            const res = await fetch(attachment.url);
+            if (res.ok) {
+                const text = await res.text();
+                content = content ? `${content}\n\n${text}` : text;
+            }
+        }
+        catch (err) {
+            console.error(`[discord-filtered] Failed to fetch attachment ${attachment.name}: ${err}`);
+        }
+    }
+    return content;
+}
 export function createDiscordClient(config, onMessage) {
     const client = new Client({
         intents: [
@@ -16,9 +35,12 @@ export function createDiscordClient(config, onMessage) {
             GatewayIntentBits.MessageContent,
         ],
     });
-    client.on('messageCreate', (message) => {
+    client.on('messageCreate', async (message) => {
         if (shouldForwardMessage(message.channel.id, message.author.id, message.author.bot, config.channelId, config.allowedUsers)) {
-            onMessage(message);
+            const content = await resolveTextAttachments(message);
+            if (content) {
+                onMessage(content, message);
+            }
         }
     });
     client.on('ready', () => {

package/dist/plugin/index.js CHANGED Viewed

@@ -1,11 +1,13 @@
 #!/usr/bin/env bun
 import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
+import { execSync } from 'child_process';
 import { createMcpServer } from './mcp-server.js';
 import { createDiscordClient } from './discord-client.js';
 import { MessageBatcher } from './message-batcher.js';
 const BOT_TOKEN = process.env.DISCORD_BOT_TOKEN;
 const CHANNEL_ID = process.env.DISCORD_CHANNEL_ID;
 const ALLOWED_USERS = JSON.parse(process.env.DISCORD_ALLOWED_USERS || '[]');
+const TMUX_TARGET = process.env.TMUX_TARGET || '';
 if (!BOT_TOKEN) {
     console.error('[discord-filtered] DISCORD_BOT_TOKEN is required');
     process.exit(1);
@@ -14,11 +16,60 @@ if (!CHANNEL_ID) {
     console.error('[discord-filtered] DISCORD_CHANNEL_ID is required');
     process.exit(1);
 }
-const discord = createDiscordClient({ botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS }, async (message) => {
+function sendInterrupt() {
+    if (!TMUX_TARGET) {
+        console.error('[discord-filtered] !stop received but TMUX_TARGET not set — cannot interrupt');
+        return false;
+    }
+    try {
+        // Escape is Claude Code's interrupt key
+        execSync(`tmux send-keys -t ${JSON.stringify(TMUX_TARGET)} Escape`, { stdio: 'pipe' });
+        console.error(`[discord-filtered] Sent interrupt (Escape) to ${TMUX_TARGET}`);
+        return true;
+    }
+    catch (err) {
+        console.error(`[discord-filtered] Failed to send interrupt: ${err}`);
+        return false;
+    }
+}
+const discord = createDiscordClient({ botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS }, async (content, message) => {
+    // Instant acknowledgment — user knows the message reached the session
+    try {
+        await message.react('👀');
+    }
+    catch { /* ignore */ }
+    const isInterrupt = /^!stop\b/i.test(content);
+    if (isInterrupt) {
+        sendInterrupt();
+        // Strip the !stop prefix and forward the rest as a normal message
+        const rest = content.replace(/^!stop\s*/i, '').trim();
+        // React to confirm the interrupt was received
+        try {
+            await message.react('🛑');
+        }
+        catch { /* ignore */ }
+        // Small delay to let Claude Code process the Escape before the new message arrives
+        await new Promise(r => setTimeout(r, 1500));
+        // Forward the message (with or without remaining text)
+        await mcpServer.notification({
+            method: 'notifications/claude/channel',
+            params: {
+                content: rest || '[interrupted by user]',
+                meta: {
+                    channel_id: message.channel.id,
+                    sender: message.author.username,
+                    sender_id: message.author.id,
+                    message_id: message.id,
+                    interrupt: true,
+                },
+            },
+        });
+        return;
+    }
     await mcpServer.notification({
         method: 'notifications/claude/channel',
         params: {
-            content: message.content,
+            content: content,
             meta: {
                 channel_id: message.channel.id,
                 sender: message.author.username,

package/dist/plugin/message-batcher.js CHANGED Viewed

@@ -1,5 +1,4 @@
 const DISCORD_MAX_LENGTH = 2000;
-const TRUNCATION_SUFFIX = '\n... (truncated)';
 export class MessageBatcher {
     buffer = [];
     timer = null;
@@ -18,12 +17,55 @@ export class MessageBatcher {
     async flush() {
         if (this.buffer.length === 0)
             return;
-        let combined = this.buffer.join('\n');
+        const combined = this.buffer.join('\n');
         this.buffer = [];
         this.timer = null;
-        if (combined.length > DISCORD_MAX_LENGTH) {
-            combined = combined.slice(0, DISCORD_MAX_LENGTH - TRUNCATION_SUFFIX.length) + TRUNCATION_SUFFIX;
+        // Split into multiple messages instead of truncating
+        const chunks = splitMessage(combined);
+        for (const chunk of chunks) {
+            await this.sendFn(chunk);
         }
-        await this.sendFn(combined);
     }
 }
+// Split long text into Discord-safe chunks, preferring line breaks as split points
+function splitMessage(text) {
+    if (text.length <= DISCORD_MAX_LENGTH)
+        return [text];
+    const chunks = [];
+    let remaining = text;
+    while (remaining.length > 0) {
+        if (remaining.length <= DISCORD_MAX_LENGTH) {
+            chunks.push(remaining);
+            break;
+        }
+        // Find a good split point: prefer double newline, then single newline, then space
+        let splitAt = -1;
+        const searchWindow = remaining.slice(0, DISCORD_MAX_LENGTH);
+        // Try splitting at last paragraph break
+        const lastParagraph = searchWindow.lastIndexOf('\n\n');
+        if (lastParagraph > DISCORD_MAX_LENGTH * 0.3) {
+            splitAt = lastParagraph;
+        }
+        // Fall back to last line break
+        if (splitAt === -1) {
+            const lastNewline = searchWindow.lastIndexOf('\n');
+            if (lastNewline > DISCORD_MAX_LENGTH * 0.3) {
+                splitAt = lastNewline;
+            }
+        }
+        // Fall back to last space
+        if (splitAt === -1) {
+            const lastSpace = searchWindow.lastIndexOf(' ');
+            if (lastSpace > DISCORD_MAX_LENGTH * 0.3) {
+                splitAt = lastSpace;
+            }
+        }
+        // Hard split as last resort
+        if (splitAt === -1) {
+            splitAt = DISCORD_MAX_LENGTH;
+        }
+        chunks.push(remaining.slice(0, splitAt));
+        remaining = remaining.slice(splitAt).replace(/^\n+/, '');
+    }
+    return chunks;
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "onkol",
-  "version": "0.4.0",
+  "version": "0.5.1",
   "description": "Decentralized on-call agent system powered by Claude Code",
   "type": "module",
   "bin": {

package/scripts/spawn-worker.sh CHANGED Viewed

@@ -9,6 +9,7 @@ while [[ $# -gt 0 ]]; do
     --task) TASK_DESC="$2"; shift 2 ;;
     --intent) INTENT="$2"; shift 2 ;;
     --context) CONTEXT="$2"; shift 2 ;;
+    --resume) RESUME_SESSION="$2"; shift 2 ;;
     *) echo "Unknown arg: $1"; exit 1 ;;
   esac
 done
@@ -19,6 +20,7 @@ done
 : "${TASK_DESC:?--task is required}"
 : "${INTENT:=fix}"
 : "${CONTEXT:=No additional context.}"
+: "${RESUME_SESSION:=}"
 # Load config
 ONKOL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
@@ -82,8 +84,7 @@ cat > "$WORKER_DIR/.mcp.json" << MCPEOF
       "env": {
         "DISCORD_BOT_TOKEN": "$BOT_TOKEN",
         "DISCORD_CHANNEL_ID": "$CHANNEL_ID",
-        "DISCORD_ALLOWED_USERS": "$ALLOWED_USERS_ESCAPED",
-        "TMUX_TARGET": "${TMUX_SESSION}:${WORKER_NAME}"
+        "DISCORD_ALLOWED_USERS": "$ALLOWED_USERS_ESCAPED"
       }
     }
   }
@@ -178,21 +179,23 @@ cat >> "$WORKER_DIR/CLAUDE.md" << STARTEOF
 Immediately when you start:
 1. Read $WORKER_DIR/task.md for your task
 2. Read $WORKER_DIR/context.md for context
-3. Use the \`reply\` tool to send "Starting work on: <brief task summary>" to Discord
-4. Begin work — send progress updates via \`reply\` every few steps
-5. When done, send your full results/summary via \`reply\` (split into <2000 char messages)
-6. For file deliverables, use \`replyWithFile\` to attach them
-IMPORTANT: The user CANNOT see your terminal. The ONLY way to communicate is the reply tool.
-If you complete work without sending results via reply, the user will never see your output.
+3. Begin work according to your intent
+4. Report progress and results using the reply tool to your Discord channel
 Do NOT wait for a message. Start working as soon as you boot.
 STARTEOF
+# Build the resume flags and initial prompt
+RESUME_FLAGS=""
+if [ -n "$RESUME_SESSION" ]; then
+  RESUME_FLAGS="--resume $RESUME_SESSION --fork-session"
+fi
 # Create a self-contained wrapper script with all paths baked in
 WRAPPER="$WORKER_DIR/start-worker.sh"
 cat > "$WRAPPER" << WRAPEOF
 #!/bin/bash
 TMUX_TARGET="${TMUX_SESSION}:${WORKER_NAME}"
+RESUMING="$RESUME_SESSION"
 # Auto-accept prompts in the background
 (
@@ -200,9 +203,14 @@ TMUX_TARGET="${TMUX_SESSION}:${WORKER_NAME}"
     sleep 2
     PANE_CONTENT=\$(tmux capture-pane -t "\$TMUX_TARGET" -p 2>/dev/null || echo "")
     if echo "\$PANE_CONTENT" | grep -q "^❯"; then
-      # Claude is ready — send the initial prompt via tmux keys
       sleep 1
-      tmux send-keys -t "\$TMUX_TARGET" "Read $WORKER_DIR/task.md and $WORKER_DIR/context.md, then begin work. IMPORTANT: You MUST use the reply tool from the discord-filtered MCP server for ALL communication — send a starting message now, progress updates as you work, and final results when done. The user cannot see your terminal." Enter
+      if [ -n "\$RESUMING" ]; then
+        # Resuming a previous session — tell it to continue and use the new Discord channel
+        tmux send-keys -t "\$TMUX_TARGET" "You have been resumed in a new session. Your Discord channel has changed — use the reply tool to communicate. Check $WORKER_DIR/task.md for your task. Continue where you left off and report progress via Discord." Enter
+      else
+        # Fresh session — send the initial task prompt
+        tmux send-keys -t "\$TMUX_TARGET" "Read $WORKER_DIR/task.md and $WORKER_DIR/context.md, then begin work per CLAUDE.md." Enter
+      fi
       break
     fi
     tmux send-keys -t "\$TMUX_TARGET" Enter 2>/dev/null || true
@@ -234,7 +242,8 @@ trap cleanup EXIT
 # and the auto-acceptor sends the first prompt via tmux keys once claude is ready)
 cd "$WORK_DIR" && claude \\
   --dangerously-skip-permissions \\
-  --dangerously-load-development-channels server:discord-filtered
+  --dangerously-load-development-channels server:discord-filtered \\
+  $RESUME_FLAGS
 WRAPEOF
 chmod +x "$WRAPPER"

package/scripts/update-and-restart.sh ADDED Viewed

@@ -0,0 +1,183 @@
+#!/bin/bash
+# Update Onkol plugin + scripts from the latest npm package, then
+# dissolve all active workers and respawn them with --resume so they
+# keep their conversation history but pick up the new code.
+#
+# Usage:
+#   onkol-update                  # update + restart all workers
+#   onkol-update --skip-update    # just restart workers (no npm pull)
+#   onkol-update --workers-only   # alias for --skip-update
+set -uo pipefail
+ONKOL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
+CONFIG="$ONKOL_DIR/config.json"
+TRACKING="$ONKOL_DIR/workers/tracking.json"
+if [ ! -f "$CONFIG" ]; then
+  echo "ERROR: No config.json found at $ONKOL_DIR. Is Onkol installed here?"
+  exit 1
+fi
+NODE_NAME=$(jq -r '.nodeName' "$CONFIG")
+SKIP_UPDATE=false
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    --skip-update|--workers-only) SKIP_UPDATE=true; shift ;;
+    *) echo "Unknown arg: $1"; exit 1 ;;
+  esac
+done
+echo "=== Onkol Update & Restart ==="
+echo "Node: $NODE_NAME"
+echo "Install dir: $ONKOL_DIR"
+echo ""
+# ── Step 1: Update files from npm ──────────────────────────────────────────
+if [ "$SKIP_UPDATE" = false ]; then
+  echo "[1/3] Updating from latest npm package..."
+  # Create a temp dir, download latest package, extract the files we need
+  TMPDIR=$(mktemp -d)
+  trap "rm -rf $TMPDIR" EXIT
+  # Use npm pack to download the tarball without installing
+  if command -v npm &>/dev/null; then
+    npm pack onkol --pack-destination "$TMPDIR" &>/dev/null
+    TARBALL=$(ls "$TMPDIR"/onkol-*.tgz 2>/dev/null | head -1)
+  fi
+  if [ -z "${TARBALL:-}" ] || [ ! -f "${TARBALL:-}" ]; then
+    echo "WARNING: Could not download npm package. Trying npx..."
+    # Fallback: use npx to find the package cache
+    npx --yes onkol@latest --help &>/dev/null 2>&1
+    PKG_DIR=$(find ~/.npm/_npx -name "onkol" -path "*/node_modules/*" -type d 2>/dev/null | head -1)
+    if [ -z "$PKG_DIR" ]; then
+      echo "ERROR: Could not find onkol package. Skipping update."
+      echo "You can update manually: copy plugin/ and scripts/ from the repo."
+      SKIP_UPDATE=true
+    fi
+  fi
+  if [ "$SKIP_UPDATE" = false ]; then
+    if [ -n "${TARBALL:-}" ] && [ -f "${TARBALL:-}" ]; then
+      # Extract from tarball
+      tar xzf "$TARBALL" -C "$TMPDIR"
+      PKG_DIR="$TMPDIR/package"
+    fi
+    if [ -d "$PKG_DIR" ]; then
+      # Update plugin files
+      if [ -d "$PKG_DIR/src/plugin" ]; then
+        cp "$PKG_DIR/src/plugin/"*.ts "$ONKOL_DIR/plugins/discord-filtered/" 2>/dev/null && \
+          echo "  ✓ Plugin files updated"
+      elif [ -d "$PKG_DIR/dist/plugin" ]; then
+        cp "$PKG_DIR/dist/plugin/"*.js "$ONKOL_DIR/plugins/discord-filtered/" 2>/dev/null && \
+          echo "  ✓ Plugin files updated (dist)"
+      fi
+      # Update scripts
+      if [ -d "$PKG_DIR/scripts" ]; then
+        for script in "$PKG_DIR/scripts/"*.sh; do
+          name=$(basename "$script")
+          cp "$script" "$ONKOL_DIR/scripts/$name"
+          chmod +x "$ONKOL_DIR/scripts/$name"
+        done
+        echo "  ✓ Scripts updated"
+      fi
+      echo "  Done."
+    fi
+  fi
+else
+  echo "[1/3] Skipping update (--skip-update)"
+fi
+echo ""
+# ── Step 2: Dissolve active workers (saving session IDs) ──────────────────
+echo "[2/3] Dissolving active workers..."
+if [ ! -f "$TRACKING" ] || [ "$(jq length "$TRACKING" 2>/dev/null)" -eq 0 ]; then
+  echo "  No active workers to restart."
+  echo ""
+  echo "=== Update complete. No workers to restart. ==="
+  exit 0
+fi
+# Build a list of workers with their session IDs before dissolving
+declare -a WORKER_NAMES=()
+declare -a WORKER_DIRS=()
+declare -a WORKER_INTENTS=()
+declare -a WORKER_SESSIONS=()
+while IFS= read -r line; do
+  W_NAME=$(echo "$line" | jq -r '.name')
+  W_DIR=$(echo "$line" | jq -r '.workDir')
+  W_INTENT=$(echo "$line" | jq -r '.intent')
+  # Find the latest session ID for this worker's project directory
+  # Claude Code stores sessions in ~/.claude/projects/<encoded-path>/
+  ENCODED_PATH=$(echo "$W_DIR" | sed 's|^/||; s|/|-|g; s|^|-|')
+  SESSION_DIR="$HOME/.claude/projects/$ENCODED_PATH"
+  SESSION_ID=""
+  if [ -d "$SESSION_DIR" ]; then
+    LATEST_JSONL=$(find "$SESSION_DIR" -maxdepth 1 -name "*.jsonl" \
+      -not -path "*/subagents/*" -printf '%T@ %f\n' 2>/dev/null \
+      | sort -n | tail -1 | awk '{print $2}')
+    if [ -n "$LATEST_JSONL" ]; then
+      SESSION_ID="${LATEST_JSONL%.jsonl}"
+    fi
+  fi
+  WORKER_NAMES+=("$W_NAME")
+  WORKER_DIRS+=("$W_DIR")
+  WORKER_INTENTS+=("$W_INTENT")
+  WORKER_SESSIONS+=("$SESSION_ID")
+  echo "  $W_NAME → session: ${SESSION_ID:-none}"
+done < <(jq -c '.[] | select(.status == "active")' "$TRACKING")
+echo ""
+# Dissolve all workers
+for name in "${WORKER_NAMES[@]}"; do
+  "$ONKOL_DIR/scripts/dissolve-worker.sh" --name "$name" 2>&1 | sed 's/^/  /'
+done
+echo ""
+# ── Step 3: Respawn with --resume ──────────────────────────────────────────
+echo "[3/3] Respawning workers with --resume..."
+for i in "${!WORKER_NAMES[@]}"; do
+  W_NAME="${WORKER_NAMES[$i]}"
+  W_DIR="${WORKER_DIRS[$i]}"
+  W_INTENT="${WORKER_INTENTS[$i]}"
+  W_SESSION="${WORKER_SESSIONS[$i]}"
+  RESUME_ARG=""
+  if [ -n "$W_SESSION" ]; then
+    RESUME_ARG="--resume $W_SESSION"
+  fi
+  echo "  Spawning $W_NAME (intent: $W_INTENT, resume: ${W_SESSION:-fresh})..."
+  "$ONKOL_DIR/scripts/spawn-worker.sh" \
+    --name "$W_NAME" \
+    --dir "$W_DIR" \
+    --task "Continue the previous work. Check your conversation history for context." \
+    --intent "$W_INTENT" \
+    $RESUME_ARG 2>&1 | sed 's/^/    /'
+  # Small delay between spawns to avoid Discord rate limits
+  sleep 2
+done
+echo ""
+echo "=== Update complete. ${#WORKER_NAMES[@]} worker(s) restarted. ==="

package/scripts/worker-watchdog.sh CHANGED Viewed

@@ -76,15 +76,18 @@ llm_analyze() {
 Keys:
 - status: one of: working, done_replied, done_silent, error, idle
-- action: one of: none, nudge_reply, nudge_error, nudge_idle
+- action: one of: none, nudge_reply, nudge_error, nudge_idle, progress_update
 - reason: one short sentence explaining your assessment
+- summary: (ONLY when action is progress_update) A brief 1-2 sentence user-facing summary of what the worker is currently doing. Be specific — mention file names, tools being run, or operations in progress. Example: \"Reading agent config files and analyzing the call flow pipeline.\" or \"Running TypeScript type-check after modifying 4 frontend components.\"
-Rules:
-- working: Claude is actively executing tools, thinking, or generating output. Action: none
-- done_replied: Worker finished AND used the discord-filtered reply MCP tool (you'll see 'discord-filtered - reply (MCP)' with result 'sent'). Action: none
-- done_silent: Worker finished work (wrote files, completed analysis, etc.) but NEVER used the reply MCP tool to send results to Discord. Action: nudge_reply
-- error: Worker hit a fatal error and stopped (Traceback, FATAL, crash at the prompt). Action: nudge_error. Note: errors from EARLIER that the worker recovered from do NOT count.
-- idle: Worker is sitting at the prompt with no clear completion or error. Action: nudge_idle"
+Rules (check in this order):
+1. done_replied: If ANYWHERE in the output you see 'discord-filtered - reply (MCP)' or 'discord-filtered - reply_with_file (MCP)' followed by 'sent', the worker HAS replied. Status=done_replied, Action=none. This takes priority — even if the worker is now idle at the prompt, if it replied earlier it is done_replied NOT idle.
+2. working: Claude is actively executing tools, thinking, or generating output (not at the idle prompt). Action=progress_update. Include a summary field.
+3. error: Worker hit a fatal error and stopped (Traceback, FATAL, crash at the prompt). Action: nudge_error. Errors from EARLIER that the worker recovered from do NOT count — only errors right before the current prompt.
+4. done_silent: Worker finished work (wrote files, completed analysis, etc.) but NEVER used the reply MCP tool anywhere in the visible output. Action: nudge_reply
+5. idle: Worker is sitting at the prompt with no clear completion, no error, and no reply tool usage. Action: nudge_idle
+CRITICAL: If you see ANY 'discord-filtered - reply (MCP)' with 'sent' in the output, the answer is ALWAYS done_replied with action none, regardless of current prompt state."
   # Use jq to build the payload — handles all JSON escaping correctly
   local payload
@@ -102,7 +105,7 @@ ${pane_content}" \
         {role: "user", content: $user}
       ],
       temperature: 0,
-      max_tokens: 150
+      max_tokens: 250
     }')
   local response
@@ -151,9 +154,9 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
     continue
   fi
-  # Check nudge cooldown (don't analyze more than once per 10 minutes per worker)
+  # Check nudge cooldown (don't analyze more than once per 3 minutes per worker)
   NUDGE_FLAG="$WORKER_DIR/.watchdog-last-nudge"
-  if [ -f "$NUDGE_FLAG" ] && [ -z "$(find "$NUDGE_FLAG" -mmin +10 2>/dev/null)" ]; then
+  if [ -f "$NUDGE_FLAG" ] && [ -z "$(find "$NUDGE_FLAG" -mmin +3 2>/dev/null)" ]; then
     continue
   fi
@@ -163,7 +166,16 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
   STATUS=$(echo "$ANALYSIS" | jq -r '.status // "unknown"')
   REASON=$(echo "$ANALYSIS" | jq -r '.reason // ""')
+  SUMMARY=$(echo "$ANALYSIS" | jq -r '.summary // ""')
   case "$ACTION" in
+    progress_update)
+      # Worker is actively working — post a progress summary to its channel
+      if [ -n "$SUMMARY" ]; then
+        touch "$NUDGE_FLAG"
+        discord_msg "$WORKER_CHANNEL" "⏳ $SUMMARY"
+      fi
+      ;;
     nudge_reply)
       touch "$NUDGE_FLAG"
       tmux send-keys -t "$TMUX_TARGET" \
@@ -186,7 +198,7 @@ jq -r '.[] | select(.status == "active") | .name' "$TRACKING" | while read -r WO
         "[watchdog] Worker **${WORKER}** — $REASON. Nudged it to respond."
       ;;
     none|*)
-      # Worker is fine (working or already replied) — do nothing
+      # Worker is fine (already replied) — do nothing
       ;;
   esac
 done

package/src/plugin/discord-client.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { Client, GatewayIntentBits, type Message } from 'discord.js'
+import { Client, GatewayIntentBits, type Message, type Attachment } from 'discord.js'
 export interface DiscordClientConfig {
   botToken: string
@@ -19,9 +19,30 @@ export function shouldForwardMessage(
   return true
 }
+// When a message is too long, Discord auto-converts it to a .txt file attachment
+// with empty message content. This fetches the text from those attachments.
+async function resolveTextAttachments(message: Message): Promise<string> {
+  let content = message.content
+  const textAttachments = message.attachments.filter(
+    (a: Attachment) => a.contentType?.startsWith('text/') || a.name?.endsWith('.txt')
+  )
+  for (const attachment of textAttachments.values()) {
+    try {
+      const res = await fetch(attachment.url)
+      if (res.ok) {
+        const text = await res.text()
+        content = content ? `${content}\n\n${text}` : text
+      }
+    } catch (err) {
+      console.error(`[discord-filtered] Failed to fetch attachment ${attachment.name}: ${err}`)
+    }
+  }
+  return content
+}
 export function createDiscordClient(
   config: DiscordClientConfig,
-  onMessage: (message: Message) => void
+  onMessage: (content: string, message: Message) => void
 ) {
   const client = new Client({
     intents: [
@@ -31,7 +52,7 @@ export function createDiscordClient(
     ],
   })
-  client.on('messageCreate', (message) => {
+  client.on('messageCreate', async (message) => {
     if (
       shouldForwardMessage(
         message.channel.id,
@@ -41,7 +62,10 @@ export function createDiscordClient(
         config.allowedUsers
       )
     ) {
-      onMessage(message)
+      const content = await resolveTextAttachments(message)
+      if (content) {
+        onMessage(content, message)
+      }
     }
   })

package/src/plugin/index.ts CHANGED Viewed

@@ -37,8 +37,10 @@ function sendInterrupt(): boolean {
 const discord = createDiscordClient(
   { botToken: BOT_TOKEN, channelId: CHANNEL_ID, allowedUsers: ALLOWED_USERS },
-  async (message) => {
-    const content = message.content
+  async (content, message) => {
+    // Instant acknowledgment — user knows the message reached the session
+    try { await message.react('👀') } catch { /* ignore */ }
     const isInterrupt = /^!stop\b/i.test(content)
     if (isInterrupt) {

package/src/plugin/message-batcher.ts CHANGED Viewed

@@ -1,5 +1,4 @@
 const DISCORD_MAX_LENGTH = 2000
-const TRUNCATION_SUFFIX = '\n... (truncated)'
 export class MessageBatcher {
   private buffer: string[] = []
@@ -20,14 +19,65 @@ export class MessageBatcher {
   private async flush(): Promise<void> {
     if (this.buffer.length === 0) return
-    let combined = this.buffer.join('\n')
+    const combined = this.buffer.join('\n')
     this.buffer = []
     this.timer = null
-    if (combined.length > DISCORD_MAX_LENGTH) {
-      combined = combined.slice(0, DISCORD_MAX_LENGTH - TRUNCATION_SUFFIX.length) + TRUNCATION_SUFFIX
+    // Split into multiple messages instead of truncating
+    const chunks = splitMessage(combined)
+    for (const chunk of chunks) {
+      await this.sendFn(chunk)
+    }
+  }
+}
+// Split long text into Discord-safe chunks, preferring line breaks as split points
+function splitMessage(text: string): string[] {
+  if (text.length <= DISCORD_MAX_LENGTH) return [text]
+  const chunks: string[] = []
+  let remaining = text
+  while (remaining.length > 0) {
+    if (remaining.length <= DISCORD_MAX_LENGTH) {
+      chunks.push(remaining)
+      break
+    }
+    // Find a good split point: prefer double newline, then single newline, then space
+    let splitAt = -1
+    const searchWindow = remaining.slice(0, DISCORD_MAX_LENGTH)
+    // Try splitting at last paragraph break
+    const lastParagraph = searchWindow.lastIndexOf('\n\n')
+    if (lastParagraph > DISCORD_MAX_LENGTH * 0.3) {
+      splitAt = lastParagraph
+    }
+    // Fall back to last line break
+    if (splitAt === -1) {
+      const lastNewline = searchWindow.lastIndexOf('\n')
+      if (lastNewline > DISCORD_MAX_LENGTH * 0.3) {
+        splitAt = lastNewline
+      }
+    }
+    // Fall back to last space
+    if (splitAt === -1) {
+      const lastSpace = searchWindow.lastIndexOf(' ')
+      if (lastSpace > DISCORD_MAX_LENGTH * 0.3) {
+        splitAt = lastSpace
+      }
+    }
+    // Hard split as last resort
+    if (splitAt === -1) {
+      splitAt = DISCORD_MAX_LENGTH
     }
-    await this.sendFn(combined)
+    chunks.push(remaining.slice(0, splitAt))
+    remaining = remaining.slice(splitAt).replace(/^\n+/, '')
   }
+  return chunks
 }