npm - onkol - Versions diffs - 0.3.0 → 0.5.0 - Mend

onkol 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +322 -0
package/dist/cli/discord-api.d.ts +20 -0
package/dist/cli/discord-api.js +102 -0
package/dist/cli/index.js +205 -10
package/dist/cli/prompts.d.ts +4 -0
package/dist/cli/prompts.js +65 -0
package/dist/cli/systemd.js +22 -3
package/dist/plugin/discord-client.d.ts +1 -1
package/dist/plugin/discord-client.js +24 -2
package/dist/plugin/index.js +53 -2
package/dist/plugin/message-batcher.js +47 -5
package/package.json +7 -5
package/scripts/spawn-worker.sh +18 -3
package/scripts/start-orchestrator.sh +60 -13
package/scripts/update-and-restart.sh +183 -0
package/scripts/worker-watchdog.sh +204 -0
package/src/plugin/discord-client.ts +28 -4
package/src/plugin/index.ts +50 -2
package/src/plugin/message-batcher.ts +55 -5

package/README.md ADDED Viewed

@@ -0,0 +1,322 @@
+# Onkol
+Your AI on-call team. One command per VM, and you get an autonomous agent on Discord that handles bugs, features, analysis, and ops so you don't have to.
+Onkol turns Claude Code into a decentralized on-call system. Each VM runs an orchestrator that listens on Discord. You describe a problem in plain English, it spins up a dedicated worker session to solve it, and reports back when it's done.
+## How it works
+```
+You on Discord:  "the auth endpoint is returning 403 after token refresh"
+                              |
+                    Orchestrator (Claude Code)
+                    reads your message, understands intent,
+                    prepares context, spawns a worker
+                              |
+                    Worker (new Claude Code session)
+                    diagnoses the bug, fixes auth.py,
+                    runs tests, commits to a branch
+                              |
+You on Discord:  "Fixed. Clock skew between auth server and app server.
+                  Added 5s tolerance. Tests pass. Branch: fix/auth-403"
+```
+**What makes it different:**
+- **Decentralized.** Each VM is self-contained. No central server. 10 VMs = 10 independent agents.
+- **Intent-driven.** Say "fix this" and it fixes autonomously. Say "look into this" and it investigates without touching code. Your phrasing controls the behavior.
+- **Gets smarter.** Every resolved task leaves behind a learning. Next time a similar issue comes up, the agent already knows what to look for.
+- **Works behind firewalls.** All connections are outbound to Discord. No inbound ports, no SSH tunnels, no VPN required.
+## Real-world setup
+The intended way to use Onkol is with a **dedicated Discord server** that becomes your ops control center.
+I manage about 10 applications across prod and staging. I created one Discord server and set it up exclusively for Onkol. Each VM I onboard creates its own category with an orchestrator channel. My Discord sidebar looks like this:
+```
+MY-INFRA (Discord server)
+│
+├── API-SERVER-PROD                ← VM running in GCP
+│   ├── #orchestrator              ← talk to this VM's brain here
+│   ├── #fix-auth-403              ← active worker (auto-created)
+│   └── #analyze-error-logs        ← active worker (auto-created)
+│
+├── WEB-APP-STAGING                ← VM running in AWS
+│   └── #orchestrator
+│
+├── BACKEND-PROD                   ← VM behind corporate VPN
+│   ├── #orchestrator
+│   └── #add-export-endpoint       ← active worker
+│
+├── DATA-PIPELINE-STAGING          ← Another GCP VM
+│   └── #orchestrator
+│
+└── ... (as many VMs as you have)
+```
+### The workflow
+From your phone, laptop, or anywhere with Discord:
+1. Open the server, go to `#orchestrator` under the VM you care about
+2. Type what you need: "there's a bug where users get 403 after token refresh"
+3. The orchestrator creates a new channel `#fix-auth-403` and spawns a worker
+4. The worker posts its progress and findings in `#fix-auth-403`
+5. You can jump into that channel to give more context or redirect
+6. When it's done, the orchestrator dissolves the worker, the channel disappears, learnings are saved
+You can do this from a party, a flight, or bed at 2 AM. You're just texting on Discord. The agent does the SSH, the debugging, the code reading, the fixing.
+### Multiple VMs, one view
+Every VM is a category. Every task is a channel. You see your entire infrastructure at a glance in the Discord sidebar. No dashboards to build, no web apps to deploy. Discord IS the dashboard.
+The VMs don't need to know about each other. Each one connects outbound to Discord independently. If a VM is behind a VPN you can only reach from one specific laptop, doesn't matter. As long as it has outbound HTTPS, it can connect to Discord and you can talk to it.
+### Setting up a new VM
+```bash
+# SSH into the VM (one time only)
+ssh user@my-new-vm
+# Run setup (2 minutes)
+npx onkol@latest setup
+# Answer the questions, done.
+# A new category appears in your Discord server.
+# You never need to SSH into this VM again.
+```
+## Quick start
+### Prerequisites
+You need these on the VM where you're setting up:
+| Tool | Why | Install |
+|------|-----|---------|
+| **Node.js 18+** | Runs the setup CLI | [nodejs.org](https://nodejs.org) |
+| **Bun** | Runs the Discord channel plugin | `curl -fsSL https://bun.sh/install \| bash` |
+| **Claude Code** | The AI that does the work | [docs.anthropic.com](https://docs.anthropic.com/en/docs/claude-code/getting-started) |
+| **tmux** | Keeps sessions alive | `apt install tmux` / `yum install tmux` |
+| **jq** | JSON processing in scripts | `apt install jq` / `yum install jq` |
+Claude Code must be logged in via `claude.ai` OAuth on the VM (not API key).
+The setup wizard checks all dependencies before asking any questions. If something's missing, it tells you exactly what to install and exits without wasting your time.
+### Create a Discord bot
+1. Go to [discord.com/developers/applications](https://discord.com/developers/applications)
+2. New Application, name it, Create
+3. Bot, Reset Token, **copy it** (you only see it once)
+4. Bot, Privileged Gateway Intents, enable **Message Content Intent**, Save
+5. OAuth2, URL Generator, check `bot`, check permissions:
+   - View Channels, Send Messages, Read Message History, Attach Files, Manage Channels
+6. Copy the URL, open in browser, invite to your Discord server
+The setup wizard validates your bot token and checks that Message Content Intent is enabled before proceeding. If something's wrong, it tells you exactly what to fix.
+### Run setup
+```bash
+npx onkol@latest setup
+```
+The wizard walks you through everything:
+```
+Welcome to Onkol Setup
+Checking dependencies...
+  ✓ claude
+  ✓ bun
+  ✓ tmux
+  ✓ jq
+  ✓ curl
+  All dependencies found.
+✔ Where should Onkol live? ~/onkol
+✔ What should this node be called? api-server-prod
+✔ Discord bot token: ****
+✔ Discord server (guild) ID: 1234567890
+✔ Your Discord user ID: 9876543210
+✔ Registry file? Write a prompt — tell Claude what to find
+✔ Describe: Find the API endpoints and database URLs from .env
+✔ Service summary? Auto-discover
+✔ CLAUDE.md? Yes — This is a Node.js API server deployed via docker...
+✔ Plugins? context7, superpowers, code-simplifier
+✓ Bot token is valid
+✓ Message Content intent is enabled
+✓ Discord category and #orchestrator channel created
+✓ 6 scripts installed
+✓ Plugin installed with 4 files + dependencies
+✓ Systemd service installed and enabled
+✓ Orchestrator started in tmux session "onkol-api-server-prod"
+✓ Onkol node "api-server-prod" is live!
+```
+Go to your Discord server. You'll see a new category with an `#orchestrator` channel. Send it a message.
+## Usage
+### Talking to the orchestrator
+The orchestrator lives in the `#orchestrator` channel of your node's category. It reads your intent from how you phrase things:
+| You say | What happens |
+|---------|-------------|
+| "fix the 403 bug in auth" | Spawns a worker that diagnoses, fixes, tests, and commits |
+| "look into why response times are high" | Spawns a worker that investigates and reports, no code changes |
+| "add retry logic to the webhook handler" | Spawns a worker that implements, tests, and waits for your approval |
+| "analyze transferred calls for the last 3 weeks" | Spawns a worker that reads logs/data and produces an analysis |
+| "just ship it" | Fully autonomous, pushes and deploys (asks for confirmation first) |
+### How workers work
+When the orchestrator spawns a worker:
+1. A new Discord channel appears in your category (e.g., `#fix-auth-bug`)
+2. A new Claude Code session starts in tmux on the VM
+3. The worker posts progress and results in its Discord channel
+4. You can talk to the worker directly in that channel
+5. When done, tell the orchestrator to dissolve it. The channel disappears, learnings are saved.
+### Managing workers
+From the orchestrator channel:
+- "dissolve fix-auth-bug" kills the worker, saves learnings, deletes channel
+- "list workers" shows all active workers
+- "check on fix-auth-bug" gets the worker's current status
+### Setup prompts
+During setup, you can describe things in plain English instead of providing config files:
+- **Registry**: "Find the API endpoints from .env and the S3 bucket from AWS CLI"
+- **Services**: Auto-discovers running services, or you describe what to look for
+- **CLAUDE.md**: "This is a Node.js API server, Express, deployed via docker..."
+The orchestrator executes these prompts on first boot and generates the structured files.
+## Architecture
+```
+Your Discord Server
+├── Category: api-server-prod           ← VM 1
+│   ├── #orchestrator                   ← persistent Claude Code session
+│   ├── #fix-auth-bug                   ← worker (temporary)
+│   └── #analyze-error-logs             ← worker (temporary)
+├── Category: web-app-staging           ← VM 2
+│   └── #orchestrator
+└── Category: backend-prod              ← VM 3
+    └── #orchestrator
+```
+Each VM runs independently:
+- **Orchestrator.** Long-running Claude Code session in tmux. Receives Discord messages, spawns workers, manages lifecycle.
+- **Workers.** Ephemeral Claude Code sessions. One per task. Each gets its own Discord channel, its own context, its own instructions.
+- **discord-filtered plugin.** Custom MCP channel server that routes Discord messages by channel ID. All sessions share one bot but each only hears its own channel.
+### On-disk structure
+```
+~/onkol/
+├── config.json          # Node config (bot token, server ID, etc.)
+├── registry.json        # VM-specific secrets, endpoints, ports
+├── services.md          # What runs on this VM
+├── CLAUDE.md            # Orchestrator instructions
+├── knowledge/           # Learnings from dissolved workers
+│   ├── index.json
+│   └── 2026-03-22-fix-auth-clock-skew.md
+├── workers/
+│   ├── tracking.json    # Active workers
+│   └── fix-auth-bug/    # Worker directory (while active)
+├── scripts/             # Lifecycle scripts
+└── plugins/
+    └── discord-filtered/  # MCP channel plugin
+```
+### Knowledge base
+Every dissolved worker leaves behind a learning:
+```markdown
+## What happened
+Token validation rejected valid tokens for 2-3 seconds after refresh.
+## Root cause
+No clock skew tolerance between auth server and app server.
+## Fix
+Added 5-second CLOCK_SKEW_TOLERANCE in auth.py:47.
+## For next time
+If 403 errors appear after token operations, check clock sync first.
+```
+The orchestrator includes relevant past learnings when spawning new workers. The system gets better at diagnosing issues over time.
+## Resumable setup
+If setup fails midway (missing dependency, network error, wrong bot token), your answers are saved automatically. Next time you run `npx onkol setup`, it offers to resume:
+```
+? Found a previous setup attempt (4 steps completed). What do you want to do?
+  ❯ Resume from where it left off (node: api-server-prod)
+    Start fresh
+```
+No re-entering bot tokens or server IDs. It picks up right where it left off.
+## Commands
+```bash
+npx onkol setup          # Interactive setup wizard
+npx onkol@latest setup   # Force latest version
+```
+On the VM after setup:
+```bash
+# Attach to the orchestrator
+tmux attach -t onkol-<node-name>
+# Check service status
+systemctl status onkol-<node-name>
+# Restart orchestrator
+sudo systemctl restart onkol-<node-name>
+# View active workers
+bash ~/onkol/scripts/list-workers.sh
+# Manually dissolve a worker
+bash ~/onkol/scripts/dissolve-worker.sh --name "worker-name"
+```
+## Requirements
+- Claude Code with `claude.ai` OAuth login (Max plan recommended for concurrent sessions)
+- Node.js 18+ and Bun on each VM
+- tmux and jq on each VM
+- A Discord server with a bot that has Manage Channels permission
+- VMs need outbound HTTPS access (no inbound ports needed)
+## How it's built
+| Component | Tech | Lines |
+|-----------|------|-------|
+| Setup wizard | Node.js, TypeScript, Inquirer | ~500 |
+| Discord channel plugin | Bun, MCP SDK, discord.js | ~300 |
+| Worker lifecycle scripts | Bash | ~400 |
+| Orchestrator/worker templates | Handlebars | ~150 |
+The core mechanism is [Claude Code Channels](https://code.claude.com/docs/en/channels), an MCP-based system that pushes Discord messages into Claude Code sessions. The `discord-filtered` plugin is a custom channel that routes by Discord channel ID, allowing multiple sessions to share one bot.
+## License
+MIT

package/dist/cli/discord-api.d.ts CHANGED Viewed

@@ -17,3 +17,23 @@ export declare function createChannel(token: string, guildId: string, name: stri
 }>;
 export declare function deleteChannel(token: string, channelId: string): Promise<void>;
 export declare function sendMessage(token: string, channelId: string, content: string): Promise<void>;
+/**
+ * Validates the bot token and checks if it can connect to the Discord gateway
+ * with the required intents (Guilds, GuildMessages, MessageContent).
+ * Returns { ok: true } or { ok: false, error: string }.
+ */
+export declare function validateBotToken(token: string): Promise<{
+    ok: true;
+} | {
+    ok: false;
+    error: string;
+}>;
+/**
+ * Performs a lightweight check for MessageContent intent by attempting a
+ * test gateway connection. Returns a warning message if the intent appears
+ * to be disabled, or null if everything looks good.
+ *
+ * Note: The Discord REST API doesn't expose which intents are enabled.
+ * We do a quick WebSocket handshake to the gateway to detect DisallowedIntents.
+ */
+export declare function checkGatewayIntents(token: string): Promise<string | null>;

package/dist/cli/discord-api.js CHANGED Viewed

@@ -51,3 +51,105 @@ export async function sendMessage(token, channelId, content) {
     if (!res.ok)
         throw new Error(`Failed to send message: ${res.status} ${await res.text()}`);
 }
+/**
+ * Validates the bot token and checks if it can connect to the Discord gateway
+ * with the required intents (Guilds, GuildMessages, MessageContent).
+ * Returns { ok: true } or { ok: false, error: string }.
+ */
+export async function validateBotToken(token) {
+    // Step 1: Check the token is valid via /users/@me
+    const meRes = await fetch(`${DISCORD_API}/users/@me`, {
+        headers: { Authorization: `Bot ${token}` },
+    });
+    if (!meRes.ok) {
+        const body = await meRes.text();
+        if (meRes.status === 401)
+            return { ok: false, error: 'Invalid bot token.' };
+        return { ok: false, error: `Discord API error (${meRes.status}): ${body}` };
+    }
+    // Step 2: Get the bot's application to check if it's a bot token
+    const me = await meRes.json();
+    if (!me.bot)
+        return { ok: false, error: 'This token belongs to a user account, not a bot.' };
+    // Step 3: Try connecting to the gateway with the required intents to check for DisallowedIntents
+    // Intents: Guilds (1) | GuildMessages (512) | MessageContent (32768) = 33281
+    const gatewayRes = await fetch(`${DISCORD_API}/gateway/bot`, {
+        headers: { Authorization: `Bot ${token}` },
+    });
+    if (!gatewayRes.ok) {
+        const body = await gatewayRes.text();
+        return { ok: false, error: `Cannot fetch gateway info (${gatewayRes.status}): ${body}` };
+    }
+    return { ok: true };
+}
+/**
+ * Performs a lightweight check for MessageContent intent by attempting a
+ * test gateway connection. Returns a warning message if the intent appears
+ * to be disabled, or null if everything looks good.
+ *
+ * Note: The Discord REST API doesn't expose which intents are enabled.
+ * We do a quick WebSocket handshake to the gateway to detect DisallowedIntents.
+ */
+export function checkGatewayIntents(token) {
+    return new Promise(async (resolve) => {
+        const timeout = setTimeout(() => resolve(null), 10000); // assume OK if no response in 10s
+        try {
+            const gatewayRes = await fetch(`${DISCORD_API}/gateway/bot`, {
+                headers: { Authorization: `Bot ${token}` },
+            });
+            if (!gatewayRes.ok) {
+                clearTimeout(timeout);
+                resolve('Could not fetch gateway URL. Check your bot token.');
+                return;
+            }
+            const { url } = await gatewayRes.json();
+            // Dynamic import for WebSocket (works in both Node and Bun)
+            const WebSocket = (await import('ws')).default;
+            const ws = new WebSocket(`${url}?v=10&encoding=json`);
+            ws.on('message', (data) => {
+                try {
+                    const payload = JSON.parse(data.toString());
+                    if (payload.op === 10) {
+                        // Send IDENTIFY with the intents we need
+                        // Guilds=1, GuildMessages=512, MessageContent=32768
+                        ws.send(JSON.stringify({
+                            op: 2,
+                            d: {
+                                token,
+                                intents: 1 | 512 | 32768,
+                                properties: { os: 'linux', browser: 'onkol-setup', device: 'onkol-setup' },
+                            },
+                        }));
+                    }
+                    else if (payload.op === 0 && payload.t === 'READY') {
+                        // All good — intents accepted
+                        ws.close();
+                        clearTimeout(timeout);
+                        resolve(null);
+                    }
+                }
+                catch { /* ignore parse errors */ }
+            });
+            ws.on('close', (code) => {
+                clearTimeout(timeout);
+                if (code === 4014) {
+                    resolve('MessageContent intent is not enabled for this bot.\n' +
+                        '    Go to https://discord.com/developers/applications → your bot → Bot settings\n' +
+                        '    → Privileged Gateway Intents → enable "Message Content Intent" → Save');
+                }
+                else if (code === 4004) {
+                    resolve('Invalid bot token (gateway rejected authentication).');
+                }
+                // Other close codes are fine (we close it ourselves on READY)
+            });
+            ws.on('error', () => {
+                clearTimeout(timeout);
+                resolve(null); // network error, don't block setup
+            });
+        }
+        catch {
+            clearTimeout(timeout);
+            resolve(null);
+        }
+    });
+}