npm - @sarkar-ai/deskmate - Versions diffs - 0.2.1 → 0.4.0 - Mend

@sarkar-ai/deskmate 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/.env.example +16 -13
package/README.md +150 -43
package/dist/cli/init.js +480 -77
package/dist/cli/ops.js +312 -0
package/dist/cli/tray.js +141 -0
package/dist/cli.js +91 -29
package/dist/core/agent/factory.js +6 -0
package/dist/core/agent/index.js +10 -2
package/dist/core/agent/providers/base-cli.js +128 -0
package/dist/core/agent/providers/codex.js +92 -0
package/dist/core/agent/providers/gemini.js +29 -0
package/dist/core/agent/providers/opencode.js +47 -0
package/dist/index.js +56 -27
package/install.sh +167 -48
package/package.json +5 -4
package/src/cli/tray-mac.swift +125 -0
package/dist/telegram/bot.js +0 -333

package/.env.example CHANGED Viewed

@@ -4,29 +4,32 @@
 # Run `deskmate init` for interactive setup, or copy this file to .env and edit.
 # Alternative: ./install.sh
-# Telegram Bot Token (get from @BotFather)
-# https://t.me/BotFather → /newbot → copy token
-TELEGRAM_BOT_TOKEN=your_bot_token_here
-# Your Telegram User ID (get from @userinfobot)
-# https://t.me/userinfobot → send any message → copy Id
-# Only this user can interact with the bot
-ALLOWED_USER_ID=your_user_id_here
-# Multi-client allowed users (for gateway mode)
+# Allowed users (required)
 # Format: clientType:userId, comma-separated
 # Example: telegram:123456,discord:987654321,slack:U12345
 ALLOWED_USERS=telegram:your_user_id_here
-# Anthropic API Key
-# https://console.anthropic.com/
+# Telegram Bot Token (get from @BotFather)
+# https://t.me/BotFather -> /newbot -> copy token
+TELEGRAM_BOT_TOKEN=your_bot_token_here
+# API Keys — only the key matching your AGENT_PROVIDER is required
+# Anthropic API Key (for claude-code) — https://console.anthropic.com/
 ANTHROPIC_API_KEY=your_anthropic_key_here
+# OpenAI API Key (for codex) — https://platform.openai.com/api-keys
+# OPENAI_API_KEY=your_openai_key_here
+# Gemini API Key (for gemini) — https://aistudio.google.com/apikey
+# GEMINI_API_KEY=your_gemini_key_here
+# OpenCode manages its own auth — no key needed here
+# Legacy single-user Telegram ID (still supported, converted to telegram:<id> internally)
+# ALLOWED_USER_ID=your_user_id_here
 # ===========================================
 # Optional Configuration
 # ===========================================
-# Agent provider (only claude-code is supported)
+# Agent provider: claude-code, codex, gemini, opencode
 AGENT_PROVIDER=claude-code
 # Working directory for command execution (defaults to $HOME)

package/README.md CHANGED Viewed

@@ -6,18 +6,23 @@ Control your Local Machine from anywhere using natural language.
   <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=for-the-badge" alt="MIT License"></a>
   <a href="#requirements"><img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20WSL2-lightgrey.svg?style=for-the-badge" alt="Platform"></a>
   <a href="#requirements"><img src="https://img.shields.io/badge/node-%3E%3D18-green.svg?style=for-the-badge" alt="Node"></a>
+  <a href="https://discord.com/channels/1467923903597908244/1467926060195778692"><img src="https://img.shields.io/badge/Discord-Join%20us-5865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
 </p>
-Deskmate is a personal AI assistant that runs on your personal machine and talks to you on the channels you already use. Send a Telegram message from your phone, and it executes on your machine. Powered by the [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) with full local tool access — no sandboxed command set, no artificial limits.
+Deskmate is a local execution agent that lets you control your personal machine using natural language and talks to you on the channels you already use. Deskmate focuses on execution, not autonomy or orchestration. Send a Telegram message from your phone, and it executes on your machine. Supports multiple agent backends — [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Codex (OpenAI)](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), and [OpenCode](https://github.com/opencode-ai/opencode) — with full local tool access, no sandboxed command set, no artificial limits.
 A passion project developed, born from a simple goal: staying in creative and developer flow even when I'm not sitting at my desk. Inspired by [gen-shell](https://github.com/sarkarsaurabh27/gen-shell).
-[Getting Started](#quick-start) · [Gateway Mode](#gateway-mode) · [Architecture](#architecture) · [Contributing](#contributing)
+[Getting Started](#quick-start) · [Gateway Mode](#gateway-mode) · [Agent Providers](#agent-providers) · [Architecture](#architecture) · [Discord](https://discord.com/channels/1467923903597908244/1467926060195778692)
 ---
 ## Demo
+<p align="center">
+  <img src="assets/deskmate-screenshot.jpeg" alt="Deskmate Screenshot" width="500">
+</p>
 | Telegram Conversation | Installation |
 |:---:|:---:|
 | ![Telegram Demo](assets/deskmate-tg.gif) | ![Installation Demo](assets/deskmate-install.gif) |
@@ -35,8 +40,8 @@ Telegram / Discord* / Slack* / ...
            |
            v
   +-------------------+
-  |   Claude Code     |    full local tool access (Bash, Read, Write, Edit, ...)
-  |   Agent (SDK)     |
+  |   Agent Provider  |    Claude Code | Codex | Gemini | OpenCode
+  |   (pluggable)     |    full local tool access
   +-------------------+
            |
            v
@@ -47,25 +52,36 @@ Telegram / Discord* / Slack* / ...
 The Gateway is the control plane. Each messaging platform is a thin I/O adapter. The agent has unrestricted access to your machine (approve-by-default), with optional approval gating for protected folders.
+## Responsibility Boundary
+Deskmate’s responsibility is **execution**.
+- It turns intent into concrete system actions
+- It does not coordinate other agents
+- It does not monitor agent health or resource usage
+If you want visibility into what agents are doing on your machine,
+see **Riva**, the local observability layer.
 ## Highlights
 - **Full local access** — the agent can run any command, read/write any file, take screenshots. No artificial 6-tool sandbox.
 - **Multi-channel gateway** — Telegram today, Discord/Slack/WhatsApp tomorrow. One Gateway, many clients.
 - **Conversation memory** — session continuity across messages. Ask follow-up questions naturally.
-- **Extensible model layer** — Claude Code agent supports any provider that speaks the Anthropic Messages API (including [Ollama](https://ollama.com) for local models). See [Claude Code docs](https://docs.anthropic.com/en/docs/claude-code) for model configuration.
+- **Multi-agent backends** — ships with Claude Code (default), Codex (OpenAI), Gemini CLI (Google), and OpenCode. Switch with `AGENT_PROVIDER=codex` in `.env`.
 - **Approve-by-default** — safe commands auto-approve. Protected folders (Desktop, Documents, etc.) prompt for confirmation via inline buttons.
 - **MCP server** — expose your machine as a tool server for Claude Desktop or any MCP client.
 - **Runs as service** — launchd (macOS) or systemd (Linux) integration, starts on boot, restarts on crash.
-- **Extensible agent layer** — ships with Claude Code agent. Bring your own via `registerProvider()`.
+- **Extensible agent layer** — bring your own agent via `registerProvider()`.
 ## Requirements
 - **macOS** (tested on Ventura, Sonoma, Sequoia) or **Linux** (with systemd)
 - Windows via [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install)
 - Node.js 18+
-- [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) installed (`which claude`)
+- One of the supported agent CLIs installed (see [Agent Providers](#agent-providers))
 - Telegram account (for Telegram mode)
-- Anthropic API key (or configure Claude Code CLI for [alternative providers](https://docs.anthropic.com/en/docs/claude-code))
+- API key for your chosen provider (Anthropic, OpenAI, or Google — OpenCode manages its own auth)
 ### Linux Prerequisites
@@ -87,25 +103,34 @@ The installer guides you through these (macOS only). You can also configure them
 ## Quick Start
-### Install from npm (recommended for users)
+### Option A: Install from npm (recommended)
 ```bash
-npm install -g deskmate
+npm install -g @sarkar-ai/deskmate
 deskmate init
 ```
 The wizard walks you through everything: API keys, Telegram credentials,
-platform permissions, and background service setup.
+platform permissions, and background service setup. Config is stored in
+`~/.config/deskmate/.env`.
+After setup, run manually with `deskmate` or let the background service handle it.
-### Install from source (for contributors)
+### Option B: Install from source (for contributors)
 ```bash
 git clone https://github.com/sarkar-ai-taken/deskmate.git
 cd deskmate
 npm install --legacy-peer-deps
-cp .env.example .env  # edit with your credentials
 npm run build
-./install.sh          # or: npx deskmate init
+./install.sh          # interactive: configures .env, service, permissions
+```
+Or use the TypeScript wizard instead of the shell installer:
+```bash
+cp .env.example .env  # edit with your credentials
+npx deskmate init     # or: npm link && deskmate init
 ```
 To reconfigure later: `deskmate init`
@@ -114,30 +139,26 @@ To reconfigure later: `deskmate init`
 | Mode | Command | Description |
 |------|---------|-------------|
-| Telegram | `deskmate telegram` | Standalone Telegram bot (legacy) |
-| Gateway | `deskmate gateway` | Multi-client gateway (recommended for new setups) |
+| Gateway | `deskmate` | Multi-client gateway (default) |
 | MCP | `deskmate mcp` | MCP server for Claude Desktop |
-| Both | `deskmate both` | Telegram + MCP simultaneously |
+| Both | `deskmate both` | Gateway + MCP simultaneously |
+> **Note:** `deskmate telegram` still works but is a deprecated alias that starts the gateway.
 ## Gateway Mode
-The gateway is the recommended way to run Deskmate. It separates platform I/O from agent logic, so adding a new messaging client doesn't require touching auth, sessions, or the agent layer.
+The gateway is the default way to run Deskmate. It separates platform I/O from agent logic, so adding a new messaging client doesn't require touching auth, sessions, or the agent layer.
 ```bash
 # Configure multi-client auth
 ALLOWED_USERS=telegram:123456,discord:987654321
 # Start
-deskmate gateway
+deskmate
 ```
 The gateway auto-registers clients based on available env vars. If `TELEGRAM_BOT_TOKEN` is set, Telegram is active. Future clients (Discord, Slack) follow the same pattern.
-### Gateway vs Telegram mode
-- **`deskmate telegram`** — original standalone bot. Simple, self-contained, no gateway overhead. Good for single-user Telegram-only setups.
-- **`deskmate gateway`** — centralized architecture. Auth, sessions, and agent orchestration are shared. Required for multi-client setups and recommended for new installations.
 ## Bot Commands
 | Command | Description |
@@ -187,32 +208,99 @@ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
 Restart Claude Desktop. You can now ask Claude to interact with your local machine.
-### Combined Mode (MCP + Telegram)
+### Combined Mode (Gateway + MCP)
-Run both with `deskmate both`. MCP handles Claude Desktop requests; Telegram sends approval notifications to your phone so you can approve sensitive operations from anywhere.
+Run both with `deskmate both`. MCP handles Claude Desktop requests; the gateway handles Telegram (and future clients), sending approval notifications to your phone so you can approve sensitive operations from anywhere.
+### Observability
+Deskmate focuses on executing actions safely.
+For monitoring agent behavior, resource usage, and failures across
+multiple local agents, see **Riva** (local-first agent observability).
 ## Security
 > **Important**: The agent can execute arbitrary commands on your machine. This is by design — the strategy is approve-by-default for read-only operations, with approval gating for protected folders and write operations.
-**Built-in protections:**
+### Built-in protections
 | Layer | What it does |
 |-------|-------------|
-| **User authentication** | Only allowlisted user IDs can interact (per-client) |
-| **Folder protection** | Desktop, Documents, Downloads, etc. require explicit approval |
-| **No sudo by default** | The agent won't use sudo unless you explicitly ask |
-| **No open ports** | The bot polls Telegram's servers, doesn't expose any ports |
-| **Structured logging** | All actions are logged with timestamps for audit |
-| **Session isolation** | Gateway sessions are keyed by `clientType:channelId` |
-**Recommendations:**
+| **User authentication** | Allowlist-based access control via `SecurityManager`. Only users in `ALLOWED_USERS` can interact. Supports per-client auth (`telegram:123`, `discord:456`) and wildcards (`*:*`). |
+| **Action approval** | `ApprovalManager` gates sensitive operations. Write commands, file writes, and folder access require explicit human approval with configurable timeouts (default 5 min). |
+| **Protected folders** | OS-aware folder protection. Desktop, Documents, Downloads, Pictures, Movies/Videos, Music, and iCloud (macOS) require approval. Session-based caching avoids repeated prompts. |
+| **Safe command auto-approval** | Read-only commands (`ls`, `cat`, `git status`, `docker ps`, `node -v`, etc.) auto-approve. Full list in `src/core/approval.ts`. |
+| **Command execution limits** | 2-minute timeout and 10 MB output buffer per command. Prevents runaway processes and memory exhaustion. |
+| **Session isolation** | Sessions keyed by `clientType:channelId`. 30-minute idle timeout with automatic pruning. Optional disk persistence survives restarts. |
+| **Input validation** | MCP tools use Zod schema validation. Telegram callbacks validated via regex patterns. |
+| **No open ports** | The bot polls Telegram's servers — no inbound ports exposed. |
+| **No sudo by default** | The agent won't use sudo unless you explicitly ask. |
+| **Structured logging** | All actions logged with timestamps, context hierarchy, and configurable log levels for audit trails. |
+| **Stale message protection** | Telegram client drops pending updates on startup (`drop_pending_updates: true`), preventing replay of messages received while offline. |
+### Approval workflow
+1. User sends a message that triggers a sensitive operation (e.g., writing to `~/Documents`)
+2. `ApprovalManager` checks if the action matches a safe auto-approve pattern
+3. If not safe, a pending approval is created with a timeout countdown
+4. Approval request is broadcast to all clients with recent activity (last 30 min)
+5. User taps Approve/Reject via inline buttons (Telegram) or equivalent
+6. Action executes on approval, or is cancelled on rejection/timeout
+Set `REQUIRE_APPROVAL_FOR_ALL=true` to gate every operation, including reads.
+### Recommendations
 - Set `WORKING_DIR` to limit default command scope
-- Use `ALLOWED_USERS` (gateway mode) for multi-client allowlisting
+- Use `ALLOWED_USERS` for multi-client allowlisting
+- Use `ALLOWED_FOLDERS` to pre-approve specific directories
 - Review logs regularly (`logs/stdout.log`)
 - Keep `.env` secure and never commit it
 - Use `REQUIRE_APPROVAL_FOR_ALL=true` if you want to approve every operation
+### Execution Philosophy
+Deskmate follows an **approve-by-default, visible-by-design** model.
+- Read-only operations are auto-approved
+- Sensitive operations require explicit confirmation
+- All actions are logged locally
+The goal is speed without hidden behavior.
+## Non-goals
+Deskmate is intentionally not:
+- A multi-agent orchestration framework
+- A cloud-hosted control plane
+- A long-running autonomous system
+- A monitoring or observability tool
+These constraints are deliberate.
+## Agent Providers
+Deskmate supports multiple agent backends. Set `AGENT_PROVIDER` in your `.env` or select one during `deskmate init`.
+| Provider | Binary | Env Var | Install |
+|----------|--------|---------|---------|
+| **Claude Code** (default) | `claude` | `ANTHROPIC_API_KEY` | [docs.anthropic.com](https://docs.anthropic.com/en/docs/claude-code) |
+| **Codex** (OpenAI) | `codex` | `OPENAI_API_KEY` | [github.com/openai/codex](https://github.com/openai/codex) |
+| **Gemini CLI** (Google) | `gemini` | `GEMINI_API_KEY` | [github.com/google-gemini/gemini-cli](https://github.com/google-gemini/gemini-cli) |
+| **OpenCode** | `opencode` | *(manages own auth)* | [github.com/opencode-ai/opencode](https://github.com/opencode-ai/opencode) |
+```bash
+# Switch provider
+AGENT_PROVIDER=codex
+OPENAI_API_KEY=sk-...
+# Or use the wizard
+deskmate init
+```
+Only the API key matching your selected provider is required. Keys for other providers are preserved in `.env` if you switch back.
 ## Architecture
 ```
@@ -222,7 +310,11 @@ src/
 │   │   ├── types.ts              # AgentProvider interface
 │   │   ├── factory.ts            # Provider factory + registerProvider()
 │   │   └── providers/
-│   │       └── claude-code.ts    # Claude Code SDK (default)
+│   │       ├── claude-code.ts    # Claude Code SDK (default)
+│   │       ├── base-cli.ts       # Base class for CLI-spawned providers
+│   │       ├── codex.ts          # Codex (OpenAI)
+│   │       ├── gemini.ts         # Gemini CLI (Google)
+│   │       └── opencode.ts       # OpenCode
 │   ├── approval.ts               # Approval manager (auto-approve + manual)
 │   ├── executor.ts               # Command execution, file I/O, screenshots
 │   └── logger.ts                 # Structured logging
@@ -233,13 +325,11 @@ src/
 │   └── session.ts                # Session manager (composite keys, idle pruning)
 ├── clients/
 │   └── telegram.ts               # Telegram adapter (grammY)
-├── telegram/
-│   └── bot.ts                    # Legacy standalone Telegram bot
 └── mcp/
     └── server.ts                 # MCP server
 ```
-**Agent layer** — ships with Claude Code (`@anthropic-ai/claude-agent-sdk`). Full built-in tool access: Bash, Read, Write, Edit, Glob, Grep. Custom agent providers can be registered via `registerProvider()`.
+**Agent layer** — ships with four providers: Claude Code (via `@anthropic-ai/claude-agent-sdk`), Codex, Gemini CLI, and OpenCode. The three non-Claude providers extend `BaseCliProvider`, which handles subprocess spawning and stdout streaming. Custom agent providers can be registered via `registerProvider()`.
 **Gateway layer** — central coordinator handling auth (`SecurityManager`), sessions (`SessionManager`), agent orchestration, approval routing, and screenshot delivery. Platform adapters implement the `MessagingClient` interface and do only I/O.
@@ -312,8 +402,9 @@ systemctl --user status deskmate.service
 **Bot not responding?**
 1. Check logs: `tail -f logs/stderr.log`
-2. Verify your `ALLOWED_USER_ID` matches your Telegram ID
-3. Ensure Claude Code CLI is installed: `which claude`
+2. Verify your `ALLOWED_USERS` includes your Telegram ID (e.g. `telegram:123456`)
+3. Ensure your agent CLI is installed (e.g. `which claude`, `which codex`, `which gemini`, `which opencode`)
+4. Run `deskmate doctor` to diagnose configuration issues
 **Commands timing out?**
 - Default timeout is 2 minutes
@@ -355,6 +446,22 @@ MIT License — see [LICENSE](LICENSE) for details.
 ## Acknowledgments
-- [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) — agent runtime
+- [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) — default agent runtime
+- [Codex](https://github.com/openai/codex) — OpenAI agent backend
+- [Gemini CLI](https://github.com/google-gemini/gemini-cli) — Google agent backend
+- [OpenCode](https://github.com/opencode-ai/opencode) — OpenCode agent backend
 - [grammY](https://grammy.dev/) — Telegram bot framework
 - [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/sdk) — MCP support
+---
+## Community
+- [Discord](https://discord.com/channels/1467923903597908244/1467926060195778692) — join the community, ask questions, share your setup
+## Share
+If you find Deskmate useful, feel free to share:
+- [Share on X](https://x.com/intent/post?text=Running%20real%20system%20actions%20with%20a%20local-first%20AI%20agent.%20Deskmate%20lets%20you%20control%20your%20machine%20using%20natural%20language.&url=https%3A%2F%2Fgithub.com%2Fsarkar-ai-taken%2Fdeskmate&via=sarkar_ai)
+- [Post to Hacker News](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fgithub.com%2Fsarkar-ai-taken%2Fdeskmate&t=Deskmate%3A%20A%20local-first%20AI%20agent%20for%20executing%20real%20system%20actions)