@sarkar-ai/deskmate 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example CHANGED
@@ -4,29 +4,32 @@
4
4
  # Run `deskmate init` for interactive setup, or copy this file to .env and edit.
5
5
  # Alternative: ./install.sh
6
6
 
7
- # Telegram Bot Token (get from @BotFather)
8
- # https://t.me/BotFather → /newbot → copy token
9
- TELEGRAM_BOT_TOKEN=your_bot_token_here
10
-
11
- # Your Telegram User ID (get from @userinfobot)
12
- # https://t.me/userinfobot → send any message → copy Id
13
- # Only this user can interact with the bot
14
- ALLOWED_USER_ID=your_user_id_here
15
-
16
- # Multi-client allowed users (for gateway mode)
7
+ # Allowed users (required)
17
8
  # Format: clientType:userId, comma-separated
18
9
  # Example: telegram:123456,discord:987654321,slack:U12345
19
10
  ALLOWED_USERS=telegram:your_user_id_here
20
11
 
21
- # Anthropic API Key
22
- # https://console.anthropic.com/
12
+ # Telegram Bot Token (get from @BotFather)
13
+ # https://t.me/BotFather -> /newbot -> copy token
14
+ TELEGRAM_BOT_TOKEN=your_bot_token_here
15
+
16
+ # API Keys — only the key matching your AGENT_PROVIDER is required
17
+ # Anthropic API Key (for claude-code) — https://console.anthropic.com/
23
18
  ANTHROPIC_API_KEY=your_anthropic_key_here
19
+ # OpenAI API Key (for codex) — https://platform.openai.com/api-keys
20
+ # OPENAI_API_KEY=your_openai_key_here
21
+ # Gemini API Key (for gemini) — https://aistudio.google.com/apikey
22
+ # GEMINI_API_KEY=your_gemini_key_here
23
+ # OpenCode manages its own auth — no key needed here
24
+
25
+ # Legacy single-user Telegram ID (still supported, converted to telegram:<id> internally)
26
+ # ALLOWED_USER_ID=your_user_id_here
24
27
 
25
28
  # ===========================================
26
29
  # Optional Configuration
27
30
  # ===========================================
28
31
 
29
- # Agent provider (only claude-code is supported)
32
+ # Agent provider: claude-code, codex, gemini, opencode
30
33
  AGENT_PROVIDER=claude-code
31
34
 
32
35
  # Working directory for command execution (defaults to $HOME)
package/README.md CHANGED
@@ -6,18 +6,23 @@ Control your Local Machine from anywhere using natural language.
6
6
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=for-the-badge" alt="MIT License"></a>
7
7
  <a href="#requirements"><img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20WSL2-lightgrey.svg?style=for-the-badge" alt="Platform"></a>
8
8
  <a href="#requirements"><img src="https://img.shields.io/badge/node-%3E%3D18-green.svg?style=for-the-badge" alt="Node"></a>
9
+ <a href="https://discord.com/channels/1467923903597908244/1467926060195778692"><img src="https://img.shields.io/badge/Discord-Join%20us-5865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
9
10
  </p>
10
11
 
11
- Deskmate is a personal AI assistant that runs on your personal machine and talks to you on the channels you already use. Send a Telegram message from your phone, and it executes on your machine. Powered by the [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) with full local tool access no sandboxed command set, no artificial limits.
12
+ Deskmate is a local execution agent that lets you control your personal machine using natural language and talks to you on the channels you already use. Deskmate focuses on execution, not autonomy or orchestration. Send a Telegram message from your phone, and it executes on your machine. Supports multiple agent backends — [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Codex (OpenAI)](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), and [OpenCode](https://github.com/opencode-ai/opencode) — with full local tool access, no sandboxed command set, no artificial limits.
12
13
 
13
14
  A passion project developed, born from a simple goal: staying in creative and developer flow even when I'm not sitting at my desk. Inspired by [gen-shell](https://github.com/sarkarsaurabh27/gen-shell).
14
15
 
15
- [Getting Started](#quick-start) · [Gateway Mode](#gateway-mode) · [Architecture](#architecture) · [Contributing](#contributing)
16
+ [Getting Started](#quick-start) · [Gateway Mode](#gateway-mode) · [Agent Providers](#agent-providers) · [Architecture](#architecture) · [Discord](https://discord.com/channels/1467923903597908244/1467926060195778692)
16
17
 
17
18
  ---
18
19
 
19
20
  ## Demo
20
21
 
22
+ <p align="center">
23
+ <img src="assets/deskmate-screenshot.jpeg" alt="Deskmate Screenshot" width="500">
24
+ </p>
25
+
21
26
  | Telegram Conversation | Installation |
22
27
  |:---:|:---:|
23
28
  | ![Telegram Demo](assets/deskmate-tg.gif) | ![Installation Demo](assets/deskmate-install.gif) |
@@ -35,8 +40,8 @@ Telegram / Discord* / Slack* / ...
35
40
  |
36
41
  v
37
42
  +-------------------+
38
- | Claude Code | full local tool access (Bash, Read, Write, Edit, ...)
39
- | Agent (SDK) |
43
+ | Agent Provider | Claude Code | Codex | Gemini | OpenCode
44
+ | (pluggable) | full local tool access
40
45
  +-------------------+
41
46
  |
42
47
  v
@@ -47,25 +52,36 @@ Telegram / Discord* / Slack* / ...
47
52
 
48
53
  The Gateway is the control plane. Each messaging platform is a thin I/O adapter. The agent has unrestricted access to your machine (approve-by-default), with optional approval gating for protected folders.
49
54
 
55
+ ## Responsibility Boundary
56
+
57
+ Deskmate’s responsibility is **execution**.
58
+
59
+ - It turns intent into concrete system actions
60
+ - It does not coordinate other agents
61
+ - It does not monitor agent health or resource usage
62
+
63
+ If you want visibility into what agents are doing on your machine,
64
+ see **Riva**, the local observability layer.
65
+
50
66
  ## Highlights
51
67
 
52
68
  - **Full local access** — the agent can run any command, read/write any file, take screenshots. No artificial 6-tool sandbox.
53
69
  - **Multi-channel gateway** — Telegram today, Discord/Slack/WhatsApp tomorrow. One Gateway, many clients.
54
70
  - **Conversation memory** — session continuity across messages. Ask follow-up questions naturally.
55
- - **Extensible model layer** — Claude Code agent supports any provider that speaks the Anthropic Messages API (including [Ollama](https://ollama.com) for local models). See [Claude Code docs](https://docs.anthropic.com/en/docs/claude-code) for model configuration.
71
+ - **Multi-agent backends** — ships with Claude Code (default), Codex (OpenAI), Gemini CLI (Google), and OpenCode. Switch with `AGENT_PROVIDER=codex` in `.env`.
56
72
  - **Approve-by-default** — safe commands auto-approve. Protected folders (Desktop, Documents, etc.) prompt for confirmation via inline buttons.
57
73
  - **MCP server** — expose your machine as a tool server for Claude Desktop or any MCP client.
58
74
  - **Runs as service** — launchd (macOS) or systemd (Linux) integration, starts on boot, restarts on crash.
59
- - **Extensible agent layer** — ships with Claude Code agent. Bring your own via `registerProvider()`.
75
+ - **Extensible agent layer** — bring your own agent via `registerProvider()`.
60
76
 
61
77
  ## Requirements
62
78
 
63
79
  - **macOS** (tested on Ventura, Sonoma, Sequoia) or **Linux** (with systemd)
64
80
  - Windows via [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install)
65
81
  - Node.js 18+
66
- - [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) installed (`which claude`)
82
+ - One of the supported agent CLIs installed (see [Agent Providers](#agent-providers))
67
83
  - Telegram account (for Telegram mode)
68
- - Anthropic API key (or configure Claude Code CLI for [alternative providers](https://docs.anthropic.com/en/docs/claude-code))
84
+ - API key for your chosen provider (Anthropic, OpenAI, or Google OpenCode manages its own auth)
69
85
 
70
86
  ### Linux Prerequisites
71
87
 
@@ -87,25 +103,34 @@ The installer guides you through these (macOS only). You can also configure them
87
103
 
88
104
  ## Quick Start
89
105
 
90
- ### Install from npm (recommended for users)
106
+ ### Option A: Install from npm (recommended)
91
107
 
92
108
  ```bash
93
- npm install -g deskmate
109
+ npm install -g @sarkar-ai/deskmate
94
110
  deskmate init
95
111
  ```
96
112
 
97
113
  The wizard walks you through everything: API keys, Telegram credentials,
98
- platform permissions, and background service setup.
114
+ platform permissions, and background service setup. Config is stored in
115
+ `~/.config/deskmate/.env`.
116
+
117
+ After setup, run manually with `deskmate` or let the background service handle it.
99
118
 
100
- ### Install from source (for contributors)
119
+ ### Option B: Install from source (for contributors)
101
120
 
102
121
  ```bash
103
122
  git clone https://github.com/sarkar-ai-taken/deskmate.git
104
123
  cd deskmate
105
124
  npm install --legacy-peer-deps
106
- cp .env.example .env # edit with your credentials
107
125
  npm run build
108
- ./install.sh # or: npx deskmate init
126
+ ./install.sh # interactive: configures .env, service, permissions
127
+ ```
128
+
129
+ Or use the TypeScript wizard instead of the shell installer:
130
+
131
+ ```bash
132
+ cp .env.example .env # edit with your credentials
133
+ npx deskmate init # or: npm link && deskmate init
109
134
  ```
110
135
 
111
136
  To reconfigure later: `deskmate init`
@@ -114,30 +139,26 @@ To reconfigure later: `deskmate init`
114
139
 
115
140
  | Mode | Command | Description |
116
141
  |------|---------|-------------|
117
- | Telegram | `deskmate telegram` | Standalone Telegram bot (legacy) |
118
- | Gateway | `deskmate gateway` | Multi-client gateway (recommended for new setups) |
142
+ | Gateway | `deskmate` | Multi-client gateway (default) |
119
143
  | MCP | `deskmate mcp` | MCP server for Claude Desktop |
120
- | Both | `deskmate both` | Telegram + MCP simultaneously |
144
+ | Both | `deskmate both` | Gateway + MCP simultaneously |
145
+
146
+ > **Note:** `deskmate telegram` still works but is a deprecated alias that starts the gateway.
121
147
 
122
148
  ## Gateway Mode
123
149
 
124
- The gateway is the recommended way to run Deskmate. It separates platform I/O from agent logic, so adding a new messaging client doesn't require touching auth, sessions, or the agent layer.
150
+ The gateway is the default way to run Deskmate. It separates platform I/O from agent logic, so adding a new messaging client doesn't require touching auth, sessions, or the agent layer.
125
151
 
126
152
  ```bash
127
153
  # Configure multi-client auth
128
154
  ALLOWED_USERS=telegram:123456,discord:987654321
129
155
 
130
156
  # Start
131
- deskmate gateway
157
+ deskmate
132
158
  ```
133
159
 
134
160
  The gateway auto-registers clients based on available env vars. If `TELEGRAM_BOT_TOKEN` is set, Telegram is active. Future clients (Discord, Slack) follow the same pattern.
135
161
 
136
- ### Gateway vs Telegram mode
137
-
138
- - **`deskmate telegram`** — original standalone bot. Simple, self-contained, no gateway overhead. Good for single-user Telegram-only setups.
139
- - **`deskmate gateway`** — centralized architecture. Auth, sessions, and agent orchestration are shared. Required for multi-client setups and recommended for new installations.
140
-
141
162
  ## Bot Commands
142
163
 
143
164
  | Command | Description |
@@ -187,32 +208,99 @@ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
187
208
 
188
209
  Restart Claude Desktop. You can now ask Claude to interact with your local machine.
189
210
 
190
- ### Combined Mode (MCP + Telegram)
211
+ ### Combined Mode (Gateway + MCP)
191
212
 
192
- Run both with `deskmate both`. MCP handles Claude Desktop requests; Telegram sends approval notifications to your phone so you can approve sensitive operations from anywhere.
213
+ Run both with `deskmate both`. MCP handles Claude Desktop requests; the gateway handles Telegram (and future clients), sending approval notifications to your phone so you can approve sensitive operations from anywhere.
214
+
215
+ ### Observability
216
+
217
+ Deskmate focuses on executing actions safely.
218
+
219
+ For monitoring agent behavior, resource usage, and failures across
220
+ multiple local agents, see **Riva** (local-first agent observability).
193
221
 
194
222
  ## Security
195
223
 
196
224
  > **Important**: The agent can execute arbitrary commands on your machine. This is by design — the strategy is approve-by-default for read-only operations, with approval gating for protected folders and write operations.
197
225
 
198
- **Built-in protections:**
226
+ ### Built-in protections
199
227
 
200
228
  | Layer | What it does |
201
229
  |-------|-------------|
202
- | **User authentication** | Only allowlisted user IDs can interact (per-client) |
203
- | **Folder protection** | Desktop, Documents, Downloads, etc. require explicit approval |
204
- | **No sudo by default** | The agent won't use sudo unless you explicitly ask |
205
- | **No open ports** | The bot polls Telegram's servers, doesn't expose any ports |
206
- | **Structured logging** | All actions are logged with timestamps for audit |
207
- | **Session isolation** | Gateway sessions are keyed by `clientType:channelId` |
208
-
209
- **Recommendations:**
230
+ | **User authentication** | Allowlist-based access control via `SecurityManager`. Only users in `ALLOWED_USERS` can interact. Supports per-client auth (`telegram:123`, `discord:456`) and wildcards (`*:*`). |
231
+ | **Action approval** | `ApprovalManager` gates sensitive operations. Write commands, file writes, and folder access require explicit human approval with configurable timeouts (default 5 min). |
232
+ | **Protected folders** | OS-aware folder protection. Desktop, Documents, Downloads, Pictures, Movies/Videos, Music, and iCloud (macOS) require approval. Session-based caching avoids repeated prompts. |
233
+ | **Safe command auto-approval** | Read-only commands (`ls`, `cat`, `git status`, `docker ps`, `node -v`, etc.) auto-approve. Full list in `src/core/approval.ts`. |
234
+ | **Command execution limits** | 2-minute timeout and 10 MB output buffer per command. Prevents runaway processes and memory exhaustion. |
235
+ | **Session isolation** | Sessions keyed by `clientType:channelId`. 30-minute idle timeout with automatic pruning. Optional disk persistence survives restarts. |
236
+ | **Input validation** | MCP tools use Zod schema validation. Telegram callbacks validated via regex patterns. |
237
+ | **No open ports** | The bot polls Telegram's servers — no inbound ports exposed. |
238
+ | **No sudo by default** | The agent won't use sudo unless you explicitly ask. |
239
+ | **Structured logging** | All actions logged with timestamps, context hierarchy, and configurable log levels for audit trails. |
240
+ | **Stale message protection** | Telegram client drops pending updates on startup (`drop_pending_updates: true`), preventing replay of messages received while offline. |
241
+
242
+ ### Approval workflow
243
+
244
+ 1. User sends a message that triggers a sensitive operation (e.g., writing to `~/Documents`)
245
+ 2. `ApprovalManager` checks if the action matches a safe auto-approve pattern
246
+ 3. If not safe, a pending approval is created with a timeout countdown
247
+ 4. Approval request is broadcast to all clients with recent activity (last 30 min)
248
+ 5. User taps Approve/Reject via inline buttons (Telegram) or equivalent
249
+ 6. Action executes on approval, or is cancelled on rejection/timeout
250
+
251
+ Set `REQUIRE_APPROVAL_FOR_ALL=true` to gate every operation, including reads.
252
+
253
+ ### Recommendations
254
+
210
255
  - Set `WORKING_DIR` to limit default command scope
211
- - Use `ALLOWED_USERS` (gateway mode) for multi-client allowlisting
256
+ - Use `ALLOWED_USERS` for multi-client allowlisting
257
+ - Use `ALLOWED_FOLDERS` to pre-approve specific directories
212
258
  - Review logs regularly (`logs/stdout.log`)
213
259
  - Keep `.env` secure and never commit it
214
260
  - Use `REQUIRE_APPROVAL_FOR_ALL=true` if you want to approve every operation
215
261
 
262
+ ### Execution Philosophy
263
+
264
+ Deskmate follows an **approve-by-default, visible-by-design** model.
265
+
266
+ - Read-only operations are auto-approved
267
+ - Sensitive operations require explicit confirmation
268
+ - All actions are logged locally
269
+
270
+ The goal is speed without hidden behavior.
271
+
272
+ ## Non-goals
273
+
274
+ Deskmate is intentionally not:
275
+ - A multi-agent orchestration framework
276
+ - A cloud-hosted control plane
277
+ - A long-running autonomous system
278
+ - A monitoring or observability tool
279
+
280
+ These constraints are deliberate.
281
+
282
+ ## Agent Providers
283
+
284
+ Deskmate supports multiple agent backends. Set `AGENT_PROVIDER` in your `.env` or select one during `deskmate init`.
285
+
286
+ | Provider | Binary | Env Var | Install |
287
+ |----------|--------|---------|---------|
288
+ | **Claude Code** (default) | `claude` | `ANTHROPIC_API_KEY` | [docs.anthropic.com](https://docs.anthropic.com/en/docs/claude-code) |
289
+ | **Codex** (OpenAI) | `codex` | `OPENAI_API_KEY` | [github.com/openai/codex](https://github.com/openai/codex) |
290
+ | **Gemini CLI** (Google) | `gemini` | `GEMINI_API_KEY` | [github.com/google-gemini/gemini-cli](https://github.com/google-gemini/gemini-cli) |
291
+ | **OpenCode** | `opencode` | *(manages own auth)* | [github.com/opencode-ai/opencode](https://github.com/opencode-ai/opencode) |
292
+
293
+ ```bash
294
+ # Switch provider
295
+ AGENT_PROVIDER=codex
296
+ OPENAI_API_KEY=sk-...
297
+
298
+ # Or use the wizard
299
+ deskmate init
300
+ ```
301
+
302
+ Only the API key matching your selected provider is required. Keys for other providers are preserved in `.env` if you switch back.
303
+
216
304
  ## Architecture
217
305
 
218
306
  ```
@@ -222,7 +310,11 @@ src/
222
310
  │ │ ├── types.ts # AgentProvider interface
223
311
  │ │ ├── factory.ts # Provider factory + registerProvider()
224
312
  │ │ └── providers/
225
- │ │ └── claude-code.ts # Claude Code SDK (default)
313
+ │ │ ├── claude-code.ts # Claude Code SDK (default)
314
+ │ │ ├── base-cli.ts # Base class for CLI-spawned providers
315
+ │ │ ├── codex.ts # Codex (OpenAI)
316
+ │ │ ├── gemini.ts # Gemini CLI (Google)
317
+ │ │ └── opencode.ts # OpenCode
226
318
  │ ├── approval.ts # Approval manager (auto-approve + manual)
227
319
  │ ├── executor.ts # Command execution, file I/O, screenshots
228
320
  │ └── logger.ts # Structured logging
@@ -233,13 +325,11 @@ src/
233
325
  │ └── session.ts # Session manager (composite keys, idle pruning)
234
326
  ├── clients/
235
327
  │ └── telegram.ts # Telegram adapter (grammY)
236
- ├── telegram/
237
- │ └── bot.ts # Legacy standalone Telegram bot
238
328
  └── mcp/
239
329
  └── server.ts # MCP server
240
330
  ```
241
331
 
242
- **Agent layer** — ships with Claude Code (`@anthropic-ai/claude-agent-sdk`). Full built-in tool access: Bash, Read, Write, Edit, Glob, Grep. Custom agent providers can be registered via `registerProvider()`.
332
+ **Agent layer** — ships with four providers: Claude Code (via `@anthropic-ai/claude-agent-sdk`), Codex, Gemini CLI, and OpenCode. The three non-Claude providers extend `BaseCliProvider`, which handles subprocess spawning and stdout streaming. Custom agent providers can be registered via `registerProvider()`.
243
333
 
244
334
  **Gateway layer** — central coordinator handling auth (`SecurityManager`), sessions (`SessionManager`), agent orchestration, approval routing, and screenshot delivery. Platform adapters implement the `MessagingClient` interface and do only I/O.
245
335
 
@@ -312,8 +402,9 @@ systemctl --user status deskmate.service
312
402
 
313
403
  **Bot not responding?**
314
404
  1. Check logs: `tail -f logs/stderr.log`
315
- 2. Verify your `ALLOWED_USER_ID` matches your Telegram ID
316
- 3. Ensure Claude Code CLI is installed: `which claude`
405
+ 2. Verify your `ALLOWED_USERS` includes your Telegram ID (e.g. `telegram:123456`)
406
+ 3. Ensure your agent CLI is installed (e.g. `which claude`, `which codex`, `which gemini`, `which opencode`)
407
+ 4. Run `deskmate doctor` to diagnose configuration issues
317
408
 
318
409
  **Commands timing out?**
319
410
  - Default timeout is 2 minutes
@@ -355,6 +446,22 @@ MIT License — see [LICENSE](LICENSE) for details.
355
446
 
356
447
  ## Acknowledgments
357
448
 
358
- - [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) — agent runtime
449
+ - [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-code/agent-sdk) — default agent runtime
450
+ - [Codex](https://github.com/openai/codex) — OpenAI agent backend
451
+ - [Gemini CLI](https://github.com/google-gemini/gemini-cli) — Google agent backend
452
+ - [OpenCode](https://github.com/opencode-ai/opencode) — OpenCode agent backend
359
453
  - [grammY](https://grammy.dev/) — Telegram bot framework
360
454
  - [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/sdk) — MCP support
455
+
456
+ ---
457
+
458
+ ## Community
459
+
460
+ - [Discord](https://discord.com/channels/1467923903597908244/1467926060195778692) — join the community, ask questions, share your setup
461
+
462
+ ## Share
463
+
464
+ If you find Deskmate useful, feel free to share:
465
+
466
+ - [Share on X](https://x.com/intent/post?text=Running%20real%20system%20actions%20with%20a%20local-first%20AI%20agent.%20Deskmate%20lets%20you%20control%20your%20machine%20using%20natural%20language.&url=https%3A%2F%2Fgithub.com%2Fsarkar-ai-taken%2Fdeskmate&via=sarkar_ai)
467
+ - [Post to Hacker News](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fgithub.com%2Fsarkar-ai-taken%2Fdeskmate&t=Deskmate%3A%20A%20local-first%20AI%20agent%20for%20executing%20real%20system%20actions)