@zhihand/mcp 0.20.0 → 0.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  ZhiHand MCP Server — let AI agents see and control your phone.
4
4
 
5
- Version: `0.16.0`
5
+ Version: `0.20.0`
6
6
 
7
7
  ## What is this?
8
8
 
@@ -74,6 +74,8 @@ zhihand start -d # Start daemon in background (detached)
74
74
 
75
75
  The daemon runs the MCP Server on `localhost:18686/mcp` (HTTP Streamable transport), maintains a brain heartbeat every 30 seconds (keeps the phone Brain indicator green), and listens for phone-initiated prompts.
76
76
 
77
+ When started with `-d`, daemon logs are written to `~/.zhihand/daemon.log`.
78
+
77
79
  ### 3. Start using it
78
80
 
79
81
  Once configured, your AI agent can use ZhiHand tools directly. For example, in Claude Code:
@@ -90,18 +92,19 @@ Once configured, your AI agent can use ZhiHand tools directly. For example, in C
90
92
  ```
91
93
  zhihand setup Interactive setup: pair + detect tools + auto-select + configure MCP + start daemon
92
94
  zhihand start Start daemon (MCP Server + Relay + Config API)
93
- zhihand start -d Start daemon in background (detached)
95
+ zhihand start -d Start daemon in background (logs to ~/.zhihand/daemon.log)
94
96
  zhihand stop Stop the running daemon
95
- zhihand status Show daemon status, pairing info, device, and active backend
97
+ zhihand status Show daemon status, pairing info, device, backend, and model
96
98
 
97
99
  zhihand pair Pair with a phone (QR code in terminal)
98
100
  zhihand detect List detected CLI tools and their login status
99
101
  zhihand serve Start MCP Server (stdio mode, backward compatible)
100
102
  zhihand --help Show help
101
103
 
102
- zhihand claude Switch backend to Claude Code (sends IPC to daemon, auto-configures MCP)
103
- zhihand codex Switch backend to Codex CLI (sends IPC to daemon, auto-configures MCP)
104
- zhihand gemini Switch backend to Gemini CLI (sends IPC to daemon, auto-configures MCP)
104
+ zhihand gemini Switch backend to Gemini CLI (default model: flash)
105
+ zhihand claude Switch backend to Claude Code (default model: sonnet)
106
+ zhihand codex Switch backend to Codex CLI (default model: gpt-5.4-mini)
107
+ zhihand gemini --model pro Switch backend with custom model
105
108
  ```
106
109
 
107
110
  ### Daemon Lifecycle
@@ -123,15 +126,28 @@ The daemon is a single persistent process that runs:
123
126
  Use `zhihand claude`, `zhihand codex`, or `zhihand gemini` to switch the active backend:
124
127
 
125
128
  ```bash
126
- zhihand gemini # Switch to Gemini CLI
127
- zhihand claude # Switch to Claude Code
128
- zhihand codex # Switch to Codex CLI
129
+ zhihand gemini # Switch to Gemini CLI (model: flash)
130
+ zhihand claude # Switch to Claude Code (model: sonnet)
131
+ zhihand codex # Switch to Codex CLI (model: gpt-5.4-mini)
132
+ zhihand gemini --model pro # Use a custom model
133
+ zhihand claude -m opus # Short flag form
129
134
  ```
130
135
 
136
+ Each backend has a **default model alias** that resolves to the latest version:
137
+
138
+ | Backend | Default | Alias examples | Resolution |
139
+ |---------|---------|---------------|------------|
140
+ | Gemini CLI | `flash` | `flash`, `pro` | Gemini CLI resolves natively (e.g. flash → gemini-2.5-flash) |
141
+ | Claude Code | `sonnet` | `sonnet`, `opus`, `haiku` | Claude Code resolves natively (e.g. sonnet → claude-sonnet-4) |
142
+ | Codex CLI | `gpt-5.4-mini` | any full model name | Codex requires full model names |
143
+
144
+ Model resolution priority: `--model` flag > `ZHIHAND_MODEL` env > `ZHIHAND_<BACKEND>_MODEL` env > default.
145
+
131
146
  When you switch:
132
147
  - The command sends an **IPC message to the running daemon**
133
148
  - MCP config is **automatically added** to the new backend
134
149
  - MCP config is **automatically removed** from the previous backend
150
+ - The model selection is **persisted** to `~/.zhihand/backend.json`
135
151
  - If the tool is not installed, an error is shown
136
152
 
137
153
  ### Options
@@ -139,6 +155,9 @@ When you switch:
139
155
  | Option | Description |
140
156
  |---|---|
141
157
  | `--device <name>` | Use a specific paired device (if you have multiple) |
158
+ | `--model, -m <name>` | Set model alias (e.g. `flash`, `pro`, `sonnet`, `opus`, `gpt-5.4-mini`) |
159
+ | `--port <port>` | Override daemon port (default: 18686) |
160
+ | `-d, --detach` | Run daemon in background |
142
161
  | `-h, --help` | Show help |
143
162
 
144
163
  ### Environment Variables
@@ -147,6 +166,10 @@ When you switch:
147
166
  |---|---|
148
167
  | `ZHIHAND_DEVICE` | Default device name (same as `--device`) |
149
168
  | `ZHIHAND_CLI` | Override CLI tool selection for mobile-initiated tasks |
169
+ | `ZHIHAND_MODEL` | Override model for all backends |
170
+ | `ZHIHAND_GEMINI_MODEL` | Override model for Gemini only |
171
+ | `ZHIHAND_CLAUDE_MODEL` | Override model for Claude only |
172
+ | `ZHIHAND_CODEX_MODEL` | Override model for Codex only |
150
173
 
151
174
  ## MCP Tools
152
175
 
@@ -160,12 +183,17 @@ The main phone control tool. Supports these actions:
160
183
  |---|---|---|
161
184
  | `click` | `xRatio`, `yRatio` | Tap at normalized coordinates [0,1] |
162
185
  | `doubleclick` | `xRatio`, `yRatio` | Double-tap |
163
- | `rightclick` | `xRatio`, `yRatio` | Right-click (long press) |
164
- | `middleclick` | `xRatio`, `yRatio` | Middle-click |
186
+ | `longclick` | `xRatio`, `yRatio`, `durationMs` | Long press (default 800ms) |
187
+ | `rightclick` | `xRatio`, `yRatio` | Right-click (desktop/BLE HID) |
188
+ | `middleclick` | `xRatio`, `yRatio` | Middle-click (desktop/BLE HID) |
165
189
  | `type` | `text` | Type text into the focused field |
166
- | `swipe` | `startXRatio`, `startYRatio`, `endXRatio`, `endYRatio` | Swipe gesture |
190
+ | `swipe` | `startXRatio`, `startYRatio`, `endXRatio`, `endYRatio`, `durationMs` | Swipe gesture (default 300ms) |
167
191
  | `scroll` | `xRatio`, `yRatio`, `direction`, `amount` | Scroll up/down/left/right |
168
192
  | `keycombo` | `keys` | Key combination (e.g. `"ctrl+c"`, `"alt+tab"`) |
193
+ | `back` | — | Press system Back button |
194
+ | `home` | — | Press system Home button |
195
+ | `enter` | — | Press Enter key |
196
+ | `open_app` | `appPackage`, `bundleId`, `urlScheme`, `appName` | Open an application |
169
197
  | `clipboard` | `clipboardAction` (`get`/`set`), `text` | Read or write clipboard |
170
198
  | `wait` | `durationMs` | Wait (local sleep, no server round-trip) |
171
199
  | `screenshot` | — | Capture screen immediately |
@@ -234,8 +262,9 @@ Pairing credentials are stored at:
234
262
  ```
235
263
  ~/.zhihand/
236
264
  ├── credentials.json # Device credentials (credentialId, controllerToken, endpoint)
237
- ├── backend.json # Active backend selection (claudecode/codex/gemini)
265
+ ├── backend.json # Active backend + model selection
238
266
  ├── daemon.pid # Daemon PID file (for zhihand stop)
267
+ ├── daemon.log # Daemon log output (when started with -d)
239
268
  └── state.json # Current pairing session state
240
269
  ```
241
270
 
@@ -267,7 +296,8 @@ packages/mcp/
267
296
  │ ├── index.ts # MCP Server (stdio transport, legacy)
268
297
  │ ├── openclaw.adapter.ts # OpenClaw Plugin adapter (thin wrapper)
269
298
  │ ├── core/
270
- │ │ ├── config.ts # Credential & config management (~/.zhihand/)
299
+ │ │ ├── config.ts # Credential & config management (~/.zhihand/), default models
300
+ │ │ ├── resolve-path.ts # Platform-aware executable path resolution (gemini/claude/codex)
271
301
  │ │ ├── command.ts # Command creation, enqueue, ACK formatting
272
302
  │ │ ├── screenshot.ts # Binary screenshot fetch (JPEG)
273
303
  │ │ ├── sse.ts # SSE client + hybrid ACK (SSE push + polling fallback)
@@ -2,7 +2,7 @@
2
2
  * Platform-aware executable path resolution.
3
3
  * Shared by both the CLI detection layer and the daemon dispatcher.
4
4
  */
5
- import { execSync } from "node:child_process";
5
+ import { execFileSync } from "node:child_process";
6
6
  import fs from "node:fs";
7
7
  import path from "node:path";
8
8
  import os from "node:os";
@@ -19,7 +19,7 @@ export function resolveExecutable(name, fallbackPaths) {
19
19
  return cached;
20
20
  // Try `which` first (works when the binary is in PATH)
21
21
  try {
22
- const resolved = execSync(`which ${name}`, { encoding: "utf8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] }).trim();
22
+ const resolved = execFileSync("which", [name], { encoding: "utf8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] }).trim();
23
23
  if (resolved) {
24
24
  cache.set(name, resolved);
25
25
  return resolved;
@@ -5,8 +5,7 @@ export interface DispatchResult {
5
5
  durationMs: number;
6
6
  }
7
7
  /**
8
- * Kill the active child process. Returns a promise that resolves
9
- * when the child has exited (or immediately if no child).
8
+ * Kill the active session. Called by daemon on shutdown or backend switch.
10
9
  */
11
10
  export declare function killActiveChild(): Promise<void>;
12
11
  export declare function dispatchToCLI(backend: Exclude<BackendName, "openclaw">, prompt: string, log: (msg: string) => void, model?: string): Promise<DispatchResult>;
@@ -5,9 +5,10 @@ import os from "node:os";
5
5
  import { fileURLToPath } from "node:url";
6
6
  import { DEFAULT_MODELS } from "../core/config.js";
7
7
  import { resolveGemini, resolveClaude, resolveCodex } from "../core/resolve-path.js";
8
- const CLI_TIMEOUT = 120_000; // 120s
8
+ const CLI_TIMEOUT = 120_000; // 120s per prompt
9
9
  const SIGKILL_DELAY = 2_000; // 2s after SIGTERM
10
- const MAX_OUTPUT_BYTES = 100 * 1024; // 100KB
10
+ const MAX_OUTPUT_BYTES = 100 * 1024; // 100KB (for one-shot backends)
11
+ const MAX_HISTORY_TURNS = 20; // keep last N exchanges in conversation history
11
12
  // Gemini session file polling
12
13
  const SESSION_POLL_INTERVAL = 1_000; // 1s
13
14
  const SESSION_STABILITY_DELAY = 2_000; // wait 2s after outcome before returning
@@ -16,7 +17,8 @@ const __dirname = path.dirname(fileURLToPath(import.meta.url));
16
17
  const PTY_WRAP_SCRIPT = path.resolve(__dirname, "../../scripts/pty-wrap.py");
17
18
  // Gemini session directories
18
19
  const GEMINI_TMP_DIR = path.join(os.homedir(), ".gemini", "tmp");
19
- let activeChild = null;
20
+ let session = null;
21
+ const conversationHistory = [];
20
22
  // ── Gemini Session File Monitoring ─────────────────────────
21
23
  /** Safely read and parse a JSON file (single attempt, async). */
22
24
  async function loadJsonFile(filePath) {
@@ -26,7 +28,6 @@ async function loadJsonFile(filePath) {
26
28
  return typeof parsed === "object" && parsed !== null ? parsed : null;
27
29
  }
28
30
  catch {
29
- // File locked or partial write — next poll cycle will retry
30
31
  return null;
31
32
  }
32
33
  }
@@ -56,7 +57,6 @@ function extractMessageText(message) {
56
57
  if (typeof obj.text === "string")
57
58
  return obj.text;
58
59
  }
59
- // Fallback to displayContent
60
60
  const display = message.displayContent;
61
61
  if (typeof display === "string")
62
62
  return display;
@@ -86,7 +86,6 @@ function hasActiveToolCalls(message) {
86
86
  function checkSessionOutcome(messages) {
87
87
  if (messages.length === 0)
88
88
  return null;
89
- // Get the latest turn messages (trailing messages from last user input)
90
89
  const trailing = [];
91
90
  for (let i = messages.length - 1; i >= 0; i--) {
92
91
  const msg = messages[i];
@@ -96,22 +95,18 @@ function checkSessionOutcome(messages) {
96
95
  }
97
96
  if (trailing.length === 0)
98
97
  return null;
99
- // If any message has active tool calls, still in progress
100
98
  for (const msg of trailing) {
101
99
  if (hasActiveToolCalls(msg))
102
100
  return null;
103
101
  }
104
- // Check from last message backwards for a result
105
102
  for (let i = trailing.length - 1; i >= 0; i--) {
106
103
  const msg = trailing[i];
107
104
  const msgType = String(msg.type ?? "").trim();
108
- // Error/warning/info messages
109
105
  if (["error", "warning", "info"].includes(msgType)) {
110
106
  const text = extractMessageText(msg).trim();
111
107
  if (text)
112
108
  return ["error", text];
113
109
  }
114
- // Gemini response message
115
110
  if (msgType === "gemini") {
116
111
  const text = extractMessageText(msg).trim();
117
112
  if (text)
@@ -152,21 +147,19 @@ async function findLatestSessionFile(afterTime, promptText) {
152
147
  }
153
148
  }
154
149
  }
155
- // Sort newest first, then validate content matches our prompt
156
150
  candidates.sort((a, b) => b.mtime - a.mtime);
157
151
  const promptPrefix = promptText.slice(0, 50);
158
152
  for (const candidate of candidates) {
159
153
  const data = await loadJsonFile(candidate.path);
160
154
  if (!data || !Array.isArray(data.messages))
161
155
  continue;
162
- // Check first user message matches our prompt
163
156
  for (const msg of data.messages) {
164
157
  if (String(msg.type ?? "").trim() !== "user")
165
158
  continue;
166
159
  const text = extractMessageText(msg);
167
160
  if (text.startsWith(promptPrefix))
168
161
  return candidate.path;
169
- break; // Only check first user message
162
+ break;
170
163
  }
171
164
  }
172
165
  return null;
@@ -175,33 +168,44 @@ async function findLatestSessionFile(afterTime, promptText) {
175
168
  return null;
176
169
  }
177
170
  }
171
+ /** Count how many "user" type messages are in the session */
172
+ function countUserMessages(messages) {
173
+ return messages.filter(m => String(m.type ?? "").trim() === "user").length;
174
+ }
178
175
  /**
179
- * Poll gemini session files for the response.
180
- * Returns the final text when gemini completes, or null on timeout.
176
+ * Poll gemini session file for the response to the current prompt.
177
+ *
178
+ * For persistent sessions:
179
+ * - First prompt: find the session file, wait for first response, keep process alive
180
+ * - Subsequent: session file known, wait for new user message + response
181
181
  */
182
- function pollGeminiSession(child, startTime, promptText, log) {
182
+ function pollGeminiSession(child, startTime, promptText, log, knownSessionFile, expectedUserCount) {
183
183
  return new Promise((resolve) => {
184
- let sessionFile = null;
184
+ let sessionFile = knownSessionFile;
185
185
  let outcomeAt = null;
186
186
  let finalResult = null;
187
187
  let settled = false;
188
188
  let pollTimeout = null;
189
+ let newUserSeen = knownSessionFile === null; // first prompt: don't wait for user msg
189
190
  function settle(result) {
190
191
  if (settled)
191
192
  return;
192
193
  settled = true;
193
194
  if (pollTimeout)
194
195
  clearTimeout(pollTimeout);
195
- // Kill the gemini process now that we have the answer
196
- closeChild(child);
196
+ // DON'T kill the child persistent session keeps it alive
197
197
  resolve(result);
198
198
  }
199
199
  async function poll() {
200
200
  if (settled)
201
201
  return;
202
202
  const elapsed = Date.now() - startTime;
203
- // Timeout
204
203
  if (elapsed > CLI_TIMEOUT) {
204
+ // Kill the timed-out session to prevent zombie processes
205
+ if (session?.child === child) {
206
+ session.alive = false;
207
+ log(`[gemini] Session timed out — killing process`);
208
+ }
205
209
  closeChild(child);
206
210
  settle({
207
211
  text: "Gemini timed out after 120s.",
@@ -210,16 +214,18 @@ function pollGeminiSession(child, startTime, promptText, log) {
210
214
  });
211
215
  return;
212
216
  }
213
- // Find session file if not yet found
217
+ // Find session file if not yet found (first prompt only)
214
218
  if (!sessionFile) {
215
219
  sessionFile = await findLatestSessionFile(startTime, promptText);
216
220
  if (sessionFile) {
217
221
  log(`[gemini] Session file found: ${path.basename(sessionFile)}`);
222
+ if (session)
223
+ session.geminiSessionFile = sessionFile;
218
224
  }
219
225
  schedulePoll();
220
226
  return;
221
227
  }
222
- // Read session file and check for outcome
228
+ // Read session file
223
229
  const conversation = await loadJsonFile(sessionFile);
224
230
  if (!conversation) {
225
231
  schedulePoll();
@@ -230,15 +236,23 @@ function pollGeminiSession(child, startTime, promptText, log) {
230
236
  schedulePoll();
231
237
  return;
232
238
  }
239
+ // For subsequent prompts: wait until the new user message appears
240
+ if (!newUserSeen) {
241
+ const userCount = countUserMessages(messages);
242
+ if (userCount < expectedUserCount) {
243
+ schedulePoll();
244
+ return;
245
+ }
246
+ newUserSeen = true;
247
+ log(`[gemini] New user message detected (turn #${expectedUserCount})`);
248
+ }
233
249
  const outcome = checkSessionOutcome(messages);
234
250
  if (!outcome) {
235
- // Still in progress, reset stability timer
236
251
  outcomeAt = null;
237
252
  finalResult = null;
238
253
  schedulePoll();
239
254
  return;
240
255
  }
241
- // Outcome detected — wait for stability (2s) before returning
242
256
  if (!outcomeAt) {
243
257
  outcomeAt = Date.now();
244
258
  finalResult = outcome;
@@ -262,12 +276,16 @@ function pollGeminiSession(child, startTime, promptText, log) {
262
276
  return;
263
277
  pollTimeout = setTimeout(() => { poll(); }, SESSION_POLL_INTERVAL);
264
278
  }
265
- // Start polling
266
279
  schedulePoll();
267
- // Also handle process exit (in case it crashes before producing session file)
268
- child.on("close", (code) => {
280
+ // Handle unexpected process exit
281
+ const onClose = (code) => {
269
282
  if (settled)
270
283
  return;
284
+ // Mark session as dead
285
+ if (session?.child === child) {
286
+ session.alive = false;
287
+ log(`[gemini] Session process exited with code ${code}`);
288
+ }
271
289
  // Give a final chance to read the session file
272
290
  setTimeout(async () => {
273
291
  if (settled)
@@ -292,14 +310,14 @@ function pollGeminiSession(child, startTime, promptText, log) {
292
310
  durationMs: Date.now() - startTime,
293
311
  });
294
312
  }, 500);
295
- });
313
+ };
314
+ child.on("close", onClose);
296
315
  });
297
316
  }
298
- /** Gracefully close a child process: EOF → SIGTERM → SIGKILL. */
317
+ /** Gracefully close a child process: SIGTERM → SIGKILL. */
299
318
  function closeChild(child) {
300
319
  if (child.killed || child.exitCode !== null)
301
320
  return;
302
- // Try SIGTERM first
303
321
  child.kill("SIGTERM");
304
322
  setTimeout(() => {
305
323
  if (!child.killed && child.exitCode === null) {
@@ -307,29 +325,29 @@ function closeChild(child) {
307
325
  }
308
326
  }, SIGKILL_DELAY);
309
327
  }
310
- /**
311
- * Kill the active child process. Returns a promise that resolves
312
- * when the child has exited (or immediately if no child).
313
- */
314
- export function killActiveChild() {
315
- if (!activeChild || activeChild.killed) {
328
+ /** Close the persistent session and clear conversation history. */
329
+ function closeSession() {
330
+ if (!session)
331
+ return Promise.resolve();
332
+ const s = session;
333
+ session = null;
334
+ if (!s.alive)
316
335
  return Promise.resolve();
317
- }
318
336
  return new Promise((resolve) => {
319
- const child = activeChild;
320
- child.once("close", () => resolve());
321
- closeChild(child);
322
- // Safety: resolve after SIGKILL_DELAY + 1s even if no close event
337
+ s.child.once("close", () => resolve());
338
+ closeChild(s.child);
323
339
  setTimeout(() => resolve(), SIGKILL_DELAY + 1000);
324
340
  });
325
341
  }
326
- // ── System Prompt ─────────────────────────────────────────
327
342
  /**
328
- * Wrap the user's raw prompt with system context so the CLI backend
329
- * knows about the connected phone and how to use zhihand MCP tools.
343
+ * Kill the active session. Called by daemon on shutdown or backend switch.
330
344
  */
331
- function wrapPrompt(userPrompt) {
332
- return `You are ZhiHand, an AI assistant connected to the user's mobile phone via MCP tools.
345
+ export async function killActiveChild() {
346
+ await closeSession();
347
+ conversationHistory.length = 0;
348
+ }
349
+ // ── System Prompt ─────────────────────────────────────────
350
+ const SYSTEM_CONTEXT = `You are ZhiHand, an AI assistant connected to the user's mobile phone via MCP tools.
333
351
 
334
352
  ## Available MCP Tools
335
353
 
@@ -359,26 +377,54 @@ Control the phone. Requires "action" parameter. All coordinates use normalized r
359
377
  - When the user asks to see their screen, ALWAYS call zhihand_screenshot first.
360
378
  - When the user asks to open an app (e.g. WeChat, Settings), use open_app action.
361
379
  - When the user asks to go back/home, use back/home actions.
362
- - For all tap/click operations, use xRatio and yRatio (0-1 normalized coordinates based on the screenshot).
363
-
364
- User message:
365
- ${userPrompt}`;
380
+ - For all tap/click operations, use xRatio and yRatio (0-1 normalized coordinates based on the screenshot).`;
381
+ /**
382
+ * Build the full system prompt with optional conversation history.
383
+ * Used for first prompt in persistent sessions and all one-shot calls.
384
+ */
385
+ function wrapPrompt(userPrompt, history) {
386
+ let result = SYSTEM_CONTEXT;
387
+ if (history && history.length > 0) {
388
+ result += "\n\n## Recent Conversation\n";
389
+ for (const turn of history) {
390
+ const label = turn.role === "user" ? "User" : "Assistant";
391
+ // Truncate long assistant responses in history to save tokens
392
+ const text = turn.text.length > 500 ? turn.text.slice(0, 500) + "..." : turn.text;
393
+ result += `\n${label}: ${text}\n`;
394
+ }
395
+ }
396
+ result += `\nUser message:\n${userPrompt}`;
397
+ return result;
398
+ }
399
+ // ── Conversation History Helpers ─────────────────────────────
400
+ function recordTurn(role, text) {
401
+ conversationHistory.push({ role, text });
402
+ // Trim to keep last N exchanges (2 turns per exchange)
403
+ while (conversationHistory.length > MAX_HISTORY_TURNS * 2) {
404
+ conversationHistory.shift();
405
+ }
366
406
  }
367
407
  // ── Dispatch Entrypoint ────────────────────────────────────
368
408
  export function dispatchToCLI(backend, prompt, log, model) {
369
409
  const startTime = Date.now();
370
- const wrappedPrompt = wrapPrompt(prompt);
371
- // Resolve model: explicit > env > default
372
410
  const resolvedModel = resolveModel(backend, model);
373
- log(`[dispatch] Backend: ${backend}, Model: ${resolvedModel}`);
411
+ // Check if existing session matches — if not, close it
412
+ const canReuse = session?.alive && session.backend === backend && session.model === resolvedModel;
413
+ if (session && !canReuse) {
414
+ log(`[dispatch] Session mismatch (was ${session.backend}/${session.model}), closing old session`);
415
+ closeSession();
416
+ conversationHistory.length = 0;
417
+ }
418
+ const sessionLabel = canReuse ? `#${session.promptCount + 1}` : "new";
419
+ log(`[dispatch] Backend: ${backend}, Model: ${resolvedModel}, Session: ${sessionLabel}`);
374
420
  if (backend === "gemini") {
375
- return dispatchGemini(wrappedPrompt, startTime, log, resolvedModel);
421
+ return dispatchGeminiPersistent(prompt, startTime, log, resolvedModel);
376
422
  }
377
423
  if (backend === "codex") {
378
- return dispatchCodex(wrappedPrompt, startTime, resolvedModel);
424
+ return dispatchCodexWithHistory(prompt, startTime, log, resolvedModel);
379
425
  }
380
426
  if (backend === "claudecode") {
381
- return dispatchClaude(wrappedPrompt, startTime, resolvedModel);
427
+ return dispatchClaudeWithHistory(prompt, startTime, log, resolvedModel);
382
428
  }
383
429
  return Promise.resolve({
384
430
  text: `Unsupported backend: ${backend}`,
@@ -389,20 +435,13 @@ export function dispatchToCLI(backend, prompt, log, model) {
389
435
  /**
390
436
  * Resolve the model to use for a backend.
391
437
  * Priority: explicit parameter > ZHIHAND_MODEL env > backend-specific env > default alias.
392
- *
393
- * Each backend CLI handles alias→full-name resolution natively:
394
- * - Gemini CLI: "flash" → gemini-2.5-flash, "pro" → gemini-2.5-pro
395
- * - Claude Code: "sonnet" → claude-sonnet-4-*, "opus" → claude-opus-4-*, "haiku" → claude-haiku-4-*
396
- * - Codex CLI: no alias support — pass full model name directly (e.g. "o4-mini", "codex-mini")
397
438
  */
398
439
  function resolveModel(backend, explicit) {
399
440
  if (explicit)
400
441
  return explicit;
401
- // Global env override
402
442
  const globalEnv = process.env.ZHIHAND_MODEL;
403
443
  if (globalEnv)
404
444
  return globalEnv;
405
- // Per-backend env override
406
445
  const envMap = {
407
446
  gemini: process.env.ZHIHAND_GEMINI_MODEL,
408
447
  claudecode: process.env.ZHIHAND_CLAUDE_MODEL,
@@ -413,12 +452,26 @@ function resolveModel(backend, explicit) {
413
452
  return perBackend;
414
453
  return DEFAULT_MODELS[backend];
415
454
  }
416
- // ── Gemini Dispatch (PTY + Session File Monitoring) ────────
417
- function dispatchGemini(prompt, startTime, log, model) {
455
+ // ── Gemini Dispatch (Persistent PTY Session) ─────────────────
456
+ async function dispatchGeminiPersistent(prompt, startTime, log, model) {
457
+ // Reuse existing session?
458
+ if (session?.alive && session.backend === "gemini") {
459
+ session.promptCount++;
460
+ const turnNum = session.promptCount;
461
+ log(`[gemini] Reusing session — sending prompt #${turnNum}`);
462
+ // Write raw prompt to PTY stdin (gemini already has system context from first prompt)
463
+ session.child.stdin?.write(prompt + "\n");
464
+ const result = await pollGeminiSession(session.child, startTime, prompt, log, session.geminiSessionFile, turnNum);
465
+ recordTurn("user", prompt);
466
+ recordTurn("assistant", result.text);
467
+ return result;
468
+ }
469
+ // New session — spawn gemini with first prompt
470
+ const wrappedPrompt = wrapPrompt(prompt);
418
471
  const cliArgs = [
419
472
  "--approval-mode", "yolo",
420
473
  "--model", model,
421
- "-i", prompt,
474
+ "-i", wrappedPrompt,
422
475
  ];
423
476
  const env = {
424
477
  ...process.env,
@@ -426,51 +479,80 @@ function dispatchGemini(prompt, startTime, log, model) {
426
479
  TERM: "xterm-256color",
427
480
  COLORTERM: "truecolor",
428
481
  };
429
- // Wrap with PTY so gemini sees isatty()==true
430
482
  const geminiPath = resolveGemini();
483
+ log(`[gemini] Starting new persistent session (model: ${model})`);
431
484
  const child = spawn("python3", [PTY_WRAP_SCRIPT, geminiPath, ...cliArgs], {
432
485
  env,
433
- stdio: ["ignore", "pipe", "pipe"],
486
+ stdio: ["pipe", "pipe", "pipe"], // stdin=pipe for subsequent prompts
434
487
  detached: false,
435
488
  });
436
- activeChild = child;
437
- // Drain PTY output (discard — we read from session file instead)
489
+ session = {
490
+ child,
491
+ backend: "gemini",
492
+ model,
493
+ promptCount: 1,
494
+ alive: true,
495
+ geminiSessionFile: null,
496
+ };
497
+ // Handle unexpected exit — mark session dead
498
+ child.on("close", (code) => {
499
+ if (session?.child === child) {
500
+ session.alive = false;
501
+ log(`[gemini] Session process exited (code ${code})`);
502
+ }
503
+ });
504
+ // Drain PTY stdout/stderr (we read from session file, not stdout)
438
505
  child.stdout?.resume();
439
506
  child.stderr?.resume();
440
- return pollGeminiSession(child, startTime, prompt, log);
507
+ const result = await pollGeminiSession(child, startTime, wrappedPrompt, log, null, // no known session file yet
508
+ 1);
509
+ recordTurn("user", prompt);
510
+ recordTurn("assistant", result.text);
511
+ return result;
441
512
  }
442
- // ── Codex Dispatch ─────────────────────────────────────────
443
- function dispatchCodex(prompt, startTime, model) {
444
- // --dangerously-bypass-approvals-and-sandbox is required so MCP tool calls
445
- // are not auto-cancelled in non-interactive mode (--full-auto cancels them)
513
+ // ── Codex Dispatch (One-shot with History) ────────────────────
514
+ async function dispatchCodexWithHistory(prompt, startTime, log, model) {
515
+ // Include conversation history in the prompt for context
516
+ const fullPrompt = wrapPrompt(prompt, conversationHistory);
446
517
  const args = ["exec", "--dangerously-bypass-approvals-and-sandbox", "--skip-git-repo-check", "--json"];
447
518
  args.push("-m", model);
448
- args.push(prompt);
519
+ // Pass prompt via stdin to avoid ARG_MAX limit with long conversation history
520
+ args.push("-");
449
521
  const codexPath = resolveCodex();
522
+ log(`[codex] One-shot dispatch (history: ${conversationHistory.length} turns)`);
450
523
  const child = spawn(codexPath, args, {
451
524
  env: process.env,
452
- stdio: ["ignore", "pipe", "pipe"],
525
+ stdio: ["pipe", "pipe", "pipe"],
453
526
  detached: false,
454
527
  });
455
- activeChild = child;
456
- return collectCodexOutput(child, startTime);
528
+ // Write prompt to stdin, then close to signal EOF
529
+ child.stdin?.write(fullPrompt);
530
+ child.stdin?.end();
531
+ const result = await collectCodexOutput(child, startTime);
532
+ recordTurn("user", prompt);
533
+ recordTurn("assistant", result.text);
534
+ return result;
457
535
  }
458
- // ── Claude Dispatch ────────────────────────────────────────
459
- function dispatchClaude(prompt, startTime, model) {
536
+ // ── Claude Dispatch (One-shot with History) ───────────────────
537
+ async function dispatchClaudeWithHistory(prompt, startTime, log, model) {
538
+ const fullPrompt = wrapPrompt(prompt, conversationHistory);
460
539
  const claudePath = resolveClaude();
461
- const child = spawn(claudePath, ["-p", prompt, "--model", model, "--output-format", "json"], {
540
+ log(`[claude] One-shot dispatch (history: ${conversationHistory.length} turns)`);
541
+ // Pass prompt via stdin (-p -) to avoid ARG_MAX limit with long conversation history
542
+ const child = spawn(claudePath, ["-p", "-", "--model", model, "--output-format", "json"], {
462
543
  env: process.env,
463
- stdio: ["ignore", "pipe", "pipe"],
544
+ stdio: ["pipe", "pipe", "pipe"],
464
545
  detached: false,
465
546
  });
466
- activeChild = child;
467
- return collectChildOutput(child, startTime);
547
+ // Write prompt to stdin, then close to signal EOF
548
+ child.stdin?.write(fullPrompt);
549
+ child.stdin?.end();
550
+ const result = await collectChildOutput(child, startTime);
551
+ recordTurn("user", prompt);
552
+ recordTurn("assistant", result.text);
553
+ return result;
468
554
  }
469
- /**
470
- * Collect codex JSONL output with streaming line parsing.
471
- * Processes each JSONL line as it arrives so we extract agent text
472
- * without buffering large binary payloads (e.g. base64 screenshots).
473
- */
555
+ // ── Codex JSONL Output Collector ──────────────────────────────
474
556
  function collectCodexOutput(child, startTime) {
475
557
  return new Promise((resolve) => {
476
558
  const texts = [];
@@ -505,50 +587,35 @@ function collectCodexOutput(child, startTime) {
505
587
  hasError = true;
506
588
  }
507
589
  }
508
- catch {
509
- // Not valid JSON — skip
510
- }
590
+ catch { /* skip non-JSON */ }
511
591
  }
512
592
  const timer = setTimeout(() => { closeChild(child); }, CLI_TIMEOUT);
513
- const onData = (data) => {
593
+ child.stdout?.on("data", (data) => {
514
594
  lineBuffer += data.toString("utf8");
515
595
  const lines = lineBuffer.split("\n");
516
- // Keep the last (possibly incomplete) line in the buffer
517
596
  lineBuffer = lines.pop() ?? "";
518
- for (const line of lines) {
597
+ for (const line of lines)
519
598
  processLine(line);
520
- }
521
- };
522
- child.stdout?.on("data", onData);
523
- // stderr is not JSONL, just discard
599
+ });
524
600
  child.stderr?.resume();
525
601
  child.on("close", (code) => {
526
602
  clearTimeout(timer);
527
- activeChild = null;
528
- // Process any remaining data in the buffer
529
603
  if (lineBuffer.trim())
530
604
  processLine(lineBuffer);
531
605
  const durationMs = Date.now() - startTime;
532
606
  let text = texts.join("\n\n");
533
607
  if (!text) {
534
- text = code === 0
535
- ? "Task completed (no output)."
536
- : `CLI process exited with code ${code}.`;
608
+ text = code === 0 ? "Task completed (no output)." : `CLI process exited with code ${code}.`;
537
609
  }
538
610
  settle({ text, success: !hasError && code === 0, durationMs });
539
611
  });
540
612
  child.on("error", (err) => {
541
613
  clearTimeout(timer);
542
- activeChild = null;
543
- settle({
544
- text: `CLI launch failed: ${err.message}`,
545
- success: false,
546
- durationMs: Date.now() - startTime,
547
- });
614
+ settle({ text: `CLI launch failed: ${err.message}`, success: false, durationMs: Date.now() - startTime });
548
615
  });
549
616
  });
550
617
  }
551
- // ── Shared: Collect stdout/stderr from a child process ─────
618
+ // ── Shared: Collect stdout/stderr from a child process ───────
552
619
  function collectChildOutput(child, startTime) {
553
620
  return new Promise((resolve) => {
554
621
  const chunks = [];
@@ -561,10 +628,7 @@ function collectChildOutput(child, startTime) {
561
628
  settled = true;
562
629
  resolve(result);
563
630
  }
564
- // Timeout with two-stage kill
565
- const timer = setTimeout(() => {
566
- closeChild(child);
567
- }, CLI_TIMEOUT);
631
+ const timer = setTimeout(() => { closeChild(child); }, CLI_TIMEOUT);
568
632
  const collectOutput = (data) => {
569
633
  if (truncated)
570
634
  return;
@@ -581,27 +645,18 @@ function collectChildOutput(child, startTime) {
581
645
  child.stderr?.on("data", collectOutput);
582
646
  child.on("close", (code) => {
583
647
  clearTimeout(timer);
584
- activeChild = null;
585
648
  const durationMs = Date.now() - startTime;
586
649
  let text = Buffer.concat(chunks).toString("utf8").trim();
587
- if (truncated) {
650
+ if (truncated)
588
651
  text += "\n\n[Output truncated at 100KB]";
589
- }
590
652
  if (!text) {
591
- text = code === 0
592
- ? "Task completed (no output)."
593
- : `CLI process exited with code ${code}.`;
653
+ text = code === 0 ? "Task completed (no output)." : `CLI process exited with code ${code}.`;
594
654
  }
595
655
  settle({ text, success: code === 0, durationMs });
596
656
  });
597
657
  child.on("error", (err) => {
598
658
  clearTimeout(timer);
599
- activeChild = null;
600
- settle({
601
- text: `CLI launch failed: ${err.message}`,
602
- success: false,
603
- durationMs: Date.now() - startTime,
604
- });
659
+ settle({ text: `CLI launch failed: ${err.message}`, success: false, durationMs: Date.now() - startTime });
605
660
  });
606
661
  });
607
662
  }
@@ -618,7 +673,6 @@ export async function postReply(config, promptId, text) {
618
673
  body: JSON.stringify({ role: "assistant", text }),
619
674
  signal: AbortSignal.timeout(30_000),
620
675
  });
621
- // 4xx = prompt cancelled, that's OK
622
676
  return response.ok || (response.status >= 400 && response.status < 500);
623
677
  }
624
678
  catch {
@@ -1,4 +1,11 @@
1
1
  import type { ZhiHandConfig } from "../core/config.ts";
2
+ /** Brain metadata included in every heartbeat, so the app always knows the current backend/model. */
3
+ export interface BrainMeta {
4
+ backend?: string | null;
5
+ model?: string | null;
6
+ }
7
+ /** Update the backend/model metadata that will be sent with the next heartbeat. */
8
+ export declare function setBrainMeta(meta: BrainMeta): void;
2
9
  export declare function sendBrainOnline(config: ZhiHandConfig): Promise<boolean>;
3
10
  export declare function sendBrainOffline(config: ZhiHandConfig): Promise<boolean>;
4
11
  export declare function startHeartbeatLoop(config: ZhiHandConfig, log: (msg: string) => void): void;
@@ -2,18 +2,28 @@ const HEARTBEAT_INTERVAL = 30_000; // 30s
2
2
  const HEARTBEAT_RETRY_INTERVAL = 5_000; // 5s on failure
3
3
  let heartbeatTimer;
4
4
  let retryTimer;
5
+ let currentMeta = {};
6
+ /** Update the backend/model metadata that will be sent with the next heartbeat. */
7
+ export function setBrainMeta(meta) {
8
+ currentMeta = meta;
9
+ }
5
10
  function buildUrl(config) {
6
11
  return `${config.controlPlaneEndpoint}/v1/credentials/${encodeURIComponent(config.credentialId)}/brain-status`;
7
12
  }
8
13
  async function sendHeartbeat(config, online) {
9
14
  try {
15
+ const body = { plugin_online: online };
16
+ if (currentMeta.backend)
17
+ body.backend = currentMeta.backend;
18
+ if (currentMeta.model)
19
+ body.model = currentMeta.model;
10
20
  const response = await fetch(buildUrl(config), {
11
21
  method: "POST",
12
22
  headers: {
13
23
  "Content-Type": "application/json",
14
24
  "x-zhihand-controller-token": config.controllerToken,
15
25
  },
16
- body: JSON.stringify({ plugin_online: online }),
26
+ body: JSON.stringify(body),
17
27
  signal: AbortSignal.timeout(10_000),
18
28
  });
19
29
  return response.ok;
@@ -7,7 +7,7 @@ import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/
7
7
  import { createServer as createMcpServer } from "../index.js";
8
8
  import { resolveConfig, loadBackendConfig, saveBackendConfig, resolveZhiHandDir, ensureZhiHandDir, DEFAULT_MODELS, } from "../core/config.js";
9
9
  import { PACKAGE_VERSION } from "../index.js";
10
- import { startHeartbeatLoop, stopHeartbeatLoop, sendBrainOffline } from "./heartbeat.js";
10
+ import { startHeartbeatLoop, stopHeartbeatLoop, sendBrainOffline, setBrainMeta } from "./heartbeat.js";
11
11
  import { PromptListener } from "./prompt-listener.js";
12
12
  import { dispatchToCLI, postReply, killActiveChild } from "./dispatcher.js";
13
13
  const DEFAULT_PORT = 18686;
@@ -82,6 +82,7 @@ function handleInternalAPI(req, res) {
82
82
  activeModel = model ?? null;
83
83
  saveBackendConfig({ activeBackend, model: activeModel });
84
84
  const effectiveModel = activeModel ?? DEFAULT_MODELS[activeBackend];
85
+ setBrainMeta({ backend: activeBackend, model: effectiveModel });
85
86
  log(`[config] Backend switched to ${activeBackend}, model: ${effectiveModel}`);
86
87
  res.writeHead(200, { "Content-Type": "application/json" });
87
88
  res.end(JSON.stringify({ ok: true, backend: activeBackend, model: effectiveModel }));
@@ -166,11 +167,12 @@ export async function startDaemon(options) {
166
167
  const backendConfig = loadBackendConfig();
167
168
  activeBackend = backendConfig.activeBackend ?? null;
168
169
  activeModel = backendConfig.model ?? null;
169
- // Log startup info
170
+ // Log startup info + set brain meta for heartbeat
170
171
  log(`ZhiHand v${PACKAGE_VERSION} starting...`);
171
172
  if (activeBackend) {
172
173
  const effectiveModel = activeModel ?? DEFAULT_MODELS[activeBackend];
173
174
  log(`[config] Backend: ${activeBackend}, Model: ${effectiveModel}`);
175
+ setBrainMeta({ backend: activeBackend, model: effectiveModel });
174
176
  }
175
177
  else {
176
178
  log(`[config] No backend configured. Use: zhihand gemini / zhihand claude / zhihand codex`);
package/dist/index.d.ts CHANGED
@@ -1,4 +1,4 @@
1
1
  import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2
- export declare const PACKAGE_VERSION = "0.20.0";
2
+ export declare const PACKAGE_VERSION = "0.22.0";
3
3
  export declare function createServer(deviceName?: string): McpServer;
4
4
  export declare function startStdioServer(deviceName?: string): Promise<void>;
package/dist/index.js CHANGED
@@ -5,7 +5,7 @@ import { controlSchema, screenshotSchema, pairSchema } from "./tools/schemas.js"
5
5
  import { executeControl } from "./tools/control.js";
6
6
  import { handleScreenshot } from "./tools/screenshot.js";
7
7
  import { handlePair } from "./tools/pair.js";
8
- export const PACKAGE_VERSION = "0.20.0";
8
+ export const PACKAGE_VERSION = "0.22.0";
9
9
  export function createServer(deviceName) {
10
10
  const server = new McpServer({
11
11
  name: "zhihand",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhihand/mcp",
3
- "version": "0.20.0",
3
+ "version": "0.22.0",
4
4
  "private": false,
5
5
  "type": "module",
6
6
  "description": "ZhiHand MCP Server — phone control tools for Claude Code, Codex, Gemini CLI, and OpenClaw",