npm - ucu-mcp - Versions diffs - 0.4.3 → 0.5.0 - Mend

ucu-mcp 0.4.3 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/CHANGELOG.md +32 -0
package/README.md +28 -3
package/dist/src/mcp/server.js +1 -0
package/dist/src/mcp/tools/element-tools.js +11 -0
package/dist/src/mcp/tools/index.d.ts +1 -1
package/dist/src/mcp/tools/index.js +1 -1
package/dist/src/mcp/tools/keyboard-tools.js +2 -2
package/dist/src/mcp/tools/screen-tools.js +139 -3
package/dist/src/platform/base.d.ts +31 -0
package/dist/src/platform/macos/base.d.ts +3 -1
package/dist/src/platform/macos/base.js +7 -1
package/dist/src/platform/macos/element.d.ts +20 -0
package/dist/src/platform/macos/element.js +255 -0
package/dist/src/platform/macos/screen.js +21 -18
package/dist/src/platform/macos/window.js +22 -0
package/dist/src/safety/guard.js +17 -2
package/dist/src/utils/input.js +24 -16
package/package.json +2 -1
package/skills/ucu-mcp/SKILL.md +100 -0
package/skills/ucu-mcp/agents/openai.yaml +4 -0
package/skills/ucu-mcp/references/tool-reference.md +255 -0
package/skills/ucu-mcp/references/troubleshooting.md +142 -0
package/skills/ucu-mcp/references/workflows.md +146 -0

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,38 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.5.0] - 2026-06-15
+cc-switch 操控诊断与视觉通道修复：打通 OCR + 字母快捷键 + 托盘 + 视觉降级 fallback + 内置 agent skill。
+### Added
+- **方向3 `describe_screen` 工具 + screenshot `describe` 选项（25→26 工具）**：结构化文本屏幕描述（OCR blocks + AX tree + foreground window），是 image content 在中转环境被降级为 URL 时的视觉 fallback。OCR/AX 各自 try/catch，失败聚合到 `errors[]` 不抛出。密码字段（`AXSecureTextField` 或 name 匹配 `/password|secret|token/i`）自动脱敏为 `[REDACTED]`。`screenshot(describe:true)` 在 image block 后追加 text block；独立 `describe_screen` 工具纯文本无 image。新增 `ScreenDescription` 类型（`base.ts`）；`axDepth` 用 `Math.min(depth,10)` 与 ax-tree 一致；`ocrBlocks` 默认 50 cap。
+- **方向4a `click_menu_bar_extra` 托盘支持 + SystemUIServer 遍历**：`findMenuBarExtra` JXA 两阶段——先查 app 自身 menuBarItems（`host:"self"`），为空或仅 Apple 菜单时追加遍历 `SystemUIServer` 进程托管的状态项（`host:"systemuiserver"`），按 description/name 双向 includes 匹配。`clickMenuBarExtra` 按 `host` 在正确进程重定位（SystemUIServer 顺序不稳定，按 name/description 二次匹配）。`MenuBarExtraItem` 加 `host` 字段。纯 LSUIElement 托盘应用现在可达。
+- **Agent Skill 三层**：`skills/ucu-mcp/SKILL.md`（精简入口 + YAML frontmatter）+ `references/{tool-reference,workflows,troubleshooting}.md`（深度文档，按需读取）+ `agents/openai.yaml`（Codex 接口元数据）。随 npm 包发布（`files` 字段加 `skills/`）。可通过 `npx skills add ucu-mcp -g -a codex/claude-code` 安装。README 加 "Agent Skill" 节。
+- `UCU_MCP_INSTRUCTIONS` 加 describe_screen 指引（image 不可见时的 fallback）。
+- `OBSERVE_ACTIONS` 加 `describe_screen`（observe 类，不触发 user-activity pause）。
+- 工具数断言 25→26 同步更新（tools-layer / cli-mcp / client-cli-smoke）。
+- +19 测试：describe_screen（OCR/AX 失败聚合、ocrBlocks cap、密码脱敏 AXSecureTextField/token、ocr=false/includeAx=false）+ screenshot describe=true + SystemUIServer 遍历（source 断言 / host 字段 / click 重定位）。
+### Fixed
+- **方向1 OCR 路径修复（两 bug 叠加）**：`ocrNative` candidates 加 4 级 `../../../../native/ocr/ocr-helper`（npm prod 下 screen.js 在 4 级深，原 3 级候选全 MISSING）；`ocrJxa` 重写绕开 CGImage 脆弱路径（`VNImageRequestHandler.initWithURLOptions` + `Ref()` + `ObjC.unwrap` + 像素维度 `pixelsWide/pixelsHigh`），消除 `NSNull unrecognized selector` 崩溃。删 `ocrJxa` 死代码 `scaleFactorX`（`buf.readUInt32BE(16)` 在权限缺失时抛 RangeError 绕过 hint）。`window.ts` `resolveNativeHelper` 同源 4 级路径修复。
+- **方向2 press_key 字母/数字键**：`typeText` 的 letterMap/digitMap 提升为模块级 `MAC_LETTER_KEY_CODES`/`MAC_DIGIT_KEY_CODES`；`pressKey` falsy-safe 三元回退（先特殊键 → 字母 `in` 判定 → 数字；用 `in` 防 `'a'`=0 穿透）。`keyboard-tools.ts` zod describe 更新（支持 a-z / 0-9）。
+- **SafetyGuard blocklist 修正（方向2 引入的回归）**：字母 q 可解析后 `cmd+shift+q`（注销）绕过 blocklist（只有 cmd+q）。blocklist 加 `cmd+shift+q` + `cmd+option+q`；`normalizeShortcut` 加修饰键别名归一化（alt→option / ctrl→control / cmd→command）根治 `cmd+alt+esc` 绕过 `cmd+option+esc`。
+- `findMenuBarExtra` JXA 进程名大小写/空格/连字符/下划线容差（`_norm`）；`matchMenuBarExtra` 无 selector 时过滤 Apple 菜单；`clickMenuBarExtra` null deref 防护。
+### Changed
+- 工具数 24→26（+`describe_screen`、+`click_menu_bar_extra`）。README/index.ts 注释同步。
+- `focus_app` 托盘回退建立 `windowId:"tray"` 的 activeTarget（pid:0），`validateActiveTarget` 对 tray 直接 return（不查 listWindows）。
+## [0.4.3] - 2026-06-14
+### Fixed
+- `registerTool` 回调漏调 `registry.register(name)`，导致启动日志 `Registered tools count:0`。一行修复（行为无变化，24 工具始终正常注册到 MCP server）。
 ## [0.4.2] - 2026-06-13
 ### Security

package/README.md CHANGED Viewed

@@ -72,16 +72,17 @@ If you installed globally, you can use the shorter form:
 ## Tool List
-UCU-MCP provides 22 tools across five categories:
+UCU-MCP provides 26 tools across five categories:
 ### Screen & Window
 | Tool | Description | Key Parameters |
 |------|-------------|----------------|
-| `screenshot` | Capture screen, window, or region as PNG/JPEG image content | `display?`, `windowId?`, `region?`, `maxWidth?`, `format?` |
+| `screenshot` | Capture screen, window, or region as PNG/JPEG image content; `describe=true` appends a structured text description (OCR + AX) for vision-degraded environments | `display?`, `windowId?`, `region?`, `maxWidth?`, `format?`, `describe?`, `describeOptions?` |
+| `describe_screen` | Structured text description of the screen (OCR blocks + AX tree + foreground window) — the vision-degraded fallback when image content is not visible to the model. Password fields are masked. | `display?`, `ocr?`, `includeAx?`, `axDepth?`, `ocrBlocks?`, `windowId?` |
 | `list_windows` | List all on-screen windows with IDs, titles, bounds | `includeMinimized?` |
 | `list_apps` | List visible macOS apps with pid, frontmost state, and window count | — |
-| `focus_app` | Select an app/window target context for later AX tools; returns `targetId`, `appName`, `pid`, `windowId`, `title`, and `capturedAt` | `app` |
+| `focus_app` | Select an app/window target context for later AX tools; returns `targetId`, `appName`, `pid`, `windowId`, `title`, and `capturedAt`. Falls back to a tray target for menu-bar-only apps. | `app` |
 | `get_window_state` | Get accessibility tree of a window, or the prior focus_app target when windowId is omitted | `windowId?`, `depth?`, `includeBounds?` |
 | `get_screen_size` | Get screen dimensions | `display?` |
 | `ocr` | Perform OCR on screen or region; returns text with bounding boxes and confidence | `display?`, `region?` |
@@ -112,6 +113,7 @@ UCU-MCP provides 22 tools across five categories:
 | `click_element` | Click an AX element by its id (from find_element), using the current focus_app target when app is omitted; refetches equivalent elements after UI updates | `elementId`, `app?`, `captureAfter?` |
 | `set_value` | Set an AX element's value directly without focusing it, using the current focus_app target when app is omitted | `elementId`, `value`, `app?`, `captureAfter?` |
 | `type_in_element` | Type text into a specific AX text field element; may focus the element and refetches equivalent elements after UI updates | `elementId`, `text`, `app?`, `clearFirst?`, `captureAfter?` |
+| `click_menu_bar_extra` | Click a menu-bar status item (tray icon) — for menu-bar-only apps (e.g. cc-switch) that focus_app cannot target. Finds items in the app's own menu bar or hosted by SystemUIServer. | `app`, `description?`, `name?`, `index?`, `captureAfter?` |
 ### Runtime & Synchronization
@@ -120,6 +122,8 @@ UCU-MCP provides 22 tools across five categories:
 | `doctor` | Check platform readiness, permissions, lock-screen state, and client integration hints | — |
 | `wait` | Wait for UI state to settle after launches, animations, or navigation | `ms` |
 | `wait_for_element` | Poll the AX tree until a matching element appears | `text?`, `role?`, `app?`, `timeout?`, `timeoutMs?`, `interval?`, `intervalMs?` |
+| `clipboard_read` | Read the current contents of the system clipboard | — |
+| `clipboard_write` | Write text to the system clipboard (text-injection patterns are blocked) | `text`, `captureAfter?` |
 Action tools accept `captureAfter`, `captureMaxWidth`, and `captureFormat` so an agent can receive a post-action screenshot as a second MCP image content item in the same response instead of spending another round trip on `screenshot`. When `captureAfter` is requested and the action succeeds, the tool returns an `ActionReceipt` (see the Action Receipt section below) with `capture.status: "ok"`. If post-action capture fails, the receipt has `status: "partial"` and `capture.status: "error"` with the error details. If `captureAfter` is omitted, `capture.status` is `"skipped"`.
@@ -415,6 +419,27 @@ ucu-mcp doctor
 The same readiness report is also available as the MCP `doctor` tool.
+## Agent Skill
+UCU-MCP ships an installable **agent skill** that gives Codex, Claude Code, and
+other agent runtimes richer guidance than the embedded MCP `instructions:` field
+alone — structured tool-selection rules, task playbooks, and an error-recovery
+reference. The skill lives at [`skills/ucu-mcp/`](skills/ucu-mcp/SKILL.md) and
+is included in the npm tarball.
+Install it for your agent runtime with the [`skills` CLI](https://www.npmjs.com/package/skills):
+```bash
+# Codex
+npx skills add ucu-mcp -g -a codex --skill ucu-mcp -y
+# Claude Code
+npx skills add ucu-mcp -g -a claude-code --skill ucu-mcp -y
+```
+Or reference it directly: the entry point is
+[`skills/ucu-mcp/SKILL.md`](skills/ucu-mcp/SKILL.md), with deeper content in
+`skills/ucu-mcp/references/` (tool reference, workflows, troubleshooting).
 ## Safety
 ### Built-in safety rules

package/dist/src/mcp/server.js CHANGED Viewed

@@ -12,6 +12,7 @@ Pick the right tool sequence for the task:
 • Fill a form field → find_element (text/role) + type_in_element or set_value. Prefer AX over coordinates.
 • Click a menu bar item → get_screen_size + click with coordinates (menu bar is not in the AX tree).
 • Read what's on screen → screenshot; for text not in AX use ocr; for a structured tree use get_window_state.
+• When image content is not visible (relayed/downgraded to URL) → screenshot with describe=true, or describe_screen for a text-only structured view (OCR + AX tree + foreground window).
 • Switch between apps → list_apps, then focus_app; subsequent tools use the active target context.
 • Verify an action succeeded → captureAfter=true on action tools, or call screenshot afterwards.
 • Wait for UI to change → wait_for_element (until: "appear" default; also "disappear" or "value_change").

package/dist/src/mcp/tools/element-tools.js CHANGED Viewed

@@ -56,4 +56,15 @@ export function registerElementTools(registerTool) {
         await withSafety({ action: "type_in_element", params: { text: params.text, ...safetyCtx }, requiresAccessibility: true, execute: () => getPlatform().typeInElement(params.elementId, params.text, effectiveApp, params.clearFirst) });
         return actionResponse("type_in_element", { typed: true, elementId: params.elementId, charCount: params.text.length }, { elementId: params.elementId, app: effectiveApp }, params.captureAfter, params.captureFormat, params.captureMaxWidth);
     });
+    registerTool("click_menu_bar_extra", "Click a menu bar status item (tray icon) — for menu-bar-only apps (e.g. cc-switch) that focus_app cannot target. After clicking, the menu opens; use find_element to locate menu items, or screenshot + ocr if the menu's AX tree is opaque.", {
+        app: z.string().describe("Target app name"),
+        description: z.string().optional().describe("Match menu bar item by description/name substring"),
+        name: z.string().optional().describe("Match menu bar item by name/description substring"),
+        index: z.number().int().nonnegative().optional().describe("0-based index among matched items (default 0)"),
+        ...captureAfterFields,
+    }, async (params) => {
+        const safetyCtx = await getSafetyContext();
+        await withSafety({ action: "click_menu_bar_extra", params: { ...safetyCtx }, requiresAccessibility: true, execute: () => getPlatform().clickMenuBarExtra(params.app, { description: params.description, name: params.name, index: params.index }) });
+        return actionResponse("click_menu_bar_extra", { clicked: true, app: params.app }, { app: params.app }, params.captureAfter, params.captureFormat, params.captureMaxWidth);
+    });
 }

package/dist/src/mcp/tools/index.d.ts CHANGED Viewed

@@ -1,7 +1,7 @@
 /**
  * Tool registry for UCU-MCP.
  *
- * Registers 24 MCP tools on the server and dispatches each call through
+ * Registers 26 MCP tools on the server and dispatches each call through
  * a shared safety/permission/retry pipeline (`withSafety`).
  */
 import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";

package/dist/src/mcp/tools/index.js CHANGED Viewed

@@ -1,7 +1,7 @@
 /**
  * Tool registry for UCU-MCP.
  *
- * Registers 24 MCP tools on the server and dispatches each call through
+ * Registers 26 MCP tools on the server and dispatches each call through
  * a shared safety/permission/retry pipeline (`withSafety`).
  */
 import { createLogger } from "../../util/logger.js";

package/dist/src/mcp/tools/keyboard-tools.js CHANGED Viewed

@@ -14,8 +14,8 @@ export function registerKeyboardTools(registerTool) {
         return actionResponse("type_text", { typed: true, charCount: params.text.length }, {}, params.captureAfter, params.captureFormat, params.captureMaxWidth);
     });
     registerTool("press_key", "Press a keyboard shortcut", {
-        keys: z.array(z.string()).optional().describe("Keys to press simultaneously"),
-        key: z.string().optional().describe("Single key to press (alias for keys)"),
+        keys: z.array(z.string()).optional().describe("Keys to press simultaneously — special keys (enter/escape/tab/f1-f12/arrows...), single letters a-z, or single digits 0-9"),
+        key: z.string().optional().describe("Single key (alias for keys) — special keys, single letter a-z, or single digit 0-9"),
         modifiers: z.array(z.string()).optional().describe("Modifier keys used with key, such as cmd, shift, alt, or ctrl"),
         windowId: z.string().optional().describe("UNSUPPORTED: windowId-targeted key events are not implemented"),
         ...captureAfterFields,

package/dist/src/mcp/tools/screen-tools.js CHANGED Viewed

@@ -1,14 +1,111 @@
 import { z } from "zod";
 import { checkPermission } from "../../safety/permissions.js";
 import { UnsupportedParameterError } from "../../util/errors.js";
-import { getPlatform, getActiveTarget, withSafety, } from "./helpers.js";
+import { getPlatform, getActiveTarget, withSafety, jsonText, } from "./helpers.js";
+/** Sensitive-field detection regex for AX tree masking (password fields, secrets). */
+const SENSITIVE_NAME_RE = /password|passwd|secret|pincode|pin\b|token|credential/i;
+const SENSITIVE_ROLES = new Set(["AXSecureTextField", "AXPasswordField"]);
+/**
+ * Recursively mask values of password / secret fields in an AX subtree.
+ * Mutates in place. Covers AXSecureTextField role and heuristic name matches.
+ */
+function maskSensitiveFields(el) {
+    if (!el)
+        return;
+    const isSensitive = SENSITIVE_ROLES.has(el.role) || SENSITIVE_NAME_RE.test(el.name || "") || SENSITIVE_NAME_RE.test(el.description || "");
+    if (isSensitive && el.value)
+        el.value = "[REDACTED]";
+    if (el.children)
+        for (const child of el.children)
+            maskSensitiveFields(child);
+}
+/**
+ * Build a structured ScreenDescription — each source (screen / foreground / OCR / AX)
+ * is collected independently inside try/catch. Failures go to `errors` and set the
+ * corresponding status; the function never throws. This is the text fallback for
+ * environments where image content blocks are downgraded to URLs.
+ */
+async function buildScreenDescription(opts) {
+    const platform = getPlatform();
+    const errors = [];
+    const capturedAt = new Date().toISOString();
+    // screen — sync, almost never fails
+    let screen;
+    try {
+        screen = platform.getScreenSize(opts.display);
+    }
+    catch (e) {
+        screen = { width: 0, height: 0, scaleFactor: 1, estimated: true };
+        errors.push({ source: "screen", message: `getScreenSize failed: ${e.message}` });
+    }
+    // foreground window — listApps() isFrontmost → listWindows() filter by processName + isOnScreen
+    let foregroundWindow;
+    try {
+        if (platform.listApps) {
+            const apps = await platform.listApps();
+            const front = apps.find((a) => a.isFrontmost);
+            if (front) {
+                const wins = await platform.listWindows(true);
+                foregroundWindow = wins.find((w) => w.processName === front.name && w.isOnScreen);
+            }
+        }
+    }
+    catch (e) {
+        errors.push({ source: "foreground", message: `foreground window resolution failed: ${e.message}` });
+    }
+    // OCR — cap blocks to ocrBlocks via slice
+    let ocr;
+    if (opts.runOcr) {
+        try {
+            const result = await platform.ocr(opts.display);
+            ocr = {
+                blocks: result.elements.slice(0, opts.ocrBlocks),
+                fullText: result.fullText,
+                status: "ok",
+            };
+        }
+        catch (e) {
+            ocr = { blocks: [], fullText: "", status: "failed" };
+            errors.push({ source: "ocr", message: `ocr failed: ${e.message}` });
+        }
+    }
+    else {
+        ocr = { blocks: [], fullText: "", status: "skipped" };
+    }
+    // AX — getWindowState with depth cap, password masking applied
+    let ax;
+    if (opts.includeAx) {
+        const effectiveWindowId = opts.windowId ?? getActiveTarget()?.windowId;
+        try {
+            const cappedDepth = Math.min(opts.axDepth, 10);
+            const state = await platform.getWindowState(effectiveWindowId, cappedDepth, true);
+            maskSensitiveFields(state.tree);
+            maskSensitiveFields(state.focusedElement);
+            ax = { elements: state.tree, status: "ok", windowId: effectiveWindowId };
+        }
+        catch (e) {
+            ax = { status: "failed", windowId: effectiveWindowId };
+            errors.push({ source: "ax", message: `getWindowState failed: ${e.message}` });
+        }
+    }
+    else {
+        ax = { status: "skipped" };
+    }
+    return { capturedAt, screen, foregroundWindow, ocr, ax, errors };
+}
 export function registerScreenTools(registerTool) {
-    registerTool("screenshot", "Capture a screenshot of the entire screen or a region", {
+    registerTool("screenshot", "Capture a screenshot of the entire screen or a region. Set describe=true to also append a structured text description (OCR + AX tree) — useful when image content blocks may not be visible to the model.", {
         display: z.number().optional().describe("Display index (default 0)"),
         windowId: z.string().optional().describe("Window ID from list_windows; when set, captures that window"),
         region: z.object({ x: z.number(), y: z.number(), width: z.number(), height: z.number() }).optional().describe("Region to capture"),
         format: z.enum(["png", "jpeg"]).default("png").describe("Image format"),
         maxWidth: z.number().default(1280).describe("Maximum output width in pixels. Aspect ratio is preserved."),
+        describe: z.boolean().default(false).describe("When true, append a text content block with a structured screen description (OCR + AX tree) after the image"),
+        describeOptions: z.object({
+            axDepth: z.number().int().positive().default(3).describe("AX tree depth (capped at 10)"),
+            ocrBlocks: z.number().int().positive().default(50).describe("Max OCR blocks to include"),
+            includeAx: z.boolean().default(true).describe("Include the AX tree in the description"),
+        }).optional().describe("Options for the appended description (only used when describe=true)"),
     }, async (params) => {
         if (params.windowId && params.region)
             throw new UnsupportedParameterError("screenshot windowId cannot be combined with region");
@@ -23,7 +120,20 @@ export function registerScreenTools(registerTool) {
                     : Promise.reject(new UnsupportedParameterError("window screenshots are not implemented on this platform"))
                 : getPlatform().screenshot(params.display, params.region, options),
         });
-        return { content: [{ type: "image", data: buf.toString("base64"), mimeType: `image/${params.format}` }] };
+        const content = [{ type: "image", data: buf.toString("base64"), mimeType: `image/${params.format}` }];
+        if (params.describe) {
+            const opts = params.describeOptions ?? { axDepth: 3, ocrBlocks: 50, includeAx: true };
+            const desc = await buildScreenDescription({
+                display: params.display,
+                runOcr: true,
+                includeAx: opts.includeAx ?? true,
+                axDepth: opts.axDepth ?? 3,
+                ocrBlocks: opts.ocrBlocks ?? 50,
+                windowId: params.windowId,
+            });
+            content.push(jsonText(desc));
+        }
+        return { content };
     });
     registerTool("list_windows", "List all visible windows on screen", {
         includeMinimized: z.boolean().optional().describe("Include minimized windows"),
@@ -66,4 +176,30 @@ export function registerScreenTools(registerTool) {
         const result = await withSafety({ action: "ocr", params: {}, requiresScreenRecording: true, execute: () => getPlatform().ocr(params.display, params.region) });
         return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };
     });
+    registerTool("describe_screen", "Get a structured text description of the screen (OCR blocks + AX tree + foreground window). Use this when image content blocks are not visible to the model (e.g. relayed/downgraded to URLs) or when you need a machine-readable screen layout. With ocr=false it does not require Screen Recording. Sensitive fields (passwords) are masked.", {
+        display: z.number().optional().describe("Display index (default 0)"),
+        ocr: z.boolean().default(true).describe("Run OCR and include text blocks (requires Screen Recording)"),
+        includeAx: z.boolean().default(true).describe("Include the AX tree of the foreground/active window"),
+        axDepth: z.number().int().positive().default(3).describe("AX tree depth (capped at 10)"),
+        ocrBlocks: z.number().int().positive().default(50).describe("Max OCR blocks to include"),
+        windowId: z.string().optional().describe("Window ID for AX traversal (defaults to active target's window)"),
+    }, async (params) => {
+        const desc = await withSafety({
+            action: "describe_screen",
+            params,
+            requiresScreenRecording: params.ocr,
+            requiresAccessibility: params.includeAx,
+            // describe_screen never throws from inner source failures (they go to errors[]),
+            // but withSafety still needs to enforce rate-limit / lock / dry-run semantics around it.
+            execute: () => buildScreenDescription({
+                display: params.display,
+                runOcr: params.ocr,
+                includeAx: params.includeAx,
+                axDepth: params.axDepth,
+                ocrBlocks: params.ocrBlocks,
+                windowId: params.windowId,
+            }),
+        });
+        return { content: [jsonText(desc)] };
+    });
 }

package/dist/src/platform/base.d.ts CHANGED Viewed

@@ -126,6 +126,31 @@ export interface WindowState {
     focusedElement?: ElementInfo;
     tree?: ElementInfo;
 }
+/**
+ * Structured text description of the screen — a fallback for environments where
+ * image content blocks are downgraded to URLs (so the model cannot see screenshots).
+ * Each source (OCR / AX / foreground) is collected independently; failures are
+ * aggregated in `errors` rather than thrown.
+ */
+export interface ScreenDescription {
+    capturedAt: string;
+    screen: ScreenSize;
+    foregroundWindow?: WindowInfo;
+    ocr: {
+        blocks: OcrElement[];
+        fullText: string;
+        status: "ok" | "skipped" | "failed";
+    };
+    ax: {
+        elements?: ElementInfo;
+        status: "ok" | "skipped" | "failed";
+        windowId?: string;
+    };
+    errors: Array<{
+        source: "ocr" | "ax" | "foreground" | "screen";
+        message: string;
+    }>;
+}
 export interface Platform {
     screenshot(display?: number, region?: ScreenRegion, options?: ScreenshotOptions): Promise<Buffer>;
     screenshotWindow?(windowId: string, options?: ScreenshotOptions): Promise<Buffer>;
@@ -147,6 +172,12 @@ export interface Platform {
     clickElement(elementId: string, app?: string): Promise<void>;
     typeInElement(elementId: string, text: string, app?: string, clearFirst?: boolean): Promise<void>;
     setElementValue?(elementId: string, value: string, app?: string): Promise<void>;
+    findMenuBarExtra?(app: string): Promise<unknown[]>;
+    clickMenuBarExtra?(app: string, selector?: {
+        description?: string;
+        name?: string;
+        index?: number;
+    }): Promise<void>;
     isScreenLocked?(): boolean;
     saveFocus?(): Promise<void>;
     restoreFocus?(): Promise<void>;

package/dist/src/platform/macos/base.d.ts CHANGED Viewed

@@ -5,7 +5,7 @@ import { screenshot, screenshotWindow, getScreenSize, isScreenLocked, ocr } from
 import { listApps, focusApp, getActiveBrowserContext, listWindows } from "./window.js";
 import { getWindowState, findElement } from "./ax-tree.js";
 import { click, move, drag, scroll, getCursorPosition, type as typeMethod, key } from "./input.js";
-import { clickElement, typeInElement, setElementValue } from "./element.js";
+import { clickElement, typeInElement, setElementValue, findMenuBarExtra, clickMenuBarExtra } from "./element.js";
 import { readClipboard, writeClipboard } from "./clipboard.js";
 export type { MacOSPlatformOptions } from "./helpers.js";
 export declare class MacOSPlatform implements Platform {
@@ -52,6 +52,8 @@ export declare class MacOSPlatform implements Platform {
     clickElement: typeof clickElement;
     typeInElement: typeof typeInElement;
     setElementValue: typeof setElementValue;
+    findMenuBarExtra: typeof findMenuBarExtra;
+    clickMenuBarExtra: typeof clickMenuBarExtra;
     readClipboard: typeof readClipboard;
     writeClipboard: typeof writeClipboard;
 }

package/dist/src/platform/macos/base.js CHANGED Viewed

@@ -4,7 +4,7 @@ import { screenshot, screenshotWindow, getScreenSize, isScreenLocked, ocr } from
 import { listApps, focusApp, getActiveBrowserContext, listWindows } from "./window.js";
 import { getWindowState, findElement } from "./ax-tree.js";
 import { click, move, drag, scroll, getCursorPosition, type as typeMethod, key } from "./input.js";
-import { clickElement, typeInElement, setElementValue } from "./element.js";
+import { clickElement, typeInElement, setElementValue, findMenuBarExtra, clickMenuBarExtra } from "./element.js";
 import { readClipboard, writeClipboard } from "./clipboard.js";
 export class MacOSPlatform {
     _nativeHelperPaths;
@@ -53,6 +53,10 @@ export class MacOSPlatform {
     async validateActiveTarget() {
         if (!this.activeTarget?.windowId)
             return;
+        // 托盘 target（focus_app 找不到窗口时回退到 status item）没有真实窗口，
+        // 恒有效直到模型显式 focus_app 其他应用——不查 listWindows（查了必然失配）。
+        if (this.activeTarget.windowId === "tray")
+            return;
         this.windowCache = undefined;
         const windows = await this.listWindows(true);
         const match = windows.find(w => w.id === this.activeTarget.windowId);
@@ -87,6 +91,8 @@ export class MacOSPlatform {
     clickElement = clickElement;
     typeInElement = typeInElement;
     setElementValue = setElementValue;
+    findMenuBarExtra = findMenuBarExtra;
+    clickMenuBarExtra = clickMenuBarExtra;
     readClipboard = readClipboard;
     writeClipboard = writeClipboard;
 }

package/dist/src/platform/macos/element.d.ts CHANGED Viewed

@@ -2,3 +2,23 @@ import type { MacOSPlatform } from "./base.js";
 export declare function clickElement(this: MacOSPlatform, elementId: string, app?: string): Promise<void>;
 export declare function typeInElement(this: MacOSPlatform, elementId: string, text: string, app?: string, clearFirst?: boolean): Promise<void>;
 export declare function setElementValue(this: MacOSPlatform, elementId: string, value: string, app?: string): Promise<void>;
+export interface MenuBarExtraItem {
+    menuBar: number;
+    index: number;
+    name: string;
+    description: string;
+    x: number;
+    y: number;
+    width: number;
+    height: number;
+    /** Which process hosts this status item. "self" = app's own menu bar; "systemuiserver" = third-party tray hosted by SystemUIServer. */
+    host: "self" | "systemuiserver";
+}
+export interface MenuBarExtraSelector {
+    description?: string;
+    name?: string;
+    index?: number;
+}
+export declare function findMenuBarExtra(this: MacOSPlatform, app: string): Promise<MenuBarExtraItem[]>;
+export declare function matchMenuBarExtra(items: MenuBarExtraItem[], selector: MenuBarExtraSelector): MenuBarExtraItem | undefined;
+export declare function clickMenuBarExtra(this: MacOSPlatform, app: string, selector?: MenuBarExtraSelector): Promise<void>;