@vortex-os/computer-use 0.7.1 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,19 +1,20 @@
1
1
  # @vortex-os/computer-use
2
2
 
3
- <!-- docs-version: 0.7.1 -->
3
+ <!-- docs-version: 0.7.2 -->
4
4
 
5
5
  Read-only **screen perception** for VortEX agents, exposed as an MCP server. It lets an agent *see* what is on screen — read a window's structure, capture a region as an image, and watch for on-screen changes — without ever moving the mouse or typing. It layers on `@vortex-os/base` but also works standalone.
6
6
 
7
- > **Status: 0.5.0, Windows-first, read-only.** Mouse/keyboard **control is intentionally out of scope** for this release — this package only *perceives*. macOS/Linux backends are not yet implemented.
7
+ > **Status: Windows-first, read-only.** Mouse/keyboard **control is intentionally out of scope** — this package only *perceives* (and speaks). macOS/Linux backends are not yet implemented.
8
8
 
9
9
  ## What it is
10
10
 
11
- An MCP (Model Context Protocol) server that exposes nine perception tools over stdio:
11
+ An MCP (Model Context Protocol) server that exposes eleven tools over stdio (nine perception, plus `beep`/`speak` for output):
12
12
 
13
13
  | Tool | What it does | Cost |
14
14
  |---|---|---|
15
15
  | `probe` | Reports whether this environment can perceive the screen (displays, DPI, capture latency). Never captures real screen content. | ~0 |
16
16
  | `read_ui` | Reads the active/target window as a **structured accessibility tree** (UI Automation): element roles, coordinates, text. No image. | ~0 image tokens |
17
+ | `classify_activity` | Classifies the on-screen activity (game / dev / media / browsing / productivity) so a companion can branch its help. | metadata |
17
18
  | `capture_screen` | Pixel capture (PNG) for what structure can't reach — canvases, games, remote desktops. Target by window, region, monitor, or cursor box. | image |
18
19
  | `watch_capture` | Captures N frames at an interval in one process; with `changeOnly`, keeps only changed frames. | image(s) |
19
20
  | `poll_change` | One non-blocking "did it change?" probe; returns a change percentage and (optionally) an image. Poll it on an interval to watch without blocking. | metadata, image optional |
@@ -21,6 +22,7 @@ An MCP (Model Context Protocol) server that exposes nine perception tools over s
21
22
  | `get_events` | Collect the buffered changes a `start_watch` has accumulated — batched (a few looks for a long watch); each event carries the settled frame. | metadata + image(s) |
22
23
  | `stop_watch` | Stop a background watch and discard its buffer. | — |
23
24
  | `beep` | A system beep, to get the user's attention while they look elsewhere. | — |
25
+ | `speak` | Speaks a short utterance locally (built-in Windows voice, or the optional Supertonic neural voice). | — |
24
26
 
25
27
  The design favors **structure first, pixels as fallback**: `read_ui` is cheap and precise for ordinary apps; `capture_screen` is for content that has no accessibility tree (games, custom canvases).
26
28
 
@@ -71,13 +73,13 @@ Once the models are present, the speak path uses them automatically (`engine: "a
71
73
 
72
74
  **Audio ducking.** While the companion speaks, other apps' audio (game / music / video) is briefly lowered per-app and **restored exactly** when it finishes, so the voice stands out. On by default. *DRM-protected audio (e.g. Netflix) cannot be ducked* — that protected path bypasses Windows volume control; normal app/game audio ducks fine.
73
75
 
74
- Configure in `computer-use.config.json` (`tts` section) or via env (env wins). Defaults shown:
76
+ Configure in your instance-root `computer-use.config.json` (`tts` section — see *Privacy & redaction* for placement) or via env (env wins). Defaults shown:
75
77
 
76
78
  ```json
77
79
  { "tts": { "engine": "auto", "voice": "F1", "speed": 1.05, "duck": true, "duckFactor": 0.3 } }
78
80
  ```
79
81
 
80
- `engine` `auto|supertonic|heami` · `voice` `F1..F5/M1..M5` · `speed` rate multiplier (~1.0 = normal, higher = faster; clamped 0.5..2.0, applied to both the neural and built-in voices) · `duckFactor` `0..1` (others drop to this fraction; lower = quieter). Env: `VORTEX_CU_TTS_ENGINE` / `VORTEX_CU_TTS_VOICE` / `VORTEX_CU_TTS_SPEED` / `VORTEX_CU_DUCK=off` / `VORTEX_CU_DUCK_FACTOR`. Restart the server after changing.
82
+ `engine` `auto|supertonic|heami` · `voice` `F1..F5/M1..M5` (Supertonic only; the built-in Windows voice picks by system language) · `speed` rate multiplier (~1.0 = normal, higher = faster; clamped 0.5..2.0, applied to both the neural and built-in voices) · `duckFactor` `0..1` (clamped) (others drop to this fraction; lower = quieter). Env: `VORTEX_CU_TTS_ENGINE` / `VORTEX_CU_TTS_VOICE` / `VORTEX_CU_TTS_SPEED` / `VORTEX_CU_DUCK=off` / `VORTEX_CU_DUCK_FACTOR`. Restart the server after changing.
81
83
 
82
84
  ### Optional: a local vision model (the `vision` trigger)
83
85
 
@@ -118,14 +120,14 @@ The profiles branch the behavior (full design in [`docs/adaptive-companion.md`](
118
120
 
119
121
  For a `GAME`, `needsChangeRate` tells the agent to take a couple of `poll_change` reads to split **fast-action** (too fast to coach — break-gated only) from **strategy** (coachable, periodic). Honesty is built in: it never pretends to coach a game it can't follow, and it won't talk over media. The **interruptibility state** gates every utterance, on top of the global speech budget. Explicit user requests ("tell me when X happens", "be quieter") layer on as reflex triggers / cadence overrides.
120
122
 
121
- Tune it in `computer-use.config.json` (`companion` section): `uiaCanvasMax` (the canvas cutoff) and per-class `profiles` (e.g. `GAME.cadenceSec: 20` for chattier coaching). Env: `VORTEX_CU_UIA_CANVAS_MAX`.
123
+ Tune it in your instance-root `computer-use.config.json` (`companion` section): `uiaCanvasMax` (the canvas cutoff) and per-class `profiles` (e.g. `GAME.cadenceSec: 20` for chattier coaching). Env: `VORTEX_CU_UIA_CANVAS_MAX`.
122
124
 
123
125
  ## What it is NOT
124
126
 
125
127
  - **Not control.** No clicking, typing, or app automation. Perception only.
126
128
  - **Not real-time for *judgment*.** Reflex `triggers` deliver a sub-second beep / fixed phrase / OCR readout, but anything the agent has to *think* about (a judged message) is seconds-scale — it makes a cloud call. Good for alerts, translation, and watching-alongside; not for reflex-speed decisions.
127
129
  - **Not comprehensive secret protection.** See *Privacy & redaction* below — the denylist is the real control; field-level masking is best-effort and does not catch plaintext secrets sitting in arbitrary windows.
128
- - **Not cross-platform yet.** Windows only in 0.3.0.
130
+ - **Not cross-platform yet.** Windows only (for now).
129
131
 
130
132
  ## Install
131
133
 
@@ -159,9 +161,9 @@ Whatever you point this at is sent to your AI model. Two controls reduce acciden
159
161
  1. **Denylist (the primary control).** List window titles or process names that must never be captured. If a listed window is visible anywhere inside a capture region, the whole capture is refused (`{ "redacted": true }` — no image, no text). This is the reliable defense against accidentally capturing a password manager or banking window during a watch.
160
162
  2. **Password-field masking.** In `read_ui`, fields the OS reports as password inputs are dropped (no value, no text, children not traversed).
161
163
 
162
- Copy `computer-use.config.example.json` to `computer-use.config.json` (next to the server) to configure the denylist, or set `VORTEX_CU_DENY_TITLES` / `VORTEX_CU_DENY_PROCS` (JSON arrays). **The denylist is read once at startup — restart the server after changing it.**
164
+ Copy `computer-use.config.example.json` to **`computer-use.config.json` in your instance root** (the folder you launch the agent from — i.e. next to `.mcp.json`), or point `VORTEX_CU_CONFIG` at an explicit path, to configure the denylist; or set `VORTEX_CU_DENY_TITLES` / `VORTEX_CU_DENY_PROCS` (JSON arrays). Do **not** put it inside `node_modules` — that is wiped on every reinstall. **The denylist is read once at startup — restart the server after changing it.**
163
165
 
164
- **Honest limits.** This is *not* comprehensive secret-scanning. A plaintext token shown in a text editor or terminal (not a password field, not a denylisted window) will still be captured. Pixel-level password masking is intentionally out of scope for 0.1.0. Capture images are volatile — held only long enough to send, then deleted; they are never written to disk persistently.
166
+ **Honest limits.** This is *not* comprehensive secret-scanning. A plaintext token shown in a text editor or terminal (not a password field, not a denylisted window) will still be captured. Pixel-level password masking is intentionally out of scope. Capture images are volatile — held only long enough to send, then deleted; they are never written to disk persistently.
165
167
 
166
168
  ### Audit
167
169
 
@@ -1,5 +1,5 @@
1
1
  {
2
- "_comment": "Copy to computer-use.config.json to enable redaction. Empty by default — nothing is blocked until you add entries. The denylist is the primary control: any listed window/process that appears inside a capture region makes the whole capture fail-closed (no image, no structured text). Matching is case-insensitive substring. This is NOT comprehensive secret-scanning: plaintext secrets visible in non-listed windows (editors, terminals) are still captured. Env overrides: VORTEX_CU_DENY_TITLES / VORTEX_CU_DENY_PROCS (JSON arrays).",
2
+ "_comment": "Copy to computer-use.config.json IN YOUR INSTANCE ROOT (the folder you launch the agent from, next to .mcp.json), or point VORTEX_CU_CONFIG at a path. Do NOT leave it inside node_modules — that copy is wiped on reinstall and is no longer the primary lookup. Empty by default — nothing is blocked until you add entries. The denylist is the primary control: any listed window/process that appears inside a capture region makes the whole capture fail-closed (no image, no structured text). Matching is case-insensitive substring. This is NOT comprehensive secret-scanning: plaintext secrets visible in non-listed windows (editors, terminals) are still captured. Env overrides: VORTEX_CU_DENY_TITLES / VORTEX_CU_DENY_PROCS (JSON arrays).",
3
3
  "_restart": "The denylist is read once at server start; RESTART the MCP server (restart the agent / reload its MCP servers) after changing this file or the env vars for the change to take effect.",
4
4
  "redaction": {
5
5
  "denyWindowTitles": [],
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@vortex-os/computer-use",
3
- "version": "0.7.1",
3
+ "version": "0.7.2",
4
4
  "description": "Add-on — read-only screen perception (structured UIA tree + pixel fallback + noise-filtered background watch with an event buffer + sub-second reflex alerts: beep / fixed-phrase / OCR or optional local-VLM description spoken locally, optional higher-quality Supertonic neural TTS with per-app audio ducking, adaptive companion that classifies the on-screen activity and branches its help) exposed as an MCP server, layered on @vortex-os/base. Windows-first. Control (mouse/keyboard) is intentionally out of scope.",
5
5
  "license": "MIT",
6
6
  "author": "vortex-os-project",
@@ -22,13 +22,50 @@ import { classifyActivity } from './activity.mjs';
22
22
  const dir = dirname(fileURLToPath(import.meta.url));
23
23
  const plat = process.platform;
24
24
 
25
+ // Package version (read from the shipped package.json — `dir` is scripts/, so the manifest is one up),
26
+ // reported as the MCP server version so the host sees the real version (was hardcoded + stale before).
27
+ const PKG_VERSION = (() => {
28
+ try { return JSON.parse(readFileSync(join(dir, '..', 'package.json'), 'utf8')).version || '0.0.0'; }
29
+ catch { return '0.0.0'; }
30
+ })();
31
+
32
+ // Resolve the user config file `computer-use.config.json`. HISTORY: it was read only from `dir` (this
33
+ // scripts/ folder INSIDE node_modules) — a path the docs never named and that npm wipes on every reinstall,
34
+ // so a documented redaction denylist / tts / companion config silently never loaded. NOW, in order:
35
+ // 1. VORTEX_CU_CONFIG — explicit absolute path override.
36
+ // 2. <cwd>/computer-use.config.json — the instance root (the MCP host launches us with cwd = instance root);
37
+ // durable, outside node_modules, and where users are now told to put it.
38
+ // 3. <dir>/computer-use.config.json — legacy scripts/ location, kept for backward compatibility.
39
+ // When none exists we return the cwd path (the canonical "put it here" location), so callers treat config as absent.
40
+ // Because a missing config means an EMPTY redaction denylist (no privacy protection), every "expected
41
+ // config not loaded" case warns to stderr rather than failing silently — a silent inert denylist is the
42
+ // exact trap this release fixes (codex r1, MEDIUM x2).
43
+ function resolveConfigPath() {
44
+ const env = process.env.VORTEX_CU_CONFIG;
45
+ const cwd = join(process.cwd(), 'computer-use.config.json');
46
+ const legacy = join(dir, 'computer-use.config.json');
47
+ if (env && env.trim()) {
48
+ if (existsSync(env)) return env;
49
+ // Explicit path that doesn't exist: warn LOUD (likely a typo) and fall back, never silently honor a dead path.
50
+ process.stderr.write(`[computer-use MCP] WARNING: VORTEX_CU_CONFIG="${env}" does not exist — config (incl. the redaction denylist) is NOT loaded from it; falling back to the instance root.\n`);
51
+ }
52
+ if (existsSync(cwd)) {
53
+ if (existsSync(legacy)) process.stderr.write(`[computer-use MCP] NOTE: using the instance-root computer-use.config.json; a legacy ${legacy} also exists and is IGNORED.\n`);
54
+ return cwd;
55
+ }
56
+ if (existsSync(legacy)) return legacy;
57
+ return cwd;
58
+ }
59
+ // Resolve ONCE at load (so the warnings above fire at most once, not per config section).
60
+ const CONFIG_PATH = resolveConfigPath();
61
+
25
62
  // ── redaction config (§8·§14) ─────────────────────────────────────────────
26
63
  // Normalize the denylist into env (JSON array) so children (worker / per-call spawn) inherit it -> no per-call args. Config source: env > config file.
27
64
  // The actual blocking is done by the backend (lib.ps1 Test-AxDenylist) right before CopyFromScreen (Node doesn't know which windows are inside the region/monitor).
28
65
  function loadRedactionConfig() {
29
66
  let titles = [], procs = [];
30
67
  try {
31
- const cfgPath = join(dir, 'computer-use.config.json');
68
+ const cfgPath = CONFIG_PATH;
32
69
  if (existsSync(cfgPath)) {
33
70
  const r = (JSON.parse(readFileSync(cfgPath, 'utf8')) || {}).redaction || {};
34
71
  if (Array.isArray(r.denyWindowTitles)) titles = r.denyWindowTitles;
@@ -52,7 +89,7 @@ const REDACTION = loadRedactionConfig();
52
89
  function loadTtsConfig() {
53
90
  let cfg = {};
54
91
  try {
55
- const cfgPath = join(dir, 'computer-use.config.json');
92
+ const cfgPath = CONFIG_PATH;
56
93
  if (existsSync(cfgPath)) cfg = (JSON.parse(readFileSync(cfgPath, 'utf8')) || {}).tts || {};
57
94
  } catch {}
58
95
  const setIfUnset = (k, v) => { if (v !== undefined && v !== null && (process.env[k] === undefined || process.env[k] === '')) process.env[k] = String(v); };
@@ -62,7 +99,9 @@ function loadTtsConfig() {
62
99
  setIfUnset('VORTEX_CU_TTS_LANG', cfg.lang); // spoken language (defaults to the OCR language)
63
100
  setIfUnset('VORTEX_CU_TTS_SPEED', cfg.speed); // speech-rate multiplier (~1.0 = normal; empty = engine default)
64
101
  if (cfg.duck === false) setIfUnset('VORTEX_CU_DUCK', 'off'); // lower other apps while speaking (default on)
65
- setIfUnset('VORTEX_CU_DUCK_FACTOR', cfg.duckFactor); // others -> original*factor during speech (0..1; default 0.3)
102
+ // others -> original*factor during speech; clamp a finite value to 0..1 so a typo (e.g. 30) can't pass through (default 0.3).
103
+ const df = Number.isFinite(Number(cfg.duckFactor)) ? Math.max(0, Math.min(1, Number(cfg.duckFactor))) : cfg.duckFactor;
104
+ setIfUnset('VORTEX_CU_DUCK_FACTOR', df);
66
105
  }
67
106
  loadTtsConfig();
68
107
 
@@ -73,7 +112,7 @@ let COMPANION_PROFILES = {};
73
112
  function loadCompanionConfig() {
74
113
  let cfg = {};
75
114
  try {
76
- const cfgPath = join(dir, 'computer-use.config.json');
115
+ const cfgPath = CONFIG_PATH;
77
116
  if (existsSync(cfgPath)) cfg = (JSON.parse(readFileSync(cfgPath, 'utf8')) || {}).companion || {};
78
117
  } catch {}
79
118
  if (cfg.uiaCanvasMax != null && (process.env.VORTEX_CU_UIA_CANVAS_MAX === undefined || process.env.VORTEX_CU_UIA_CANVAS_MAX === '')) {
@@ -1329,7 +1368,7 @@ if (process.argv.slice(2).includes('install')) {
1329
1368
  const { Server } = await import('@modelcontextprotocol/sdk/server/index.js');
1330
1369
  const { StdioServerTransport } = await import('@modelcontextprotocol/sdk/server/stdio.js');
1331
1370
  const { ListToolsRequestSchema, CallToolRequestSchema } = await import('@modelcontextprotocol/sdk/types.js');
1332
- const server = new Server({ name: 'computer-use', version: '0.5.0' }, { capabilities: { tools: {} } });
1371
+ const server = new Server({ name: 'computer-use', version: PKG_VERSION }, { capabilities: { tools: {} } });
1333
1372
  server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: TOOLS }));
1334
1373
  server.setRequestHandler(CallToolRequestSchema, handleCallTool);
1335
1374
  await server.connect(new StdioServerTransport());